News in Proteomics Research: Learning R for proteomics

Friday, November 24, 2017

Learning R for proteomics

We keep seeing more and more awesome R tools for proteomics. I'm sure there is some nerd in your department who is doing something with R -- especially if you have transcriptomics or metabolomics people around. It is starting to seem rare around here for me to run into a grad student or postdoc who isn't proficient in it.

In case you aren't aware, however:
R is a programming language that has a few serious limitations:
1) Every object R is working with must be stored in the RAM on your computer. If you've got a 1e6 MS/MS spectra that you don't filter down to something very small immediately and you've only got 4GB of free RAM on your PC -- you're going to have a bad time.

2) R is super ugly. Apparently nothing can be done about the times new roman 1.5 spaced text and grainy plots.

3) Anyone can write a package in R and upload it where it can be accessed and used.

I'm honestly just being a jerk about number 2. Since I mostly saw younger people using R, I assumed at first that the output I saw was some sort of retro hipster MacBook protest thing. "Only the data matter, who cares about the resolution of the pixels on the plot?" he said as he chained his single speed bike to the rack. (You can actually make it a really pretty output if you make it a priority to do so. I obviously can't do it, but I've seen experts who can do it)

Number 3 can be a huge advantage. R is true open source software. You can get a new R package loaded into your text edit, change it however you want (cite it if you publish with it of course!) and it's yours to run. This is only a disadvantage if you download some R package and run it when you don't understand how it works. If it sucks, your output also sucks.

I've been really excited about all the Shiny Apps recently as well. Shiny is a way for someone to take an R package and make a GUI (user interface) so the end user doesn't have to load R and know the right formats to use the programs.

MSStatsQC
ProteoSign and
GiaPronto

all appear to be Shiny apps (I'm not 100% sure, but they look like them) that I've used and really like (especially GiaPronto -- which I'm using the heck out of!)

There aren't yet Shiny Apps for everything and this leaves you with 2 very good reasons to spend some time using R

1) There is some amazing tools out there for proteomics already (too many to name, but the package R for proteomics is a great place to start!)
2) R provides you with what can only be described as super powers for digging through data. Things you've always wanted to do but seemed impossible are things you can get R to do just by writing a single word in the right place in a bracket. Got a thousand text files (.mgfs, maybe?) and you want to pull one data point out of each one of them and get the mean and standard deviation across all of them? One sentence. For real, I had to do something similar for a homework assignment recently.

Which bring us around to -- how do you learn this oddity? I signed up for this program a couple years ago:

I was feeling really confident about it through the first class, then I failed the one called "R programming" and decided to spend my time doing something else. Unfortunately, I've got something on my desktop I'd like to finish and R looks like the easiest way, so I buckled down and -- at the half-way point -- I still haven't failed out of it. The program is online -- stupid hard (at least for me) and I think you can pay by the class or by the month now. I paid a lump sum up front for a number of classes I could complete (maybe 15 or something? I forget, but I think that was a really good deal, maybe $30 a class?) anytime in a 3 year period. You can audit the class for free, but you can't take the quizzes/homework assignments.

There are loads of other ways to learn R -- there are great videos on Youtube, books with condescending titles like the one at the top of this blog post, and -- my favorite SWIRL

SWIRL is a game you play in R that teaches you how to use R by playing the game. It also provides you with excessive positive feedback when you get things right. Get an answer right (finally...ugh...) and you'll be rewarded by a statement like "you are truly exceptional!" and all the sudden you don't feel quite as dumb as you did before.

This is a lot of words for this point -- if you see an awesome new paper with tools in R -- don't dismiss it out of hand. If you're smart enough to run a mass spectrometer, I guarantee you can totally download R and use that tool! Just don't get down on yourself if a 500 page book suggest you'll do it by tomorrow.

Relevant (from AbstruseGoose):

News in Proteomics Research

Friday, November 24, 2017

Learning R for proteomics

No comments:

Post a Comment