Friday, November 17, 2017

What is the best digestion method for ultralow levels of protein?

I don't have to tell anyone doing proteomics that the demands on us are getting harder as time goes by. We're seeing a lot less "how many proteins can you find in this unlimited amount of cell culture I have here" and more -- "what PTM changes the most in the membrane prep of these 4,000 cells my grad student spent 11 years isolating from the earlobes of these 2 gnat species."

This new study takes aim the latter problems -- how do different sample prep techniques compare when we've got incredibly low amounts of sample to work with?

The comparison is my favorite prep method, FASP vs. solid phase single pot enhanced sample prep (SP3) vs digestion in a stage tip (iST).

Full disclaimer here -- I've never personally tried the SP3 or iST -- I'm not even sure if I've heard of the latter.

SP3 was originally described here, in case you're interested (they validated it by digesting a single drosophila embryo. I'm no expert on the subject, but that sounds like it might be comparable in size to 4,000 gnat earlobes)

iST was introduced in this Nature Methods paper, and now I know what they're talking about. We've seen this in more than a few papers over the years. I think the abbreviation was throwing me off.

These authors start with low amounts of protein and/or cell counts for all 3 methods. I'm talking low --their high point is 20ug of starting material!

They compare the number of peptides identified after using each digestion methodology and, perhaps more importantly (at least to me) the reproducibility of the peptides ID'ed.

Most of this is given away in the abstract so I don't feel bad telling you about it here.

At "high" load (20ug of protein is high load these days, LOL!) FASP is right there with the other 2 methods. Both in peptides ID'ed and it looks like it has the best CV -- though iST is right there with it.

However -- as you drop down in starting amount, the effectiveness of FASP drops faster than the resale value of a BMW i8 after Tesla's new roadster was revealed. (Maybe not that fast)

SP3 looks good in terms of peptides ID'ed, but the CV gets wonky as the load drops. The clear winner in both categories is the iST methodology.

The authors go on to validate this by flow sorting some sells (!!! awesome !!!) to just a few thousand and reveal that the sorting still left a heterogeneous mixture of whatever cells there are.

I want to give a big shoutout to Dr. M for sending me this ultracool paper. It is quite seriously my favorite thing I read this week.

Oh, and all the RAW data has been uploaded to ProteomeXchange here via PRIDE (PXD006760).

Wednesday, November 15, 2017

Spectral libraries for contaminants!

Matrix Science has a really great and super professional blog. They don't produce as much content as this one does, but the content is significantly better.  You can check it out here.

Last month's post has a really great idea and a walkthrough on how to set it up.  Instead of searching all our data with a contaminants FASTA, what if we had a spectral library of those nasty things. John will describe better why this is a good idea and when it is an awesome idea in his post than I will. to set this up if no one will let you use their Mascot server....

Tuesday, November 14, 2017

Survival proteomes in bacteria!

This is open access and I've retweeted it twice so I'll remember to read it -- and that didn't work, so I'm leaving it here now (disclaimer -- I still haven't done more than gloss through it)

However -- we HAVE to do something about this antimicrobial resistance stuff and bacteria are simple organisms in comparison to poplar trees and 60,000 gallons of salt water and the other things some proteomics researchers out there are working on. We should totally be the ones solving all these problems with the "simple" organisms!

If Coon lab is completing simple eukaryote proteomes in an hour and Olsen lab is completing comprehensive human proteomes in 32 hours, a consistent focused effort on annoyances like C.difficile that have picked up resistance to up to 4 extremely important compounds that ought to kill it should be no big deal for us to puzzle out, right?

If you're writing a grant right now that has anything to do with bacteria and proteomics and you don't drop the term "survival proteomes" into the title or abstract right now, it's certainly not my fault now.

Monday, November 13, 2017

Great perspective paper on WikiPathways!

At some point a new column called "WikiPathways" appeared in my annotation columns in Proteome Discoverer -- and I've found the column pretty useful in helping me make sense of these filtered down differential protein lists.

As of yet, I haven't seen this cool resource make its way to Compound Discoverer, but I'm gonna guess I'll see it soon.

For a great perspective on what this is -- and why we should care about it -- check out this short and interesting paper in Nucleic Acids Research here.

In the end any if any of these big data tools like gene ontology, annotated pathway analysis, yeast 2 hybrid (...I'm stretching it a little bit here...) are going to help me get to the question from a global dataset I'm probably going to try it.  And if it can integrate genetic observations with the ones our instruments are making -- even better!

Friday, November 10, 2017

How to do dependent peptide search using Proteome Discoverer!

Sometimes we have to stand on the shoulders of giants. This is more polite than just up and stating that your are going to steal someone's super cool idea and show people how you can do something similar without using their uber powerful software.

I recently discovered a MaxQuant feature called "dependent peptide search." It has been in place for years, but I've been motivated for a number of years to spend my time on another software package.

Dependent search goes kinda like this (not exact, but you didn't come to a blog for exactness):

There are modified peptides present in your RAW file.
If you looked for them all (with traditional search engines) it would take FOREVER.
However, if there are modified peptides from a protein there -- there are definitely unmodified peptides from that protein -- and they're almost always easier to detect.

SO -- it's time to reduce some variables.

MaxQuant is fancy enough to do all this with some button presses. You, my friend, paid for your software (unless you are using IMP-PD) so you have to do a bit more work

Step 1: Process your RAW files with your FASTA database. Go easy on the modifications. (Your cysteine alkylation, oxidation of Met, protein N-terminal acetylation maybe)

Open your processed report. Right click anywhere on it and check ALL THE THINGS!

You'll notice I have a Contaminant flagged. I'm gonna leave it in there. I'm not getting paid for this. Actually, it will just be a redundant entry in the later steps and won't matter.

Now that everything is checkmarked -- File > Export > To FASTA.  Then you'll discover you don't actually have to do the right click checkmark thing.

Now you have a FASTA that is only made up of the proteins that you actually discovered. If you are using a big database this could be a massive search space reduction. You'll notice my Filters are open. I'm running some stuff as I'm writing this to see what filters are the most effective. First run, I filtered down to just the "Master" proteins and things with >1 unique peptide ID. If you're going to find a phosphopeptide you're sure as heck gonna find at least 2 peptides from that protein first -- right?

Now you can input this new FASTA database and go crazy.

Check the phosphoSTY, add all the acetylations, throw in some GlyGly. If you've got Byonic or Mascot you can get closer to dependent peptide search by actually doing a deltaM or wildcard search.

If you're concerned about FDR considerations -- you definitely should be. That's why I don't have data from this to look at. Your lowering your database size and potentially forcing the search engine to make some matches that might not be the best ones artificially.

I'm dealing with it (for now) by allowing the Peptide FDR in the consensus to work things out. If I take my first run data and my new stuff that just processed with the 10 PTMs I care about and combine it into a new (Multi) Consensus report

I have a lot of settings (hit "Advanced") under the peptide validator that I can toy around with:

And I think that optimization of these is the trick to getting the best data out of this. You can always go back to deltaM or search engine PSM score (and manual validation of the ugly ones) if you need to.

Thursday, November 9, 2017

GiaPronto -- One-click quantification visualizations!

I don't know if I've ever sat down and mastered such a powerful data analysis tool in 4 times the time it took to get beautiful plots out of GiaPronto.  I seriously almost didn't write anything on the blog about it, because I was afraid more people using it would mean it wouldn't run so lightning quick for me.

I've been impressed with the ingenuity of GIA before. But now I have all these tools in a ridiculously easy web-interface?

Welcome to GiaPronto!

You pull your quantitative protein list you want to analyze (it only supports pairwise comparisons, but it's so fast, just do a bunch of them!) put them in the format it wants (I just exported my PD 2.2 result reports as Excel and deleted anything I didn't need, save it as a tab delimited .txt file input it into the interface and hit the Go button.  If you're using MaxQuant it's even easier. You don't have to change the titles of your Rows (columns?) to say iBAQ. They might already include this program-critical term!

It normalizes your data (as shown above) it makes PCA and volcano plots for you, it pulls out your list of significantly differential proteins which you can just export as full tables and ---

it does really powerful and intuitive biomarker analysis! "Hey, what proteins are my important biomarkers?" Don't change anything. Just hit the Go button and tab over!

It also does GO (Gene Ontology), but I don't have my data formatted correctly for that I think. AND I haven't even tried the PTM analysis functionality.

You can read more about it in this new paper at MCP here.

Wednesday, November 8, 2017

Peptide Prediction with Abundance (PPA)

Some peptides are invisible to mass spectrometry. One of my favorite pathways is a phosphorylation cascade where the active sites are something like KXS(p)XK  -- if you try to study this pathway and are smart enough to use trypsin, you'll enrich a lot of stuff that is singly charged and/or too small to ever identify.

X!Tandem at the GPM has a really neat function where instead of getting percent coverage of your total protein, you can choose to get percent coverage of the protein for what should be visible in your protein (singly charged and super small peptides don't count against you).  It's cool to run it at least once to see that you really have been getting 100% coverage of BSA in every sample you've ran -- for years and years.

Check this out, though!

Please keep in mind that this tool, PPA, is far more powerful than what I'm about to use it for and I'm going to return to use it's more advanced functionality in the future, for sure. However, it just did a really neat trick and that's why I'm talking about it here.

PPA is a fancy machine learning algorithm that can figure out how likely your peptide, including modified versions of your peptides are, to show up in your MS/MS analysis. The authors validate it with some really complex datasets using files from several instruments. You can load in your theoretical databases and your experimental and that's all the advanced stuff.

You can use PPA on the online portal here or you can download it to run it locally if you're good at PERL.

The neat trick that I'm very impressed by is that you can just give it the FASTA file and it will predict the likelihood of each individual peptide of being detected by MS/MS using 15 known properties of peptides in general and give you a likelihood of detection for that peptide on a scale of 0 to 1 (with 1 being very good).

And...this could be small sample size...but I've got some data I've been trying to help troubleshoot on my desktop in my off hours. The problem has been the decrease in total % protein coverage of the protein of interest as the experiment has progressed...and PPA is surprisingly predictive of the peptides that are still around late in the experiment. The authors of this software have more lofty goals for this algorithm, but seeing it do something simple that matches experimental really well lends it a ton of credibility in my mind.

Tuesday, November 7, 2017

JS-MS - A new 3D MS viewer for all platforms!

My brain doesn't process 3 dimensional data very well. I'm ALWAYS gonna go for a game like --

-- over any of the fancy new 3D stuff!

I've also never been able to understand 3D mass spec data interpretation software and it mostly just makes me dizzy.  However, I know super smart people that can and do use these, and JS-MS appears to be a nice new tool for everyone!

It has these advantages:
1) Vendor neutral (it looks like it takes any mZmL converted data)
2) It runs in Java (which you probably already have)
3) It is heavily optimized for speed and usability
4) It's free! (You can get it here).

The authors focus on comparing it to TOPPView, which is a really nice program that operates within the OpenMS framework.  JS-MS doesn't require the installation of a full program like Open-MS, but it does appear to require a configured Maven environment. If you're already using OpenMS, you'll probably want to stick with your TOPPView. Heck, if you've got an LC-MS background, you might still be better off. Open-MS is built for you and you'll be able to figure it out.

However -- If you're a bioinformatics person or a programmer in general, you may already have Maven set up and you'd be better off going the JS-MS route. It's great to have more options as our field continues to expand!

The comparison I was really curious about was how this how JS-MS would compare to BatMass, which seems to have a lot of similar functionality, but I've only seen it utilize Thermo data and I'm too lazy to look up what files it is capable of accepting.

BatMass has the additional advantage of being the coolest looking icon and startup screen on your desktop -- maybe even your personal one, even if it's full of old 2D arcade games!

Monday, November 6, 2017

Cell-type specific metabolic labeling!!!

Did you know this was possible?!?!

Rather than metabolically label every protein in a cell like we normally do with SILAC or similar reagents, this group utilized cell-type specific promoters to only incorporate labels according to where they wanted them! 

In this particular study they use this awesome technology to show what proteins change when mice are bored or when they have fun things to play with. The list is surprisingly large, but it is the promise of this technology that I find really exciting!

If you can label and enrich things as small and complex as different areas in mice brains -- accurately -- and you have cell-type specific promoters that are truly cell-type specific -- you can do loads of awesome stuff with this!  Maybe settle some of these tumor vs. stromal cell/protein identifications once and for all, for example....

Downstream analysis was all done with a quadrupole Orbitrap (Plus) system and an LTQ Orbitrap Elite. Data analysis all appears to be MaxQuant/Perseus.

Thursday, November 2, 2017

The immuno-peptidome of ovarian cancer!

Are you interested in MHCs?

If you aren't, I bet there is some guy in your department who won't stop talking about them if you get on the topic!

If you can identify the antigens that are present on the surface of cells that you don't want in a system with a functioning immune system (for example cancer cells, intracellular parasites, those sorts of things) then you can utilize the immune system to go ahead and get rid of those cells for you. It's totally probably not really that simple.

Flip to the method section in this brand new paper and you'll be filled with confidence that you, too, can identify the essential peptides that will make great new immunotherapeutic targets!

In all seriousness, they do make this sound quite straight-forward and present some very positive findings in the tumors that they analyze here. I don't know enough about the biology to contribute anything meaningful, but the authors seem excited about it. What I do know is that if I wanted to start looking for immunogenic antigen presenting peptides (or whatever they're called -- come on, my job would just be to identify a bunch of them, right?) I'd start by following the very clear method section in this paper!

Wednesday, November 1, 2017

Quick tutorial -- how to estimate your digestion efficiency in PD 2.x

I got an email from a reader who is having some trouble assessing digestion efficiency with the awesome free Preview node from Protein Metrics. While they get that sorted out, I suggested doing it the way we did before that team gave us all free software that would do it for us. Then I realized that it is even easier in the new Proteome Discoverererers than it was in the past. 

Quick, lazy tutorial time!!

You need a couple of things.
A representative FASTA of the most abundant proteins in your samples (you probably have to switch this if you run lots of different organisms)
RAW files
PD 2.x

Firstly, you need to process one or more of your representative RAW files in Proteome Discoverer and get a result report.

I just grabbed a quick file. This is some HeLa digest I recently ran on a CE-QEHF (ZipChip 8 minute runs), processed vs. UniProt/SwissProt and sorted by highest number of PTMs

Alternatively (and probably more validly [wait..."validly" is a word?!?!]) you should go with the intensity or something if you've got it from LFQ or whatever.

Next you'll want to "Check" these highest hits. I just grabbed everything on the front page of the 46 inch TV someone was throwing away(!!) that I now use as a PC monitor. As long as you select more than 25 proteins, you'll be fine. You can draw a box around all of them and then right click "Check selected", or you can checkmark each one of them.

Then make a new .FASTA from these proteins. File>Export>To FASTA> Checked Proteins Only

BOOM! Tiny FASTA file.

Now you can import that FASTA and then use that to search the data that you're concerned about digestion efficiency.

With a database this small it doesn't matter if you allow your search engine to run with 10 missed cleavages. It still won't take very long

Once you have an output report find where you can plot your data. The icon looks different in PD 2.2 than in the other versions but it's at the top. Then toggle over to your histogram, choose PSMs #missed cleavages and hit Refresh (cut off in this screenshot)

Now you have a simple representative FASTA and a quick way to use the search engine of your choice to get a picture of your sample digestion efficiency.

Of course -- this is all assuming that the digestion efficiency will universally affect the proteins by abundance in the same way, but this is an assumption that seems reasonably safe to make. To be super thorough you could just run your whole FASTA with 10 missed cleavages, but this could take a really long time....

Thanks to Dr. A.H. for the informed questions and really interesting problem I haven't seen before (and still don't know how to solve) that led me around to putting this together.

Tuesday, October 31, 2017

Cool SIM-DIA technique for screening DNA adducts!

I had NO idea that you could even do this. However, this appears to be mostly my ignorance, because these authors have been developing techniques like these for years! 

Before I go further, this is the new paper at JPR. 

There are a ton of classical genetics techniques for quantifying DNA adducts (essentially DNA that is messed up or altered in some way). Unless a lot of new stuff has popped up since I left JHU, they have:
Low sensitivity
High false negatives/false positives
Detect adducts that Maybe I'm exaggerating, but I'm not sure that I am. The techniques my department used were fluorescence based and/or gel migration based (sure -- the gels were like 0.5M long, but still...gels...)  

I think this paper is an invitation for DNA damage researchers to join us in the 21st century! Check this out:

Yeah -- this method is no joke. I mentioned above that the Turesky lab has been developing mass spec based DNA adduct detection/quantification methods for years, it appears, so they know what they are looking for in terms of mass shifts. However, a classical dd-MS2 method isn't going to cut it here. Honestly, this figure doesn't do this study justice.

At first, I thought this was the first Wi-SIM-DIA paper I'd come across (this is a method on a Tribrid mass spec where the Orbitrap does wide SIM scans while the ion trap does simultaneous small DIA scans. However, to get the level of precision this group needs to detect and identify these adducts, they do all steps in the Orbitrap. This requires a lot of timing because they are eluting digested nucleotides off nano-LC columns and all these Orbitrap scans take time.

Looking at it this way, I immediately wonder if someone could pull this off on a simpler instrument if they had complete hardware control, but then you realize they throw in MS3 as well. Quantification and confirmation of these nasty DNA modification all in one go!

Monday, October 30, 2017

New Stuff on the Thermo Omics Portal!

If you've visited the Thermo-Omics Portal recently you'll notice it's been under renovations and looks a whole lot spiffier. You might also notice that they've announced the 2017 User meeting in Bremen!

The navigation on-site has changed and I've gotten a couple questions about how to find things. For example, if you are looking for the PD 2.2 demo you'll want to follow the horizontal lines, then sideways arrows. This is ugly, but this is what I mean...

If you click on the horizontal lines by "Navigate products" the big red bar marked 2 shows up. Then you expand that product by clicking on the arrow pointing right.  It makes sense once you do it once or twice, but if you click on the rotating blocks on the homescreen with the product names as you did before, you won't get to where you or your collaborators can download the demo and Viewer software versions.

Saturday, October 28, 2017

Where do you get all those values for proteomics ruler calculations?

The proteomic ruler approach has appeared a couple of times on this blog.  While incredibly cool, when you start trying to do it you may run into this issue -- "where the heck do I get these total intensity values"? The authors are using MaxQuant, but maybe you're using Proteome Discoverer? And you don't want to go back to the RAW file and start taking averages, right?

BIG SHOUTOUT TO DR. CLELAND at the Smithsonian! We were having lunch and he told me exactly where you get the numbers you need!

If you are using PD 2.0 or newer (you should be using 2.2...) there is a post-processing node called Result Statistics.

I always throw it into my workflows -- cause it doesn't seem to add any time to the data processing and it adds a tab of data, but I've honestly never used it for anything.  AND EVERYTHING YOU NEED IS HERE!

For example, if you want to normalize your data against the total area or intensity of every feature identified in the entire dataset?

It is on row 612!  There is so much information here, including the stuff you'll need to start calculating your absolute protein abundances.