Thursday, May 29, 2014

Two human proteome maps! How do they compare?


We have two first drafts of the Human Proteome!  What did you expect me to do?  Lets compare what they did and what we end up getting out of it!

First of all, both these studies are awesome and big and give our field a load of credibility, but they are very different.

Instrumentation:  Both groups used Orbitraps, of course.  Pandey's lab exclusively used High/high data.  So their MS/MS spectra were high res accurate mass.  The Kuster lab used a mix of high/high and high/low data.  Due to the increased sensitivity and speed of the high/low experiments, we'd expect Kuster to end up with more MS/MS spectra, and they do -- by a long shot, but the overall quality of the data is probably a little better from the Pandey lab.  Pandey's lab generated all of the data on their own.  The Kuster study draws strongly on Orbitrap data that was previously generated.

Tissues analyzed:  
The Pandey lab evaluated 30 histologically separate protein samples
The Kuster lab evaluated:  60 tissue samples, 13 body fluids and 147 cell lines...holy cow....this was 6,380 runs.  I'm not joking.  This study redefines what we consider a HUGE proteomics study
In defense of the Pandey lab, the Telegraph reported that the entire project was pulled off for under $700,000.  That's pretty amazing, considering that they generated all of this data on their own!

Okay, so both of these studies kick ass.  They took tons of individual tissues and painstakingly detailed them via shotgun proteomics using the world's best instrumentation.  Next question?  What's in it for me?!?!?


The Pandey lab's data is available at the HumanProteomeMap.org

The site has a simple/handy interface:


You can search by genes or by preloaded pathways, you can compare different tissues and cell lines.  No instructions necessary.

The output is even more simple:

Perhaps...disappointingly simple.  For this example protein we see that it is expressed in two tissues.  Clicking on the gene identified doesn't help much:


We see that for this protein, the study identified one single peptide.  And that it was identified only in 2 tissues.  It was not identified in any other tissues, including the human pancreas.  This doesn't mean that it wasn't there (not having it almost always means cancer, by the way....) it just wasn't detected.

Lets try something easier.  What about HPRT1 (housekeeper gene strongly expressed in virtually all human tissues)
Okay, that's much better!  The protein is seen in every tissue here.

Lets test the same proteins on ProteomicsDB.org


Not as simple as the other interface, but there is a lot more that we can do here!

Searching for CDKN2A?


Wait a minute!  ProteomicsDB knows that CDKN2A has important isoforms?  We're looking at the data from a protein centric level.  Yes, its less clean, but there is so much more data here!  This makes me really happy.  The Human Proteome Map looks at Proteomics data like its analogous with genes, which is how we've always thought about it.  ProteomicsDB looks at proteins the way Neil Kelleher and Albert Heck look at proteins, in that isoforms and variants are seriously seriously important and we need to think about them, regardless of how much we don't want to.

What about expression profiles for this protein (I'm looking at isoform #1)?


Check out how much information is here!  They must have been working on this for years!  The expression tab is just one of 8 pages of information on this protein?  Unreal!  And the increased coverage here shows that we're seeing this protein isoform in tons of tissues (as we should...I won't show it here, but we're also seeing virtually every peptide for the protein).  This is a mind boggling amount of work and data.  Unreal...

I can spend an hour looking through information on just this one protein.  I'm not joking.  Check it out.  What if I said that you could directly examine the MS/MS spectra for every peptide identified?  Would you believe me?  Check it out.  It's there.  All of it at your fingertips.  This might be the most thorough resource tool ever developed for human proteomics.

There is no way I have time to tell you everything that you can do on this page.  Not without taking the day off from my real job.  But I want to leave you with this bit of awesomeness:


Chromosome maps.  Incredibly well curated proteomics data of every human chromosome.  Expandable to just a crazy level.  The amount of information here is unbelievable.  Have we really come this far?!?

Let me sum this up.  Both these studies obviously belong in Nature.  They represent enormous undertakings that not only provide new information for everyone (I haven't even gotten into all the protein data that we have that genetics thought was from regions of DNA that don't make protein!!!!  Which is a primary focus of Pandey's paper!).  These are super powerful new tools that really demonstrate where proteomics is right now and where it's going.

The Pandey lab did an amazing amazing job with the resources they had to work with.
The Kuster lab just changed the scale.  This may be the most thorough and sophisticated study anyone has ever done in our field and an enormous amount of effort has went into making all of this data available to everyone.  Unbelievable.


Update: 6/5/14:  For even more on what ProteomicsDB can do, check out part 2 here!

No comments:

Post a Comment