Saturday, October 19, 2013

New UCSD paper shows novel way to think about proteogenomics!



This paper, currently in press at MCP, may get my vote for bioinformatics paper of 2013.  The study comes from Natalie Castellana et al., out of UCSD.

Let me frame it, first, as I see it.  We have an organism that lacks a fully complete and annotated database.  What we do have, however, is a ton of high quality next-gen sequencing data and a few million MS/MS spectra from shotgun proteomics on this organism.  Can we possibly put the sequencing and MS/MS spectra together without having the complete sequence?

It turns out the answer is yes.  Yes we can.  In this impressive study, the team took next gen sequencing data and MS/MS spectra from corn (Zea mays) and lumped the two together, using a really impressive logical progression.  I don't want to ruin the story for you, but what if we stopped thinking that unique MS/MS spectra were the coolest part of our data?  What if we, instead, took a probability based approach and considered the repeat occurrence of spectra to be an indication of the strength that observation is true?  Obviously, we're doing some de novo type sequencing here, and considering that every peptide spectral match has a degree of uncertainty to it (and de novo even more so!) the fact that we've made that identification more than once can actually be considered a very complex functional measurement of the level of certainty of that measurement.

I'm going to stop here.  I lied.  I do want to ruin the story for you, but I am not doing the story justice.

If you are working with an organism that is not fully sequenced, or you want to but the lack of sequencing is stopping you, definitely check out this paper.  "An Automated Proteogenomic Method Utilizes Mass Spectrometry to Reveal Novel Genes in Zea mays" is available in pre-release version at MCP here.

No comments:

Post a Comment