Saturday, April 18, 2015

Proteome Discoverer 2.0 Ultra deep search options

Alright!  I've been wanting to show you guys this for a while.  Here is the question:  In PD 2.0 we currently have access to a handful of different search engines.  What is the benefit of using multiple engines?

There have been several papers over the years where people inevitably compared searching the same dataset with Sequest + Mascot (or X!Tandem) and the crude summary I give people when they ask (from what I remember of these papers {go brain go!} is this:  adding a second engine gives you somewhere between 5% and 20% new peptide identifications.  Depends on the settings in that you use and mass tolerance of the instrument and database and whatever.

Now, on my personal desktop version of PD 2.0 I have access to the search engines above (I don't mean to exclude Mascot but my employer's IT department cited 30+ reasons why I can't access the Mascot server from my home PC. I didn't read them but I suspect they state things like: "our primary directive is to deter science at any cost and to personally annoy you." You know...the stuff every IT department says but actually doesn't say....man, I need another coffee

To keep it simple I say "adding one engine will probably give you 10% more IDs. Adding a third engine will give you maybe another 5%".  Again, these are based off of those studies with Sequest, Mascot, X!Tandem, OMSSA that I can't actually cite but they are in this big filing cabinet somewhere.  But here, here we have 3 very different search engines (not to say the others aren't different, but MSAmanda and Byonic are new stuff and so is the XMAn database.

The question?  Is it worth my time to set up a search that uses all 3 engines?  What is the most efficient way to do it.  And what is the net result?

Sample:  1ug HeLa digest. 2 hour gradient. Orbitrap Fusion operating in super speed mode (HCD ion trap MS/MS scans acquired while the Orbi gets the MS1s).  ~78,000 MS/MS scans.

Baseline processing:  SequestHT, UniprotHuman database, 10ppm MS1, 0.6 Da MS/MS tolerance, Percolator

Super processing method:  All 3 engines above. MSAmanda and SequestHT searched with UniprotHuman + XMAn database + cRAP; Byonic set with UniprotHuman database + all the modifications recommended from the same file searched with the Preview node.

Baseline processing results:  27,535 peptide groups; 5026 proteins

Super processing method results: 33,530 peptide groups; 5444 proteins.

6k new peptides!!!  Thats 21%  w00t!  The punchline?  I think I can dig deeper.  Honestly.  I short-changed Byonic pretty bad.  It only got to search Uniprot AND only the modifications that Preview recommended from its quick scan of the highest intensity peptides.  If I wanted to do this right, I would (will! this is cool!) allow Byonic to use the same databases, search for more PTMs & open up the wildcard search capabilities.

I'll queue up some more stuff and then go find something outside to climb.

No comments:

Post a Comment