Monday, July 3, 2017

Reprocessing one of the Bekker-Jensen et al., datasets!



Ummm....so....yesterday I was pretty psyched about the Bekker-Jensen et al., paper in Cell Systems. I pulled all the files from one cell line that I have a LOT of data on. Some I've fractionated and ran myself.

And I've got nothing that has given me 75% of the PSM/peptide/protein IDs that this does.

33 min nanoLC runs?!? No way I'm going to see over 12,000 unique human proteins at high confidence...No way....




I rechecked my setting to make sure I didn't do anything stupid and reran a fractionated dataset from the same cell line from a similar instrument to verify. Nope. This method is legit. I've just been doing nanoLC the same way for today's super fast instruments that I was doing it for yesterday's not-as -super-fast instruments and inadvertently handicapping the potential they have for 2D fractionated samples!

Important disclaimer in my numbers above -- I am using a supplemented FASTA that may be inflating my numbers to some degree (that I don't want to talk about quite yet). It is hard to tell the exact inflation effects without another full processing run with a smaller database (going right now), but from my results I am convinced what these authors report is real.

If you are skeptical -- you should be, this is a serious paradigm shift! (33 min nLC runs?!?!? what!?!?) -- you should get some files from ProteomeXchange/PRIDE here and check them out for yourself!

Original paper I'm ranting about is here!

Edit 7/4/17: I reran this dataset with MSAmanda 2.0 using just Uniprot/Swissprot parsed on "sapiens".  Nearly 11,000 unique proteins ID'ed.

No comments:

Post a Comment