Wednesday, May 14, 2014

What is the maximum theoretical coverage of a protein?


Recently, I worked with a couple of labs that use single protein digests and % coverage as a QC metric.  Lots of people do this.  This isn't my favorite QC, but as long as people are benchmarking their instruments with some sort of constant standard, I'm sure not going to stand in the way.  A question occurred to me when I saw very high % of peptide coverage:  how much can we actually see with a single enzyme digest and mass spectrometry?

1MKWVTFISLLLLFSSAYSRGVFRRDTHKSEIAHRFKDLGEEHFKGLVLIA
51FSQYLQQCPFDEHVKLVNELTEFAKTCVADESHAGCEKSLHTLFGDELCK
101VASLRETYGDMADCCEKQEPERNECFLSHKDDSPDLPKLKPDPNTLCDEF
151KADEKKFWGKYLYEIARRHPYFYAPELLYYANKYNGVFQECCQAEDKGAC
201LLPKIETMREKVLASSARQRLRCASIQKFGERALKAWSVARLSQKFPKAE
251FVEVTKLVTDLTKVHKECCHGDLLECADDRADLAKYICDNQDTISSKLKE
301CCDKPLLEKSHCIAEVEKDAIPENLPPLTADFAEDKDVCKNYQEAKDAFL
351GSFLYEYSRRHPEYAVSVLLRLAKEYEATLEECCAKDDPHACYSTVFDKL
401KHLVDEPQNLIKQNCDQFEKLGEYGFQNALIVRYTRKVPQVSTPTLVEVS
451RSLGKVGTRCCTKPESERMPCTEDYLSLILNRLCVLHEKTPVSEKVTKCC
501TESLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQT
551ALVELLKHKPKATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPKLVV
601STQTALA

Take this coverage map  for example.  This is the Mascot coverage output for one of these QC proteins.  Mascot says 79% coverage (what was found is in red).

Something that I've started to be very concerned about, due to the amount of intact and top-down analysis I've been doing, is the signal and pro- peptide sequences.  This protein is BSA, but the first 24 amino acids are not actually part of the true BSA sequence.  They are part of the translational process and are cleaved prior to BSA, so I don't think they should count.

Lets look at what is left:  If we assume 100% cleavage, we have:

DTHK
SEIAHR
FK
DLGEEHFK
VASLR
FWGK
IETMR
EK
VLASSAR
QR
LR
CASIQK
FGER
ALK
AWVSAR
LK
CCDK
PLLEK
NYQEAK
SLGK
AFDEK
HKPK


What are our requirements for settings for our instruments?  I, for one, almost never look at ions with a mass to charge of <400.  I also ignore anything with less than 2 charges, because they don't seqence in most cases.  Ignoring the fact that not all amino acids can/will accept protons, if I only use the requirment that my peptide has a mass >800 Da, only DLGEEHFK, makes the cut.  It also has two basic amino acids, so it should charge to at least +2.  If it charges to +3 or above, this would explain why we didn't see it, as it won't meet our >400 m/z cutoff as a +3.

So, if we actually consider our coverage of what is possible?  If we start with the FASTA BSA sequence of 608 a.a. and subtract our non-expressed region (24 a.a.) then we get 584 amino acids in the fully expressed protein.  There are 109 amino acids in the peptides I just deemed too short for my mass spec analysis.  584-109 = 475.  Lets assume that DLGEEHFK will charge +2, so it counts as one that we can see but didn't so (475-8)/475 = 98% achievable coverage of BSA in this example.

Real achievable coverage (RAC? is that in use?) is 475/608 = 68% of the FASTA sequence coverage.  I wonder if that is anywhere near consistent in natural proteins?

5 comments:

  1. If you will use X!tadem (the GPM) you will get both - the achieved coverage and the corrected one after removing/correcting for unlikely peptides (I think that in the case of the GPM that is based also on previous observation of this protein in the database)

    ReplyDelete
  2. A fantastic blog and great post! Thanks so much for sharing your views and thoughts, which the rest of us are "not able to think of". Just as a suggestion, could it be possible for you to share some of your thoughts on how you optimize a MS run (especially length and MS2 injection time) when you have a low complexity sample, say a IP or gel band from an IP? Would love to see some Rawmeat pictures and hear your thoughts about that! (PS: yes, I have read your posts on how to optimize the LC-MS gradient)

    ReplyDelete
    Replies
    1. Kristian,
      Sure! Let me give it at think and I'll take a swing at it. I did a talk up in Minneapolis a couple weeks ago and we really dug into this topic with the crowd there so its pretty fresh in my mind

      Delete
    2. Great! As our Elite was available this week, I got to do some testing of altering the gradient and MS2 injection times (ion trap) using a low complexity sample (band from IP). Doing this, I got a +23% increase in protein identifications using PD 1.4 (and yes, for a Ph.D. student that felt AWESOME!).

      Delete
  3. Congrats! A 23% coverage improvement is a big deal for anyone anytime!

    ReplyDelete