Tuesday, January 3, 2017
Where does the billions of proteoforms idea come from?!?
A really interesting post from a much more serious blogger has been making the rounds on Twitter today. You can check it out here -- but it throws into question the possibility of billions of different proteoforms in humans.
I don't plan to argue this -- and, in fact, the references that are cited on the page he refers to do appear to be wrong. I've contacted some people inside that company (I know a few) to see if we can find better references. While looking for them -- I thought "wow...are you EVER going to finish your Favorite Papers of 2016 Post?" Might as well look at this thing!
As is often the case when I start digging through the amazing work you guys are out there doing -- I got distracted...
Assume for a minute that the blog post above didn't question things that ever person who's tried doing top-down proteomics knows are factual -- and that there are millions to billions of possible proteoforms. As top-down proteomics continues to grow -- how are we going to keep millions of things straight?
Qiang Kou et al., have a really interesting solution that is a step forward from some combinational bioinformatics graphs -- they call these mass graphs.
There is a ton of maths in here, but I think this figure (can I show this? email me: email@example.com if I can't -- I'll take it down! Oh..it is in the manual, shouldn't be a problem!)
How sharp is that? It summarizes the possibilities semi-linearly!
You can check the software out directly here! P.S. It is called TopMG
What was I talking about?
Oh....who first said that if the average protein had X possible length variants times Y post translational modifications that there might be billions of proteoforms?
Uh oh! Gonna need this (couldn't find a T.A.R.D.I.S.)
This is a paper we can blame for this idea! 2006!?!?
Check out this nugget!
Here in 2006 we have a hint of the number. Not considering alternative slicing events -- just PTMs, we might have millions of distinct proteins floating around. Quick -- those people at NorthWestern talk about all the proteoforms -- grab a quick paper from them and see who they blame.
Wow! Okay, I'll come back to 21 Tesla ETD top down. Cause that sounds great -- I love doing fragmentation at 1Hz...
I need to borrow some more equipment!
To get me back to 2009 -- to this nice open paper!
This tackles the topic head-on. Showing how even in canonical proteins like GAPDH, there are multiple proteoforms performing multiple functions.
But even this paper doesn't drop the --
Okay -- I'm gonna have to keep digging. The evidence is here -- and somebody totally was first to do the math and drop the B-bomb.
EDIT (1/5/17): Found it! Had to go to the Proteoform man, himself!
In this beautiful (and Open Access!) paper from 2012, we're going to run into this here figure.
...and this makes an awful lot more sense, right? Heck, I'm impressed to find out thee are 4,000 cell types. Now...it is fair to consider that each cell type isn't going to necessarily have a completely different protein profile...but it makes sense that we'd see different proteoforms in some of them. And..I'd like to point out one of my favorite papers from 2015.
59!! Proteoforms of Ovalbumin!!