Tuesday, September 30, 2014
Sure we have our choice of a bunch of stuff out there, but how many can we use that will actually catch our native interactions? This one will! (New paper in press at MCP and you can get it, for now, here!)
Now...it isn't without caveats. The analysis here is MS3 based, so you probably can't do this with a QE (though....in source CID? Maybe...tricky....and would require the QE API and custom software..the more I think about it the more possible it seems, though this confidence may come from espresso!)
If you are interested in crosslinking though, you should absolutely check this out. Its really smart and can be applied to all sorts of systems. Check out how the MS side of things works:
This is similar to the crosslinkers lots of people use that come from UVic. The difference is that our crosslinked peptides are revealed by specific MS2 shifts. If you can recognize those delta masses, then you grab those for MS3. MS3 then reveals your peptide IDs. In this way you can selectively enrich for your crosslinked groups.
Monday, September 29, 2014
I just received this image from a reader along with the subject line I used for this blog post (thanks Chris!). He said that he saw a low convectron gauge warning and no spectra (you'd think some intact glycos from chitin would come through at least...)
Twice in about a week, two insects attempt to get into mass spectrometers? I'm not usually paranoid, but I smell a conspiracy....
Friday, September 26, 2014
Okay, this is pretty cool.
As proteomics and metabolomics have gained speed over the years, people from all walks of life have been sucked into the field. In a lot of places, good Analytical Chemists are now running the proteomics facilities. I love visiting labs like this because you always know that there are going to be good quality control standards. The weakness for these labs is in the biological side. Going the complete opposite direction, my biology brethren who have been pulled, perhaps unwillingly, into running instrumentation have a great idea of what to do after the instrument spits out a protein list, but we are probably the worst group at running controls and standards (sorry, but its true).
Uniformly, though, we all suck at statistics. Again, I'm just being honest. The best labs I know of have all three. Chemists keeping everything under controls, biologists doing validation and a bioinformatician in a cubicle making sure that the list in the Supplemental data is statistically valid.
If you're a smaller facility of just biologists or chemists sometimes its really hard to make sense of your data in a robust way. AccuraScience hopes to fill that gap. This is a group of mercenaries (by that I mean "for hire", no implications here, it just makes a better title) bioinformaticians somewhere out in Iowa who can fill that last step for you. My guess is that it is probably a whole lot cheaper than hiring your own bioinformatics guy or gal (if you can find one...so in demand these days!!)
You can check out AccuraScience here.
This is a request for you guys!
I'm trying to keep a running list of all the proteomics conferences and meetings in the world. This is partially to be useful and partially for selfish reasons.
1) (Useful) This is useful for everybody in the field. Google "Proteomics conferences world-wide" and get a list of the meetings out there. (Google loves me and this blog)
2) (Selfish) I get a decent amount of vacation time for an American. And I have a lot of airline miles since I'm on a plane virtually every week. If I know about your conference, maybe I can go. I have a great time traveling alone, but sometimes its fun to travel for a reason. Yes, I'm saying I would possibly take vacation time and fly to your state/province or country to go to your meeting on my dime. I'm still super bummed out I missed a cool conference in Vienna in August for some training I had to do.
But for any of this to happen I need to know about your meeting. Big or small. Leave a comment in the section below or email me: firstname.lastname@example.org and I'll add it to the list.
Thursday, September 25, 2014
Okay, so do you see a theme developing? Everybody is working on co-eluting peptides right now. Or...maybe I'm paying a lot of attention to it for secret reasons all my own.
I am very excited about the potential of MixGF and the DeMix algorithms I discussed earlier. Realistically, however, I get about 2-3 hours/week, if I'm lucky, to work on my own research these days. This typically amounts to 1 or 2 phone calls with collaborators where we divvy up the "to-do" list, I load up a bunch of runs on my new Proteome Destroyer and I remotely send spreadsheets from whatever hotel I'm staying in that week. (Surprisingly, this is somehow productive. This is looking like it might be the most prolific year of my career...though that really isn't saying much....) Realistically, there really isn't much time for me to learn new workflows right now.
That is why this is so exciting for me. I already have MSAmanda. I already know how to run MSAmanda. I've done tons and tons of benchmarking with MSAmanda and I know that I can trust the data that comes out of it (no offense to the other new algorithms! both workflows published well and come from groups I respect a lot, but I have hard data from this one that i generated myself)
Viktoria presented this poster this week at APRS and the results look great. It is still in its early stages, but there may be a nice, easy solution for chimeric spectra just around the corner for those of us that are addicted to a certain commercial proteomics package.
And the results look pretty great. This is an extraction of 0.1ug of HeLa digest from QE data using a 3Da isolation window.
Even with a relatively small window, the second search leads to a big boost in PSMs.
Wednesday, September 24, 2014
Every once in a while we get an MS/MS spectra that matches more than one peptide in our database. Maybe the two sequences have exactly the same amino acids, but the order is different, or maybe two combinations of different amino acids will give you extremely similar masses. With high resolution / accurate mass measurements at the MS1 level this is a whole lot more rare than with lower resolution devices, but it can happen.
MixGF is a probability algorithm from some guys at the proteomics powerhouse UCSD. Description of the algorithm is currently in press at MCP and you can read it here.
In the introduction to this paper, the authors introduced me to a very scary statistic (and the references to back it up). 50% of "isolated" peptides for MS/MS are (at the very least) coisolated. Sure, I knew the number was high, but 50% of what we're looking at that are assuming is a single peptide is 2 or more with signal high enough to be statistically relevant to our scoring? The more I think about it the more it makes sense, but I do find it scary.
MixGF is a program that attempts to make this seeming disadvantage and advantage for us. What if we could identify the co-isolated peptides?
A similar analysis program, with a similar name was recently released. When two or more top end labs work on tackling the same problem, you know its a big one. I don't have time to really dig through the math from the two to tell you the differences, but it sure puts the perspective on this issue. If we ignore it, we're missing a lot of information. If we tackle it head-on, however, there may be lots to gain!
Fortunately, there is light at the end of the tunnel here. This study finds a much lower level of statistically relevant coisolated peptides than the studies they site and estimate around 30% for the digests that they analyze. (This would match surprisingly well with the running averages I have for most Proteome Discoverer analysis...simply by averaging the "coisolation" column on big datasets).
And they also show that the use of MixGF leads to a great big boost in peptide ID numbers! Great study. Let's all get thinking about these things (and if you want to write a free node for Proteome Discoverer I'll do what I can to help you out!)
Monday, September 22, 2014
EDIT: Just to clarify, I put this title in to be funny. When I saw the relatively huge number of hits this entry has received in the past few days, I figured it would be a good idea to throw in this edit. LC-MS/MS proteomics is obviously, not going anywhere. It is worth thinking about, however, that this isn't the only way we can do proteomics. Protein arrays are gaining speed and improving in quality every day and this method for amplifying DNA tagged proteins is very very interesting for proteins that we can tag.
Genomics techniques are more sensitive than proteomics techniques because DNA can be amplified. PCR didn't get the Nobel for nothing.
For years people have been working on "something like PCR" for proteins. Because that is really what we need to get down to the bottom of the dynamic range pit. But proteins don't like to amplify.
So what if you just tag the protein with a DNA barcode? You make the protein have a little specific DNA tag, then you use all the amazingly sensitive and specific tools the genomics people have been using forever to 1) concentrate the proteins with matching tags and 2) amplify that tag. You'll know what protein is present and how much you started with!
The authors of this terrifying paper demonstrate that this approach works on both large and small studies and describe the benefits and limitations of their techniques. One of the limitations? It would be tough to DNA barcode every human protein in a living person. For now this technology is limited to small well-known experiments. So maybe LC-MS based proteomics will survive, but this approach (if its valid...getting too skeptical about what I read....) sure is an interesting new tool to put in our belts!
By the way, I'm really sleepy (long day of traveling thanks to high winds on the coast) and I'm just being alarmist. You can check this paper out yourself here.
Friday, September 19, 2014
I LOVE this paper.
I've been trying to write this blog post for 2 days but hadn't been able to find a second to put it down around 12 hours/day I put into my real job. As it is, this is still going to be really short, cause I gotta hit the road!
Measuring peptide concentrations is tough. Have you looked at the CVs you get from peptide quan when using a nanodrop?
In most cases, its the best that we have and it sucks. Most people resort to tricks like doing the protein concentrations and then assuming that they will always see the same loss in the digestion for every sample. My old method was to oversaturate my desalting tips that had a maximum capacity of 5 ug of peptides and assume that the error bar on that maximum capacity wasn't through the roof. Both techniques are probably valid, but wouldn't it be awesome to know for sure?
This is where the Proteomics Ruler comes in. A trick that genomics people use is cell counts based on DNA concentration. The amount of DNA/cell is reasonably constant. You have X number of chromosomes per cell and X percentage of these cells are dividing at any point in time.
The Proteomics Ruler takes this concept one step further -- histones are proteins that associate with DNA -- at a basically normal concentration....so, if you know your the concentration of your histone peptides, then you should be able to derive how much DNA is there AND how many cell your peptides came from. Brilliant, right?!?!?! And now you basically have an internal control on every cell line.
Here is a screenshot I quickly grabbed from the paper!
As you can imagine, they validate the heck out of this approach. They compare it to one of their complex SILAC spike in experiments and find that once they normalize everything out you can get quan nearly as good with label free as with SILAC.
This paper is currently in press at MCP and open access here. It is one that everyone should read and that I should read more in depth.
Wednesday, September 17, 2014
I get this question all the time: "Can I do targeted quan on my LTQ Orbitrap?"
Answer? Of course you can. If you know what you're doing there isn't much you can't do with an LTQ Orbitrap. There is a major limitation to the targeted stuff, though.
You can have maximum sensitivity and selectivity OR you can look at a ton of targets at once. You can't do both. Tara Schroeder said "you can have breadth OR depth," which would be catchy, but I have trouble saying it out loud. Seriously! Try saying it 3 times fast.
If I want sensitivity on the LTQ Orbitraps the LTQ is the way to go. It is simply more sensitive. The ions are coming in the front and get there first. It also uses detectors where the ions make physical contact. The LTQ is also faster than the Orbitrap. This is less of an issue with the speedy Orbitrap on an Elite, but is very very obvious on a Discovery, XL or Velos instrument.
We can also get a different kind of selectivity. We can do SRM/MRM and then do our quan on the fragment ions!
First select a generic MS experiment (don't do data dependent): Also, click to expand pics.
For scan event one you get two choices.
Full scan for targets that vary a lot by mass.
SIM scan for more similar targets and sensitivity. You can also have multiple SIM ranges.
For the MS/MS you have a couple of options. If you check that little box at the bottom "Use MS/MS Mass List, this is not a data dependent experiment. This becomes akin to fragmentation on a QQQ instrument. We're gonna fragment everything within the ranges specified. Going to the Mass Lists tab at the top of the screen will give you a new option under this tab and a new function for the table.
Its called (+) MS/MS l. Why? Who knows.
The important part is that you get to put targets into this table at the bottom AND you can time them. Here I'm targeting two peptides with 2 different collision energies. Since I have 1 event set up and I only have 1 event timed at a time, this is perfect. The instrument will do my MS1 scan followed by a fragmentation of anything in the range of my parent mass, then go back to MS1. If we need to look for more than 1 target at a time, we'll get MS1 followed by MS2 of target 1 then MS2 of target 2.
Small target lists are best here. We shouldn't look at more than 10 targets at a time under most circumstances. In the tune file you can start by matching your fill times to your scan times (say 100ms for earlier instruments, more like 80 for the LTQ Velos systems. If you need more sensitivity, crank up that fill time. 200ms gives you twice as good a chance of getting good MS/MS fragments than 100ms. The best part is that you don't use the extra fill time unless you need it. For target values you'd be fine anywhere from 5e3 through 1e5, but I'd probably start at 5e4.
Does that cover it? Again, there are more ways. We have a huge amount of flexibility with an LTQ Orbitrap, but this ought to get you started on one way of using it.
Tuesday, September 16, 2014
It took me a second to realize what I was looking at... I confirmed that the cylindrical thing on the far right is an ion transfer tube.
Thanks to Karl for sending me this!
Monday, September 15, 2014
Soooo...about a year ago someone asked me if I could help them do NeuCode intact protein quan. And I said...yeah...maybe if you gave me a year or two.
You remember NeuCode, right? Like SILAC but with tiny shifts in the mass labels. How tiny? How bout you can't see them until you get to 100k resolution, and some can't be resolved until you nearly reach a half million resolution. More resolution reveals more and more channels and allows crazy levels of mutliplexing.
It makes sense to apply tiny shifts like this to proteins, cause you'd have tons and tons of tiny shifts and you wouldn't need so much resolution to separate them? There is a big problem though. You wouldn't know the shift unless you knew the protein. If you had 100 labels you would be looking for a different "heavy" than if you had the ability to integrate 110 "heavy".
The processing would be an absolute nightmare. Honestly, I had no idea where to start. This, by itself, could be a damned good Ph.D. project and really, could honestly only come out of one of the very best protein bioinformatics labs.
Wisconsin got there first. And this is every bit as complicated and elegant as I figured it would be.
You can read about this in this month's MCP (link to abstract here.) For those of you (like me) without an MCP subscription who don't want to wait until your library sends it to you (like me), you can read all about the technique on Lloyd Smth's website here.
I'm going to brazenly steal a couple images, but it is explained there better than I can, I just want to impress upon you how freaking smart this is.
So we label and do our intact analysis, but part of our sample goes to RNA-Seq? What for? Well, for one, they only want to search for proteoforms that are actually present. And they get those from the variant call file? What else do they get from it?
I really need to sit down and dig through this paper when the library gets it to me, but damn...it seems really freaking smart. Definitely check it out!
Sunday, September 14, 2014
How did I miss putting this post up? I meant to write this quite a while ago.
Anyway, the Kelleher lab is now giving away fully functional Prosight for single protein top down analysis for free. It is called Prosight Lite and you can get it here.
The video below has more details.
Saturday, September 13, 2014
I have to admit, on my first glance through this paper early this morning it made me drop a bunch of profanities in my living room. Then I moved down to the figures and stared at the ceiling for a while and realized that this has some merit and may just be a way of looking at MS/MS spectra that I've never considered and that I maybe I was just being sleepy, under caffeinated, and a little old fashioned. BTW, if you know me you know I pretty much drop profanities all the time. (Recent uncited studies circulating Facebook have suggested that people who swear a lot may be more honest...just saying...aaaannnnnddddd....Snopes says this is another viral silly thing...fucking skeptics...nevermind.....)
So what is the idea behind JUMP? It is that we may be able to make a positive ID of a peptide from an MS/MS spectra by looking for specific mass tags without requiring the comprehensive b/y ion distribution we need for a traditional match. JUMP is even capable of helping to make confirmations based on tags of individual amino acids present. Yes, I know, this is what made me start cursing, but let's take this apart.
Let's say I selected a doubly charged peptide for fragmentation in my Q Exactive and it had a mass per charge of: 413.232
And I had a peptide in my database that matched that: AHVETLR
How much information would I really need to say that this m/z is this peptide? I've got high res accurate mass, that has to narrow it down some. Could I confirm it was it if I only had b ions? My hunch is that most engines wouldn't score it super well, but they would probably make a match under most conditions if we had all of the b's.
What if I just had the TLR as a dominant ion (389.25)? Is that enough for a confirmation? What are the odds of that occurring at random? Roughly 25*25*25 (leucine=isoleucine), so 1 in 16k. It would happen a bunch of times in a complex proteome, but how often would it perfectly coincide in fragmentation with an ion of this mass range? Probably pretty rarely.
Okay, so I'm less mad about this. Single peptide tags? That I'm still more than a little uneasy about. 1 in 25? But I see what they are getting at.
The authors run through all the requisite statistics in the paper, FDR plots with various metrics and demonstrate how many more peptides they get than the existing, accepted engines. They ought to, I often tell Proteome Discoverer "if that MS/MS spectra has <10 peaks with a S/N of 5, through that spectra in the garbage". And JUMP would make identifications off of lots of those that I throw away.
Interested in a new way of thinking about MS/MS spectra and matching them to your proteome? You should check out this paper in press at MCP.
P.S. Did the math that helped lower my blood pressure with the always awesome and reliable Protein Prospector.
Friday, September 12, 2014
Oh, E. coli, I've missed you! Since I left my happy little microbiology department at Virginia Tech and stumbled around the field of human proteomics the world has gotten so much more complex for me. Not to say there isn't complexity in microbiology! But I sure do prefer thinking about organisms with one big circular piece of DNA rather than exomes and introns and whatever. Lets just do: this is a gene and it makes a protein under this condition. I see a TATA box!
Obviously, though, E.coli hasn't give up all its secrets just yet. StepDB is a project that takes a swing at solving a few. This project aims to reveal the sub-cellular localization of proteins in our favorite model organism. How'd they do it? Through an intensive bioinformatics analysis that involves a lot of predictions and previously existing data from proteomic and genomic studies. Using this they are able to break E.coli proteins into 13 subcellular categories.
A lazy person would probably just do K12, but StepDB has information on the K12 and BL21strains and even has a tab down at the bottom where you can compare the two. This'll give you a rough measurement of the consistency of these predictions for when you check the homology of the protein from you organism of interest.
You can read more about StepDB here (currently in press at MCP)
You can actually visit it here.
Thursday, September 11, 2014
Did you go to ASMS and come back thinking "man, I totally have to set up this wiSIMDIA thing?"
I did. And then I spent a couple of hours looking at the interface and came back empty handed.
There is not currently a wiSIMDIA button. There may be one day, but until that day comes you have to set up wiSIMDIA manually.
It isn't the easiest thing in the world to do, unfortunately.
Want to set it up? Follow this link (apologies to you guys with FTP access blocked; email me and I'll send it to you when I have time).
The easiest way to do it? Simply upload the wiSIMDIA quad isolation window.
Want to set it up yourself? Follow the directions in the wiSIMDIA Powerpoint.
Monday, September 8, 2014
How do we normally figure out protein-protein interactions? I think, normally, it goes this way: Grad student studies one protein and over the course of his/her Ph.D., he/she identifies 1-10 proteins that aid the protein of interest in that protein's function. Ingenuity and Protein Centre collect the publications and add these links to their databases.
This approach is extremely thorough, and all of a sudden we have a Ph.D. level scientist out there in the world who knows this protein and its interactors inside and out. This person is a resource that couldn't really be replaced by anything. There are down-sides, however. The first is that this approach is slow. The second is that not every study can be reproduced by everyone. Experimental designs vary a lot over time. The third is that we tend to bias the proteins that we are studying to whatever the funding agencies happen to find interesting at any particular point in time. Not exactly the most democratic method of expanding our knowledge of the Universe.
So...what if someone were to systematically take human proteins and, using the same exact method, just go right through and map what proteins they interact with? That would be immeasurably valuable, right? What if this study isn't even published yet, and this group has already mapped over 27,000 protein-protein interactions? Would that be enough for you to Tweet "Holy Shit!" on Twitter?
In case you hadn't guessed, this resource is real. And it is amazing! The bulk of the work appears to stem from the Gygi lab and Biogen IDEC in Cambridge. They have been using HA-Tags (an awesome, relatively easy to pull down glyco tag from the flu virus) and have been just going through proteins and mapping who that protein is friends with.
The website says that the current dataset will be continually updated -- until they finish the ORFeome! What's that mean? They aren't going to stop until they catch them all!
Ugh...I know....it did seem funny for a second...it's like 4am, give me a break....
Anyway! You can check out this awesome resource here!
Sunday, September 7, 2014
Around the 4th of July I was lucky enough to spend a few days working with Mak Saito and Matt McIlvin at the Woods Hole Oceanographic Institute. My background is in human proteomics. I've worked with some other mammalian samples as well -- some mouse, rat, and different monkeys here and there. I've also helped some people set up plant experiments and occasionally yeast, but that is probably what 90% of us are doing out there in the field.
So when I heard that this lab was taking ocean water and doing proteomics on that, I seriously wondered how that would possibly work. Turns out it not only works, but it works well enough that you can track all sorts of things, and well enough that those findings drop right into Science. Obviously, this kind of work is no where near trivial and adaptation of existing methods and technology, as well as the development of completely novel techniques.
In this recent paper, this group and their collaborators demonstrate the tracking of stress biomarkers from an abundant bacteria across different areas and depths of the Pacific ocean. They demonstrate how these markers correlate with nutrient abundance levels to a degree that you could probably stop actually tracking the nutrients and just go with the quantitative proteomics (I repeat, of the ocean, lol!)
Super cool study that shows how much we can do when we take this technology out of the box we're used to keeping it in!
Saturday, September 6, 2014
I stumbled upon this over the weekend (thanks Twitter!). BRIMS has been making an effort to keep us up to date on advances in Biomarker research and there are tons of cool articles there. Most articles still link you to pay wall abstracts, but if you have subscriptions to these journals it helps you filter down to what is interesting.
You can check out BRIMS news here.
Friday, September 5, 2014
If you have been through this blog very much you know what a nutter I am about good statistics. This mostly stems for years of doing transcription microarray work that was almost impossible to publish following the Potti scandal. I'm probably just paranoid now, but I'm frankly pretty terrified something like this is going to happen to Proteomics.
Well, no bad statistical analyses should be slipping into Science any time soon. Cause Science just started the SBore (the Statistical Board of Reviewing Editors) and no one is going to sneak crappy stats into this journal any time soon.
I guess the next question is -- when is everyone else adding one?
You can read more about SBore and why, here!
Wednesday, September 3, 2014
So you pulled down your protein complex and high resolution accurate mass spectrometry has identified all of the protein interactors. You follow up with pathway analysis and it all makes sense. All you have to do now is right it up and submit it to Science, right?
Thats probably how it ought to work, right? Unfortunately, not everyone thinks that way, and we are often left "validating" these findings with mixtures of mouse and rabbit blood. Even these old fashioned (outdated...?) techniques have trouble backing up in vivo interactions.
You know what you need? KISS.
No, not these old weirdos...
This KISS: KInase Substrate Sensors, a new technique described in this paper from Brian Chait's lab.
In KISS, multiple antibodies an fluorescent tags can be used to validate that the proteins you said were interacting in vivo really are. Best of all? Metabolite X that you said was involved can be proven to be there as well.
Need to validate something soon? Check this out!