Wednesday, September 20, 2017

APOSTL -- A staggering number of Galaxy AP-MS tools in a user friendly interface!


Thanks to whoever sent me the link to this paper! That doesn't mean that I'll write about it, btw. However, if the paper leads me to an easy user interface that allows me to use a bunch of tools I've heard about, but all require Perlython or Gava or Lunyx or whatever to use otherwise, there's a pretty good shot!

This is the paper that describes this awesome new tool! 


As far as I can tell, bioinformaticians fall into a couple of different branches. You've got the hardcore computational camps that are writing all their stuff in Python, PERL or whatever. And you've got your more data science people who seem to be using either R or Galaxy. From my perspective all the awesome tools have one common denominator when I give them a shot...


Okay...maybe 2 common denominators...


....joking of course! But it isn't uncommon for these shells or studios to require some extremely minor alteration or library installation that is a challenge for me to do. And honestly, as cool as all the stuff in Galaxy looks -- I don't even know where to start with that one.

And here is where we click on the link to the APOSTL SERVER where all the Galaxy tools for Affiniity purification/enrichment MS experiments are all located in a form where we all can use them!

APOSTL has a flexible input format. Workflows are already established for MaxQuant and PeptideShaker but it looks like you'd just need to match formatting to bring in data from anything else.


I don't have any AP-MS data on my desktop right now (and it sounds like it's still working on a big queue anyway) but I have some stuff that needs to go into this later. I'll let you know how it goes.


...and sometimes Matthias Mann uses a picture from your silly blog!!!


One of the downsides of my current job is that I have to miss my favorite conference, iHUPO. Fortunately, though, loads of really cool people are there and I've been able to keep on top of what is happening via Twitter.

...and...yesterday I got this picture that made me feel included and also made me laugh a lot!!

Tuesday, September 19, 2017

Two cool new (to me?) tools I somehow missed (?)

I'm leaving these links here so I don't forget them (again?) they've both been around for a while and I'm wondering if I forgot, or if our field just has a lot of software!

You can check out MZmine 2 here. (now I can close that tab -- WAY too many are open!)



And Mass++ is here. P.S. PubMed thinks + signs mean something else and doesn't like searching it as text.

Both are free -- look super powerful -- and are waiting on my desktop for me to bring my PC into a hyperbolic time chamber.

Let's plunder some .PDresult files, matey!

(Wait. There's a guy on our team who dresses as a pirate?!?)

As I'm sure you're aware, it's talk like a pirate day. You can go two ways with a holiday like this as a blogger. You can ignore it completely OR you can make a sad attempt to tie it in with what your blog is about. I, unfortunately, chose the latter.

Recently, I've been helping some people with some impressively complex experiments. The days of "how many proteins can you identify" are just about gone. The days of "how does this glycosylation even change globally in relation to this phosphorylation event and how the fourier are you going to normalize this" Arrr upon us. 

The Proteome Discoverer user interface has gotten remarkably powerful over the years. However, I imagine the developers sit back and have meetings about -- "we have this measurement that is made as a consequence of this node's calculation, but I can't imagine a situation under any circumstances for why someone would want this." To keep from overwhelming us with useless measurements they don't output some of these measurements.



.MSF files and .pdresult files are really formatted SQLite files in (pirate? groan....) disguise. DB Browser uses virtually no space and can pillage these files and reveal all the behind the scenes data. 

For this reporter quan experiment, I can get to:  


78 different tables! Add more nodes into your workflow and there are more! You can get to in and pillage the files for tables you can't access otherwise.

Is this useful to you? Maybe if you're doing something really weird. If the weird thing you are doing is really smart, you could also make a suggestion to the PD development team to include it in the next release.  In the meantime, maybe this will do, ya scurvy dog (ugh...sorry...)

Monday, September 18, 2017

Metabolic labeling for tracking acetylation dynamics in human cells!


I was going to call this "new strategy" in the title, but I wasn't 100% for sure on that fact. I just know that I've never seen anything like this!



The strategy involves using heavy labeled acetate and glucose. Metabolic labeling techniques that are normally reserved for advanced metabolomics experiments -- but here they are used to track acetylation in the proteome!

One of the advantages of using the heavy labeled acetate media is that you pretty much know where that is heading -- through Acetyl-CoA to protein acetylation (sure it can go other places, but that's where proteins are going to pull from) -- the first observation this awesome study provides is -- HOLY COW HISTONE ACETYLATION is FAST!

Not phosphorylation fast -- but still freaky fast. Turnover, so de-acetylation back to re-acetylation in around an hour. Maybe that doesn't sound fast until you remember what histone 3D structure looks like.

(Borrowed from MB-Info here)

Histones aren't just hanging around linear in the cytoplasm where their active site may be easily accessed by acetylases. They're balled up and complex and I'd expect alterations to happen on a more glacial time scale because of it. Sounds like whatever the acetylations are doing in there is seriously important because a lot of energy is being used to get them in and out of there.

Super cool new (at least to me) methodology and seriously interesting biological implications make this a great Monday morning read.


Sunday, September 17, 2017

What 2D peptide fractionation technique yields the highest number of IDs?


The field of proteomics has changed at a dazzling rate in a remarkably short period of time and it's an absolute challenge to keep up on instruments, software and methods. Separation science is also evolving and it's yet another factor to try to keep up on.

During my postdocs methods for in-solution isoelectric focusing of peptides were the cutting edge and that's what I used for all of my 2D-fractionations. 7-8 years later...not so much...I've seen one paper that used this separation methodology this year -- and I do watch out for it.

What does the current literature say about the best offline fractionation techniques for shotgun proteomics?  I know a lot of my friends are using high-pH reversed phase offline fractionation. But -- are they doing it for the same reason I was using IEF? Cause it's the cool thing right now?

(Did you know you can install this button in your browser window in Chrome?)

The first paper Scholar directed me to is this short review on the topic:


For studies where they have lots of sample, >10ug total peptides, these authors report dramatically higher peptide IDs (80% more peptides) when using offline high pH reversed phase fractionation rather than offline SCX. Interestingly, they mention that report that the desalting step post-SCX is a major point of sample loss, with up to 50% lost in their typical protocol.

However, they do find online SCX (MudPIT type methodologies) more sensitive when they are limited to less than 10ug of total peptides.

This review spends a lot of time stressing the importance of concatenation techniques. It's one shortcoming is that it doesn't give me much to work with in terms of technical details --resins and gradients and so forth.

However, it appears that all of these details can be found in this study:



The second paper Scholar directs me to is a phosphoproteomics one --


I know this group has been primarily using high pH revesed-phase (which they usefully abbreviate HpH, but I hadn't seen this technical note.


In terms of phosphopeptide identification, the work seems quite clear cut. Wow. Does the concatenation ever look like a pain in the...


Okay -- these studies have all compared HpH to SCX. What about isoelectric focusing (IEF)? Scholar?

First study that pops up doesn't have a very pro-IEF title...


I have to say, however, that the results aren't quite that drastic. HpH does outperform IEF in every way, but it isn't a night and day comparison. Interestingly the overlap isn't huge between the two techniques. They are sampling a pretty small percentage of the proteome (<3,000 proteins) so they're getting some interesting variations based on stochastic sampling of the whole.

This paper really goes beyond just a comparison of these two techniques. There is some really insightful charts showing peptide/protein distribution and relative protein coverage. Not that the other papers I link in this post aren't worth checking out -- but if you are just interested in peptide chemical properties -- this is a seriously interesting read.

Whoa...I've read too many papers for a Sunday morning. Time to get out and do something!

(No -- Ben -- do not investigate the impact all this concatenation has on LFQ....do that later...)

HUPO 2017 Kickoff!!

SUPER JEALOUS of everyone at HUPO 2017 in Dublin. Definitely may favorite conference and the second year I haven't been able to go do to how crazy September is for my day job.

If you are thinking to yourself -- "I probably won't Tweet this. No one will read it anyway." You're wrong!

Saturday, September 16, 2017

msVolcano -- Visualize quantitative proteomic data!


Want to visualize any quantitative proteomic data out of any platform or processing pipeline? Want that output to look like a scientist did it?

Check out msVolcano! 


This should not be confused with Ms. Volcano which I just discovered is a pop song that I could not handle in it's entirety...shudders....

While optimized for MaxQuant output, it sure looks to me like you just need to move some columns around if you are using Protoeme Discoverer or other software packages...but I haven't verified.


While it definitely seems valuable for any dataset, the authors stress the power this workflow has in terms of affinity purification and affinity enrichment experiments!


Friday, September 15, 2017

Make a histogram of up to 300 masses in 10 bins in 5 seconds!


Feel free to react like miss Jones above because, yes this is totally the laziest thing ever. However -- if you need to make a histogram in 5 seconds and you aren't real concerned about what people think about you or your computational skills --- I present the Social Science Statistics Histogram tool!


Put up to 300 masses in the box -- and BOOM!


Laziest mass distribution histogram of all time!

I made an assertion today regarding small molecule quan and a much smarter person challenged that assertion. This little tool proved in 5 seconds (maybe minutes to cut the data from the CSV file...that I opened in Excel first...which I hope elicits more eyerolls...) that I was completely wrong!!  I'd based my entire mental framework on what I know from HRAM peptide quan and one small molecule anecdote from a study I worked on last year.  Is there anything cooler than finding out you are completely wrong about something? It made my day for sure! Sure, there are 100 ways to make a better histogram, but I'd argue there isn't a faster one.

Mitochondrial Human Proteomics Initiative!


Doing mitochondial proteomics? You're going to be excited about this new resource! 


How 'bout:
Overview of instrument methods from different vendor nanoLC and MS systems?

Step by step data analysis methodologies and rationale?

What about mitochondrial proteome reference datasets you can get from ProteomeXchange (here)?

That do anything for you?

How 'bout the power to kill a yak from 200 yards away with...mind bullets? Sorry...Reddit reminded me of this video and the song has been stuck in my head all week.

The construction of the mitochondrial reference pathways and data processing schemes alone make this a super valuable resource. The fact that you can check your instrument setup and essentially follow the established protocols and know what to expect from your data because of the RAW files they deposit -- that's skadoosh.




Thursday, September 14, 2017

Could FASP filter shape explain a lot of the reported variability for the technique?


Filter aided sample prep (FASP) has always been a bit of an enigma. There are people who use it for everything (me) and people who tried it a few times and have never done it again.

There was a visiting fellow in my old department who spent months prepping samples and bringing them to me for LC-MS and I never identified a darned thing in his samples except my normal keratin background. When we finally reviewed the whole method with him we realized he was using the wrong spin filters. No protein was ever retained. It was a total bummer because his samples took foooorrreeevvveeer for him to get and he lost a lot most of his research time because of a wrong catalog number or something.

Never ever underestimate the value of a good biological QC for your sample prep methodology -- especially if it is really complicated! This also impressed upon me the need to match the filter to the method paper (or to just buy the pre-assembled kits!)

Turns out it's even more complicated than that! This team looks at different filter shapes and finds remarkably different numbers of peptides are retained depending on how the filter is constructed! 


They use an interesting mouse and human cell samples and find what I've always seen -- FASP outperforms in-solution digestion markedly (they use ethanol crashed -- I'm going to go for acetone -- and not just because I love the smell). They also experiment with some buffers for their protocol, but the interesting thing is the difference in coverage between the flat bottom filters and the conical filters. Since they don't give away the answer in the abstract, I'm not going to give it away here either. Let's just say that the kits I've always bought in the past are on the right track.

The samples are ran on a Q Exactive Plus and the data is processed in both PD 1.4 using Mascot as well as with MaxQuant, which is where all the LFQ measurements came from.

I really wonder how much these observations play into whether one researcher thinks FASP is a great idea -- or a dumb one.

Wednesday, September 13, 2017

The first "Zero Length" MS-Cleavable peptide crosslinker!


Crosslinking is taking off here in Maryland, thanks primarily to the DSSO crosslinker and the Xlink nodes.

Okay -- I got distracted. If you are interested in why there is a protein structural revolution going on -- thanks to hardware, crosslinker, and software advancements -- check out this awesome open review from Andrea Sinz et al.,


I particularly like the illustrations showing how the crosslinking molecules fit within the protein or between protein species. (It's open access, you'll have to check them out for yourself).

When you do look at the current crosslinkers you'll notice they're kind of long. DSSO is like 10 or 12 Angstroms (please note, funny symbols over the letters are missing due to blog author ignorance/laziness -- we're going to go with the letter A) long and others are even longer than this.

10 A is great if you're looking to link things together that aren't super close together, but what if you could link things that are only in extremely close proximity?!?

BOOM!


A 2.6 A MS-cleavable crosslinker!  First off..."zero-length" does sound way cooler than 2.6 A. Let's check with Drake.


Now that this is put into perspective, I'm going to be serious for a second.

This is seriously powerful! First off -- these longer crosslinks can provide really valuable information on what proteins are interacting and when -- but can you imagine how much information DSSO plus CDI (the zero length reagent that performed the best) would provide?

DSSO -- this crosslinks
CDI - it doesn't

Right there -- you have information on the relative proximity of the residues in terms of their 3 dimensional structure! Information that I wouldn't know how to ever get otherwise. Maybe bug one of those NMR or crystal nerds...?

Best of all? CDI is MS-cleavable and provides a pattern almost identical to DSSO. Which means all these new tools, including the automated instrument methods the manufacturer has developed AND the Xlink nodes, R package, and Xlink 2.0 server should all directly accept data from this reagent.

What an amazing time to be doing structural proteomics!!


Tuesday, September 12, 2017

An outstanding application of MHC peptidomics in Nature!

Image from this Wikipedia article and used in accordance with the Commons guidelines

The analysis of MHC peptides has been going on for quite a while -- most notably in cancer research, but we've seen this mechanism evaluated in other diseases as well.

This new paper in Nature is a great study showing where the cutting edge is right now!


I nearly used the title of the paper as the title of the blog post, but if I had I'd then feel like I should learn what all those words mean and thought better of it!

Instead, I'm going to stick to what I know here.

The authors use a series of genomics technologies but supplement them with studying the cell surface peptides using an Orbitrap Elite and processing the data with Sequest and PEAKS. They also do deep proteomics on the cells. They discover something that is really biologically interesting to them AND some Nature reviewers (...ugh...seriously, Ben...that's the best you can do here...?)

I'll be honest -- I really just wanted some peptidomics RAW files to test an idea out on this morning -- gotta get this thing queued up before I go to work -- and a new paper on MHCs that made it into Nature seems like a pretty good place to start!

The RAW files are on PRIDE/ProteomeXchange via
PXD004746
and
PXD005704

And they're OT Elite, so they're relatively small!

Monday, September 11, 2017

bioRxiv is going to get even better -- hypothes.is!

Look -- I hope the traditional peer-review process never goes anywhere.  It has some faults, but it is one of the central pillars of the scientific process and why it works.

However, I'm not extremely patient and I'm not exactly getting younger. These aren't the only reasons to love hybrid peer-review models like bioRxiv and f1000...but they are reasons... and I'm a huge fan of these new resources!

bioRxiv is having a pretty good year. Loads of submissions. Chan Zuckerberg foundation partnership and now  -- Annotation on submissions from hypothes.is!!

You can learn more about this here.

Sunday, September 10, 2017

Find out how much protein you have -- and how many copies per cell!!!


This new review is short and fantastic!!


I love it because it immediately clarified some things in my head regarding the application of the Total Protein and Proteomics Ruler approaches that can determine:

1) the absolute abundance of proteins in a mixture and
2) the copy numbers of each protein in each cell of your lysate

If I'm honest, I was confused by these two techniques -- when to apply where and with whom. This sets it all out clearly and makes it seem like something that can be easily applied to every LFQ dataset!

In case you were wondering -- yeah...I did order that ruler.... I had some half idea of posing one of my dogs with a meter stick...but couldn't find one anywhere. Ebay solved both problems at once. I'm going to call it the Pugeomic Ruler in honor of this awesome technique that I understand a whole lot better thanks to this great review!

Saturday, September 9, 2017

Re-annotating (is "fixing" too strong of a word?) genomes with proteomics!


I live in "Genome Alley" -- the sequencer is king here -- maybe it always will be.

I LOVE to see a paper where someone goes through and repairs incorrect genome sequences with proteomics data. It's not just that I'm petty -- it's that -- well...it's important to me to point out the FDR of these sequencing instruments. Sure -- they're amazing -- but they don't always get it right. And when you look at disagreements between the mass spec and the sequencer we're generally (if not always) right.

I like this new methods chapter because it makes genome reannotation seem a lot less daunting than some of these big papers and is specifically focused on prokaryotes!

Friday, September 8, 2017

NCI Cancer Data Commons!


LOTS of cool stuff is going on at the National Cancer Institute, thanks in large part to the Precision Medicine Initiative and the Beau Biden Cancer Moonshot.

The NCI Cancer Research Data Commons is another huge step in the right direction! Anything that helps make important data more accessible to researchers is a great thing in my book.

You can read more about it here (I'm just going to assume that proteomics is going to be integrated into this down the road, of course!)

Thursday, September 7, 2017

Improvements in the OmicsDI


Our online resources just keep getting cooler!!

Proof -- OmicsDI.org (where you can find information on all sorts of -Omics data and repositories) now has a "Watch Notification"

It's like the Google Scholar Alerts setting.

If this is your organism of interest you can get a tag every time that someone posts something for it in a public repository!

In the example above, Yasset has set an alert for anytime someone posts anything on extremely high copy number proteins in cancer cells.




Wednesday, September 6, 2017

PROCAL -- QC for retention time AND collision energy!!


If you are using HCD fragmentation for proteomics, I recommend you stop what you are doing right now and check out this awesome new paper just acceped at Proteomics!



I'm serious. 

Yes, I'm talking to you! (This pug on loop is cracking me up. Hard.to.type!)

Seriously, I've been looking for something like this resource for years! Chances are you have been as well. I've tried answering the question "What does a good HCD spectra look like?" a couple times on the blog. But these rules I've been using to troubleshoot fragmentation are imperfect at best -- and wrong at worst.

PROCAL -- is a resource that 1) Improves upon the PRTC idea (15 peptide standards? -- how about 40 retention time standards!) but goes a whole lot further.

The peptides are not heavy labeled -- their sequences don't exist in ANY protein on earth -- this allows the standards to be integrated into your proteomic processing workflow without any crummy work-arounds -- except adding the peptide sequences into your FASTA for quick identification checks (AWESOME!!)

The retention time calculations are even better with PROCAL -- several peptides were picked on purpose that have extremely similar retention times. This allows extremely sensitive monitoring of column performance.

Okay -- this is all super cool -- BUT -- they study how these peptides fragment under HCD in different instruments -- allowing the HCD collision energy to be calibrated between platforms!!  I'm fuzzy on the details and I'll have to come back to this later, but it looks like they used Skyline transition intensities to do the measurements.

I've got to run, but not before I start downloading these RAW files from PRIDE / ProteomeXchange.

Nuts. The paper isn't proofed yet, but when it is the RAW files are available here.

Gotta get out the door, but I'm so inspired that such a huge technical hurdle has been addressed for all of us!