headerimage
Home > Lab > Hendrik weisser phd

A Software Pipeline for High-Throughput Quantification of Mass Spectrometry-Based Proteomics Data: Applications to Streptococcus pyogenes

Title A Software Pipeline for High-Throughput Quantification of Mass Spectrometry-Based Proteomics Data: Applications to Streptococcus pyogenes
Student Hendrik Weisser
Type PhD
Completion Date 2013-01-10
Abstract

All living organisms are composed of cells, of which proteins are the primary structural building blocks. Proteins also form the molecular machines (enzymes) with which cells are able to accomplish their many functions. The study of proteins is thus of central importance in biology. The scientific discipline that investigates the entirety of the proteins in a cell or in a set of cells the proteome is called proteomics. As the key technology responsible for its success, modern proteomics applies liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) to identify and quantify the protein contents of complex biological samples. Label-free shotgun mass spectrometry is one quantitative proteomics technique that is especially suited for high-throughput applications. The label-free shotgun approach relies heavily on the computational processing of LC-MS/MS data to identify proteins and determine their relative abundances in different samples.

This thesis describes the development of a software pipeline for the quantification of peptides and proteins in LC-MS/MS data from label-free shotgun experiments. New algorithms for a number of data processing tasks are introduced. Implemented in the OpenMS software framework, the label-free pipeline improves upon existing alternatives by being highly flexible, applicable to large datasets (50+ samples), and amenable to automation. Based on two datasets that provide ground truths for the quantification, the performance of the pipeline was evaluated and found to be at the state of the art, with high accuracy and good coverage.

To realize its full potential for processing large datasets in a high-throughput fashion, the label-free quantification pipeline was adapted for use on a powerful computing cluster. Based on a software framework for workflow management, an automated workflow implementing the label-free pipeline, applicable to distributed computing, was created. The connection to a data management system gives this workflow access to LC-MS/MS raw data and allows the traceable storage of processing results. Further, a web interface for the configuration and submission of label-free analyses was designed, and a software module that supports aspects of workflow development was implemented.

This thesis also presents two biological studies in which the label-free quantification pipeline was applied. Both studies investigate the proteome of the bacterium Streptococcus pyogenes. S. pyogenes is an important pathogen that causes a variety of diseases ranging from mild (superficial skin and throat infections) to severe and life-threatening (necrotizing fasciitis, toxic shock syndrome). The use of shotgun mass spectrometry in connection with the labelfree quantification pipeline allowed the reliable analysis of a large part of the S. pyogenes proteome over 800 proteins in both studies. The first investigation focused on the adaptation of S. pyogenes to growth in the presence of human blood plasma, and found a marked downregulation of proteins required for fatty acid biosynthesis. The uptake of fatty acids from plasma, mediated by the binding of a fatty acid-carrying human protein (albumin) to specific proteins on the bacterial surface, allows S. pyogenes to conserve energy by reducing its internal production, as follow-up experiments could clarify.

The second study compared two S. pyogenes strains, a virulent wild-type and a hypervirulent mutant strain, under 26 different growth conditions. The analysis focused on the impact of increased virulence on the proteome of the bacterium. The results highlight specific protein expression patterns in the hypervirulent strain for virulence factors and for transcriptional regulators, largely confirming previous findings. Further, a remarkable downregulation of the protein biosynthesis machinery in the hypervirulent mutant strain was observed for a broad range of conditions, matching an apparent growth deficit of the mutant bacteria.

These applications illustrate the power and utility of mass spectrometry-based proteomics strategies in general, and of shotgun approaches combined with the OpenMS label-free quantification pipeline in particular.