|48 (Research article)
|Röst, Hannes; Schmitt, Uwe; Aebersold, Ruedi; Malmström, Lars
|Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry.
|PLoS One (2015) 10(4) e0125108
|23 citations (journal impact: 3.53)
|MOTIVATIONIn mass spectrometry-based proteomics XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data spectra and chromatograms of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MSMS measurement and put substantial strain on open-source tools particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.RESULTSHere we present a fast and versatile parsing library for mass spectrometric XML formats available in C and Python based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures small XML files can also be completely loaded into memory. In addition we have improved the parsing speed of the core mzML module by over 4-fold compared to OpenMS 1.11 making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.AVAILABILITYOur C and Python implementations are available for the Linux Mac and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at httpsgithub.comOpenMSOpenMS.
|We describe software that can process XML files fast and this is important since most MS data is in XML and they are getting larger as instruments gets faster and more precise.