1. Introduction

The latest Documentation was generated on: September 30, 2015

1.1. General information

Module to parse mzML data in Python based on cElementTree

Copyright 2010-2014 by:

T. Bald,
J. Barth,
A. Niehues,
M. Specht,
M. Hippler,
C. Fufezan

1.1.1. Contact information

Please refer to:

Dr. Christian Fufezan
Institute of Plant Biology and Biotechnology
Schlossplatz 8 , R 105
University of Muenster
Germany
Tel: +049 251 83 24861

1.2. Summary

pymzML is an extension to Python that offers
    1. easy access to mass spectrometry (MS) data that allows the rapid development of tools,
    1. a very fast parser for mzML data, the standard in mass spectrometry data format and
    1. a set of functions to compare or handle spectra.

1.3. Implementation

pymzML requires Python2.6.5+ and is fully compatible with Python3. The module is freely available on pymzml.github.com or pypi, published under LGPL and requires no additional modules to be installed.

1.4. Download

Get the latest version via github
or the latest package at
The complete Documentation can be found as pdf

1.5. Citation

Please cite us when using pymzML in your work.

Bald, T., Barth, J., Niehues, A., Specht, M., Hippler, M., and Fufezan, C. (2012) pymzML - Python module for high throughput bioinformatics on mass spectrometry data, Bioinformatics, doi: 10.1093/bioinformatics/bts066

The original publication can be found here:

1.6. Installation

sudo python setup.py install

1.7. Introduction

Mass spectrometry has evolved into a very diverse field that relies heavily on high throughput bioinformatic tools. Due to the increasing complexity of the questions asked and biological problems addressed, standard tools might not be sufficient and tailored tools still have to be developed. However, the development of such tools has been hindered by proprietary data formats and the lack of an unified mass spectrometric data file standard. The latter has been overcome by the publication of the mzML standard by the HUPO Proteomics Standards Initiative (Deutsch, 2008) (http://www.psidev.info/) and soon all manufactures will hopefully offer a way to convert their format into this standardized one in order to stay comparable and competitive. Therefore in order to rapidly develop bioinformatic tools that can explore mass spectrometry data one needs a portable, robust, yet quick and easy interface to mzML files. The Python scripting language (http://python.org) is predestined for such a task.

Scripting languages carry several advantages compared to compiled programs and although compiled programs tend to be faster, scripting languages can already compete successfully in some tasks. For example, XML parsing is extremely optimized in Python due to the cElementTree module (http://effbot.org/zone/element-index.htm), which allows XML parsing in a fraction of classical C/C++ libraries, such as libxml2 or sgmlop. Therefore it seems natural that a well designed python mzML parser can successfully compete with C/C++ libraries currently available while offering the advantages of a scripting language.