5. OBO parser Class

Class to parse the obo file and set up the accessions library

The OBO parse has been designed to convert MS:xxxxx tags to their appropriate names. A minimal set of MS accession is used in pymzml, but additional accessions can easily added, using the extraAccession parameter during run.Reader initialization.

The obo translator is used internally to associate names with MS:xxxxxxx tags.

The oboTranslator Class generates a dictionary and several lookup tables. e.g.

>>> from pymzml.obo import oboTranslator as OT
>>> translator = OT()
>>> len(translator.id.keys()) # Number of parsed entries
737
>>> translator['MS:1000127']
'centroid mass spectrum'
>>> translator['positive scan']
{'is_a': 'MS:1000465 ! scan polarity', 'id': 'MS:1000130', 'def': '"Polarity
of the scan is positive." [PSI:MS]', 'name': 'positive scan'}
>>> translator['scan']
{'relationship': 'part_of MS:0000000 ! Proteomics Standards Initiative Mass
Spectrometry Ontology', 'id': 'MS:1000441', 'def': '"Function or process of
the mass spectrometer where it records a spectrum." [PSI:MS]', 'name':
'scan'}
>>> translator['unit']
{'relationship': 'part_of MS:0000000 ! Proteomics Standards Initiative Mass
Spectrometry Ontology', 'id': 'MS:1000460', 'def': '"Terms to describe
units." [PSI:MS]', 'name': 'unit'}

pymzML comes with the queryOBO.py script that can be used to interrogate the OBO file.

$ ./example_scripts/queryOBO.py "scan time"
MS:1000016
scan time
"The time taken for an acquisition by scanning analyzers." [PSI:MS]
Is a: MS:1000503 ! scan attribute
$

5.1. Accessing specific OBO MS tags

This section describes how to access some common MS tags by their names as they are defined in the OBO file.

First pymzML is imported and the run is defined.

>>> example_file = get_example_file.open_example('dta_example.mzML')
>>> import pymzml
>>> msrun = pymzml.run.Reader(example_file)

Now, we can fetch specific imformations from the spectrum object.

MS level:

>>> for spectrum in msrun:
...     print(spectrum['ms level'])

Total Ion current:

>>> for spectrum in msrun:
...     print(spectrum['total ion current'])

Furthermore we can also check for presence of parameters, therefore the proprties of the spectrum.

Differentiation of e.g. HCD and CID fractionation:

>>> for spectrum in msrun:
...     if spctrum['ms level'] == 2:
...         if 'collision-induced dissociation' in spectrum.keys():
...             print('Spectrum {0} is a CID spectrum'.format(spectrum['id']))
...         elif 'high-energy collision-induced dissociation' in spectrum.keys():
...             print('Spectrum {0} is a HCD spectrum'.format(spectrum['id']))

5.2. Minimal accession set

The following dictionary shows the minimal accession necessary to run pymzML.

MIN_REQ = [
#
#!NOTE!   exact names will be extracted of current OBO File, comments are just an orientation
#         pymzml comes with a little script (queryOBO.py) to query the obo file
#
#         $ ./example_scripts/queryOBO.py "scan time"
#         MS:1000016
#         scan time
#         "The time taken for an acquisition by scanning analyzers." [PSI:MS]
#         Is a: MS:1000503 ! scan attribute
#
('MS:1000016',['value']             ), #"scan time"
# -> Could also be ['value','unitName'] to retrieve a
# tuple of time and unit by calling spectrum['scan time']
('MS:1000040',['value']             ), #"m/z"
('MS:1000041',['value']             ), #"charge state"
('MS:1000127',['name']              ), #"centroid spectrum"
('MS:1000128',['name']              ), #"profile spectrum"
('MS:1000133',['name']              ), #"collision-induced dissociation"
('MS:1000285',['value']             ), #"total ion current"
('MS:1000422',['name']              ), #"high-energy collision-induced dissociation"
('MS:1000511',['value']             ), #"ms level"
('MS:1000512',['value']             ), #"filter string"
('MS:1000514',['name']              ), #"m/z array"
('MS:1000515',['name']              ), #"intensity array"
('MS:1000521',['name']              ), #"32-bit float"
('MS:1000523',['name']              ), #"64-bit float"
('MS:1000744',['value']             ), # legacy precursor mz value ...
]