4. Class Spectrum

The spectrum class offers a python object for mass spectrometry data. The spectrum object holds the basic information on the spectrum and offers methods to interrogate properties of the spectrum. Data, i.e. mass over charge (m/z) and intensity decoding is performed on demand and can be accessed via their properties, e.g. spec.Spectrum.peaks.

The Spectrum class is used in the run.Run class. There each spectrum is accessible as a spectrum object.

Theoretical spectra can also be created using the setter functions. For example, m/z values, intensities, and peaks can be set by the corresponding properties: spec.Spectrum.mz, spec.Spectrum.i, spec.Spectrum.peaks.

class spec.Spectrum
__init__(measuredPrecision = value*)

Initializes a pymzml.spec.Spectrum class.

Parameters:measuredPrecision (float) – in m/z, mandatory
xmlTree

xmlTree property returns an iterator over the original xmlTree structure the spectrum was initilized with.

Example:

>>> for element in spectrum.xmlTree:
...   print( element, element.tag, element.items() )

please refer to the xml documentation of Python and cElementTree for more details.

mz

Returns the list of m/z values. If the m/z values are encoded, the function _decode() is used to decode the encoded data.

The mz property can also be setted, e.g. for theoretical data. However, it is recommended to use the peaks property to set mz and intesity tuples at same time.

Return type:list
Returns:Returns a list of mz from the actual analysed spectrum
i

Returns the list of the intensity values. If the intensity values are encoded, the function _decode() is used to decode the encoded data.

The i property can also be setted, e.g. for theoretical data.However, it is recommended to use the peaks property to set mz and intesity tuples at same time.

Return type:list
Returns:Returns a list of intensity values from the actual analysed spectrum.
peaks

Returns the list of peaks of the spectrum as tuples (m/z, intensity).

Return type:list of tuples
Returns:Returns list of tuples (m/z, intensity)

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(spectra.mzMl.gz, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     for mz, i in spectrum.peaks:
...         print(mz, i)

Note

The peaks property can also be setted, e.g. for theoretical data. It requires a list of mz/intensity tuples.

centroidedPeaks

Returns the centroided version of a profile spectrum. Performs a Gauss fit to determine centroided mz and intensities, if the spectrum is in measured profile mode. Returns a list of tuples of fitted m/z-intesity values. If the spectrum peaks are already centroided, these peaks are returned.

Return type:list of tuples
Returns:Returns list of tuples (m/z, intensity)

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(spectra.mzMl.gz, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     for mz, i in spectrum.centroidedPeaks:
...         print(mz, i)
reprofiledPeaks

Returns the reprofiled version of a centroided spectrum.

Return type:list of reprofiled mz,i tuples
Returns:Reprofiled peaks as tuple list

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(spectra.mzMl.gz, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     for mz, i in spectrum.reprofiledPeaks:
...         print(mz, i)
reprofiledPeaks

Returns the reprofiled version of a centroided spectrum.

Return type:list of reprofiled mz,i tuples
Returns:Reprofiled peaks as tuple list

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(spectra.mzMl.gz, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     for mz, i in spectrum.reprofiledPeaks:
...         print(mz, i)
measuredPrecision

Sets the measured and internal precision

Parameters:value (float) – measured precision (e.g. 5e-6)
__add__(otherSpec)

Adds two pymzml spectra together.

Parameters:otherSpec (object) – Spectrum object

Example:

>>> import pymzml
>>> s = pymzml.spec.Spectrum( measuredPrescision = 20e-6 )
>>> file_to_read = "../mzML_example_files/xy.mzML.gz"
>>> run = pymzml.run.Reader(file_to_read , MS1_Precision = 5e-6 , MSn_Precision = 20e-6)
>>> for spec in run:
...     s += spec
__mul__(value)

Multiplies each intensity with a float, i.e. scales the spectrum.

Parameters:value (float) – Value to multiply the spectrum
__truediv__(value)

Divides each intensity by a float, i.e. scales the spectrum.

Parameters:value (float, int) – Value to divide the spectrum
strip(scope='all')

Reduces the size of the spectrum. Interesting if specs need to be added or stored.

Parameters:scope (string) – accepts currently [“all”]

“all” will remove the raw and profiled data and some internal lookup tables as well.

extremeValues(key)

Find extreme values, minimal and maximum mz and intensity

Parameters:key (string) – m/z : “mz” or intensity : “i”
Return type:tuple
Returns:tuple of minimal and maximum m/z or intensity
reduce(mzRange=(None, None))

Works on peaks and reduces spectrum to a m/z range.

Example:

>>> run = pymzml.run.Reader(file_to_read, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spec in run:
...     spec.reduce( mzRange = (100,200) )
deRef()

Strip some heavy data and return deepcopy of spectrum.

Example:

>>> run = pymzml.run.Reader(file_to_read, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spec in run:
...     tmp = spec.deRef()
removeNoise(mode='median', noiseLevel=None)

Function to remove noise from peaks, centroided peaks and reprofiled peaks.

Parameters:mode (string) – define mode for removing noise. Default = “median” (other modes: “mean”, “mad”)
Return type:list of tuples
Returns:Returns a list with tuples of m/z-intensity pairs above the noise threshold

mad < median < mean

Threshold is calculated over the mad/median/mean of all intensity values. (mad = mean absolute deviation)

Example:

>>> import pymzml
>>> run = pymzml.run.Reader(spectra.mzML.gz, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     for mz, i in spectrum.removeNoise( mode = 'mean'):
...         print(mz, i)
highestPeaks(n)

Function to retrieve the n-highest centroided peaks of the spectrum.

Parameters:n (int) – Number of n-highest peaks
Return type:list
Returns:list of centroided peaks (mz, intensity tuples)

Example:

>>> run = pymzml.run.Reader("../mzML_example_files/deconvolution.mzML.gz", MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     if spectrum["ms level"] == 2:
...         if spectrum["id"] == 1770:
...             for mz,i in spectrum.highestPeaks(5):
...                print(mz,i)
estimatedNoiseLevel(mode='median')

Calculates noise threshold for function removeNoise()

hasOverlappingPeak(mz)

Checks if a spetrum has more than one peak for a given m/z value and within the measured precision

Parameters:mz (float) – m/z value which should be checked
Returns:Returns True if a nearby peak is detected, otherwise False
Return type:bool
hasPeak(mz2find)

Checks if a Spectrum has a certain peak. Needs a certain mz value as input and returns a list of peaks if a peak is found in the spectrum, otherwise [] is returned. Every peak is a tuple of m/z and intensity.

Parameters:mz2find (float) – mz value which should be found
Return type:list
Returns:m/z and intensity as tuple in list

Example:

>>> import pymzml, get_example_file
>>> example_file = get_example_file.open_example('deconvolution.mzML.gz')
>>> run = pymzml.run.Reader(example_file, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     if spectrum["ms level"] == 2:
...             peak_to_find = spectrum.hasPeak(1016.5404)
...             print(peak_to_find)
[(1016.5404, 19141.735187697403)]
hasDeconvolutedPeak(mass2find)

Checks if a deconvoluted spectrum contains a certain peak. Needs a mass value as input and returns a list of peaks if a peak is found in the spectrum. If the mass is not found [] is returned. Every peak is a tuple of m/z and intensity.

Parameters:mass2find (float) – mass value which should be found
Return type:list
Returns:mass and intensity as tuple in list if mass is found, otherwise []

Example:

>>> import pymzml, get_example_file
>>> example_file = get_example_file.open_example('deconvolution.mzML.gz')
>>> run = pymzml.run.Reader(example_file, MS1_Precision = 5e-6, MSn_Precision = 20e-6)
>>> for spectrum in run:
...     if spectrum["ms level"] == 2:
...             peak_to_find = spectrum.hasDeconvolutedPeak(1044.5804)
...             print(peak_to_find)
[(1044.5596, 3809.4356300564586)]
similarityTo(spec2)

Compares two spectra and returns cosine

Parameters:spec2 (pymzml.spec.Spectrum) – another pymzml spectrum that is compated to the current spectrum.
Returns:value between 0 and 1, i.e. the cosine between the two spectra.
Return type:float

Note

Spectra data is transformed into an n-dimensional vector, whereas mz values are binned in bins of 10 m/z and the intensities are added up. Then the cosine is calculated between those two vectors. The more similar the specs are, the closer the value is to 1.

tmzSet

Create set out of transformed m/z values (including all values in the defined imprecision).

Return type:set
tmassSet

create a set out of transformed mass values (including all values in the defined imprecision).

Return type:set
transformedPeaks

m/z value is multiplied by the internal precision

Return type:list of tuples
Returns:Returns a list of peaks (tuples of mz and intensity). Float m/z values are adjusted by the internal precision to integers.
transformed_deconvolutedPeaks

Deconvoluted mz value is multiplied by the internal precision

Return type:list of tuples
Returns:Returns a list of peaks (tuples of mz and intensity). Float m/z values are adjusted by the internal precision to integers.
deconvolute_peaks(ppmFactor=4, minCharge=1, maxCharge=8, maxNextPeaks=100, returnCharge=False, debug=False)

Calculating uncharged masses and returning deconvoluted peaks.

The deconvolution of spectra is done by first identifying isotope envelopes and the charge state of this envelopes. The first peak of an isotope envelope is choosen as the monoisotopic peak for which the mass is calculated from the m/z ratio. Isotope envelopes are identified by searching the centroided spectrum for peaks which show no preceding isotope peak within a specified mass accuracy. To be sure, the measured mass accuracy is multiplied by a user adjustable factor (ppmFactor). When the current peak meets the criteria with no preceding peaks, the following peaks are analysed. The following peaks are considered to be part of the isotope envelope, as long as they fit within the measured precision and only one local maximum is present. The second local maximum is not considered as the starting point of a new isotope envelope as one cannot be sure were this isotope envelope starts. However, the last peak before the second local maximum is considered to be part of the isotope envelope from the first local maximum, as the intensity of this peak shouldn’t have a big influence on the whole isotope envelope intensity. The charge range for detecting isotope envelopes can be specified (minCharge, maxCharge). An isotope envelope always gets the highest possible charge. With the charge the mass can be calculated from the m/z value of the first peak of the isotope envelope. The intensity of the deconvoluted peak results from the sum of all isotope envelope peaks. In a last step, deconvoluted peaks are grouped together within the measured precision. This is necessary because isotope envelopes from the same fragment but with different charge states can leed to slightly different deconvoluted peaks.

Parameters:
  • ppmFactor (int) – ppm factor (imprecision factor)
  • minCharge (int) – minimum charge considered
  • maxCharge (int) – maximum charge considered
  • maxNextPeaks – maximum length for isotope envelope
Return type:

tuple (mass, intensity)

Returns:

Deconvoluted peaks, mass (instead of m/z) and intensity are returned

deconvolutedPeaks

Calling spec.Spectrum.deconvolute_peaks() with standard parameters, which calculates uncharged masses and returns deconvoluted peaks.

Return type:list
Returns:list of deconvoluted peaks (mass (instead of m/z) / intensity tuples)