3.7. Regex patterns

This file hosts a centralized point for all regex patterns used in pymzML

Note

We need some comment lines for each regex, i.e. what does it catch .. maybe with examples and stuff …

Collection of regular expressions to catch spectrum XML-tags.

pymzml.regex_patterns.CHROMATOGRAM_AND_SPECTRUM_PATTERN_WITH_ID = re.compile('<\\s*(chromatogram|spectrum)\\s*(id=(\\".*?\\")|index=\\".*?\\")\\s(id=(\\".*?\\"))*\\s*.*\\sdefaultArrayLength=\\"[0-9]+\\">')

Regex to catch combined chromatogram and spectrum patterns

pymzml.regex_patterns.CHROMATOGRAM_CLOSE_PATTERN = re.compile(b'</chromatogram>')

Regex to catch spectrum xml close tags

pymzml.regex_patterns.CHROMATOGRAM_ID_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')

Regex to catch chromatogram id patterns

pymzml.regex_patterns.CHROMATOGRAM_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')

Regex to catch chromatogram id pattern (again ?)

pymzml.regex_patterns.FILE_ENCODING_PATTERN = re.compile(b'encoding="(?P<encoding>[A-Za-z0-9-]*)"')

Regex to catch xml file encoding

pymzml.regex_patterns.MOBY_DICK_CHAPTER_PATTERN = re.compile('CHAPTER ([0-9]+).*')

Regex to catch moby dick chapter number used in the index gezip writer example.

pymzml.regex_patterns.SIM_INDEX_PATTERN = re.compile(b'(?P<type>idRef=")(?P<nativeID>.*)">(?P<offset>[0-9]*)</offset>')

Regex pattern for SIM index

pymzml.regex_patterns.SPECTRUM_CLOSE_PATTERN = re.compile(b'</spectrum>')

Regex to catch spectrum xml close tags

pymzml.regex_patterns.SPECTRUM_ID_PATTERN2 = re.compile('(scan|scanId)=(\\d+)')

Simplified spectrum id regex. Greedly catches ints at the end of line

pymzml.regex_patterns.SPECTRUM_INDEX_PATTERN = re.compile(b'(?P<type>(scan=|nativeID="))(?P<nativeID>[0-9]*)">"(?P<offset>[0-9]*)</offset>')

Regex pattern for spectrum index works for obo format 1.1.0 until <last version checked>

Catches:
  1. demo 1

  2. demo 2

pymzml.regex_patterns.SPECTRUM_OPEN_PATTERN = re.compile(b'<*spectrum[^>]*(index|id)="(.*?)".*(index|id)="(.*?)"')

Regex to catch specturm open xml tag with encoded array length

pymzml.regex_patterns.SPECTRUM_TAG_PATTERN = re.compile('<spectrum.*?id="(?P<index>[^"]+)".*?>')

Regex to catch spectrum tag pattern