3.7. Regex patterns
This file hosts a centralized point for all regex patterns used in pymzML
Note
We need some comment lines for each regex, i.e. what does it catch .. maybe with examples and stuff …
Collection of regular expressions to catch spectrum XML-tags.
- pymzml.regex_patterns.CHROMATOGRAM_AND_SPECTRUM_PATTERN_WITH_ID = re.compile('<\\s*(chromatogram|spectrum)\\s*(id=(\\".*?\\")|index=\\".*?\\")\\s(id=(\\".*?\\"))*\\s*.*\\sdefaultArrayLength=\\"[0-9]+\\">')
Regex to catch combined chromatogram and spectrum patterns
- pymzml.regex_patterns.CHROMATOGRAM_CLOSE_PATTERN = re.compile(b'</chromatogram>')
Regex to catch spectrum xml close tags
- pymzml.regex_patterns.CHROMATOGRAM_ID_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')
Regex to catch chromatogram id patterns
- pymzml.regex_patterns.CHROMATOGRAM_PATTERN = re.compile('<chromatogram.*id="(.*?)".*?>')
Regex to catch chromatogram id pattern (again ?)
- pymzml.regex_patterns.FILE_ENCODING_PATTERN = re.compile(b'encoding="(?P<encoding>[A-Za-z0-9-]*)"')
Regex to catch xml file encoding
- pymzml.regex_patterns.MOBY_DICK_CHAPTER_PATTERN = re.compile('CHAPTER ([0-9]+).*')
Regex to catch moby dick chapter number used in the index gezip writer example.
- pymzml.regex_patterns.SIM_INDEX_PATTERN = re.compile(b'(?P<type>idRef=")(?P<nativeID>.*)">(?P<offset>[0-9]*)</offset>')
Regex pattern for SIM index
- pymzml.regex_patterns.SPECTRUM_CLOSE_PATTERN = re.compile(b'</spectrum>')
Regex to catch spectrum xml close tags
- pymzml.regex_patterns.SPECTRUM_ID_PATTERN2 = re.compile('(scan|scanId)=(\\d+)')
Simplified spectrum id regex. Greedly catches ints at the end of line
- pymzml.regex_patterns.SPECTRUM_INDEX_PATTERN = re.compile(b'(?P<type>(scan=|nativeID="))(?P<nativeID>[0-9]*)">"(?P<offset>[0-9]*)</offset>')
Regex pattern for spectrum index works for obo format 1.1.0 until <last version checked>
- Catches:
demo 1
demo 2
- pymzml.regex_patterns.SPECTRUM_OPEN_PATTERN = re.compile(b'<*spectrum[^>]*(index|id)="(.*?)".*(index|id)="(.*?)"')
Regex to catch specturm open xml tag with encoded array length
- pymzml.regex_patterns.SPECTRUM_TAG_PATTERN = re.compile('<spectrum.*?id="(?P<index>[^"]+)".*?>')
Regex to catch spectrum tag pattern