3.3. Utils
3.3.1. GSGW
- class pymzml.utils.GSGW.GSGW(file=None, max_idx=10000, max_idx_len=8, max_offset_len=8, output_path='./test.dat.igzip', comp_str=-1)[source]
Generalized Gzip writer class with random access to indexed offsets.
- Keyword Arguments:
file (string) – Filename for the resulting file
max_idx (int) – max number of indices which can be saved in this file
max_idx_len (int) – maximal length of the index in bytes, must be between 1 and 255
max_offset_len (int) – maximal length of the offset in bytes
output_path (str) – path to the output file
- Attributes:
Methods
add_data
(data, identifier)Create a new gzip member with compressed 'data' indexed with 'index'.
Only called after all the data is written, i.e. all calls to
add_data()
have been done.- _allocate_index_bytes()[source]
Allocate ‘self.max_index_num’ bytes of length ‘self.max_idx_len’ in the header for inserting the index later on.
- _write_gen_header(Index=False, FLAGS=None)[source]
Write a valid gzip header with creation time, user defined flag fields and allocated index.
- Keyword Arguments:
Index (bool) – whether to or not to write an index into this header.
FLAGS (list, optional) – list of flags (FTEXT, FHCRC, FEXTRA, FNAME) to set for this header.
- Returns:
byte offset of the file pointer
- Return type:
offset (int)
- _write_identifier(identifier)[source]
Convert and write the identifier into output file.
- Parameters:
identifier (str or int) – identifier to write into index
- _write_offset(offset)[source]
Convert and write offset to output file.
- Parameters:
offset (int) – offset which will be formatted and written into file index
- add_data(data, identifier)[source]
Create a new gzip member with compressed ‘data’ indexed with ‘index’.
- Parameters:
data (str) – uncompressed data to write to file
index (str or int) – unique index for the data
- property encoding
Returns the encoding used for this file
- property file_out
Output filehandler
- write_index()[source]
Only called after all the data is written, i.e. all calls to
add_data()
have been done.Seek back to the beginning of the file and write the index into the allocated comment bytes (see _write_gen_header(Index=True)).
3.3.2. GSGR
- class pymzml.utils.GSGR.GSGR(file=None)[source]
Generalized Gzip reader class which enables random access in files written with the
GSGW
class.- Keyword Arguments:
file (str) – path to file to read
Methods
read
([size])Read the content of the in File in binary mode
read_block
(index)Read and return the data block with the unique index index
seek
(offset)Seek to byte offset in input file.
- _read_basic_header()[source]
Read and save compression method, bitflags, changetime, compression speed and os.
- read(size=-1)[source]
Read the content of the in File in binary mode
- Keyword Arguments:
size (int, optional) – number of bytes to read, -1 for everything
- Returns:
parsed bytes from input file
- Return type:
data (bytes)
Example:
.. class SQLiteDatabase(object):
.. """
.. Example implementation of a database Connector,
.. which can be used to make run accept paths to
.. sqlite db files.
Example:
.. def _open(self, path):
.. if path.endswith('.gz'):
.. if self._indexed_gzip(path):
.. self.file_handler = indexedGzip.IndexedGzip(path, self.encoding)
.. else:
.. self.file_handler = standardGzip.StandardGzip(path, self.encoding)
.. # Insert a new condition to enable your new fileclass
.. elif path.endswith('.db'):
.. self.file_handler = utils.SQLiteConnector.SQLiteDatabase(path, self.encoding)
.. else:
.. self.file_handler = standardMzml.StandardMzml(path, self.encoding)
.. return self.file_handler