3.3. Utils

3.3.1. GSGW

class pymzml.utils.GSGW.GSGW(file=None, max_idx=10000, max_idx_len=8, max_offset_len=8, output_path='./test.dat.igzip', comp_str=-1)[source]

Generalized Gzip writer class with random access to indexed offsets.

Keyword Arguments:
  • file (string) – Filename for the resulting file

  • max_idx (int) – max number of indices which can be saved in this file

  • max_idx_len (int) – maximal length of the index in bytes, must be between 1 and 255

  • max_offset_len (int) – maximal length of the offset in bytes

  • output_path (str) – path to the output file

Attributes:
encoding

Returns the encoding used for this file

file_out

Output filehandler

Methods

add_data(data, identifier)

Create a new gzip member with compressed 'data' indexed with 'index'.

write_index()

Only called after all the data is written, i.e. all calls to add_data() have been done.

_allocate_index_bytes()[source]

Allocate ‘self.max_index_num’ bytes of length ‘self.max_idx_len’ in the header for inserting the index later on.

_write_data(data)[source]

Write data into file-stream.

Parameters:

data (str) – uncompressed data

_write_gen_header(Index=False, FLAGS=None)[source]

Write a valid gzip header with creation time, user defined flag fields and allocated index.

Keyword Arguments:
  • Index (bool) – whether to or not to write an index into this header.

  • FLAGS (list, optional) – list of flags (FTEXT, FHCRC, FEXTRA, FNAME) to set for this header.

Returns:

byte offset of the file pointer

Return type:

offset (int)

_write_identifier(identifier)[source]

Convert and write the identifier into output file.

Parameters:

identifier (str or int) – identifier to write into index

_write_offset(offset)[source]

Convert and write offset to output file.

Parameters:

offset (int) – offset which will be formatted and written into file index

add_data(data, identifier)[source]

Create a new gzip member with compressed ‘data’ indexed with ‘index’.

Parameters:
  • data (str) – uncompressed data to write to file

  • index (str or int) – unique index for the data

property encoding

Returns the encoding used for this file

property file_out

Output filehandler

write_index()[source]

Only called after all the data is written, i.e. all calls to add_data() have been done.

Seek back to the beginning of the file and write the index into the allocated comment bytes (see _write_gen_header(Index=True)).

3.3.2. GSGR

class pymzml.utils.GSGR.GSGR(file=None)[source]

Generalized Gzip reader class which enables random access in files written with the GSGW class.

Keyword Arguments:

file (str) – path to file to read

Methods

read([size])

Read the content of the in File in binary mode

read_block(index)

Read and return the data block with the unique index index

seek(offset)

Seek to byte offset in input file.

_check_magic_bytes()[source]

Check if file is a gzip file.

_read_basic_header()[source]

Read and save compression method, bitflags, changetime, compression speed and os.

_read_index()[source]

Read and save offset dict from indexed gzip file

read(size=-1)[source]

Read the content of the in File in binary mode

Keyword Arguments:

size (int, optional) – number of bytes to read, -1 for everything

Returns:

parsed bytes from input file

Return type:

data (bytes)

read_block(index)[source]

Read and return the data block with the unique index index

Parameters:

index (int or str) – identifier associated with a specific block

Returns:

indexed text block as string

Return type:

data (str)

seek(offset)[source]

Seek to byte offset in input file.

Parameters:

offset (int) – byte offset to seek to in FileIn

Returns:

None

Example:

..      class SQLiteDatabase(object):
..              """
..              Example implementation of a database Connector,
..              which can be used to make run accept paths to
..              sqlite db files.

Example:

..      def _open(self, path):
..              if path.endswith('.gz'):
..                      if self._indexed_gzip(path):
..                              self.file_handler = indexedGzip.IndexedGzip(path, self.encoding)
..                      else:
..                              self.file_handler = standardGzip.StandardGzip(path, self.encoding)
..              # Insert a new condition to enable your new fileclass
..              elif path.endswith('.db'):
..                      self.file_handler = utils.SQLiteConnector.SQLiteDatabase(path, self.encoding)
..              else:
..                      self.file_handler     = standardMzml.StandardMzml(path, self.encoding)
..              return self.file_handler