Skip to content

Cannot load a gzipped JSON trace with multiple blocks #872

@vmarkovtsev

Description

@vmarkovtsev

Perfetto trace loader doesn't support "FEXTRA" multi-block gzip files. How to reproduce:

  1. Install https://github.com/vinlyx/mgzip
  2. Take any exiting JSON trace
  3. Code
with open("trace.json") as fin:
    with mgzip.open("trace.json.gz", "wt", thread=8, blocksize=1 << 16) as fout:
        while buffer := fin.read(1 << 16):
            fout.write(buffer)
  1. For example,
../trace_processor --httpd trace.json.gz
JSON trace file is incomplete
  1. This will work:
gzip -d trace.json.gz
gzip trace.json
../trace_processor --httpd trace.json.gz

Why does this weird gzip format property matter to me? We, 100B-parameter base LLM trainers in PyTorch, deal with a few hundred megs of profile that require considerable time to compress every few minutes, so compressing them on 192 available CPU cores gives a considerable benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions