Skip to content

hxim/paq8px

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

831 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAQ8PX – Experimental Lossless Data Compressor & Entropy Estimator

About

PAQ is a family of experimental, high-end lossless data compression programs. paq8px is one of the longest-running branches of PAQ, started by Jan Ondrus in 2009 with major contributions from Márcio Pais and Zoltán Gotthardt (see Contribution Timeline).

paq8px consistently achieves state-of-the-art compression ratios on various data compression benchmarks (see Benchmark Results). This performance comes at the cost of speed and memory usage, which makes it impractical for production use or long-term storage. However, it is particularly well-suited for file entropy estimation and as a reference for compression research.

For detailed history and ongoing development discussions, see the paq8px thread on encode.su.

Quick start

paq8px is portable software – no installation required.

Get the latest binary for Windows (x64) from the Releases page or from the paq8px thread on encode.su, or build it from source for your platform – see below.

Command line interface

paq8px does not include a graphical user interface (GUI). All operations are performed from the command line.

Open a terminal and run paq8px with the desired options to compress your file (such as paq8px -8 file.txt).
Start with a small file – compression takes time.

Example output (on Windows):

c:\>paq8px.exe -8 file.txt
paq8px archiver v214 (c) 2026, Matt Mahoney et al.

Creating archive file.txt.paq8px214 in single file mode...

Filename: file.txt (111261 bytes)
Block segmentation:
 0           | text             |    111261 bytes [0 - 111260]
-----------------------
Total input size     : 111261
Total archive size   : 19597

Time 16.62 sec, used 2164 MB (2269587029 bytes) of memory

Note

The output archive extension is versioned (e.g., .paq8px214).

Note

You can place the binary anywhere and reference inputs/outputs by path.

Some examples

Compress a file at level 8 (balanced speed and compression ratio):

paq8px.exe -8 filename_to_compress 

Compress at the maximum level with LSTM modeling included (-12L):

paq8px.exe -12L filename_to_compress 

Warning

This mode is extremely slow and memory-intensive. Make sure you have 32 GB+ RAM.

Getting help

To view available options, run paq8px without arguments. To view available options + detailed help pages, run paq8px -help.

Click to expand: full paq8px help
paq8px archiver v214 (c) 2026, Matt Mahoney et al.
Free under GPL, http://www.gnu.org/licenses/gpl.txt

Usage:
  to compress       ->   paq8px -LEVEL[FLAGS] [OPTIONS] INPUT [OUTPUT]
  to decompress     ->   paq8px -d INPUT.paq8px214 [OUTPUT]
  to test           ->   paq8px -t INPUT.paq8px214 [OUTPUT]
  to list contents  ->   paq8px -l INPUT.paq8px214

LEVEL:
  -1 -2 -3 -4          | Compress using less memory (529, 543, 572, 630 MB)
  -5 -6 -7 -8          | Use more memory (747, 980, 1446, 2377 MB)
  -9 -10 -11 -12       | Use even more memory (4241, 7968, 15421, 29305 MB)
  -0                   | Segment and transform only, no compression
  -0L                  | Segment and transform then LSTM-only compression (alternative: -lstmonly)

FLAGS:
  L                    | Enable LSTM model (+24 MB per block type)
  A                    | Use adaptive learning rate
  S                    | Skip RGB color transform (images)
  B                    | Brute-force DEFLATE detection
  E                    | Pre-train x86/x64 model
  T                    | Pre-train text models (dictionary-based)

  Example: paq8px -8LA file.txt   <- Level 8 + LSTM + adaptive learning rate

Block detection control (compression-only):
  -forcebinary         | Force generic (binary) mode
  -forcetext           | Force text mode

LSTM-specific options (expert-only):
  -lstmlayers=N        | Set the number of LSTM layers to N (1..5, default is 2)
  -savelstm:text FILE  | Save learned LSTM model weights after compression
  -loadlstm:text FILE  | Load LSTM model weights before compression/decompression

Misc options:
  -v                   | Verbose output
  -log FILE            | Append compression results to log file
  -simd MODE           | Override SIMD detection - expert only (NONE|SSE2|AVX2|AVX512|NEON)

Notes:
  INPUT may be FILE, PATH/FILE, or @FILELIST
  OUTPUT is optional: FILE, PATH, PATH/FILE
  The archive is created in the current folder with .paq8px214 extension if OUTPUT omitted
  FLAGS are case-insensitive and only needed for compression; they may appear in any order
  INPUT must precede OUTPUT; all other OPTIONS may appear anywhere

=============
Detailed Help
=============

---------------
 1. Compression
---------------

  Compression levels control the amount of memory used during both compression and decompression.
  Higher levels generally improve compression ratio at the cost of higher memory usage and slower speed.
  Specifying the compression level is needed only for compression - no need to specify it for decompression.
  Approximately the same amount of memory will be used during compression and decompression.

  The listed memory usage for each LEVEL (-1 = 529 MB .. -12 = 29305 MB) is typical/indicative for compressing binary
  files with no preprocessing. Actual memory use is lower for text files and higher when a preprocessing step
  (segmentation and transformations) requires temporary memory. When special file types are detected, special models
  (image, jpg, audio) will be used and thus will require extra RAM.

------------------
 2. Special Levels
------------------

  -0   Only block type detection (segmentation) and block transformations are performed.
       The data is copied (verbatim or transformed); no compression happens.
       This mode is similar to a preprocessing-only tool like precomp.
       Uses approximately 3-7 MB total.

  -0L  Uses only a single LSTM model for prediction which is shared across all block types.
       Uses approximately 20-24 MB total RAM.
       Alternative: -lstmonly

---------------------
 3. Compression Flags
---------------------

  Compression flags are single-letter, case-insensitive, and appended directly to the level.
  They are valid only during compression. No need to specify them for decompression.

  L   Enable the LSTM (Long Short-Term Memory) model.
      Uses a fixed-size model, independent of compression level.

      At level -0L (also: -lstmonly) a single LSTM model is used for prediction for all detected block types.
      Block detection and segmentation are still performed, but no context mixing or Secondary Symbol
      Estimation (SSE) stage is used.

      At higher levels (-1L .. -12L) the LSTM model is included as a submodel in Context Mixing and its predictions
      are mixed with the other models.
      When special block types are detected, for each block type an individual LSTM model is created dynamically and
      used within that block type. Each such LSTM model adds approximately 24 MB to the total memory use.

  A   Enable adaptive learning rate in the CM mixer.
      May improve compression for some files.

  S   Skip RGB color transform for 24/32-bit images.
      Useful when the transform worsens compression.
      This flag has no effect when no image block types are detected.

  B   Enable brute-force DEFLATE stream detection.
      Slower but may improve detection of compressed streams.

  E   Pre-train the x86/x64 executable model.
      This option pre-trains the EXE model using the paq8px.exe binary itself.
      Archives created with a different paq8px.exe executable (even when built from the same source and build options)
      will differ. To decompress an archive created with -E, you must use the exact same executable that created it.

  T   Pre-train text-oriented models using a dictionary and expression list.
      The word list (english.dic) and expression list (english.exp) are used only to pre-train models before
      compression and they are not stored in the archive.
      You must have these same files available to decompress archives created with -T.

---------------------------
 4. Block Detection Control
---------------------------

  Block detection and segmentation always happen regardless of the memory level or other options - except when forced:

  -forcebinary

      Disable block detection; the whole file is considered as a single binary block and only the generic (binary)
      model set will be used.
      Useful when block detection produces false positives.

  -forcetext

      Disable block detection; consider the whole file as a single text block and use the text model set only.
      Useful when text data is misclassified as binary or fragments in a text file are incorrectly detected as some
      other block type.

---------------------------------------
 5. LSTM-Specific Options (expert-only)
---------------------------------------

  -lstmlayers=N

      Set the number of LSTM layers to N. Using more layers generally leads to better compression, but memory use
      will be higher (scales linearly with N) and compression time will be significantly slower. The default is N=2.

  -savelstm:text FILE

      Saves the LSTM model's learned parameters as a lossless snapshot to the specified file when compression finishes.
      Only the model used for text block(s) will be saved.
      It's not possible to save a snapshot from other block types. This is an experimental feature.

  -loadlstm:text FILE

      Loads the LSTM model's learned parameters from the specified file (which was saved earlier
      by the -savelstm:text option) before compression starts. The LSTM model will use this loaded
      snapshot to bootstrap its predictions.
      At levels -1L .. -12L only text blocks are affected.
      At level -0L all blocks are affected (because a single LSTM model is used for all block types).
      Critical: The same snapshot file MUST be used during decompression or the original content cannot be recovered.

----------------------
 6. Archive Operations
----------------------

  -d  Decompress an archive.
      In single-file mode the content is decompressed, the name of the output is the name of the archive without
      the .paq8px214 extension.
      In multi-file mode first the @LISTFILE is extracted then the rest of the files. Any required folders will
      be created recursively, all files will be extracted with their original names.
      If the output file or files already exist they will be overwritten.

      Example: to decompress file.txt to the current folder:
      paq8px -d file.txt.paq8px214

  -t  Test archive contents by decompressing to memory and comparing with the original data on-the-fly.
      If a file fails the test, the first mismatched position will be printed to screen.

      Example: to test archive contents:
      paq8px -t file.txt.paq8px214

  -l  List archive contents.
      Extracts the embedded @FILELIST (if present) and prints it.
      Applicable only to multi-file archives.

      Example: to list the file list (when the archive was created using @files):
      paq8px -l files.paq8px214

----------------------------------
 7. INPUT and OUTPUT Specification
----------------------------------

  INPUT may be:

  * A single file
  * A path/file
  * A [path/]@FILELIST

  In multi-file mode (i.e. when @FILELIST is provided) only file names, file contents and file sizes are stored
  in the archive. Timestamps, permissions, attributes or any other metadata are not preserved unless stored
  separately and manually by the user in the FILELIST.

  OUTPUT is optional:

    For compression:

    * If omitted, the archive is created in the current directory.
      The name of the archive: INPUT + paq8px214 extension appended.
    * If a filename is given, it is used as the archive name.
    * If a directory is given, the archive is created inside it.
    * If the archive file already exists, it will be overwritten.

    For decompression:

    * If an output filename is not provided, the output will be named the same as the archive without
      the paq8px214 extension.
    * If a filename is given, it is used as the output name.
    * If a directory is given, the restored file will be created inside it (the directory must exist).
    * If the output file(s) already exist, they will be overwritten.

  Examples:

  To create data.txt.paq8px214 in current directory:
  paq8px -8 data.txt

  To create archive.paq8px214 in current directory:
  paq8px -8 data.txt archive.paq8px214

  To create data.txt.paq8px214 in results/ directory:
  paq8px -8 data.txt results/

---------------------------------
 8. @FILELIST Format and Behavior
---------------------------------

  When a @FILELIST is provided, the FILELIST file itself is compressed as the first file in the archive and
  automatically extracted during decompression.

  The FILELIST is a tab-separated text file with this structure:

    Column 1:  Filenames and optional relative paths (required, used by compressor)
    Column 2+: Arbitrary metadata - timestamps, ownership, etc. (optional, preserved but ignored)

    First line: Header (preserved but ignored during processing the file list)

  Only the first column is used by the compressor and decompressor.
  All other columns are preserved but ignored.
  Paths must be relative to the FILELIST location.

  Using this mechanism allows full restoration of file metadata with third-party tools after decompression.


-------------------------
 9. Miscellaneous Options
-------------------------

  -v

    Enable verbose output.

  -log FILE

    Append compression results to a tab-separated log file.
    Logging applies only to compression.

  -simd MODE

    Normally, the highest usable SIMD instruction set is detected and used automatically

    - for the CM mixer - supported: SSE2, AVX2, AVX512, ARM NEON
    - for neural network operations in the LSTM model - supported: SSE2, AVX2
    - for the LSM and OLS predictors (used mainly in image and audio models) - supported: SSE2.

    This option overrides the detected SIMD instruction set. Intended for expert use and benchmarking.
    Supported values (case-insensitive):
       NONE
       SSE2, AVX2, AVX512 (on x64)
       NEON (on ARM)

    Note that when paq8px is compiled for a specific CPU architecture, the compiler may automatically
    vectorize some parts of the code. While selecting 'NONE' disables all manually optimized SIMD
    implementations, the remaining scalar code may still be auto-vectorized by the compiler and
    therefore may not be entirely free of vector instructions.

----------------------
 10. Argument Ordering
----------------------

  Command-line arguments may appear in any order with the following exception:
  INPUT must always precede OUTPUT.

  Example: the following two are equivalent:

    paq8px -v -simd sse2 enwik8 -log results.txt output/ -8
    paq8px -8 enwik8 -log results.txt output/ -v -simd sse2

  Further examples:

    paq8px -8 file.txt         | Compress using ~2.3 GB RAM
    paq8px -12L enwik8         | Compress 'enwik8' with maximum compression (~29 GB RAM), use the LSTM model as well
    paq8px -4 image.jpg        | Compress the 'image.jpg' file - using less memory, even faster
    paq8px -8ba b64sample.xml  | Compress 'b64sample.xml' faster and using less memory
                                 Put more effort into finding and transforming DEFLATE blocks
                                 Use adaptive learning rate.
    paq8px -8s rafale.bmp      | Compress the 'rafale.bmp' image file
                                 Skip color transform - this file compresses better without it

Compatibility & archive basics

A paq8px archive stores one or more files in a highly compressed format.

Note

Files and archives larger than 2 GB are not supported.

Note

paq8px archives are not compatible across different paq8px releases (past or future).

Note

A paq8px archive may contain multiple files, but once created, you cannot add to or remove files from the archive.

How to recognize it

The file extension reflects the exact paq8px version that created it (e.g., .paq8px214).
You can also check the header: if the first bytes read "paq8px", it is likely a paq8px archive.
Exact version information cannot be inferred from the archive content: the archive header does not encode the specific paq8px version used. Only the file extension reflects the version.

Single file vs multiple file modes

In single-file mode, only file contents are stored – no paths, names, timestamps, attributes, permissions, or other metadata.

In multi-file mode, you may preserve such metadata via the @FILELIST mechanism (see the help screen for details).

Notes on pre-training

Warning

Archives made with pre-training-like options (-E, -T, -R) are fragile — decompression requires the same binary and/or external files.

  1. The exe pre-training (-E)
    This option pre-trains the EXE model using the paq8px.exe binary itself.
    Archives created with a different paq8px.exe binary (even when built from the same source and build options) will differ.
    To decompress an archive created with -E, you must use the exact same executable that created it.

  2. Text pre-training (-T)
    The word list (english.dic) and expression list (english.exp) are used only to pre-train models before compression and they are not stored in the archive.
    You must have these same files available to decompress archives created with -T.

  3. LSTM pre-trained weight repositories (-R)
    If you use pre-trained LSTM repositories, ensure the same RNN weight files (english.rnn, x86_64.rnn) are available during decompression.

Warning

The LSTM repositories are temporarily unavailable in the latest release due to the refactoring of the model. The latest version supporting this feature was v209.

How to compile

Building paq8px requires a C++17 capable C++ compiler:
https://en.cppreference.com/w/cpp/compiler_support#cpp17

Windows:
On Windows, you can download a prebuilt executable instead of compiling. Just grab the latest executable from the https://encode.su/threads/342-paq8px thread.
If you would like to build an executable yourself you may use the Visual Studio solution file or in case of Mingw-w64 see the build-mingw-w64-generic-publish.cmd batch file in the build subfolder.

Linux/macOS:
The ./build folder already contains helper scripts.
You may use the following commands to build with cmake:

sudo apt-get install build-essential zlib1g-dev cmake make
cd build
./build-linux-with-cmake.sh

Testing in a Linux VM

  • Get a Linux VM (such as Lubuntu 25.04 Plucky Puffin)
  • Install the required compilers and tools with the following commands:
sudo apt update
sudo apt install gcc clang gcc-aarch64-linux-gnu g++-aarch64-linux-gnu build-essential cmake zlib1g-dev

Sample build scripts are provided in the build/ folder:

  • build/build-linux-with-cmake.sh
  • build/build-linux-with-gcc.sh
  • build/build-linux-with-clang.sh
  • build/build-linux-cross-compile-aarch64.sh

Tested toolchains

The following compiler/OS combinations have been tested successfully:

Version OS Compiler/IDE
v214 Windows Visual Studio 2022 Community Edition 17.14.14
v214 Windows Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35216
v214 Windows MinGW-w64 13.0.0 (gcc-15.2.0)
v211 Lubuntu 25.04 Plucky Puffin gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
v211 Lubuntu 25.04 Plucky Puffin Ubuntu clang version 20.1.2 (0ubuntu1), Target: x86_64-pc-linux-gnu
v211 Lubuntu 25.04 Plucky Puffin aarch64-linux-gnu-gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0

Other modern C++17 compilers may also work but are not routinely tested.

Note

We build and test 64-bit releases. 32-bit releases are seldom built or tested.
A known limitation of 32-bit releases is the 2 GB memory barrier. As a consequence, compression and decompression with 32-bit releases may not work ("out of memory") on level 8 and above.

Release checklist

When you make a new release:

  • Please update the version number in the "Versioning" section in the paq8px.cpp source file.
  • Please append a short description of your modifications to the CHANGELOG file.
  • Please carry out some sanity checks. Run these tests with asserts on (remove the NDEBUG preprocessor directive).
  • Please verify if paq8px can be propely built on different platforms (i.e. test all the build scripts)
  • Update README.md, especially the Benchmark results.

References

How it works

paq8px compresses files bit by bit using a technique called context mixing: multiple models make probabilistic predictions for the next bit, and a mixer combines them into a single, more accurate probability, which is then encoded with an arithmetic coder.

This approach is computationally intensive but highly adaptive, making paq8px especially effective for entropy estimation, compressibility testing and research purposes.

For an in-depth technical explanation, see the DOC file.

Benchmark results

Benchmark results are provided on various corpora for comparison with other compressors.
Rankings are based solely on compression ratio, not speed or memory usage to show reference compressed sizes achievable on these datasets.
Results are drawn from official listings where available, or from community testing when benchmarks are no longer maintained.
Results last verified: Sept 21, 2025.

Summary:

Corpus / Benchmark Version Rank
Calgary v213 #2
Canterbury v213 #2
Silesia v213 #1
RareWares test samples (16-bit stereo audio) v213 --
Kodak Lossless True Color Image Suite v213 #1
ImgInfo RGB test set v213 #1
Lossless Photo Compression Benchmark (LPCB) v206 #1
Large Text Compression Benchmark (LTCB) v206 #10
Darek's corpus (DBA) v210 #1
Maximumcompression benchmark v207fix1 #1
fenwik9 benchmark by Sportman v210 #1
World English Bible benchmark by Sportman v208fix1 #1

For the Calgary, Canterbury, Silesia and MaximumCompression benchmarks, see paq8px evolution up to paq8px_v207fix1, run by Darek in his post in the paq8px thread

Calgary corpus

The Calgary corpus does not have an official maintained ranking, and most published results do not include modern experimental compressors.

Below are compressed sizes under various options, compared to cmix v21 as reference.

File (v213) -8 (v213) -12L (v210) -12LT (v209) -12RT cmix v21 cmix v21 (with dict)
bib 19597 19530 17492 17376 19746 17180
book1 183307 181513 175722 163431 182429 173709
book2 113968 113152 108844 106668 113286 105918
geo 42476 42257 42265 42367 42651 42760
news 83024 82689 78490 77166 82869 76389
obj1 7060 6981 6841 6892 7154 7053
obj2 40935 40137 39820 39950 40380 40139
paper1 12365 12324 11041 10749 12449 10831
paper2 19541 19473 17478 16589 19636 17169
pic 19621 19637 19669 19677 21487 21883
progc 8872 8808 8206 8189 8900 8193
progl 9507 9450 8876 8864 9524 8788
progp 6373 6296 6061 6097 6395 6126
trans 10986 10951 10056 10045 10822 9990
Total compressed size 577'632 573'198 550'861 534'060 577'728 546'128
Compression time (approx. sec) 264 772 1231 1567 3746 n/a

With fair options (-12LT), paq8px v210 achieved results close to cmix v21 (with dictionary pre-processng).
With unfair options (-12RT), results surpass cmix, but these should be excluded (see Benchmarking Notes).

At the time of writing, paq8px v214 likely ranks #2 on Calgary behind cmix v21.

Canterbury corpus

The same general notes apply to the Canterbury corpus as to the Calgary corpus.

Below are compressed sizes under various options, compared to cmix v21.

File (v213) -8 (v213) -12L (v210) -12LT (v209) -12RT cmix v21 cmix v21 (with dict)
alice29.txt 33071 32863 31138 28317 33360 31076
asyoulik.txt 31514 31431 29601 28062 31665 29434
cp.html 5408 5393 4740 4720 5478 4746
fields.c 2028 2018 1856 1848 2087 1909
grammar.lsp 861 862 750 732 874 771
kennedy.xls 8152 7817 7850 7972 7926 7955
lcet10.txt 79113 78813 74655 72594 79550 73365
plrabn12.txt 117447 116705 112546 108648 116984 112263
ptt5 19621 19637 19669 19677 21487 21883
sum 6826 6801 6657 6679 6968 6870
xargs.1 1296 1294 1097 1061 1326 1123
Total compressed size 305'337 303'634 290'559 280'310 307'705 291'395
Compression time (approx. sec) 218 645 1015 1352 3354 n/a

At the time of writing, paq8px v214 likely ranks #2 on Canterbury behind cmix v21.

Silesia corpus

paq8px v210 ranked #1 in The Silesia Open Source Compression Benchmark at the time of writing.

Results for paq8px v213 together with cmix v21 as reference:

File -12L precomp v0.4.7 -cn + cmix v21 (with dict)
dickens 1'860'120 1'802'071
mozilla 6'128'852 6'634'210
mr 1'852'546 1'828'423
nci 776'633 781'325
ooffice 1'218'714 1'221'977
osdb 1'968'170 1'963'597
reymont 699'462 704'817
samba 1'588'070 1'588'875
sao 3'723'788 3'726'502
webster 4'401'642 4'271'915
xml 245'786 233'696
x-ray 3'521'253 3'503'686
Total compressed size 27'985'036 28'261'094
Compression time (approx. sec) 63'651 n/a

Here paq8px outperformed cmix v21 overall, though performance varies per file.

RareWares test samples (16-bit stereo audio)

The RareWares test samples has no official benchmarking for lossless audio compression. The files were converted from WavPack to WAV before compression.

Results for paq8px v212 and paq8px v213 together with OptimFrog as reference:

File (v212) -6 (v213) -6 OptimFrog*
41_30sec.wav 3'284'213 3'283'811 3'269'665
ATrain.wav 1'551'889 1'549'875 1'510'497
Bachpsichord.wav 2'373'830 2'372'713 2'150'210
Bartok_strings2.wav 1'685'993 1'683'560 1'650'617
BeautySlept.wav 1'348'613 1'348'818 1'342'402
BigYellow.wav 3'107'572 3'108'409 3'092'722
Blackwater.wav 2'005'865 2'003'290 1'961'874
bodyheat.wav 2'403'241 2'401'078 2'464'752
chanchan.wav 1'292'080 1'293'093 1'299'421
DaFunk.wav 2'259'027 2'259'112 2'276'973
death2.wav 1'077'353 1'075'118 1'129'132
Debussy.wav 1'325'814 1'304'118 1'300'765
EnolaGay.wav 2'964'656 2'967'231 2'915'459
experiencia.wav 2'418'250 2'419'769 2'407'521
female_speech.wav 1'001'434 941'697 951'494
FloorEssence.wav 2'092'472 2'093'559 2'075'225
getiton.wav 2'617'374 2'613'600 2'603'002
gone.wav 3'318'939 3'316'859 3'288'315
Hongroise.wav 1'757'526 1'740'649 1'718'751
Illinois.wav 2'777'370 2'776'349 2'740'986
ItCouldBeSweet.wav 1'838'377 1'837'187 1'833'977
kraftwerk.wav 1'800'761 1'800'019 1'875'449
Layla.wav 2'126'370 2'127'201 2'092'815
Leahy.wav 3'657'206 3'658'074 3'642'629
LifeShatters.wav 2'385'773 2'384'127 2'372'681
macabre.wav 1'781'129 1'779'770 1'738'196
Mahler.wav 2'456'386 2'452'483 2'418'657
male_speech.wav 895'904 848'470 842'498
Mama.wav 3'268'372 3'265'384 3'339'379
MidnightVoyage.wav 2'305'443 2'304'076 2'282'623
mybloodrusts.wav 2'364'972 2'367'582 2'363'087
NewYorkCity.wav 3'997'780 3'996'749 3'990'058
OrdinaryWorld.wav 3'115'705 3'116'192 3'120'641
Polonaise.wav 1'541'904 1'522'442 1'471'865
Quizas.wav 2'823'305 2'825'411 2'825'230
riteofspring.wav 1'686'084 1'684'226 1'779'253
rosemary.wav 2'734'582 2'732'578 2'723'780
Scars.wav 2'200'952 2'199'884 2'190'466
SinceAlways.wav 2'096'819 2'097'599 2'087'695
thear1.wav 2'443'956 2'442'228 2'428'164
TheSource.wav 2'325'523 2'325'891 2'317'006
TomsDiner.wav 1'545'070 1'544'343 1'556'186
trust.wav 2'885'710 2'884'743 2'920'069
Twelve.wav 3'619'506 3'619'004 3'590'123
velvet.wav 1'313'525 1'315'243 1'308'290
Waiting.wav 2'187'301 2'185'463 2'171'128
Total compressed size 104'061'926 103'869'077 103'431'728
Compression time (approx. sec) 6131 5464 n.a.

*OmtimFrog: ofr --encode --preset max %1

At the time of writing, paq8px v213 is unranked.

Kodak Lossless True Color Image Suite

The Kodak Lossless True Color Image Suite has no official benchmarking for lossless image compression. The images were converted from PNG to PPM before compression.

Results for paq8px v213 and paq8px v214:

File (v213) -8 (v213) -8L (v214) -8 (v214) -8L
kodim01.ppm 315'510 312'246 311'386 308'621
kodim02.ppm 257'732 255'672 254'005 252'320
kodim03.ppm 201'093 199'991 198'223 197'404
kodim04.ppm 267'012 264'499 262'669 260'569
kodim05.ppm 339'543 335'871 332'641 329'738
kodim06.ppm 290'075 287'476 286'119 283'942
kodim07.ppm 222'406 220'656 218'511 217'107
kodim08.ppm 353'115 348'643 346'164 342'504
kodim09.ppm 245'384 243'749 241'422 240'025
kodim10.ppm 252'837 251'080 248'722 247'300
kodim11.ppm 279'161 276'482 274'932 272'722
kodim12.ppm 231'887 230'026 228'755 227'222
kodim13.ppm 398'692 392'799 391'737 386'548
kodim14.ppm 313'938 311'208 308'775 306'562
kodim15.ppm 253'281 251'250 249'470 247'873
kodim16.ppm 238'022 236'228 234'886 233'322
kodim17.ppm 252'459 250'915 248'414 247'115
kodim18.ppm 362'678 357'394 354'562 349'938
kodim19.ppm 293'100 290'411 287'755 285'541
kodim20.ppm 239'030 237'359 235'864 234'483
kodim21.ppm 297'485 294'970 292'545 290'341
kodim22.ppm 329'634 325'054 322'563 318'509
kodim23.ppm 249'698 247'729 245'520 243'895
kodim24.ppm 298'519 294'247 293'309 289'773
Total compressed size 6'782'291 6'715'955 6'668'949 6'613'374
Compression time (approx. sec) 1'750 6'012 2'007 6'330

At the time of writing, paq8px v214 likely ranks #1 on the Kodak test set among lossless compressors with no pre-trained models.

Other compressors for reference: GitHub - WangXuan95/Image-Compression-Benchmark: A comparison of many lossless image compression formats.

ImgInfo RGB testset

The ImgInfo RGB test set has no official benchmarking for lossless image compression.

Results for paq8px v213 and paq8px v214:

File (v213) -8L (v214) -8L
artificial.ppm 396'742 394'150
big_building.ppm 43'524'971 42'805'256
big_tree.ppm 37'369'579 36'668'068
bridge.ppm 16'824'955 16'805'572
cathedral.ppm 6'576'508 6'468'273
deer.ppm 18'168'804 18'110'156
fireworks.ppm 3'169'176 3'129'441
flower_foveon.ppm 1'621'443 1'613'759
hdr.ppm 4'621'868 4'602'334
leaves_iso_1600.ppm 8'194'684 8'034'152
leaves_iso_200.ppm 6'327'733 6'212'126
nightshot_iso_100.ppm 4'558'484 4'477'648
nightshot_iso_1600.ppm 9'268'001 9'139'073
spider_web.ppm 5'492'330 5'413'843
Total compressed size 166'115'278 163'873'851
Compression time (approx. sec) n.a. 111'420

At the time of writing, paq8px v214 likely ranks #1 on the ImgInfo RGB test set among lossless compressors with no pre-trained models.

Other compressors for reference: GitHub - WangXuan95/Image-Compression-Benchmark: A comparison of many lossless image compression formats.

Lossless Photo Compression Benchmark (LPCB)

paq8px v206 ranked #1 at Lossless Photo Compression Benchmark.

The benchmark has not been rerun for later versions.

Large Text Compression Benchmark (LTCB)

paq8px v206 ranked #10 at Large Text Compression Benchmark at the time of writing.
Note, that unlike paq8px, most higher-ranked compressors are tuned specifically for enwik8/enwik9, and often apply enwik-specific preprocessing (e.g., word replacement, article reordering).

The benchmark has not been rerun for later versions.

Darek's corpus (DBA)

Darek's benchmark is not an exhaustive benchmark – it targets only high-end compressors.

See the last results in Darek's post to the encode.su forum from 2026 including results for v210.

paq8px v210 ranked #1 at that time.

MaximumCompression benchmark

The MaximumCompression benchmark is no longer actively maintained and has no up-to-date official listing.
The official site was last updated in 2011. At that time paq8px ranked #1.

See paq8px evolution on the MaximumCompression benchmark up until paq8px v207fix1 in Darek's post to the encode.su forum from 2022.

Compressed sizes for v210 and v213 with compression option -12L (-12Ls for rafale.bmp).

File -12L (v210) -12L (v213) size diff
A10.jpg 624023 624043 20
acrord32.exe 786553 786547 -6
english_mc.dic 333089 333052 -37
FlashMX.pdf 1289571 1267622 -21949
fp.log 199933 199754 -179
mso97.dll 1121228 1121280 52
ohs.doc 452209 452642 433
rafale.bmp 463390 455272 -8118
vcfiu.hlp 245448 244682 -766
world95.txt 309236 309216 -20
Total compressed size 5'824'680 5'794'110 -30570
Compression time (sec) 19'384 19'048' -335

To the best of our knowledge, paq8px's latest version, v214, would still rank #1 at the time of writing.

fenwik9 benchmark

paq8px v210 ranked #1 in the fenwik9 benchmark.
This is a non-standard but exhaustive single-file benchmark maintained by Sportman.

World English Bible benchmark (WEB)

paq8px v208fix1 ranked #1 in the World English Bible benchmark.
This is a non-standard but exhaustive single-file benchmark maintained by Sportman.

Benchmarking Notes

Warning

  1. Using -R to load pre-trained LSTM weight repositories is unfair if the target file to be compressed was part of the training data.
  2. Some compressors use text-preprocessing with external dictionary (such as cmix v21). paq8px doesn't use such techniques, but text files may be pre-processed with an external tool to boost its performance.
  3. Benchmarks and leaderboards change over time – rankings may shift.
  4. Hardware does not affect compression ratio and memory use, but it does affect runtime; reported times are approximate and for context only.

PAQ8PX contribution timeline

paq8px is a branch of the PAQ compressor series, descended from earlier versions such as PAQ7 and the PAQ8 variants (e.g., PAQ8A-PAQ8P).

Development began in 2009 and remains active, supported by a global community of contributors.

Work has focused on expanding model coverage (images, audio, executables, text) with emphasis on compression ratio.

The table below highlights milestones, contributors, and notable changes over the years.

Year Versions Contributors & Highlights
Pre-2009 PAQ roots Matt Mahoney: Original PAQ author. Early branches (paq8hp*, paq8fthis*, paq8p3, lpaq1) introduced context maps with 16-bit checksums, probabilistic state tables, specialized models (JPEG, sparse, DMC, distance-based), exe model/filter. Added directory compression and drag-and-drop (PAQ8A), BMP/PGM/JPEG/WAV support, APM/StateMap optimizations.
2009 v0–v67 Jan Ondrus: Founded paq8px, adding TGA/TIFF/AIFF/MOD/S3M models, PPM/PBM compression, CD sector transform, exe filters, recursive sub-blocks, WAV-model improvements.
Simon Berger: TGA 24/8-bit, TIFF/AIFF improvements, MSVC fixes, compression pipeline rewrite.
LovePimple: Portability fixes.
2010 v68–v69 Jan Ondrus: Added -l listing option, fix for multi-path file compression.
2016 v70–v75 Jan Ondrus: Add zlib recompression (initially unstable), PDF image support, Base64 transform, GIF recompression, and paq8pxd model updates (incl. im8bitModel), plus multiple bugfixes (zlib header/progress display, Base64, GIF).
2017 v76–v127 Márcio Pais: JPEG upgrades (subsampling, thumbnails, MJPEG), record/BMP models, grayscale detection, XML model, x86/x64 pre-training, PNG recompression, DEFLATE MTF + brute force, dBASE parsing, adaptive learning rate, English stemmer.
Jan Ondrus: JPEG tweaks, PAM format detection, block handling, PDF 4-bit fix.
Zoltán Gotthardt: Fixes, MSVC/Array/ilog2 fixes, faster JPEG learning rate, IO improvements.
Mauro Vezzosi: Bug reports, dmcModel patch.
2018 v128–v173 Márcio Pais: Extended text modeling (English/French/German stemmers, language detection, SparseMatchModel, SSE refinements, RLE/EOL transforms), 8bpp/24–32bpp image model improvements, JPEG tweaks, pre-training refinements.
Zoltán Gotthardt: New CLI and file handling, DMC enhancements, hashing improvements, charGroupModel, compiler/portability fixes.
Andrew Epstein: AVX2 optimizations, macOS build fixes.
2019 v174–v183 Márcio Pais: Added linearPredictionModel, audio8bModel, audio16bModel, new image/GIF/TIFF handling, text model with word embeddings.
Zoltán Gotthardt: refactoring (global scope cleanup, model factory, Shared struct), improved WordModel (PDF text extraction, pre-training), enhancements to StateMap, ContextMap2, MatchModel, and NormalModel.
2020 v184–v200 Andrew Epstein: Code cleanup, modularization, Doxygen docs.
Moisés Cardona: ARM/NEON support, base64 fix, SIMD work.
Zoltán Gotthardt: Refactoring (predictor separation, RNG, ContextMap), Sparse/SparseBit/Indirect model improvements, fixes, cleanup.
Márcio Pais: LSTM model (pre-training, retraining, x86/64 optimizations), DEC Alpha transform/model, new SSE stages.
Surya Kandau: JPEG model refinements.
2021 v201–v206 Zoltán Gotthardt: Improved IndirectContext/MatchModel, added high-precision arithmetic encoder & APMPost, introduced ChartModel, MRB detection, metadata modeling, separate mixers per block type, refined text detection, and -skipdetection option.
2022 v207 Zoltán Gotthardt: PNG filtering moved to transform layer; DEC-Alpha detection via object signature; TAR detection/transform; base85 filter (from paq8pxd); structured-text WordModel (linemodel) enhancements; separate LSTM per main context.
2023 v208 Zoltán Gotthardt: TAR detection fixes; new -forcetext option; enhanced 1-bit image model; shifted contexts (fewer in IndirectModel, added to WordModel for TEXT); refactors; Pavel Rosický: AVX512 detection.
2025 v209 Zoltán Gotthardt: Model tweaks (initialized mixer weights; corrected matchmodel context); TEXT detection fixes; build/toolchain updates.
2026 v210-v214 Zoltán Gotthardt: LSTM model enhancements, speed improvements, tuned Audio16BitModel, enhanced 24/32-bit image model.

This timeline is not exhaustive, for details, see CHANGELOG.

Notable borrows

paq8px incorporates ideas and code from a range of sources, often adapted and extended to fit the project’s design:

  • UTF-8 detection – based on Bjoern Hoehrmann's UTF decoder DFA; integrated by Zoltán Gotthardt
  • Base64 transform – from paq8pxd by Kaido Orav; integrated by Jan Ondrus
  • Base85 transform – from paq8pxd by Kaido Orav; integrated by Zoltán Gotthardt
  • MRB detection – from paq8pxd by Kaido Orav; integrated with enhancements by Zoltán Gotthardt
  • zlib recompression – from AntiZ; integrated by Jan Ondrus
  • Text modeling with stemming – based on the Porter/Porter2 stemmers; integrated by Márcio Pais
  • Audio modeling ideas – based on 'An asymptotically Optimal Predictor for Stereo Lossless Audio Compression' by Florin Ghido; integrated with enhancements by Márcio Pais
  • Image modeling ideas – from Emma by Márcio Pais
  • EXE model – incorporates ideas from DisFilter by Fabian Giesen; integrated with enhancements by Márcio Pais
  • ChartModel – from paq8kx7; integrated with enhancements by Zoltán Gotthardt
  • MatchModel – ideas from Emma; integrated by Márcio Pais
  • MatchModel – improvements from paq8gen; integrated by Zoltán Gotthardt
  • LSTM model – adapted from cmix by Byron Knoll; integrated with enhancements by Márcio Pais, further enhancements based on ligru-compress by Zoltán Gotthardt
  • OLS predictor – by Sebastian Lehmann; integrated by Márcio Pais
  • LMS predictor – by Sebastian Lehmann; integrated by Márcio Pais

Similar compressors

Copyright

Copyright (C) 2009-2026 Matt Mahoney, Serge Osnach, Alexander Ratushnyak, Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson, Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, Kaido Orav, Simon Berger, Neill Corlett, Márcio Pais, Andrew Epstein, Mauro Vezzosi, Zoltán Gotthardt, Moisés Cardona and others.

We would like to express our gratitude for the endless support of many contributors who encouraged paq8px development with ideas, testing, compiling, debugging: LovePimple, Skymmer, Darek, Stephan Busch, m^2, Christian Schneider, pat357, Rugxulo, Gonzalo, a902cd23, pinguin2, Luca Biondi, and the broader community at encode.su.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU General Public License for more details at http://www.gnu.org/copyleft/gpl.html.

A summary in plain language is available at https://tldrlegal.com/license/gnu-general-public-license-v2.

Packages

 
 
 

Contributors

Languages