PAQ8PX – Experimental Lossless Data Compressor & Entropy Estimator

About

PAQ is a family of experimental, high-end lossless data compression programs. paq8px is one of the longest-running branches of PAQ, started by Jan Ondrus in 2009 with major contributions from Márcio Pais and Zoltán Gotthardt (see Contribution Timeline).

paq8px consistently achieves state-of-the-art compression ratios on various data compression benchmarks (see Benchmark Results). This performance comes at the cost of speed and memory usage, which makes it impractical for production use or long-term storage. However, it is particularly well-suited for file entropy estimation and as a reference for compression research.

For detailed history and ongoing development discussions, see the paq8px thread on encode.su.

Quick start

paq8px is portable software – no installation required.

Get the latest binary for Windows (x64) from the Releases page or from the paq8px thread on encode.su, or build it from source for your platform – see below.

Command line interface

paq8px does not include a graphical user interface (GUI). All operations are performed from the command line.

Open a terminal and run paq8px with the desired options to compress your file (such as paq8px -8 file.txt).
Start with a small file – compression takes time.

Example output (on Windows):

c:\>paq8px.exe -8 file.txt
paq8px archiver v214 (c) 2026, Matt Mahoney et al.

Creating archive file.txt.paq8px214 in single file mode...

Filename: file.txt (111261 bytes)
Block segmentation:
 0           | text             |    111261 bytes [0 - 111260]
-----------------------
Total input size     : 111261
Total archive size   : 19597

Time 16.62 sec, used 2164 MB (2269587029 bytes) of memory

Note

The output archive extension is versioned (e.g., .paq8px214).

Note

You can place the binary anywhere and reference inputs/outputs by path.

Some examples

Compress a file at level 8 (balanced speed and compression ratio):

paq8px.exe -8 filename_to_compress

Compress at the maximum level with LSTM modeling included (-12L):

paq8px.exe -12L filename_to_compress

Warning

This mode is extremely slow and memory-intensive. Make sure you have 32 GB+ RAM.

Getting help

To view available options, run paq8px without arguments. To view available options + detailed help pages, run paq8px -help.

Click to expand: full paq8px help

paq8px archiver v214 (c) 2026, Matt Mahoney et al.
Free under GPL, http://www.gnu.org/licenses/gpl.txt

Usage:
  to compress       ->   paq8px -LEVEL[FLAGS] [OPTIONS] INPUT [OUTPUT]
  to decompress     ->   paq8px -d INPUT.paq8px214 [OUTPUT]
  to test           ->   paq8px -t INPUT.paq8px214 [OUTPUT]
  to list contents  ->   paq8px -l INPUT.paq8px214

LEVEL:
  -1 -2 -3 -4          | Compress using less memory (529, 543, 572, 630 MB)
  -5 -6 -7 -8          | Use more memory (747, 980, 1446, 2377 MB)
  -9 -10 -11 -12       | Use even more memory (4241, 7968, 15421, 29305 MB)
  -0                   | Segment and transform only, no compression
  -0L                  | Segment and transform then LSTM-only compression (alternative: -lstmonly)

FLAGS:
  L                    | Enable LSTM model (+24 MB per block type)
  A                    | Use adaptive learning rate
  S                    | Skip RGB color transform (images)
  B                    | Brute-force DEFLATE detection
  E                    | Pre-train x86/x64 model
  T                    | Pre-train text models (dictionary-based)

  Example: paq8px -8LA file.txt   <- Level 8 + LSTM + adaptive learning rate

Block detection control (compression-only):
  -forcebinary         | Force generic (binary) mode
  -forcetext           | Force text mode

LSTM-specific options (expert-only):
  -lstmlayers=N        | Set the number of LSTM layers to N (1..5, default is 2)
  -savelstm:text FILE  | Save learned LSTM model weights after compression
  -loadlstm:text FILE  | Load LSTM model weights before compression/decompression

Misc options:
  -v                   | Verbose output
  -log FILE            | Append compression results to log file
  -simd MODE           | Override SIMD detection - expert only (NONE|SSE2|AVX2|AVX512|NEON)

Notes:
  INPUT may be FILE, PATH/FILE, or @FILELIST
  OUTPUT is optional: FILE, PATH, PATH/FILE
  The archive is created in the current folder with .paq8px214 extension if OUTPUT omitted
  FLAGS are case-insensitive and only needed for compression; they may appear in any order
  INPUT must precede OUTPUT; all other OPTIONS may appear anywhere

=============
Detailed Help
=============

---------------
 1. Compression
---------------

  Compression levels control the amount of memory used during both compression and decompression.
  Higher levels generally improve compression ratio at the cost of higher memory usage and slower speed.
  Specifying the compression level is needed only for compression - no need to specify it for decompression.
  Approximately the same amount of memory will be used during compression and decompression.

  The listed memory usage for each LEVEL (-1 = 529 MB .. -12 = 29305 MB) is typical/indicative for compressing binary
  files with no preprocessing. Actual memory use is lower for text files and higher when a preprocessing step
  (segmentation and transformations) requires temporary memory. When special file types are detected, special models
  (image, jpg, audio) will be used and thus will require extra RAM.

------------------
 2. Special Levels
------------------

  -0   Only block type detection (segmentation) and block transformations are performed.
       The data is copied (verbatim or transformed); no compression happens.
       This mode is similar to a preprocessing-only tool like precomp.
       Uses approximately 3-7 MB total.

  -0L  Uses only a single LSTM model for prediction which is shared across all block types.
       Uses approximately 20-24 MB total RAM.
       Alternative: -lstmonly

---------------------
 3. Compression Flags
---------------------

  Compression flags are single-letter, case-insensitive, and appended directly to the level.
  They are valid only during compression. No need to specify them for decompression.

  L   Enable the LSTM (Long Short-Term Memory) model.
      Uses a fixed-size model, independent of compression level.

      At level -0L (also: -lstmonly) a single LSTM model is used for prediction for all detected block types.
      Block detection and segmentation are still performed, but no context mixing or Secondary Symbol
      Estimation (SSE) stage is used.

      At higher levels (-1L .. -12L) the LSTM model is included as a submodel in Context Mixing and its predictions
      are mixed with the other models.
      When special block types are detected, for each block type an individual LSTM model is created dynamically and
      used within that block type. Each such LSTM model adds approximately 24 MB to the total memory use.

  A   Enable adaptive learning rate in the CM mixer.
      May improve compression for some files.

  S   Skip RGB color transform for 24/32-bit images.
      Useful when the transform worsens compression.
      This flag has no effect when no image block types are detected.

  B   Enable brute-force DEFLATE stream detection.
      Slower but may improve detection of compressed streams.

  E   Pre-train the x86/x64 executable model.
      This option pre-trains the EXE model using the paq8px.exe binary itself.
      Archives created with a different paq8px.exe executable (even when built from the same source and build options)
      will differ. To decompress an archive created with -E, you must use the exact same executable that created it.

  T   Pre-train text-oriented models using a dictionary and expression list.
      The word list (english.dic) and expression list (english.exp) are used only to pre-train models before
      compression and they are not stored in the archive.
      You must have these same files available to decompress archives created with -T.

---------------------------
 4. Block Detection Control
---------------------------

  Block detection and segmentation always happen regardless of the memory level or other options - except when forced:

  -forcebinary

      Disable block detection; the whole file is considered as a single binary block and only the generic (binary)
      model set will be used.
      Useful when block detection produces false positives.

  -forcetext

      Disable block detection; consider the whole file as a single text block and use the text model set only.
      Useful when text data is misclassified as binary or fragments in a text file are incorrectly detected as some
      other block type.

---------------------------------------
 5. LSTM-Specific Options (expert-only)
---------------------------------------

  -lstmlayers=N

      Set the number of LSTM layers to N. Using more layers generally leads to better compression, but memory use
      will be higher (scales linearly with N) and compression time will be significantly slower. The default is N=2.

  -savelstm:text FILE

      Saves the LSTM model's learned parameters as a lossless snapshot to the specified file when compression finishes.
      Only the model used for text block(s) will be saved.
      It's not possible to save a snapshot from other block types. This is an experimental feature.

  -loadlstm:text FILE

      Loads the LSTM model's learned parameters from the specified file (which was saved earlier
      by the -savelstm:text option) before compression starts. The LSTM model will use this loaded
      snapshot to bootstrap its predictions.
      At levels -1L .. -12L only text blocks are affected.
      At level -0L all blocks are affected (because a single LSTM model is used for all block types).
      Critical: The same snapshot file MUST be used during decompression or the original content cannot be recovered.

----------------------
 6. Archive Operations
----------------------

  -d  Decompress an archive.
      In single-file mode the content is decompressed, the name of the output is the name of the archive without
      the .paq8px214 extension.
      In multi-file mode first the @LISTFILE is extracted then the rest of the files. Any required folders will
      be created recursively, all files will be extracted with their original names.
      If the output file or files already exist they will be overwritten.

      Example: to decompress file.txt to the current folder:
      paq8px -d file.txt.paq8px214

  -t  Test archive contents by decompressing to memory and comparing with the original data on-the-fly.
      If a file fails the test, the first mismatched position will be printed to screen.

      Example: to test archive contents:
      paq8px -t file.txt.paq8px214

  -l  List archive contents.
      Extracts the embedded @FILELIST (if present) and prints it.
      Applicable only to multi-file archives.

      Example: to list the file list (when the archive was created using @files):
      paq8px -l files.paq8px214

----------------------------------
 7. INPUT and OUTPUT Specification
----------------------------------

  INPUT may be:

  * A single file
  * A path/file
  * A [path/]@FILELIST

  In multi-file mode (i.e. when @FILELIST is provided) only file names, file contents and file sizes are stored
  in the archive. Timestamps, permissions, attributes or any other metadata are not preserved unless stored
  separately and manually by the user in the FILELIST.

  OUTPUT is optional:

    For compression:

    * If omitted, the archive is created in the current directory.
      The name of the archive: INPUT + paq8px214 extension appended.
    * If a filename is given, it is used as the archive name.
    * If a directory is given, the archive is created inside it.
    * If the archive file already exists, it will be overwritten.

    For decompression:

    * If an output filename is not provided, the output will be named the same as the archive without
      the paq8px214 extension.
    * If a filename is given, it is used as the output name.
    * If a directory is given, the restored file will be created inside it (the directory must exist).
    * If the output file(s) already exist, they will be overwritten.

  Examples:

  To create data.txt.paq8px214 in current directory:
  paq8px -8 data.txt

  To create archive.paq8px214 in current directory:
  paq8px -8 data.txt archive.paq8px214

  To create data.txt.paq8px214 in results/ directory:
  paq8px -8 data.txt results/

---------------------------------
 8. @FILELIST Format and Behavior
---------------------------------

  When a @FILELIST is provided, the FILELIST file itself is compressed as the first file in the archive and
  automatically extracted during decompression.

  The FILELIST is a tab-separated text file with this structure:

    Column 1:  Filenames and optional relative paths (required, used by compressor)
    Column 2+: Arbitrary metadata - timestamps, ownership, etc. (optional, preserved but ignored)

    First line: Header (preserved but ignored during processing the file list)

  Only the first column is used by the compressor and decompressor.
  All other columns are preserved but ignored.
  Paths must be relative to the FILELIST location.

  Using this mechanism allows full restoration of file metadata with third-party tools after decompression.


-------------------------
 9. Miscellaneous Options
-------------------------

  -v

    Enable verbose output.

  -log FILE

    Append compression results to a tab-separated log file.
    Logging applies only to compression.

  -simd MODE

    Normally, the highest usable SIMD instruction set is detected and used automatically

    - for the CM mixer - supported: SSE2, AVX2, AVX512, ARM NEON
    - for neural network operations in the LSTM model - supported: SSE2, AVX2
    - for the LSM and OLS predictors (used mainly in image and audio models) - supported: SSE2.

    This option overrides the detected SIMD instruction set. Intended for expert use and benchmarking.
    Supported values (case-insensitive):
       NONE
       SSE2, AVX2, AVX512 (on x64)
       NEON (on ARM)

    Note that when paq8px is compiled for a specific CPU architecture, the compiler may automatically
    vectorize some parts of the code. While selecting 'NONE' disables all manually optimized SIMD
    implementations, the remaining scalar code may still be auto-vectorized by the compiler and
    therefore may not be entirely free of vector instructions.

----------------------
 10. Argument Ordering
----------------------

  Command-line arguments may appear in any order with the following exception:
  INPUT must always precede OUTPUT.

  Example: the following two are equivalent:

    paq8px -v -simd sse2 enwik8 -log results.txt output/ -8
    paq8px -8 enwik8 -log results.txt output/ -v -simd sse2

  Further examples:

    paq8px -8 file.txt         | Compress using ~2.3 GB RAM
    paq8px -12L enwik8         | Compress 'enwik8' with maximum compression (~29 GB RAM), use the LSTM model as well
    paq8px -4 image.jpg        | Compress the 'image.jpg' file - using less memory, even faster
    paq8px -8ba b64sample.xml  | Compress 'b64sample.xml' faster and using less memory
                                 Put more effort into finding and transforming DEFLATE blocks
                                 Use adaptive learning rate.
    paq8px -8s rafale.bmp      | Compress the 'rafale.bmp' image file
                                 Skip color transform - this file compresses better without it

Compatibility & archive basics

A paq8px archive stores one or more files in a highly compressed format.

Note

Files and archives larger than 2 GB are not supported.

Note

paq8px archives are not compatible across different paq8px releases (past or future).

Note

A paq8px archive may contain multiple files, but once created, you cannot add to or remove files from the archive.

How to recognize it

The file extension reflects the exact paq8px version that created it (e.g., .paq8px214).
You can also check the header: if the first bytes read "paq8px", it is likely a paq8px archive.
Exact version information cannot be inferred from the archive content: the archive header does not encode the specific paq8px version used. Only the file extension reflects the version.

Single file vs multiple file modes

In single-file mode, only file contents are stored – no paths, names, timestamps, attributes, permissions, or other metadata.

In multi-file mode, you may preserve such metadata via the @FILELIST mechanism (see the help screen for details).

Notes on pre-training

Warning

Archives made with pre-training-like options (-E, -T, -R) are fragile — decompression requires the same binary and/or external files.

The exe pre-training (-E)
This option pre-trains the EXE model using the paq8px.exe binary itself.
Archives created with a different paq8px.exe binary (even when built from the same source and build options) will differ.
To decompress an archive created with -E, you must use the exact same executable that created it.
Text pre-training (-T)
The word list (english.dic) and expression list (english.exp) are used only to pre-train models before compression and they are not stored in the archive.
You must have these same files available to decompress archives created with -T.
LSTM pre-trained weight repositories (-R)
If you use pre-trained LSTM repositories, ensure the same RNN weight files (english.rnn, x86_64.rnn) are available during decompression.

Warning

The LSTM repositories are temporarily unavailable in the latest release due to the refactoring of the model. The latest version supporting this feature was v209.

How to compile

Building paq8px requires a C++17 capable C++ compiler:
https://en.cppreference.com/w/cpp/compiler_support#cpp17

Windows:
On Windows, you can download a prebuilt executable instead of compiling. Just grab the latest executable from the https://encode.su/threads/342-paq8px thread.
If you would like to build an executable yourself you may use the Visual Studio solution file or in case of Mingw-w64 see the build-mingw-w64-generic-publish.cmd batch file in the build subfolder.

Linux/macOS:
The ./build folder already contains helper scripts.
You may use the following commands to build with cmake:

sudo apt-get install build-essential zlib1g-dev cmake make
cd build
./build-linux-with-cmake.sh

Testing in a Linux VM

Get a Linux VM (such as Lubuntu 25.04 Plucky Puffin)
Install the required compilers and tools with the following commands:

sudo apt update
sudo apt install gcc clang gcc-aarch64-linux-gnu g++-aarch64-linux-gnu build-essential cmake zlib1g-dev

Sample build scripts are provided in the build/ folder:

build/build-linux-with-cmake.sh
build/build-linux-with-gcc.sh
build/build-linux-with-clang.sh
build/build-linux-cross-compile-aarch64.sh

Tested toolchains

The following compiler/OS combinations have been tested successfully:

Version	OS	Compiler/IDE
v214	Windows	Visual Studio 2022 Community Edition 17.14.14
v214	Windows	Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35216
v214	Windows	MinGW-w64 13.0.0 (gcc-15.2.0)
v211	Lubuntu 25.04 Plucky Puffin	gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
v211	Lubuntu 25.04 Plucky Puffin	Ubuntu clang version 20.1.2 (0ubuntu1), Target: x86_64-pc-linux-gnu
v211	Lubuntu 25.04 Plucky Puffin	aarch64-linux-gnu-gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0

Other modern C++17 compilers may also work but are not routinely tested.

Note

We build and test 64-bit releases. 32-bit releases are seldom built or tested.
A known limitation of 32-bit releases is the 2 GB memory barrier. As a consequence, compression and decompression with 32-bit releases may not work ("out of memory") on level 8 and above.

Release checklist

When you make a new release:

Please update the version number in the "Versioning" section in the paq8px.cpp source file.
Please append a short description of your modifications to the CHANGELOG file.
Please carry out some sanity checks. Run these tests with asserts on (remove the NDEBUG preprocessor directive).
Please verify if paq8px can be propely built on different platforms (i.e. test all the build scripts)
Update README.md, especially the Benchmark results.

References

Get Visual Studio 2022 Community Edition from: https://visualstudio.microsoft.com/vs/community/
Get MinGW-w64 for Windows from: https://winlibs.com/
zlib source files in the zlib folder originate from: https://github.com/madler/zlib
Get Lubuntu 25.04 Plucky Puffin for testing the build from: https://www.osboxes.org/lubuntu/

How it works

paq8px compresses files bit by bit using a technique called context mixing: multiple models make probabilistic predictions for the next bit, and a mixer combines them into a single, more accurate probability, which is then encoded with an arithmetic coder.

This approach is computationally intensive but highly adaptive, making paq8px especially effective for entropy estimation, compressibility testing and research purposes.

For an in-depth technical explanation, see the DOC file.

Benchmark results

Benchmark results are provided on various corpora for comparison with other compressors.
Rankings are based solely on compression ratio, not speed or memory usage to show reference compressed sizes achievable on these datasets.
Results are drawn from official listings where available, or from community testing when benchmarks are no longer maintained.
Results last verified: Sept 21, 2025.

Summary:

Corpus / Benchmark	Version	Rank
Calgary	v213	#2
Canterbury	v213	#2
Silesia	v213	#1
RareWares test samples (16-bit stereo audio)	v213	--
Kodak Lossless True Color Image Suite	v213	#1
ImgInfo RGB test set	v213	#1
Lossless Photo Compression Benchmark (LPCB)	v206	#1
Large Text Compression Benchmark (LTCB)	v206	#10
Darek's corpus (DBA)	v210	#1
Maximumcompression benchmark	v207fix1	#1
fenwik9 benchmark by Sportman	v210	#1
World English Bible benchmark by Sportman	v208fix1	#1

For the Calgary, Canterbury, Silesia and MaximumCompression benchmarks, see paq8px evolution up to paq8px_v207fix1, run by Darek in his post in the paq8px thread

Calgary corpus

The Calgary corpus does not have an official maintained ranking, and most published results do not include modern experimental compressors.

Below are compressed sizes under various options, compared to cmix v21 as reference.

File	(v213) -8	(v213) -12L	(v210) -12LT	(v209) -12RT	cmix v21	cmix v21 (with dict)
bib	19597	19530	17492	17376	19746	17180
book1	183307	181513	175722	163431	182429	173709
book2	113968	113152	108844	106668	113286	105918
geo	42476	42257	42265	42367	42651	42760
news	83024	82689	78490	77166	82869	76389
obj1	7060	6981	6841	6892	7154	7053
obj2	40935	40137	39820	39950	40380	40139
paper1	12365	12324	11041	10749	12449	10831
paper2	19541	19473	17478	16589	19636	17169
pic	19621	19637	19669	19677	21487	21883
progc	8872	8808	8206	8189	8900	8193
progl	9507	9450	8876	8864	9524	8788
progp	6373	6296	6061	6097	6395	6126
trans	10986	10951	10056	10045	10822	9990
Total compressed size	577'632	573'198	550'861	534'060	577'728	546'128
Compression time (approx. sec)	264	772	1231	1567	3746	n/a

With fair options (-12LT), paq8px v210 achieved results close to cmix v21 (with dictionary pre-processng).
With unfair options (-12RT), results surpass cmix, but these should be excluded (see Benchmarking Notes).

At the time of writing, paq8px v214 likely ranks #2 on Calgary behind cmix v21.

Canterbury corpus

The same general notes apply to the Canterbury corpus as to the Calgary corpus.

Below are compressed sizes under various options, compared to cmix v21.

File	(v213) -8	(v213) -12L	(v210) -12LT	(v209) -12RT	cmix v21	cmix v21 (with dict)
alice29.txt	33071	32863	31138	28317	33360	31076
asyoulik.txt	31514	31431	29601	28062	31665	29434
cp.html	5408	5393	4740	4720	5478	4746
fields.c	2028	2018	1856	1848	2087	1909
grammar.lsp	861	862	750	732	874	771
kennedy.xls	8152	7817	7850	7972	7926	7955
lcet10.txt	79113	78813	74655	72594	79550	73365
plrabn12.txt	117447	116705	112546	108648	116984	112263
ptt5	19621	19637	19669	19677	21487	21883
sum	6826	6801	6657	6679	6968	6870
xargs.1	1296	1294	1097	1061	1326	1123
Total compressed size	305'337	303'634	290'559	280'310	307'705	291'395
Compression time (approx. sec)	218	645	1015	1352	3354	n/a

At the time of writing, paq8px v214 likely ranks #2 on Canterbury behind cmix v21.

Silesia corpus

paq8px v210 ranked #1 in The Silesia Open Source Compression Benchmark at the time of writing.

Results for paq8px v213 together with cmix v21 as reference:

File	-12L	precomp v0.4.7 -cn + cmix v21 (with dict)
dickens	1'860'120	1'802'071
mozilla	6'128'852	6'634'210
mr	1'852'546	1'828'423
nci	776'633	781'325
ooffice	1'218'714	1'221'977
osdb	1'968'170	1'963'597
reymont	699'462	704'817
samba	1'588'070	1'588'875
sao	3'723'788	3'726'502
webster	4'401'642	4'271'915
xml	245'786	233'696
x-ray	3'521'253	3'503'686
Total compressed size	27'985'036	28'261'094
Compression time (approx. sec)	63'651	n/a

Here paq8px outperformed cmix v21 overall, though performance varies per file.

RareWares test samples (16-bit stereo audio)

The RareWares test samples has no official benchmarking for lossless audio compression. The files were converted from WavPack to WAV before compression.

Results for paq8px v212 and paq8px v213 together with OptimFrog as reference:

File	(v212) -6	(v213) -6	OptimFrog*
41_30sec.wav	3'284'213	3'283'811	3'269'665
ATrain.wav	1'551'889	1'549'875	1'510'497
Bachpsichord.wav	2'373'830	2'372'713	2'150'210
Bartok_strings2.wav	1'685'993	1'683'560	1'650'617
BeautySlept.wav	1'348'613	1'348'818	1'342'402
BigYellow.wav	3'107'572	3'108'409	3'092'722
Blackwater.wav	2'005'865	2'003'290	1'961'874
bodyheat.wav	2'403'241	2'401'078	2'464'752
chanchan.wav	1'292'080	1'293'093	1'299'421
DaFunk.wav	2'259'027	2'259'112	2'276'973
death2.wav	1'077'353	1'075'118	1'129'132
Debussy.wav	1'325'814	1'304'118	1'300'765
EnolaGay.wav	2'964'656	2'967'231	2'915'459
experiencia.wav	2'418'250	2'419'769	2'407'521
female_speech.wav	1'001'434	941'697	951'494
FloorEssence.wav	2'092'472	2'093'559	2'075'225
getiton.wav	2'617'374	2'613'600	2'603'002
gone.wav	3'318'939	3'316'859	3'288'315
Hongroise.wav	1'757'526	1'740'649	1'718'751
Illinois.wav	2'777'370	2'776'349	2'740'986
ItCouldBeSweet.wav	1'838'377	1'837'187	1'833'977
kraftwerk.wav	1'800'761	1'800'019	1'875'449
Layla.wav	2'126'370	2'127'201	2'092'815
Leahy.wav	3'657'206	3'658'074	3'642'629
LifeShatters.wav	2'385'773	2'384'127	2'372'681
macabre.wav	1'781'129	1'779'770	1'738'196
Mahler.wav	2'456'386	2'452'483	2'418'657
male_speech.wav	895'904	848'470	842'498
Mama.wav	3'268'372	3'265'384	3'339'379
MidnightVoyage.wav	2'305'443	2'304'076	2'282'623
mybloodrusts.wav	2'364'972	2'367'582	2'363'087
NewYorkCity.wav	3'997'780	3'996'749	3'990'058
OrdinaryWorld.wav	3'115'705	3'116'192	3'120'641
Polonaise.wav	1'541'904	1'522'442	1'471'865
Quizas.wav	2'823'305	2'825'411	2'825'230
riteofspring.wav	1'686'084	1'684'226	1'779'253
rosemary.wav	2'734'582	2'732'578	2'723'780
Scars.wav	2'200'952	2'199'884	2'190'466
SinceAlways.wav	2'096'819	2'097'599	2'087'695
thear1.wav	2'443'956	2'442'228	2'428'164
TheSource.wav	2'325'523	2'325'891	2'317'006
TomsDiner.wav	1'545'070	1'544'343	1'556'186
trust.wav	2'885'710	2'884'743	2'920'069
Twelve.wav	3'619'506	3'619'004	3'590'123
velvet.wav	1'313'525	1'315'243	1'308'290
Waiting.wav	2'187'301	2'185'463	2'171'128
Total compressed size	104'061'926	103'869'077	103'431'728
Compression time (approx. sec)	6131	5464	n.a.

*OmtimFrog: ofr --encode --preset max %1

At the time of writing, paq8px v213 is unranked.

Kodak Lossless True Color Image Suite

The Kodak Lossless True Color Image Suite has no official benchmarking for lossless image compression. The images were converted from PNG to PPM before compression.

Results for paq8px v213 and paq8px v214:

File	(v213) -8	(v213) -8L	(v214) -8	(v214) -8L
kodim01.ppm	315'510	312'246	311'386	308'621
kodim02.ppm	257'732	255'672	254'005	252'320
kodim03.ppm	201'093	199'991	198'223	197'404
kodim04.ppm	267'012	264'499	262'669	260'569
kodim05.ppm	339'543	335'871	332'641	329'738
kodim06.ppm	290'075	287'476	286'119	283'942
kodim07.ppm	222'406	220'656	218'511	217'107
kodim08.ppm	353'115	348'643	346'164	342'504
kodim09.ppm	245'384	243'749	241'422	240'025
kodim10.ppm	252'837	251'080	248'722	247'300
kodim11.ppm	279'161	276'482	274'932	272'722
kodim12.ppm	231'887	230'026	228'755	227'222
kodim13.ppm	398'692	392'799	391'737	386'548
kodim14.ppm	313'938	311'208	308'775	306'562
kodim15.ppm	253'281	251'250	249'470	247'873
kodim16.ppm	238'022	236'228	234'886	233'322
kodim17.ppm	252'459	250'915	248'414	247'115
kodim18.ppm	362'678	357'394	354'562	349'938
kodim19.ppm	293'100	290'411	287'755	285'541
kodim20.ppm	239'030	237'359	235'864	234'483
kodim21.ppm	297'485	294'970	292'545	290'341
kodim22.ppm	329'634	325'054	322'563	318'509
kodim23.ppm	249'698	247'729	245'520	243'895
kodim24.ppm	298'519	294'247	293'309	289'773
Total compressed size	6'782'291	6'715'955	6'668'949	6'613'374
Compression time (approx. sec)	1'750	6'012	2'007	6'330

At the time of writing, paq8px v214 likely ranks #1 on the Kodak test set among lossless compressors with no pre-trained models.

Other compressors for reference: GitHub - WangXuan95/Image-Compression-Benchmark: A comparison of many lossless image compression formats.

ImgInfo RGB testset

The ImgInfo RGB test set has no official benchmarking for lossless image compression.

Results for paq8px v213 and paq8px v214:

File	(v213) -8L	(v214) -8L
artificial.ppm	396'742	394'150
big_building.ppm	43'524'971	42'805'256
big_tree.ppm	37'369'579	36'668'068
bridge.ppm	16'824'955	16'805'572
cathedral.ppm	6'576'508	6'468'273
deer.ppm	18'168'804	18'110'156
fireworks.ppm	3'169'176	3'129'441
flower_foveon.ppm	1'621'443	1'613'759
hdr.ppm	4'621'868	4'602'334
leaves_iso_1600.ppm	8'194'684	8'034'152
leaves_iso_200.ppm	6'327'733	6'212'126
nightshot_iso_100.ppm	4'558'484	4'477'648
nightshot_iso_1600.ppm	9'268'001	9'139'073
spider_web.ppm	5'492'330	5'413'843
Total compressed size	166'115'278	163'873'851
Compression time (approx. sec)	n.a.	111'420

At the time of writing, paq8px v214 likely ranks #1 on the ImgInfo RGB test set among lossless compressors with no pre-trained models.

Other compressors for reference: GitHub - WangXuan95/Image-Compression-Benchmark: A comparison of many lossless image compression formats.

Lossless Photo Compression Benchmark (LPCB)

paq8px v206 ranked #1 at Lossless Photo Compression Benchmark.

The benchmark has not been rerun for later versions.

Large Text Compression Benchmark (LTCB)

paq8px v206 ranked #10 at Large Text Compression Benchmark at the time of writing.
Note, that unlike paq8px, most higher-ranked compressors are tuned specifically for enwik8/enwik9, and often apply enwik-specific preprocessing (e.g., word replacement, article reordering).

The benchmark has not been rerun for later versions.

Darek's corpus (DBA)

Darek's benchmark is not an exhaustive benchmark – it targets only high-end compressors.

See the last results in Darek's post to the encode.su forum from 2026 including results for v210.

paq8px v210 ranked #1 at that time.

MaximumCompression benchmark

The MaximumCompression benchmark is no longer actively maintained and has no up-to-date official listing.
The official site was last updated in 2011. At that time paq8px ranked #1.

See paq8px evolution on the MaximumCompression benchmark up until paq8px v207fix1 in Darek's post to the encode.su forum from 2022.

Compressed sizes for v210 and v213 with compression option -12L (-12Ls for rafale.bmp).

File	-12L (v210)	-12L (v213)	size diff
A10.jpg	624023	624043	20
acrord32.exe	786553	786547	-6
english_mc.dic	333089	333052	-37
FlashMX.pdf	1289571	1267622	-21949
fp.log	199933	199754	-179
mso97.dll	1121228	1121280	52
ohs.doc	452209	452642	433
rafale.bmp	463390	455272	-8118
vcfiu.hlp	245448	244682	-766
world95.txt	309236	309216	-20
Total compressed size	5'824'680	5'794'110	-30570
Compression time (sec)	19'384	19'048'	-335

To the best of our knowledge, paq8px's latest version, v214, would still rank #1 at the time of writing.

fenwik9 benchmark

paq8px v210 ranked #1 in the fenwik9 benchmark.
This is a non-standard but exhaustive single-file benchmark maintained by Sportman.

World English Bible benchmark (WEB)

paq8px v208fix1 ranked #1 in the World English Bible benchmark.
This is a non-standard but exhaustive single-file benchmark maintained by Sportman.

Benchmarking Notes

Warning

Using -R to load pre-trained LSTM weight repositories is unfair if the target file to be compressed was part of the training data.
Some compressors use text-preprocessing with external dictionary (such as cmix v21). paq8px doesn't use such techniques, but text files may be pre-processed with an external tool to boost its performance.
Benchmarks and leaderboards change over time – rankings may shift.
Hardware does not affect compression ratio and memory use, but it does affect runtime; reported times are approximate and for context only.

PAQ8PX contribution timeline

paq8px is a branch of the PAQ compressor series, descended from earlier versions such as PAQ7 and the PAQ8 variants (e.g., PAQ8A-PAQ8P).

Development began in 2009 and remains active, supported by a global community of contributors.

Work has focused on expanding model coverage (images, audio, executables, text) with emphasis on compression ratio.

The table below highlights milestones, contributors, and notable changes over the years.

Year	Versions	Contributors & Highlights
Pre-2009	PAQ roots	Matt Mahoney: Original PAQ author. Early branches (`paq8hp`, `paq8fthis`, `paq8p3`, `lpaq1`) introduced context maps with 16-bit checksums, probabilistic state tables, specialized models (JPEG, sparse, DMC, distance-based), exe model/filter. Added directory compression and drag-and-drop (PAQ8A), BMP/PGM/JPEG/WAV support, APM/StateMap optimizations.
2009	v0–v67	Jan Ondrus: Founded `paq8px`, adding TGA/TIFF/AIFF/MOD/S3M models, PPM/PBM compression, CD sector transform, exe filters, recursive sub-blocks, WAV-model improvements. Simon Berger: TGA 24/8-bit, TIFF/AIFF improvements, MSVC fixes, compression pipeline rewrite. LovePimple: Portability fixes.
2010	v68–v69	Jan Ondrus: Added `-l` listing option, fix for multi-path file compression.
2016	v70–v75	Jan Ondrus: Add zlib recompression (initially unstable), PDF image support, Base64 transform, GIF recompression, and paq8pxd model updates (incl. im8bitModel), plus multiple bugfixes (zlib header/progress display, Base64, GIF).
2017	v76–v127	Márcio Pais: JPEG upgrades (subsampling, thumbnails, MJPEG), record/BMP models, grayscale detection, XML model, x86/x64 pre-training, PNG recompression, DEFLATE MTF + brute force, dBASE parsing, adaptive learning rate, English stemmer. Jan Ondrus: JPEG tweaks, PAM format detection, block handling, PDF 4-bit fix. Zoltán Gotthardt: Fixes, MSVC/Array/`ilog2` fixes, faster JPEG learning rate, IO improvements. Mauro Vezzosi: Bug reports, dmcModel patch.
2018	v128–v173	Márcio Pais: Extended text modeling (English/French/German stemmers, language detection, SparseMatchModel, SSE refinements, RLE/EOL transforms), 8bpp/24–32bpp image model improvements, JPEG tweaks, pre-training refinements. Zoltán Gotthardt: New CLI and file handling, DMC enhancements, hashing improvements, charGroupModel, compiler/portability fixes. Andrew Epstein: AVX2 optimizations, macOS build fixes.
2019	v174–v183	Márcio Pais: Added linearPredictionModel, audio8bModel, audio16bModel, new image/GIF/TIFF handling, text model with word embeddings. Zoltán Gotthardt: refactoring (global scope cleanup, model factory, Shared struct), improved WordModel (PDF text extraction, pre-training), enhancements to StateMap, ContextMap2, MatchModel, and NormalModel.
2020	v184–v200	Andrew Epstein: Code cleanup, modularization, Doxygen docs. Moisés Cardona: ARM/NEON support, base64 fix, SIMD work. Zoltán Gotthardt: Refactoring (predictor separation, RNG, ContextMap), Sparse/SparseBit/Indirect model improvements, fixes, cleanup. Márcio Pais: LSTM model (pre-training, retraining, x86/64 optimizations), DEC Alpha transform/model, new SSE stages. Surya Kandau: JPEG model refinements.
2021	v201–v206	Zoltán Gotthardt: Improved IndirectContext/MatchModel, added high-precision arithmetic encoder & APMPost, introduced ChartModel, MRB detection, metadata modeling, separate mixers per block type, refined text detection, and `-skipdetection` option.
2022	v207	Zoltán Gotthardt: PNG filtering moved to transform layer; DEC-Alpha detection via object signature; TAR detection/transform; base85 filter (from paq8pxd); structured-text WordModel (linemodel) enhancements; separate LSTM per main context.
2023	v208	Zoltán Gotthardt: TAR detection fixes; new -forcetext option; enhanced 1-bit image model; shifted contexts (fewer in IndirectModel, added to WordModel for TEXT); refactors; Pavel Rosický: AVX512 detection.
2025	v209	Zoltán Gotthardt: Model tweaks (initialized mixer weights; corrected matchmodel context); TEXT detection fixes; build/toolchain updates.
2026	v210-v214	Zoltán Gotthardt: LSTM model enhancements, speed improvements, tuned Audio16BitModel, enhanced 24/32-bit image model.

This timeline is not exhaustive, for details, see CHANGELOG.

Notable borrows

paq8px incorporates ideas and code from a range of sources, often adapted and extended to fit the project’s design:

UTF-8 detection – based on Bjoern Hoehrmann's UTF decoder DFA; integrated by Zoltán Gotthardt
Base64 transform – from paq8pxd by Kaido Orav; integrated by Jan Ondrus
Base85 transform – from paq8pxd by Kaido Orav; integrated by Zoltán Gotthardt
MRB detection – from paq8pxd by Kaido Orav; integrated with enhancements by Zoltán Gotthardt
zlib recompression – from AntiZ; integrated by Jan Ondrus
Text modeling with stemming – based on the Porter/Porter2 stemmers; integrated by Márcio Pais
Audio modeling ideas – based on 'An asymptotically Optimal Predictor for Stereo Lossless Audio Compression' by Florin Ghido; integrated with enhancements by Márcio Pais
Image modeling ideas – from Emma by Márcio Pais
EXE model – incorporates ideas from DisFilter by Fabian Giesen; integrated with enhancements by Márcio Pais
ChartModel – from paq8kx7; integrated with enhancements by Zoltán Gotthardt
MatchModel – ideas from Emma; integrated by Márcio Pais
MatchModel – improvements from paq8gen; integrated by Zoltán Gotthardt
LSTM model – adapted from cmix by Byron Knoll; integrated with enhancements by Márcio Pais, further enhancements based on ligru-compress by Zoltán Gotthardt
OLS predictor – by Sebastian Lehmann; integrated by Márcio Pais
LMS predictor – by Sebastian Lehmann; integrated by Márcio Pais

Similar compressors

paq8pxd by Kaido Orav
cmix by Byron Knoll

Copyright

Copyright (C) 2009-2026 Matt Mahoney, Serge Osnach, Alexander Ratushnyak, Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson, Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, Kaido Orav, Simon Berger, Neill Corlett, Márcio Pais, Andrew Epstein, Mauro Vezzosi, Zoltán Gotthardt, Moisés Cardona and others.

We would like to express our gratitude for the endless support of many contributors who encouraged paq8px development with ideas, testing, compiling, debugging: LovePimple, Skymmer, Darek, Stephan Busch, m^2, Christian Schneider, pat357, Rugxulo, Gonzalo, a902cd23, pinguin2, Luca Biondi, and the broader community at encode.su.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU General Public License for more details at http://www.gnu.org/copyleft/gpl.html.

A summary in plain language is available at https://tldrlegal.com/license/gnu-general-public-license-v2.

Name		Name	Last commit message	Last commit date
Latest commit History 831 Commits
build		build
file		file
filter		filter
lstm		lstm
model		model
text		text
zlib		zlib
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
APM.cpp		APM.cpp
APM.hpp		APM.hpp
APM1.cpp		APM1.cpp
APM1.hpp		APM1.hpp
APMPost.cpp		APMPost.cpp
APMPost.hpp		APMPost.hpp
AdaptiveMap.cpp		AdaptiveMap.cpp
AdaptiveMap.hpp		AdaptiveMap.hpp
ArithmeticEncoder.cpp		ArithmeticEncoder.cpp
ArithmeticEncoder.hpp		ArithmeticEncoder.hpp
Array.hpp		Array.hpp
BH.hpp		BH.hpp
BitCount.cpp		BitCount.cpp
BitCount.hpp		BitCount.hpp
Block.cpp		Block.cpp
Block.hpp		Block.hpp
BlockType.cpp		BlockType.cpp
BlockType.hpp		BlockType.hpp
Bucket16.hpp		Bucket16.hpp
CHANGELOG		CHANGELOG
CMakeLists.txt		CMakeLists.txt
CharacterNames.hpp		CharacterNames.hpp
Clz.hpp		Clz.hpp
ContextMap.cpp		ContextMap.cpp
ContextMap.hpp		ContextMap.hpp
ContextMap2.cpp		ContextMap2.cpp
ContextMap2.hpp		ContextMap2.hpp
DOC		DOC
DivisionTable.hpp		DivisionTable.hpp
DummyMixer.cpp		DummyMixer.cpp
DummyMixer.hpp		DummyMixer.hpp
Encoder.cpp		Encoder.cpp
Encoder.hpp		Encoder.hpp
Hash.hpp		Hash.hpp
HashElementForBitHistoryState.hpp		HashElementForBitHistoryState.hpp
HashElementForContextMap.hpp		HashElementForContextMap.hpp
HashElementForMatchPositions.hpp		HashElementForMatchPositions.hpp
HashElementForStationaryMap.hpp		HashElementForStationaryMap.hpp
IPredictor.hpp		IPredictor.hpp
Ilog.cpp		Ilog.cpp
Ilog.hpp		Ilog.hpp
IndirectContext.hpp		IndirectContext.hpp
IndirectMap.cpp		IndirectMap.cpp
IndirectMap.hpp		IndirectMap.hpp
LMS.cpp		LMS.cpp
LMS.hpp		LMS.hpp
LMS_AVX.cpp		LMS_AVX.cpp
LMS_AVX.hpp		LMS_AVX.hpp
LMS_SSE2.cpp		LMS_SSE2.cpp
LMS_SSE2.hpp		LMS_SSE2.hpp
LMS_Scalar.cpp		LMS_Scalar.cpp
LMS_Scalar.hpp		LMS_Scalar.hpp
LargeIndirectContext.hpp		LargeIndirectContext.hpp
LargeStationaryMap.cpp		LargeStationaryMap.cpp
LargeStationaryMap.hpp		LargeStationaryMap.hpp
MTFList.cpp		MTFList.cpp
MTFList.hpp		MTFList.hpp
Mixer.cpp		Mixer.cpp
Mixer.hpp		Mixer.hpp
MixerFactory.cpp		MixerFactory.cpp
MixerFactory.hpp		MixerFactory.hpp
Mixer_AVX2.cpp		Mixer_AVX2.cpp
Mixer_AVX2.hpp		Mixer_AVX2.hpp
Mixer_AVX512.cpp		Mixer_AVX512.cpp
Mixer_AVX512.hpp		Mixer_AVX512.hpp
Mixer_Neon.cpp		Mixer_Neon.cpp
Mixer_Neon.hpp		Mixer_Neon.hpp
Mixer_SSE2.cpp		Mixer_SSE2.cpp
Mixer_SSE2.hpp		Mixer_SSE2.hpp
Mixer_Scalar.cpp		Mixer_Scalar.cpp
Mixer_Scalar.hpp		Mixer_Scalar.hpp
Models.cpp		Models.cpp
Models.hpp		Models.hpp
OLS.cpp		OLS.cpp
OLS.hpp		OLS.hpp
OLS_double_SSE3.cpp		OLS_double_SSE3.cpp
OLS_double_SSE3.hpp		OLS_double_SSE3.hpp
OLS_double_Scalar.cpp		OLS_double_Scalar.cpp
OLS_double_Scalar.hpp		OLS_double_Scalar.hpp
OLS_factory.cpp		OLS_factory.cpp
OLS_factory.hpp		OLS_factory.hpp
OLS_float_SSE3.cpp		OLS_float_SSE3.cpp
OLS_float_SSE3.hpp		OLS_float_SSE3.hpp
OLS_float_Scalar.cpp		OLS_float_Scalar.cpp
OLS_float_Scalar.hpp		OLS_float_Scalar.hpp
Predictor.cpp		Predictor.cpp
Predictor.hpp		Predictor.hpp
PredictorBlock.cpp		PredictorBlock.cpp
PredictorBlock.hpp		PredictorBlock.hpp
PredictorMain.cpp		PredictorMain.cpp

Folders and files

Latest commit

History

Repository files navigation

PAQ8PX – Experimental Lossless Data Compressor & Entropy Estimator

About

Quick start

Command line interface

Some examples

Getting help

Compatibility & archive basics

How to recognize it

Single file vs multiple file modes

Notes on pre-training

How to compile

Testing in a Linux VM

Tested toolchains

Release checklist

References

How it works

Benchmark results

Calgary corpus

Canterbury corpus

Silesia corpus

RareWares test samples (16-bit stereo audio)

Kodak Lossless True Color Image Suite

ImgInfo RGB testset

Lossless Photo Compression Benchmark (LPCB)

Large Text Compression Benchmark (LTCB)

Darek's corpus (DBA)

MaximumCompression benchmark

fenwik9 benchmark

World English Bible benchmark (WEB)

Benchmarking Notes

PAQ8PX contribution timeline

Notable borrows

Similar compressors

Copyright

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages