Parquet Compression

Trusted by over 10,000 every month

View the compression ratio, uncompressed size, compressed size, compression, encodings and type of your parquet file.

Parquet supports multiple compression techniques as part of the specification.

How to View Parquet File Compression Information

  1. Upload your parquet file using the input at the top of the page
  2. View the parquet compression information in the table that appears

Should Parquet Files Be Compressed

Compression is part of the Parquet file specification. Parquet files are generally compressed when they are created. This means that you do not need to further compress Parquet files that you create.

Parquet Compression Techniques

Snappy

Snappy is the default compression codec for Parquet. It offers a good tradeoff between CPU usage and compression performance on big workloads.

Snappy performs well on numeric data and other low cardinality types. It does not perform as well on high entropy types like ByteArray (strings).

LZO

Low compression ratios but very fast for compression and decompression steps. This is a good choice if storage is cheap and has high throughput.

Brotli

High compression ratio, but low compression and decompression speed. Brotli codec is useful when you don't have much storage space, or storage throughput is slow.

Brotli has parallel processing, which the other codecs do not provide.

GZIP

High compression ratio with reasonable speeds for compression and decompression. GZip is a common codec for Parquet Compression.

Parquet Compression Ratio

There are two kinds of parquet compression ratio:

  1. Size of parquet file compared to the equivalent csv file
  2. Size of the parquet file after compression compared to before compression

This tool will tell you the compression ratio for each column before and after compression.

Each column in flights-1m has a compression ratio between 1 and 2 compared to before snappy compression was applied.

Parquet Compression vs CSV

The compression of a parquet file compared to a csv file depends on the types of columns in the data and the Parquet compression technique used.

The flights-1m has a compression ratio of 5.6 compared to the CSV file. It is 7.3MB as a Snappy compressed parquet file and 41.1MB as a CSV file. It contains 1 million rows of data.