Parquet Compression
Trusted by over 10,000 every month
View the compression ratio, uncompressed size, compressed size, compression, encodings and type of your parquet file.
Parquet supports multiple compression techniques as part of the specification.
How to View Parquet File Compression Information
- Upload your parquet file using the input at the top of the page
- View the parquet compression information in the table that appears
Should Parquet Files Be Compressed
Compression is part of the Parquet file specification. Parquet files are generally compressed when they are created. This means that you do not need to further compress Parquet files that you create.
Parquet Compression Techniques
Snappy
Snappy is the default compression codec for Parquet. It offers a good tradeoff between CPU usage and compression performance on big workloads.
Snappy performs well on numeric data and other low cardinality types. It does not perform as well on high entropy types like ByteArray (strings).
LZO
Low compression ratios but very fast for compression and decompression steps. This is a good choice if storage is cheap and has high throughput.
Brotli
High compression ratio, but low compression and decompression speed. Brotli codec is useful when you don't have much storage space, or storage throughput is slow.
Brotli has parallel processing, which the other codecs do not provide.
GZIP
High compression ratio with reasonable speeds for compression and decompression. GZip is a common codec for Parquet Compression.
Parquet Compression Ratio
There are two kinds of parquet compression ratio:
- Size of parquet file compared to the equivalent csv file
- Size of the parquet file after compression compared to before compression
This tool will tell you the compression ratio for each column before and after compression.
Each column in flights-1m has a compression ratio between 1 and 2 compared to before snappy compression was applied.
Parquet Compression vs CSV
The compression of a parquet file compared to a csv file depends on the types of columns in the data and the Parquet compression technique used.
The flights-1m has a compression ratio of 5.6 compared to the CSV file. It is 7.3MB as a Snappy compressed parquet file and 41.1MB as a CSV file. It contains 1 million rows of data.
Parquet Tools
Use these Parquet Tools to work with Parquet files on Windows, Mac, Linux, ChromeOS and Android.
Parquet Viewer
View and filter Parquet files
Query Parquet With SQL
Write SQL to query your Parquet File
Explore
Find correlations in your Parquet File
Convert
File format converter
Parquet Compression Viewer
View the compression of a parquet file
Parquet Data Types Viewer
View the data types of a parquet file
Parquet Encoding Viewer
View the encoding of a parquet file
Parquet Metadata Viewer
View the metadata of a parquet file
Parquet Row Groups Viewer
View the row groups of a parquet file
Parquet Schema Viewer
View the schema a parquet file
Sample Parquet File
Download a sample parquet file for testing