Convert CSV to Parquet
Convert CSV to Parquet in seconds with this free online CSV to Parquet converter
CSV (Comma Separated Values) files are the most common format for storing tabular data. Values in a row are separated by commas and rows are separated by newlines.
CSQ files often start with a header row that has column names, but this is not required.
Each row in a CSV file must have the same number of values as the header row.
CSV files do no enforce types or a schema. This means that each column can have multiple types, which can make analysis difficult and compression inefficient.
Parquet files can be easier to analyze and compress better than CSV files.
Apache Parquet (.parquet) is a format that was designed for storing tabular data on disk. It was designed based on the format used in Google's Dremel paper (Dremel later became Big Query).
Parquet files store data in a binary format, which means that they can be efficiently read by computers but are difficult for people to read.
Parquet files have a schema, so means that every value in a column must have the same type. The schema makes Parquet files easier to analyse than CSV files and also helps them to have better compression so they are smaller on disk.
Supercharge your data exploration
Open csv, parquet, arrow, json and tsv files straight from your desktop
Share and embed
Share your graphs and data sets. Or embed them directly into web pages.
Work straight from Google Drive
Open csv, parquet, arrow, json and tsv files directly from Drive, Gmail and Classroom by installing the Google Workspace App
How to Convert CSV to Parquet
- Select your input CSV file
- Your CSV file will be converted to Parquet
- Download your Parquet file
- Click the View button to view your file
How to Convert CSV to Parquet in Python
We can convert CSV to Parquet in Python using Pandas or DuckDB
How to Convert CSV to Parquet using Pandas
First, we need to install pandas
pip install pandas
Then we can load the CSV file into a dataframe
df = pd.read_csv('path/to/file.csv')
Finally, we can export the dataframe to the Parquet format
How to Convert CSV to Parquet using DuckDB
First, we need to install duckdb for Python
pip install duckdb
The following DuckDB query will read a CSV file and output a Parquet file
duckdb.sql("""COPY (select * from 'path/to/file.csv') TO 'path/to/file.parquet' (FORMAT 'parquet')""")
When to Convert CSV to Parquet
A Parquet file can be good for storing data on disk. A CSV file can be useful when you need to upload data to an application.
Parquet files are useful for storing data because they have good compression on disk, so they have a small file size. One downside of Parquet files is that they store data in the apache parquet format and cannot be read by many applications.
Apache Parquet files also convert nicely to the apache arrow data format which is useful for columnar analytics. big data sets can be stored in multiple parquet files.
A CSV file can be opened by a lot of different applications, but they do not have very good compression on disk. This means that data stored in the csv file format usually takes up more disk space than the same data stored in parquet files.
Sometimes it can be useful to store data in parquet files and then convert parquet to csv when the data is needed in an application.
Use these Parquet Tools to work with Parquet files on Windows, Mac, Linux, ChromeOS and Android.
View and filter Parquet files
Query Parquet With SQL
Write SQL to query your Parquet File
Find correlations in your Parquet File
File format converter
Parquet Compression Viewer
View the compression of a parquet file
Parquet Data Types Viewer
View the data types of a parquet file
Parquet Encoding Viewer
View the encoding of a parquet file
Parquet Metadata Viewer
View the metadata of a parquet file
Parquet Row Groups Viewer
View the row groups of a parquet file
Parquet Schema Viewer
View the schema a parquet file
Sample Parquet File
Download a sample parquet file for testing