Parquet Data Types

Trusted by over 10,000 every month

How to View the Data Types in a Parquet File

  1. Upload your parquet file using the input at the top of the page
  2. View the parquet data types in the table that appears

What are the Parquet data types

Parquet files can have a combination of primitive types and logical types. The primitive types are very small and efficient. Logical types are created as combinations of primitive types.

Primitive types

Parquet file format supports 7 primitive data types:

  • Boolean - a 1 bit boolean
  • Int32 - a 32 bit signed integer
  • Int64 - a 64 bit signed integer
  • Int96 - a 96 bit signed integer
  • Float - 32 bit floating point number
  • Double - 64 bit floating point number
  • Byte Array - byte array of any length. Generally used for strings.

Logical types

Logical types are built from primitive types. They are the combination of a primitive type and an annotation that tells the client application how to interpret the primitive type.

Some common logical type are:

  • String - a Byte Array with a utf-8 annotation
  • Enum - a Byte Array with a ENUM annotation
  • UUID - a Byte Array with a UUID annotation
  • Int - Integer primitives can have a Int(width, signed) annotation. Width specifies the number of bits and signed specifies whether the number is signed.
  • Decimal
  • Date - An Int32 or Int64 primitive type with annotations for whether it is UTC time and the unit (MICRO, MILLIS, NANOS).
  • Timestamp - An Int64 that has annotation values for isAdjustedToUTC and the unit (MICRO, MILLIS, NANOS)
  • Interval - Represents an interval of time.
  • JSON
  • BSON
  • List
  • Map

Parquet files have a schema at the end of the file that identifies what kind of data is in the file.

Columns are stored separately in Parquet files. This means that data systems can choose to load only the columns that are required for a query.

Does Parquet store data types

Yes, the Parquet file format contains a schema that specifies what data types are in the file.

Do Parquet files need a schema

Yes, Parquet files require a schema as part of the specification.

Parquet files store the schema is stored at the end of the file. This makes it easier to create Parquet files from a streaming data source. Other file formats, like CSV, store the schema at the start of the file.