Partitioning & Clustering in Google bigquery

Bhanu chander Shyamala

Data Analyst

Partitioning and Clustering in google bigquery

What is Partitioning?

A partitioned table is a spacial table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and you can control costs by reducing the number of bytes read by a query.

Before partitioning, the performace of the query is low and costly

It took almost 7.3 seconds and processed 18 gb of data. so lets look at the query performance after partitioning

Create a partitioned table as shown below

Now query the same data as we used in the first screenshot and check the result you will be astonished

It took just 1.1 seconds and processed 980.5 MB of data

What if we want to partition data on something like text column?

Is clustering can solve this problem, yes.

What is Clustering

When you create a clustered table in BigQuery, the table data is automatically organized based on the contents of one or more columns in the table’s schema. The columns you specify are used to colocate related data.

Let's create a cluster table as shown below and see the results

Let's run the same query as we used in the partitioning table, after it got created another table by clustering with column name tags

It took just 0.8 seconds and processed 47.7 MB of data

Search This Blog

SQL