Partitioning & Clustering in Google bigquery

Bhanu chander Shyamala

Data Analyst

Partitioning and Clustering in google bigquery

What is Partitioning?

A partitioned table is a spacial table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and you can control costs by reducing the number of bytes read by a query.

Before partitioning, the performace of the query is low and costly

It took almost 7.3 seconds and processed 18 gb of data. so lets look at the query performance after partitioning

Create a partitioned table as shown below

Now query the same data as we used in the first screenshot and check the result you will be astonished

It took just 1.1 seconds and processed 980.5 MB of data

What if we want to partition data on something like text column?

Is clustering can solve this problem, yes.

What is Clustering

When you create a clustered table in BigQuery, the table data is automatically organized based on the contents of one or more columns in the tableโ€™s schema. The columns you specify are used to colocate related data.

Let's create a cluster table as shown below and see the results

Let's run the same query as we used in the partitioning table, after it got created another table by clustering with column name tags

It took just 0.8 seconds and processed 47.7 MB of data

Comments

  1. Easily explained and very helpful. Thank you Bhanu

    ReplyDelete
  2. Nice explaination.all the doubts are cleared.Thank you bhanu๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ™‚

    ReplyDelete
  3. awesome explanation ๐Ÿ‘..thank you bhanu

    ReplyDelete
  4. Superb Bhanu thanks for explaining...

    ReplyDelete

Post a Comment

Popular posts from this blog

Overview of data storage and sql commands