Posts

Arrays in Bigquery

Image
Bhanu chander Shyamala Data Analyst Arrays in Bigquery What is an array? An array is a collection of similar data items.You can construct arrays of simple data types, such as INT64, and complex data types, such as STRUCTs.Array of arrays are not supported in bigquery. You can combine arrays using functions like ARRAY_CONCAT(), and convert arrays to strings using ARRAY_TO_STRING(). Basic array created in bigquery as shown below To find the length of array in bigquery ARR_AGG expression in bigquery As we have used the ARR_AGG function so it is showing 2 records as output other wise it would have retrieved 4 records ARRAY UNION ALL ARRAY_CONCAT_AGG BigQuery also supports an aggregate function, ARRAY_CONCAT_AGG(), which concatenates the elements of an array column across rows. Converting arrays to string The ARRAY_TO_STRING() function allows you to convert an ARRAY -STRING- to a single STRING value

Partitioning & Clustering in Google bigquery

Image
Bhanu chander Shyamala Data Analyst Partitioning and Clustering in google bigquery What is Partitioning? A partitioned table is a spacial table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and you can control costs by reducing the number of bytes read by a query. Before partitioning, the performace of the query is low and costly It took almost 7.3 seconds and processed 18 gb of data. so lets look at the query performance after partitioning Create a partitioned table as shown below Now query the same data as we used in the first screenshot and check the result you will be astonished It took just 1.1 seconds and processed 980.5 MB of data What if we want to partition data on something like text column? Is clustering can solve this problem, yes. What is Clustering When you create a clustered table in BigQuery, the table d...

Loading file to cloud storage(GCP)

Image
Bhanu chander Shyamala Data Analyst Cloud Storage: Google Storage is also known as Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing capabilities. steps to load a csv file to storage In Google cloud console click on stotage Click on Create bucket Complete the steps as shown below Upload files to created bucket file stored in the bucket sucessfully.

CSV file to Bigquery

Image
Bhanu chander Shyamala Data Analyst Bigquery Intro: Bigquery is a fully managed, serverless data warehouse that enables scalabale analysis over petabytes of data and it also has built-in machine learning capabilities. Bigquery was announced in the may 2010 and generally availble in november 2011 How to upload CSV file into Bigquery? Create-Table option from Bigquery interface Lets follow the below steps to complete the uploading of CSV file to bigquery Select the bigquery option from google cloud console as shown in the below Bigquery interface Click on create table option in the bigquery interface Select the upload option Browse and Upload the CSV file Write the table name in the input box Skip the header 1 because it is already mentioned in the CSV file Last step, Just click on the create table option in seconds your table will be created

Overview of data storage and sql commands

Image
Data acquired from various sources Bhanuchander Shyamala Data Analyst Before Databases , data had to be recorded on paper. we had millions of records like that and stored in the racks. when it was necessary to access one of these records finding the files and searching for specific information is the laborious task. There were problems ranging from misplaced records to fires that wiped out entire archives and destroyed the history of societies, Organizations and government. File Based: In the late 1960's file based database were introduced. In the file based databases, data was maintained in the flat file. Hierarchical data model: It was IBM's first DBMS and called the information management system. Network model: It was first DBMS model at Honeywell called Integrated Data Store. Relational Database: It is the era of relational database and database management. In 1970's, the relational model was proposed by E.F Codd. Cloud Database: Cloud datastore...