Skip to main content

Question 111

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the
Data Science team runs a query filtered on a date column and limited to 30`"90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries.
What should you do?

  • A. Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
  • B. Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
  • C. Modify your pipeline to maintain the last 30ג€"90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
  • D. Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.