Skip to main content

Question 72

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

  • A. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
  • B. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
  • C. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
  • D. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.

https://cloud.google.com/blog/products/gcp/accessing-external-federated-data-sources-with-bigquerys-data-access-layer.

Permanent table—You create a table in a BigQuery dataset that is linked to your external data source. This allows you to use BigQuery dataset-level IAM roles to share the table with others who may have access to the underlying external data source. Use permanent tables when you need to share the table with others.

Temporary table—You submit a command that includes a query and creates a non-permanent table linked to the external data source. With this approach you do not create a table in one of your BigQuery datasets, so make sure to give consideration towards sharing the query or table. Consider using a temporary table for one-time, ad-hoc queries, or for one time extract, transform, or load (ETL) workflows