Skip to main content

Question 180

You are migrating an application that tracks library books and information about each book, such as author or year published, from an on-premises data warehouse to BigQuery. In your current relational database, the author information is kept in a separate table and joined to the book information on a common key. Based on Google's recommended practice for schema design, how would you structure the data to ensure optimal speed of queries about the author of each book that has been borrowed?

  • A. Keep the schema the same, maintain the different tables for the book and each of the attributes, and query as you are doing today.
  • B. Create a table that is wide and includes a column for each attribute, including the author's first name, last name, date of birth, etc.
  • C. Create a table that includes information about the books and authors, but nest the author fields inside the author column.
  • D. Keep the schema the same, create a view that joins all of the tables, and always query the view.

if data is time based or sequential, find partition and cluster option

if data is not time based, always look for denomalize / nesting option.

https://cloud.google.com/bigquery/docs/best-practices-performance-nested

Best practice: Use nested and repeated fields to denormalize data storage and increase query performance. Denormalization is a common strategy for increasing read performance for relational datasets that were previously normalized. The recommended way to denormalize data in BigQuery is to use nested and repeated fields. It's best to use this strategy when the relationships are hierarchical and frequently queried together, such as in parent-child relationships.