Skip to main content

Question 8

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

  • A. Include ORDER BY DESK on timestamp column and LIMIT to 1.
  • B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
  • C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
  • D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

https://cloud.google.com/bigquery/streaming-data-into-bigquery#manually_removing_duplicates