Skip to main content

Question 33

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

  • A. Normalize the data using Google Kubernetes Engine.
  • B. Translate the normalization algorithm into SQL for use with BigQuery.
  • C. Use the normalizer_fn argument in TensorFlow's Feature Column API.
  • D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.