Skip to main content

Lab: Exploratory Data Analysis Using Python and BigQuery

Overview

This lab is in introduction to linear regression using Python and Scikit-Learn. This lab serves as a foundation for more complex algorithms and machine learning models that you will encounter in the course. We will train a linear regression model to predict housing price.

Learning objectives

  • Analyze a Pandas Dataframe

  • Create Seaborn plots for Exploratory Data Analysis in Python

  • Write a SQL query to pick up specific fields from a BigQuery dataset

  • Exploratory Analysis in BigQuery

Setup and requirements

Enable the Vertex AI API

  1. In the Google Cloud Console, on the Navigation menu, click Vertex AI > Dashboard.

  2. Click Enable Vertex AI API.

Enable the Notebooks API

  1. In the Google Cloud Console, on the Navigation menu, click APIs & Services > Library.

  2. Search for Notebooks API, and press ENTER.

  3. Click on the Notebooks API result.

  4. If the API is not enabled, click Enable.

Task 1. Create a Vertex AI Workbench instance

  1. In the Google Cloud Console, on the Navigation Menu, click Vertex AI > Workbench. Select User-Managed Notebooks.

  2. On the Notebook instances page, click New Notebook > TensorFlow Enterprise > TensorFlow Enterprise 2.6 (with LTS) > Without GPUs.

  3. In the New notebook instance dialog, confirm the name of the deep learning VM, if you don’t want to change the region and zone, leave all settings as they are and then click Create. The new VM will take 2-3 minutes to start.

  4. Click Open JupyterLab.
    A JupyterLab window will open in a new tab.

  5. You will see “Build recommended” pop up, click Build. If you see the build failed, ignore it.

Task 2. Clone a course repo within your Vertex AI Notebooks instance

To clone the training-data-analyst notebook in your JupyterLab instance:

    1. In JupyterLab, to open a new terminal, click the Terminal icon.

    2. At the command-line prompt, run the following command: 
      git clone https://github.com/GoogleCloudPlatform/training-data-analyst
    3. To confirm that you have cloned the repository, double-click on the training-data-analyst directory and ensure that you can see its contents. The files for all the Jupyter notebook-based labs throughout this course are available in this directory.

Task 3. Exploratory data analysis using Python and BigQuery

  1. In the notebook interface, navigate to training-data-analyst > courses > machine_learning > deepdive2 > launching_into_ml > labs and open python.BQ_explore_data.ipynb.

  2. In the notebook interface, click Edit > Clear All Outputs.

  3. Carefully read through the notebook instructions and fill in lines marked with #TODO where you need to complete the code.

Tip: To run the current cell, click the cell and press SHIFT+ENTER. Other cell commands are listed in the notebook UI under Run.

  • Hints may also be provided for the tasks to guide you along. Highlight the text to read the hints (they are in white text).
  • If you need more help, look at the complete solution at training-data-analyst > courses > machine_learning > deepdive2 > launching_into_ml > solutions and open python.BQ_explore_data.ipynb.