Question 212
You are troubleshooting your Dataflow pipeline that processes data from Cloud Storage to BigQuery. You have discovered that the Dataflow worker nodes cannot communicate with one another. Your networking team relies on Google Cloud network tags to define firewall rules. You need to identify the issue while following Google-recommended networking security practices. What should you do?
- A. Determine whether your Dataflow pipeline has a custom network tag set.
- B. Determine whether there is a firewall rule set to allow traffic on TCP ports 12345 and 12346 for the Dataflow network tag.
- C. Determine whether there is a firewall rule set to allow traffic on TCP ports 12345 and 12346 on the subnet used by Dataflow workers.
- D. Determine whether your Dataflow pipeline is deployed with the external IP address option enabled.
The best approach would be to check if there is a firewall rule allowing traffic on TCP ports 12345 and 12346 for the Dataflow network tag. Dataflow uses TCP ports 12345 and 12346 for communication between worker nodes. Using network tags and associated firewall rules is a Google-recommended security practice for controlling access between Compute Engine instances like Dataflow workers. So the key things to check would be:
- Ensure your Dataflow pipeline is using the Dataflow network tag on the worker nodes. This tag is applied by default unless overridden.
- 2. Check if there is a firewall rule allowing TCP 12345 and 12346 ingress and egress traffic for instances with the Dataflow network tag. If not, add the rule.
Because network tags are used and Dataflow uses TCP ports 12345 and 12346 as stated on https://cloud.google.com/dataflow/docs/guides/routes-firewall