Around 30 questions I memorize from the Google Cloud Professional Machine Learning certification exam. You find all sources for exam training questions from my preparation tips.

Obviously I do not remember the exact wording and options. And I do not know if my answers were correct. Hopefully these give you some advice about relevant exam questions on 2023.

My listed questions are summarized. In the real exam they were written in verbose format so that you needed to identify the significant part.

My exam had no questions about feature store which was surprising. One option involved Data Fusion but it was clearly wrong. No Dataprep was mentioned. Cloud SQL was present in one question as part of combination of multiple services.

Logistics department needs to know what to order next month. How should you deliver the prediction results for them?

In the exam the question was long and formulated in a tricky way.

I first chose predicting total sales for the next month per product and delivering that information once a month. That would leave space for the logistics department to make the final decision.

In the end I switched to predict stock increase once a month. This would eliminate need to make additional guesses and would be the most automated approach. Also, Google does not recommend gluing additional layers on top of the original predictions.

You are predicting house prices. distance_to_school feature is missing for some records. How do you replace missing values without removing the records?

My answer was to fit another linear regression model to populate the missing values.

Other options were converting to zero, doing feature cross with other column or removing the records (duh!).

I could not figure out how feature cross would help for the situation, which feature to use and how to convert number to category.

You are training a model with multiple categorical variables (features). Train and test data are split 50-50. You perform one-hot encoding for training data. You realize that training data is missing one category. What you do?

This question was just weird, I did not get the actual problem.

Finally my pick was creating sparse feature. Training and test data should have the same features, and sparse format should be chosen for missing data.

Other options were redistributing train and test data by 70-30, doing one-hot encoding to test data or adding more data. Even though more data helps, in business environment and Google MLE exam it is not usually the right answer.

Game has millions of active players and they are making in-app purchases. You want to predict whether they will spent more than 10 $ during the next 7 days after some event. The information about the event is streamed to Pub/Sub. You want to show customized content based on the prediction. How to implement the pipeline?

Both BigQuery and Dataflow would support streaming from Pub/Sub. Batch processing in BigQuery would not be good idea because all players are not active all the time.

I chose Dataflow as predictions would be instant and concern only specific gamers.

Cinema ticket system wants to make low latency predictions (<50 ms) after ticket purchase. Model is developed in Tensorflow. Steps are x,y and z and you are using Dataflow somewhere in the pipeline… Where to do the predictions?

I only realized the similarity to the previous question now when writing these down.

I chose applying Tensorflow RunIference API within Dataflow. Had not heard about such API before but the workflow sounded right.

This answer receives special award for the most irrelevant detail in the exam.

Edge deployment by Tensorflow Lite would have been another candidate.

You run a video streaming company and have complete attributes of all your movies. Your AI ethics team has reviewed the solution. How to create recommendations for the users?

One option suggested Recommendation AI and MovieLens dataset. Often readily available Google services are good choice, but now our own data was already complete. And the ethics department might not like public datasets.

I chose auto-encoder approach to create embeddings by Tensorflow and comparing similarity.

You are training large model with TPUs and have performance issues. What should you do?

I chose using bfloat16 instead of Float32. This should decrease model size and training performance.

Similar phrasing was among the Google’s sample questions.

You are executing a long running training job. It takes weeks with any hardware. What is the best action?

Using pre-emptible TPU. If your job runs long anyway, this option decreases costs.

The training curve decreases slowly but stays at high levels during training. What to do?

Considering the way the questions was formulated the reason seemed to be low learning rate. I would remember that my choice was parameter tuning. This would enable finding the correct learning rate automatically.

What is the best way to rate videos in a service like Youtube?

The option with click-bait mentioned was clearly not right. Two other answers were not convincing either, as something like number of clicks is not a comprehensive metric.

My answer was total watch time in 30 days in 95% percent of videos. I do not remember what was the percentage exactly about, but this metric had by far the best objective.

This questions might have been in Exam Topics web site.

Training and test data are randomly split by 80% / 20% ratio in BigQuery. Accuracy is 85% in training but drops to 65% in production. What causes this?

At least one obviously wrong option suggested first running RAND() function for all data and taking all values with value less than 0.8 for training. And then applying the random function again and assigning rows having value less than 0.2 to test data. This is wrong because the same rows almost certainly end up to both train and test set. And on the other hand some rows are not present in either of the sets.

The correct way is to apply the RAND() just once so each row is strictly included to only train or test set.

Time series sensor data is cross validated but suffers from data leakage between train and test sets. How to avoid this?

Order the dataset by timestamps. The earliest records need to be in the training set and the rest in the test set.

Training curve oscillates (goes up and down). What solves this?

Too big learning rate causes training curve oscillation. The gradient descent “bounces on each side of the valley”.

The solution is to decrease the learning rate.

1 billion rows of training data is in CSV files. What is the best way to read them efficiently in Tensorflow?

Convert them to TFRecord format Dataflow batch job. Save the files in Google Cloud Storage.

This questions have become in multiple formats in the exam questions sites and training materials.

The team is using Vertex AI autoscaling for predictions. New deployment gives weird results.

Many answers suggested something fully automated.

As the source of the problem was not known, my answer was to undo the deployment and review the code manually.

Vertex AI endpoint gives out of memory error while submitting lots of records at once. What makes this happen?

For me most sensible alternative was to submit less predictions in a single http request.

How to find the correct BiQuery table among hundreds of them?

Use GCP Data Catalog. Do not use SQL query for information schema.

This was in one of the training exams.

How to structure Google Cloud Storage buckets with Cloud Data Loss Prevention service to handle sensitive files.

The files should first arrive to quarantine bucket. Process the files periodically and move to sensitive and non-sensitive buckets.

Other options would have exposed sensitive data. For example by tagging the files as non-sensitive by default.

Also this was present in example questions.

You are training a model with dataset having different three labels. In training data 10 000 rows are labeled as A, 1 000 as B and also 1 000 as C. Which loss functions would be most suitable?

Categorical cross-entropy was my pick. If I have understood correctly, it calculates the probabilities and is alias for softmax. Binary cross entropy would be log loss.

Categorical hinge was another available method. It tries to make maximum distinction between the categories like in SVM.

The company is building a fraud detection model. Sometimes you need to answer customer complains and explain why their transactions are considered fraudulent. How can this be done?

If I caught the idea of local importance right, it could explain the predictions row-wise.

Applying Explainable AI for AutoML model would not be sufficient as it does not provide explanations per customer.

How to compare results from multiple different models?

Vertex AI ML Metadata service would do the job.

How to deal with sensitive passport data?

I probably answered something related to Cloud Data Loss Prevention which I am not very familiar with.

Storing the data to BigQuery and hashing with MD5 algorithm just did not sound right.

The customer needs to pre-process data, train a model and store results to ANSI 2011 compliant database. What combination of GCP services can achieve this?

Dataflow, Vertex AI and BigQuery.

I was not sure about the SQL ANSI part. The other option for the storage service was Cloud SQL which typically better for applications.

A car manufacturer wants you to build a model that predicts car sales in different cities per car type. How should you prepare the features?

In my opinion feature cross from bucketized latitude, bucketized longitude and car type was right. It helps the model to learn from multiple features simultaneously.

All other options were leaning towards independent features.

You need to train an ML model, deploy it and make predictions in an automated way… Which service can fulfill these requirements?

This was possibly the easiest question. A few silly options, but Vertex AI was obviously correct.

How to normalize numeric values in Dataflow with minimum effort?

Use normalizer_fn in Tensorflow.

Two questions about collaborative filtering

Learn about Google’s recommendation system principles. Collaborative is their preferred way to predict user’s preferences by learning from other similar users.

Two similar questions about how to minimize false positives and false negatives in imbalanced dataset

Both of them had options for precision, recall and F1-score.

I chose F1-score as it balances both. Accuracy might not work as a metric for imbalanced data.

Additionally I remember one long question where I answered maximize precision when recall is at least 0.5.

The conclusion: Learn precision, recall, accuracy, F1 and the confusion matrix inside out for binary classification.