Free Databricks Databricks-Machine-Learning-Associate Exam Actual Questions

The questions for Databricks-Machine-Learning-Associate were last updated On Feb 17, 2025

At ValidExamDumps, we consistently monitor updates to the Databricks-Machine-Learning-Associate exam questions by Databricks. Whenever our team identifies changes in the exam questions,exam objectives, exam focus areas or in exam requirements, We immediately update our exam questions for both PDF and online practice exams. This commitment ensures our customers always have access to the most current and accurate questions. By preparing with these actual questions, our customers can successfully pass the Databricks Certified Machine Learning Associate Exam exam on their first attempt without needing additional materials or study guides.

Other certification materials providers often include outdated or removed questions by Databricks in their Databricks-Machine-Learning-Associate exam. These outdated questions lead to customers failing their Databricks Certified Machine Learning Associate Exam exam. In contrast, we ensure our questions bank includes only precise and up-to-date questions, guaranteeing their presence in your actual exam. Our main priority is your success in the Databricks-Machine-Learning-Associate exam, not profiting from selling obsolete exam questions in PDF or Online Practice Test.

 

Question No. 1

A machine learning engineer has identified the best run from an MLflow Experiment. They have stored the run ID in the run_id variable and identified the logged model name as "model". They now want to register that model in the MLflow Model Registry with the name "best_model".

Which lines of code can they use to register the model associated with run_id to the MLflow Model Registry?

Show Answer Hide Answer
Correct Answer: B

To register a model that has been identified by a specific run_id in the MLflow Model Registry, the appropriate line of code is:

mlflow.register_model(f'runs:/{run_id}/model', 'best_model')

This code correctly specifies the path to the model within the run (runs:/{run_id}/model) and registers it under the name 'best_model' in the Model Registry. This allows the model to be tracked, managed, and transitioned through different stages (e.g., Staging, Production) within the MLflow ecosystem.

Reference

MLflow documentation on model registry: https://www.mlflow.org/docs/latest/model-registry.html#registering-a-model


Question No. 2

A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.

Which of the following approaches can the team use to identify which task is the cause of the failure?

Show Answer Hide Answer
Correct Answer: B

To identify which task is causing the failure in the job, the team should review the matrix view in the Job's runs. The matrix view provides a clear and detailed overview of each task's status, allowing the team to quickly identify which task failed. This approach is more efficient than running each notebook interactively, as it provides immediate insights into the job's execution flow and any issues that occurred during the run.


Databricks documentation on Jobs: Jobs in Databricks

Question No. 3

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

Show Answer Hide Answer
Correct Answer: C

The OneHotEncoder in Spark ML requires numerical indices as inputs rather than string labels. Therefore, you need to first convert the string columns to numerical indices using StringIndexer. After that, you can apply OneHotEncoder to these indices.

Corrected code:

from pyspark.ml.feature import StringIndexer, OneHotEncoder # Convert string column to index indexers = [StringIndexer(inputCol=col, outputCol=col+'_index') for col in input_columns] indexer_model = Pipeline(stages=indexers).fit(features_df) indexed_features_df = indexer_model.transform(features_df) # One-hot encode the indexed columns ohe = OneHotEncoder(inputCols=[col+'_index' for col in input_columns], outputCols=output_columns) ohe_model = ohe.fit(indexed_features_df) ohe_features_df = ohe_model.transform(indexed_features_df)


PySpark ML Documentation

Question No. 4

A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.

Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

Show Answer Hide Answer
Correct Answer: B

For the feature set schema provided, the columns that need to be imputed using the most common value (mode) are typically the categorical columns. In this case, loyalty_tier is the only categorical column that should be imputed using the most common value. customer_id is a unique identifier and should not be imputed, while spend and units are numerical columns that should typically be imputed using the mean or median values, not the mode.


Databricks documentation on missing value imputation: Handling Missing Data

If you need any further clarification or additional questions answered, please let me know!

Question No. 5

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:

prediction DOUBLE

actual DOUBLE

Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

A)

B)

C)

D)

E)

Show Answer Hide Answer
Correct Answer: C

The code block to compute the root mean-squared error (RMSE) for a linear regression model in Spark ML should use the RegressionEvaluator class with metricName set to 'rmse'. Given the schema of preds_df with columns prediction and actual, the correct evaluator setup will specify predictionCol='prediction' and labelCol='actual'. Thus, the appropriate code block (Option C in your list) that uses RegressionEvaluator to compute the RMSE is the correct choice. This setup correctly measures the performance of the regression model using the predictions and actual outcomes from the DataFrame. Reference:

Spark ML documentation (Using RegressionEvaluator to Compute RMSE).