Name: Databricks Certified Associate Developer for Apache Spark 3.0
Brand: ValidExamDumps
SKU: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0
Price: 20 USD
Availability: InStock
Rating: 5.0 (200 reviews)

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Actual Questions

The questions for Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 were last updated On Apr 17, 2025

At ValidExamDumps, we consistently monitor updates to the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam questions by Databricks. Whenever our team identifies changes in the exam questions,exam objectives, exam focus areas or in exam requirements, We immediately update our exam questions for both PDF and online practice exams. This commitment ensures our customers always have access to the most current and accurate questions. By preparing with these actual questions, our customers can successfully pass the Databricks Certified Associate Developer for Apache Spark 3.0 exam on their first attempt without needing additional materials or study guides.

Other certification materials providers often include outdated or removed questions by Databricks in their Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam. These outdated questions lead to customers failing their Databricks Certified Associate Developer for Apache Spark 3.0 exam. In contrast, we ensure our questions bank includes only precise and up-to-date questions, guaranteeing their presence in your actual exam. Our main priority is your success in the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam, not profiting from selling obsolete exam questions in PDF or Online Practice Test.

Question No. 1

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

AtransactionsDf['storeId'].distinct()

BtransactionsDf.select('storeId').distinct()
(Correct)

CtransactionsDf.filter('storeId').distinct()

DtransactionsDf.select(col('storeId').distinct())

EtransactionsDf.distinct('storeId')

Show Answer

Correct Answer: B

distinct() is a method of a DataFrame. Knowing this, or recognizing this from the documentation, is the key to solving this question.

More info: pyspark.sql.DataFrame.distinct --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 2, Question: 19 (Databricks import instructions)

Question No. 2

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A1. select
2. 'storeId'
3. print_schema()

B1. limit
2. 1
3. columns

C1. select
2. 'storeId'
3. printSchema()

D1. limit
2. 'storeId'
3. printSchema()

E1. select
2. storeId
3. dtypes

Show Answer

Correct Answer: B

Correct code block:

transactionsDf.select('storeId').printSchema()

The difficulty of this Question: is that it is hard to solve with the stepwise first-to-last-gap approach that has worked well for similar questions, since the answer options are so different from

one

another. Instead, you might want to eliminate answers by looking for patterns of frequently wrong answers.

A first pattern that you may recognize by now is that column names are not expressed in quotes. For this reason, the answer that includes storeId should be eliminated.

By now, you may have understood that the DataFrame.limit() is useful for returning a specified amount of rows. It has nothing to do with specific columns. For this reason, the answer that resolves to

limit('storeId') can be eliminated.

Given that we are interested in information about the data type, you should Question: whether the answer that resolves to limit(1).columns provides you with this information. While

DataFrame.columns is a valid call, it will only report back column names, but not column types. So, you can eliminate this option.

The two remaining options either use the printSchema() or print_schema() command. You may remember that DataFrame.printSchema() is the only valid command of the two. The select('storeId')

part just returns the storeId column of transactionsDf - this works here, since we are only interested in that column's type anyways.

More info: pyspark.sql.DataFrame.printSchema --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 57 (Databricks import instructions)

Question No. 3

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

AtransactionsDf.repartition(transactionsDf.getNumPartitions()+2)

BtransactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

CtransactionsDf.coalesce(10)

DtransactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

EtransactionsDf.repartition(transactionsDf._partitions+2)

Show Answer

Correct Answer: B

transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

Correct. The repartition operator is the correct one for increasing the number of partitions. calling getNumPartitions() on DataFrame.rdd returns the current number of partitions.

transactionsDf.coalesce(10)

No, after this command transactionsDf will continue to only have 8 partitions. This is because coalesce() can only decreast the amount of partitions, but not increase it.

transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

Incorrect, there is no getNumPartitions() method for the DataFrame class.

transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

Wrong, coalesce() can only be used for reducing the number of partitions and there is no getNumPartitions() method for the DataFrame class.

transactionsDf.repartition(transactionsDf._partitions+2)

No, DataFrame has no _partitions attribute. You can find out the current number of partitions of a DataFrame with the DataFrame.rdd.getNumPartitions() method.

More info: pyspark.sql.DataFrame.repartition --- PySpark 3.1.2 documentation, pyspark.RDD.getNumPartitions --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, Question: 23 (Databricks import instructions)

Question No. 4

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+

3. +-------------+---------+-----+-------+---------+----+

4. | 1| 3| 4| 25| 1|null|

5. | 2| 6| 7| 2| 2|null|

6. | 3| 3| null| 25| 3|null|

7. | 4| null| null| 3| 2|null|

8. | 5| null| null| null| 2|null|

9. | 6| 3| 2| 25| 2|null|

10. +-------------+---------+-----+-------+---------+----+

AtransactionsDf.filter(col('transactionId').isin([3,4,6])).select([predError, value])

BtransactionsDf.select(col('transactionId').isin([3,4,6]), 'predError', 'value')

CtransactionsDf.filter('transactionId' % 2 == 0).select('predError', 'value')

DtransactionsDf.filter(col('transactionId') % 2 == 0).select('predError', 'value')
(Correct)
E.
1. transactionsDf.createOrReplaceTempView('transactionsDf')
2. spark.sql('FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2')

FtransactionsDf.filter(col(transactionId).isin([3,4,6]))

Show Answer

Correct Answer: D

Output of correct code block:

+---------+-----+

|predError|value|

+---------+-----+

| 6| 7|

| null| null|

| 3| 2|

+---------+-----+

This is not an easy Question: to solve. You need to know that % stands for the module operator in Python. % 2 will return true for every second row. The statement using spark.sql gets it

almost right (the modulo operator exists in SQL as well), but % 2 = 2 will never yield true, since modulo 2 is either 0 or 1.

Other answers are wrong since they are missing quotes around the column names and/or use filter or select incorrectly.

If you have any doubts about SparkSQL and answer options 3 and 4 in this question, check out the notebook I created as a response to a related student question.

Static notebook | Dynamic notebook: See test 1, Question: 53 (Databricks import instructions)

Question No. 5

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1. root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1. +-------------+---------+-----+-------+---------+----+

3. +-------------+---------+-----+-------+---------+----+

4. | 1| 3| 4| 25| 1|null|

5. | 2| 6| 7| 2| 2|null|

6. | 3| 3| null| 25| 3|null|

7. +-------------+---------+-----+-------+---------+----+

AtransactionsDf.schema.print()

BtransactionsDf.rdd.printSchema()

CtransactionsDf.rdd.formatSchema()

DtransactionsDf.printSchema()

Eprint(transactionsDf.schema)

Show Answer

Correct Answer: D

The output is the typical output of a DataFrame.printSchema() call. The DataFrame's RDD representation does not have a printSchema or formatSchema method (find available methods in the RDD

documentation linked below). The output of print(transactionsDf.schema) is this: StructType(List(StructField(transactionId,IntegerType,true),StructField(predError,IntegerType,true),StructField

(value,IntegerType,true),StructField(storeId,IntegerType,true),StructField(productId,IntegerType,true),StructField(f,IntegerType,true))). It includes the same information as the nicely formatted original

output, but is not nicely formatted itself. Lastly, the DataFrame's schema attribute does not have a print() method.

More info:

- pyspark.RDD: pyspark.RDD --- PySpark 3.1.2 documentation

- DataFrame.printSchema(): pyspark.sql.DataFrame.printSchema --- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 2, Question: 52 (Databricks import instructions)