Name: CCA Spark and Hadoop Developer
Brand: ValidExamDumps
SKU: CCA175
Price: 20 USD
Availability: InStock
Rating: 4.8 (370 reviews)

Free Cloudera CCA175 Exam Actual Questions

The questions for CCA175 were last updated On Mar 31, 2025

At ValidExamDumps, we consistently monitor updates to the Cloudera CCA175 exam questions by Cloudera. Whenever our team identifies changes in the exam questions,exam objectives, exam focus areas or in exam requirements, We immediately update our exam questions for both PDF and online practice exams. This commitment ensures our customers always have access to the most current and accurate questions. By preparing with these actual questions, our customers can successfully pass the Cloudera CCA Spark and Hadoop Developer exam on their first attempt without needing additional materials or study guides.

Other certification materials providers often include outdated or removed questions by Cloudera in their Cloudera CCA175 exam. These outdated questions lead to customers failing their Cloudera CCA Spark and Hadoop Developer exam. In contrast, we ensure our questions bank includes only precise and up-to-date questions, guaranteeing their presence in your actual exam. Our main priority is your success in the Cloudera CCA175 exam, not profiting from selling obsolete exam questions in PDF or Online Practice Test.

Question No. 1

Problem Scenario 17 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish below assignment.

1. Create a table in hive as below, create table departments_hiveOl(department_id int, department_name string, avg_salary int);

2. Create another table in mysql using below statement CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);

3. Copy all the data from departments table to departments_hive01 using insert into departments_hive01 select a.*, null from departments a;

Also insert following records as below

insert into departments_hive01 values(777, "Not known",1000);

insert into departments_hive01 values(8888, null,1000);

insert into departments_hive01 values(666, null,1100);

4. Now import data from mysql table departments_hive01 to this hive table. Please make sure that data should be visible using below hive command. Also, while importing if null value found for department_name column replace it with "" (empty string) and for id column with -999 select * from departments_hive;

ASolution :
Step 1 : Create hive table as below.
hive
show tables;
create table departments_hive01(department_id int, department_name string, avgsalary int);
Step 2 : Create table in mysql db as well.
mysql -user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);
show tables;
step 3 : Insert data in mysql table.
insert into departments_hive01 select a.*, null from departments a;
check data inserts
select' from departments_hive01;
Now iserts null records as given in problem. insert into departments_hive01 values(777, 'Not known',1000); insert into departments_hive01 values(8888, null,1000); insert into departments_hive01 values(666, null,1100);
Step 4 : Now import data in hive as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
--password=cloudera \
-table departments_hive01 \
--hive-home /user/hive/warehouse \
--hive-import \
-hive-overwrite \
-hive-table departments_hive0l \
--fields-terminated-by '\001' \
--null-string M'\
--null-non-strlng -999 \
-split-by id \
-m 1
Step 5 : Checkthe data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive01
hdfs dfs -cat/user/hive/warehouse/departments_hive01/part'
Check data in hive table.
Select * from departments_hive01;

BSolution :
Step 1 : Create hive table as below.
hive
show tables;
create table departments_hive01(department_id int, department_name string, avgsalary int);
Step 2 : Create table in mysql db as well.
mysql -user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);
show tables;
step 3 : Insert data in mysql table.
insert into departments_hive01 select a.*, null from departments a;
check data inserts
select' from departments_hive01;
Now iserts null records as given in problem. insert into departments_hive01 values(777, 'Not known',1000); insert into departments_hive01 values(8888, null,1000); insert into departments_hive01 values(888, null,3300);
Step 4 : Now import data in hive as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
--password=cloudera \
-table departments_hive07 \
--hive-home /user/hive/warehouse \
--hive-import \
-hive-overwrite \
-hive-table departments_hive0l \
--fields-terminated-by '\001' \
--null-string M'\
--null-non-strlng -888 \
-split-by id \
-m 1
Step 5 : Checkthe data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive01
hdfs dfs -cat/user/hive/warehouse/departments_hive01/part'
Check data in hive table.
Select * from departments_hive01;

Show Answer

Correct Answer: A

Question No. 2

Problem Scenario 87 : You have been given below three files

product.csv (Create this file in hdfs)

productID,productCode,name,quantity,price,supplierid

1001,PEN,Pen Red,5000,1.23,501

1002,PEN,Pen Blue,8000,1.25,501

1003,PEN,Pen Black,2000,1.25,501

1004,PEC,Pencil 2B,10000,0.48,502

1005,PEC,Pencil 2H,8000,0.49,502

1006,PEC,Pencil HB,0,9999.99,502

2001,PEC,Pencil 3B,500,0.52,501

2002,PEC,Pencil 4B,200,0.62,501

2003,PEC,Pencil 5B,100,0.73,501

2004,PEC,Pencil 6B,500,0.47,502

supplier.csv

supplierid,name,phone

501,ABC Traders,88881111

502,XYZ Company,88882222

503,QQ Corp,88883333

products_suppliers.csv

productID,supplierID

2001,501

2002,501

2003,501

2004,502

2001,503

Now accomplish all the queries given in solution.

Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL

ASolution :
Step 1:
hdfs dfs -mkdir sparksql2
hdfs dfs -put product.csv sparksq!2/
hdfs dfs -put supplier.csv sparksql2/
hdfs dfs -put products_suppliers.csv sparksql2/
Step 2 : Now in spark shell
// this Is used to Implicitly convert an RDD to a DataFrame.
import sqlContext.impIicits._
// Import Spark SQL data types and Row.
import org.apache.spark.sql._
// load the data into a new RDD
val products = sc.textFile('sparksql2/product.csv')
val supplier = sc.textFileC'sparksq^supplier.csv')
val prdsup = sc.textFile('sparksql2/products_suppliers.csv'}
// Return the first element in this RDD
products.fi rst()
supplier.first{).
prdsup.first()
//define the schema using a case class
case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger)
case class Suplier(supplierid: Integer, name: String, phone: String)
case class PRDSUP(productid: Integer.supplierid: Integer)
// create an RDD of Product objects
val prdRDD = products.map(_.split('\')).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint))
val supRDD = supplier.map(_.split(',')).map(p => Suplier(p(0).tolnt,p(1),p(2)))
val prdsupRDD = prdsup.map(_.split(',')).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}}
prdRDD.first()
prdRDD.count()
supRDD.first() supRDD.count()
prdsupRDD.first() prdsupRDD.count(}
// change RDD of Product objects to a DataFrame
val prdDF = prdRDD.toDF()
val supDF = supRDD.toDF()
val prdsupDF = prdsupRDD.toDF()
// register the DataFrame as a temp table prdDF.registerTempTablef'products')
supDF.registerTempTablef'suppliers')
prdsupDF.registerTempTablef'productssuppliers'}
//Select product, its price , its supplier name where product price is less than 0.6
val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......]
results. show()

BSolution :
Step 1:
hdfs dfs -mkdir sparksql2
hdfs dfs -put product.csv sparksq!2/
hdfs dfs -put supplier.csv sparksql2/
hdfs dfs -put products_suppliers.csv sparksql2/
Step 2 : Now in spark shell
// this Is used to Implicitly convert an RDD to a DataFrame.
import sqlContext.impIicits._
// Import Spark SQL data types and Row.
import org.apache.spark.sql._
// load the data into a new RDD
val products = sc.textFile('sparksql2/product.csv')
val supplier = sc.textFileC'sparksq^supplier.csv')
val prdsup = sc.textFile('sparksql2/products_suppliers.csv'}
// Return the first element in this RDD
products.fi rst()
supplier.first{).
prdsup.first()
//define the schema using a case class
case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger)
case class Suplier(supplierid: Integer, name: String, phone: String)
case class PRDSUP(productid: Integer.supplierid: Integer)
// create an RDD of Product objects
val prdRDD = products.map(_.split('\')).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint))
val supRDD = supplier.map(_.split(',')).map(p => Suplier(p(0).tolnt,p(1),p(2)))
val prdsupRDD = prdsup.map(_.split(',')).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}}
val prdsupDF = prdsupRDD.toDF()
// register the DataFrame as a temp table prdDF.registerTempTablef'products')
supDF.registerTempTablef'suppliers')
prdsupDF.registerTempTablef'productssuppliers'}
//Select product, its price , its supplier name where product price is less than 0.6
val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......]
results. show()

Show Answer

Correct Answer: A

Question No. 3

Problem Scenario 28 : You need to implement near real time solutions for collecting information when submitted in file with below

Data

echo "IBM,100,20160104" >> /tmp/spooldir2/.bb.txt

echo "IBM,103,20160105" >> /tmp/spooldir2/.bb.txt

mv /tmp/spooldir2/.bb.txt /tmp/spooldir2/bb.txt

After few mins

echo "IBM,100.2,20160104" >> /tmp/spooldir2/.dr.txt

echo "IBM,103.1,20160105" >> /tmp/spooldir2/.dr.txt

mv /tmp/spooldir2/.dr.txt /tmp/spooldir2/dr.txt

You have been given below directory location (if not available than create it) /tmp/spooldir2 .

As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/primary as well as /tmp/flume/secondary location.

However, note that/tmp/flume/secondary is optional, if transaction failed which writes in this directory need not to be rollback.

Write a flume configuration file named flumeS.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir2 directory

2. File prefix in hdfs sholuld be events

3. File suffix should be .log

4. If file is not committed and in use than it should have _ as prefix.

5. Data should be written as text to hdfs

ASolution :
Step 1 : Create directory mkdir /tmp/spooldir2
Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume8.conf.
agent1 .sources = source1
agent1.sinks = sink1a sink1b
agent1.channels = channel1a channel1b
agent1.sources.source1.channels = channel1a channel1b
agent1.sources.source1.selector.type = replicating
agent1.sources.source1.selector.optional = channel1b
agent1.sinks.sink1a.channel = channel1a
agent1 .sinks.sink1b.channel = channel1b
agent1.sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir2
agent1.sinks.sink1a.type = hdfs
agent1 .sinks, sink1a.hdfs. path = /tmp/flume/primary
agent1 .sinks.sink1a.hdfs.tilePrefix = events
agent1 .sinks.sink1a.hdfs.fileSuffix = .log
agent1 .sinks.sink1a.hdfs.fileType = Data Stream
agent1 .sinks.sink1b.type = hdfs
agent1 .sinks.sink1b.hdfs.path = /tmp/flume/secondary
agent1 .sinks.sink1b.hdfs.filePrefix = events
agent1.sinks.sink1b.hdfs.fileSuffix = .log
agent1 .sinks.sink1b.hdfs.fileType = Data Stream
agent1.channels.channel1a.type = file
agent1.channels.channel1b.type = memory
step 4 : Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume8.conf --name age
Step 5 : Open another terminal and create a file in /tmp/spooldir2/
echo 'IBM,100,20160104' /tmp/spooldir2/.bb.txt
echo 'IBM,103,20160105' /tmp/spooldir2/.bb.txt mv /tmp/spooldir2/.bb.txt /tmp/spooldir2/bb.txt
After few mins
echo 'IBM.100.2,20160104' /tmp/spooldir2/.dr.txt
echo 'IBM,103.1,20160105' /tmp/spooldir2/.dr.txt mv /tmp/spooldir2/.dr.txt /tmp/spooldir2/dr.txt

BSolution :
Step 1 : Create directory mkdir /tmp/spooldir2
Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume8.conf.
agent1 .sources = source1
agent1.sinks = sink1a sink1b
agent1.channels = channel1a channel1b
agent1.sources.source1.channels = channel1a channel1b
agent1.sources.source1.selector.type = replicating
agent1.sources.source1.selector.optional = channel1b
agent1.sinks.sink1a.channel = channel1a
agent1.channels.channel1b.type = memory
step 4 : Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume8.conf --name age
Step 5 : Open another terminal and create a file in /tmp/spooldir2/
echo 'IBM,100,20160104' /tmp/spooldir2/.bb.txt
echo 'IBM,103,20160105' /tmp/spooldir2/.bb.txt mv /tmp/spooldir2/.bb.txt /tmp/spooldir2/bb.txt
After few mins
echo 'IBM.100.2,20160104' /tmp/spooldir2/.dr.txt
echo 'IBM,103.1,20160105' /tmp/spooldir2/.dr.txt mv /tmp/spooldir2/.dr.txt /tmp/spooldir2/dr.txt

Show Answer

Correct Answer: A

Question No. 4

Problem Scenario 92 : You have been given a spark scala application, which is bundled in jar named hadoopexam.jar.

Your application class name is com.hadoopexam.MyTask

You want that while submitting your application should launch a driver on one of the cluster node.

Please complete the following command to submit the application.

spark-submit XXX -master yarn \

YYY SSPARK HOME/lib/hadoopexam.jar 10

ASolution
XXX: -class com.hadoopexam.MyTask

BSolution
XXX: -class com.hadoopexam.MyTask
YYY : --deploy-mode cluster

Show Answer

Correct Answer: B

Question No. 5

Problem Scenario 60 : You have been given below code snippet.

val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}

val b = a.keyBy(_.length)

val c = sc.parallelize(List("dog","cat","gnu","salmon","rabbit","turkey","woif","bear","bee"), 3)

val d = c.keyBy(_.length)

operation1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(lnt, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (6,(salmon,turkey)), (6,(salmon,salmon)), (6,(salmon,rabbit)),

(6,(salmon,turkey)), (3,(dog,dog)), (3,(dog,cat)), (3,(dog,gnu)), (3,(dog,bee)), (3,(rat,dog)), (3,(rat,cat)), (3,(rat,gnu)), (3,(rat,bee)))

Asolution:
b.join(d).collect
join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item.

Bsolution:
b.join(d).collect
join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the originalvalue of the newly created tuples.

Show Answer

Correct Answer: B