You are using a Python notebook in an Apache Spark pool in Azure Synapse Analytics.
You need to present the data distribution statistics from a DataFrame in a tabular view.
Which method should you invoke on the DataFrame?
pandas.DataFrame.corr computes pairwise correlation of columns, excluding NA/null values.
Incorrect:
* freqItems
pyspark.sql.DataFrame.freqItems
Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou.'
* summary is used for index.
* There is no panda method for rollup. Rollup would not be correct anyway.
You have a Power Bl report that contains the visual shown in the following exhibit.
You need to make the visual more accessible to users who have color vision deficiency. What should you do?
Themes, contrast and colorblind-friendly colors
You should ensure that your reports have enough contrast between text and any background colors.
Certain color combinations are particularly difficult for users with color vision deficiencies to distinguish. These include the following combinations:
**---> green and black
green and red
green and brown
blue and purple
green and blue
light green and yellow
blue and grey
green and grey
Avoid using these colors together in a chart, or on the same report page.
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Power Bl dataset named Dataset1.
In Dataset1, you currently have 50 measures that use the same time intelligence logic.
You need to reduce the number of measures, while maintaining the current functionality.
Solution: From Power Bl Desktop, you create a hierarchy.
Does this meet the goal?
Instead use the solution: From DAX Studio, you write a query that uses grouping sets.
A grouping is a set of discrete values that are used to group measure fields.
Note: A hierarchy is an ordered set of values that are linked to the level above. An example of a hierarchy could be Country, State, and City. Cities are in a State, and States make up a Country. In Power BI visuals can handle hierarchy data and provide controls for the user to navigate up and down the hierarchy.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using an Azure Synapse Analytics serverless SQL pool to query a collection of Apache Parquet files by using automatic schema inference. The files contain more than 40 million rows of UTF-8-encoded business names, survey names, and participant counts. The database is configured to use the default collation.
The queries use open row set and infer the schema shown in the following table.
You need to recommend changes to the queries to reduce I/O reads and tempdb usage.
Solution: You recommend using openrowset with to explicitly define the collation for businessName and surveyName as Latim_Generai_100_BiN2_UTF8.
Does this meet the goal?
Query Parquet files using serverless SQL pool in Azure Synapse Analytics.
Important
Ensure you are using a UTF-8 database collation (for example Latin1_General_100_BIN2_UTF8) because string values in PARQUET files are encoded using UTF-8 encoding. A mismatch between the text encoding in the PARQUET file and the collation may cause unexpected conversion errors. You can easily change the default collation of the current database using the following T-SQL statement: alter database current collate Latin1_General_100_BIN2_UTF8'.
Note: If you use the Latin1_General_100_BIN2_UTF8 collation you will get an additional performance boost compared to the other collations. The Latin1_General_100_BIN2_UTF8 collation is compatible with parquet string sorting rules. The SQL pool is able to eliminate some parts of the parquet files that will not contain data needed in the queries (file/column-segment pruning). If you use other collations, all data from the parquet files will be loaded into Synapse SQL and the filtering is happening within the SQL process. The Latin1_General_100_BIN2_UTF8 collation has additional performance optimization that works only for parquet and CosmosDB. The downside is that you lose fine-grained comparison rules like case insensitivity.
You have an Azure Synapse Analytics dataset that contains data about jet engine performance. You need to score the dataset to identify the likelihood of an engine failure. Which function should you use in the query?