Data used for an object detection ML system was found to have been labelled incorrectly in many cases.
Which ONE of the following options is most likely the reason for this problem?
SELECT ONE OPTION
The question refers to a problem where data used for an object detection ML system was labelled incorrectly. This issue is most closely related to 'accuracy issues.' Here's a detailed explanation:
Accuracy Issues: The primary goal of labeling data in machine learning is to ensure that the model can accurately learn and make predictions based on the given labels. Incorrectly labeled data directly impacts the model's accuracy, leading to poor performance because the model learns incorrect patterns.
Why Not Other Options:
Security Issues: This pertains to data breaches or unauthorized access, which is not relevant to the problem of incorrect data labeling.
Privacy Issues: This concerns the protection of personal data and is not related to the accuracy of data labeling.
Bias Issues: While bias in data can affect model performance, it specifically refers to systematic errors or prejudices in the data rather than outright incorrect labeling.
Max. Score: 2
Al-enabled medical devices are used nowadays for automating certain parts of the medical diagnostic processes. Since these are life-critical process the relevant authorities are considenng bringing about suitable certifications for these Al enabled medical devices. This certification may involve several facets of Al testing (I - V).
I . Autonomy
II . Maintainability
III . Safety
IV . Transparency
V . Side Effects
Which ONE of the following options contains the three MOST required aspects to be satisfied for the above scenario of certification of Al enabled medical devices?
SELECT ONE OPTION
For AI-enabled medical devices, the most required aspects for certification are safety, transparency, and side effects. Here's why:
Safety (Aspect III): Critical for ensuring that the AI system does not cause harm to patients.
Transparency (Aspect IV): Important for understanding and verifying the decisions made by the AI system.
Side Effects (Aspect V): Necessary to identify and mitigate any unintended consequences of the AI system.
Why Not Other Options:
Autonomy and Maintainability (Aspects I and II): While important, they are secondary to the immediate concerns of safety, transparency, and managing side effects in life-critical processes.
Which ONE of the following activities is MOST relevant when addressing the scenario where you have more than the required amount of data available for the training?
SELECT ONE OPTION
A . Feature selection
Feature selection is the process of selecting the most relevant features from the data. While important, it is not directly about handling excess data.
B . Data sampling
Data sampling involves selecting a representative subset of the data for training. When there is more data than needed, sampling can be used to create a manageable dataset that maintains the statistical properties of the full dataset.
C . Data labeling
Data labeling involves annotating data for supervised learning. It is necessary for training models but does not address the issue of having excess data.
D . Data augmentation
Data augmentation is used to increase the size of the training dataset by creating modified versions of existing data. It is useful when there is insufficient data, not when there is excess data.
Therefore, the correct answer is B because data sampling is the most relevant activity when dealing with an excess amount of data for training.
Which ONE of the following combinations of Training, Validation, Testing data is used during the process of learning/creating the model?
SELECT ONE OPTION
The process of developing a machine learning model typically involves the use of three types of datasets:
Training Data: This is used to train the model, i.e., to learn the patterns and relationships in the data.
Validation Data: This is used to tune the model's hyperparameters and to prevent overfitting during the training process.
Test Data: This is used to evaluate the final model's performance and to estimate how it will perform on unseen data.
Let's analyze each option:
A . Training data - validation data - test data
This option correctly includes all three types of datasets used in the process of creating and validating a model. The training data is used for learning, validation data for tuning, and test data for final evaluation.
B . Training data - validation data
This option misses the test data, which is crucial for evaluating the model's performance on unseen data after the training and validation phases.
C . Training data - test data
This option misses the validation data, which is important for tuning the model and preventing overfitting during training.
D . Validation data - test data
This option misses the training data, which is essential for the initial learning phase of the model.
Therefore, the correct answer is A because it includes all necessary datasets used during the process of learning and creating the model: training, validation, and test data.
"AllerEgo" is a product that uses sell-learning to predict the behavior of a pilot under combat situation for a variety of terrains and enemy aircraft formations. Post training the model was exposed to the real-
world data and the model was found to be behaving poorly. A lot of data quality tests had been performed on the data to bring it into a shape fit for training and testing.
Which ONE of the following options is least likely to describes the possible reason for the fall in the performance, especially when considering the self-learning nature of the Al system?
SELECT ONE OPTION
The difficulty of defining criteria for improvement before the model can be accepted.
The fast pace of change did not allow sufficient time for testing.
The unknown nature and insufficient specification of the operating environment might have caused the poor performance.
There was an algorithmic bias in the Al system.