# Data analytics mcq questions and answers

## Data analytics mcq questions and answers

1. What is the minimum no. of variables/ features required to perform clustering?

1. 0
2. 1
3. 2
4. 3

1

2. For two runs of K-Mean clustering is it expected to get same clustering results?

1. Yes
2. No

No

3. Which of the following algorithm is most sensitive to outliers?

1. K-means clustering algorithm
2. K-medians clustering algorithm
3. K-modes clustering algorithm
4. K-medoids clustering algorithm

K-means clustering algorithm

4. The discrete variables and continuous variables are two types of

1. Open end classification
2. Time series classification
3. Qualitative classification
4. Quantitative classification

Quantitative classification

5. Bayesian classifiers is

1. A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.
2. Any mechanism employed by a learning system to constrain the search space of a hypothesis
3. An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation.
4. None of these

A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

6. Classification accuracy is

1. A subdivision of a set of examples into a number of classes
2. Measure of the accuracy, of the classification of a concept that is given by a certain theory
3. The task of assigning a classification to a set of examples
4. None of these

Measure of the accuracy, of the classification of a concept that is given by a certain theory

7. Euclidean distance measure is

1. A stage of the KDD process in which new data is added to the existing selection.
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem
4. none of above

The distance between two points as calculated using the Pythagoras theorem

8. Hybrid is

1. Combining different types of method or information
2. Approach to the design of learning algorithms that is structured along the lines of the theory of evolution.
3. Decision support systems that contain an information base filled with the knowledge of an expert formulated in terms of if-then rules.
4. none of above

Combining different types of method or information

9. Decision trees use ______ , in that they always choose the option that seems the best available at that moment.

1. Greedy Algorithms
2. divide and conquer
3. Backtracking
4. Shortest path algorithm

Greedy Algorithms

10. Discovery is

1. It is hidden within a database and can only be recovered if one is given certain clues (an example IS encrypted information).
2. The process of executing implicit previously unknown and potentially useful information from data
3. An extremely complex molecule that occurs in human chromosomes and that carries genetic information in the form of genes.
4. None of these

The process of executing implicit previously unknown and potentially useful information from data

## Data Analytics sppu mcq

11. Hidden knowledge referred to

1. A set of databases from different vendors, possibly using different database paradigms
2. An approach to a problem that is not guaranteed to work but performs well in most cases
3. Information that is hidden in a database and that cannot be recovered by a simple SQL query.
4. None of these

Information that is hidden in a database and that cannot be recovered by a simple SQL query.

12. Decision trees cannot handle categorical attributes with many distinct values, such as country codes for telephone numbers.

1. True
2. False

False

13. Enrichment is

1. A stage of the KDD process in which new data is added to the existing selection
2. The process of finding a solution for a problem simply by enumerating all possible solutions according to some pre-defined order and then testing them
3. The distance between two points as calculated using the Pythagoras theorem.
4. None of these

A stage of the KDD process in which new data is added to the existing selection

14. _____ are easy to implement and can execute efficiently even without prior knowledge of the data, they are among the most popular algorithms for classifying text documents.

1. ID3
2. Naïve Bayes classifiers
3. CART
4. None of above

Naïve Bayes classifiers

15. High entropy means that the partitions in classification are

1. Pure
2. Not Pure
3. Usefull
4. useless

Uses a single processor or computer

16. Which of the following statements about Naive Bayes is incorrect?

1. Attributes are equally important.
2. Attributes are statistically dependent of one another given the class value.
3. Attributes are statistically independent of one another given the class value.
4. Attributes can be nominal or numeric

Attributes are statistically dependent of one another given the class value.

17. The maximum value for entropy depends on the number of classes so if we have 8 Classes what will be the max entropy.

1. Max Entropy is 1
2. Max Entropy is 2
3. Max Entropy is 3
4. Max Entropy is 4

Max Entropy is 3

18. Point out the wrong statement.

1. k-nearest neighbor is same as k-means
2. k-means clustering is a method of vector quantization
3. k-means clustering aims to partition n observations into k clusters
4. none of the mentioned

k-nearest neighbor is same as k-means

19. Consider the following example “How we can divide set of articles such that those articles have the same theme (we do not know the theme of the articles ahead of time) ” is this:

1. Clustering
2. Classification
3. Regression
4. None of these

Clustering

## data analytics mcqs with answers

20. Can we use K Mean Clustering to identify the objects in video?

1. Yes
2. No

Yes

21. Clustering techniques are ______ in the sense that the data scientist does not determine, in advance, the labels to apply to the clusters.

1. Unsupervised
2. supervised
3. Reinforcement
4. Neural network

Unsupervised

22. _____ metric is examined to determine a reasonably optimal value of k.

1. Mean Square Error
2. Within Sum of Squares (WSS)
3. Speed
4. None of these

Within Sum of Squares (WSS)

23. If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

1. Apriori Property
2. Downward Closure Property
3. Either 1 or 2
4. Both 1 and 2

Both 1 and 2

24. if {bread,eggs,milk} has a support of 0.15 and {bread,eggs} also has a support of 0.15, the confidence of rule {bread,eggs}→{milk} is

1. 0
2. 1
3. 2
4. 3

1

25. Confidence is a measure of how X and Y are really related rather than coincidentally happening together.

1. True
2. False

False

26. ______ recommend items based on similarity measures between users and/or items.

1. Content Based Systems
2. Hybrid System
3. Collaborative Filtering Systems
4. None of these

Collaborative Filtering Systems

27. There are ______ major Classification of Collaborative Filtering Mechanisms

1. 1
2. 2
3. 3
4. none of above

2

28. Movie Recommendation to people is an example of

1. User Based Recommendation
2. Item Based Recommendation
3. Knowledge Based Recommendation
4. content based recommendation

Item Based Recommendation

29. _____ recommenders rely on an explicitely defined set of recommendation rules

1. Constraint Based
2. Case Based
3. Content Based
4. User Based

Case Based

30. Parallelized hybrid recommender systems operate dependently of one another and produce separate recommendation lists.

1. True
2. False