**Data Analytics mcq with answers | big data analytics mcq**

**1. This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration This clustering algorithm terminates when mean values computed for the current iteration of the algorithm are identical to the computed mean values for the previous iteration**

- K-Means clustering
- conceptual clustering
- expectation maximization
- agglomerative clustering

K-Means clustering

**2. The correlation coefficient for two real-valued attributes is â€“0.85. What does this value tell you?**

- The attributes are not linearly related.
- As the value of one attribute decreases the value of the second attribute increases.
- As the value of one attribute increases the value of the second attribute also increases.
- The attributes show a linear relationship

As the value of one attribute decreases the value of the second attribute increases.

**3. Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that**

- Y is false when X is known to be false.
- Y is true when X is known to be true.
- X is true when Y is known to be true
- X is false when Y is known to be false.

Y is true when X is known to be true.

**4. Chameleon is**

- Density based clustering algorithm
- Partitioning based algorithm
- Model based algorithm
- Hierarchical clustering algorithm

Hierarchical clustering algorithm

**5. Find odd man out**

- DBSCAN
- K-Mean
- PAM
- None of above

DBSCAN

**6. The number of iterations in apriori _**

- increases with the size of the data
- decreases with the increase in size of the data
- increases with the size of the maximum frequent set
- decreases with increase in size of the maximum frequent set

increases with the size of the maximum frequent set

**7. Which of the following are interestingness measures for association rules?**

- Recall ‘
- Lift
- Accuracy
- All of Above

Lift

**8. Given a frequent itemset L, If |L| = k, then there are**

- 2k â€“ 1 candidate association rules
- 2k candidate association rules
- 2k â€“ 2 candidate association rules
- 2k -2 candidate association rules

2k â€“ 2 candidate association rules (2 to power k -2)

**9. _______ is an example for case based-learning**

- Decision trees
- Neural networks
- Genetic algorithm
- K-nearest neighbor

K-nearest neighbor

**10. The average positive difference between computed and desired outcome values.**

- mean positive
- error mean squared
- error mean absolute
- error root mean squared error

error mean absolute

**data analytics mcq with answers pdf**

**11. Frequent item sets is**

- Superset of only closed frequent item sets
- Superset of only maximal frequent item sets
- Subset of maximal frequent item sets
- Superset of both closed frequent item sets and maximal frequent item sets

Superset of both closed frequent item sets and maximal frequent item sets

**12. Assume that we have a dataset containing information about 200 individuals. A supervised data mining session has discovered the following rule: IF age < 30 & credit card insurance = yes THEN life insurance = yes Rule Accuracy: 70% and Rule Coverage: 63% How many individuals in the class life insurance= no have credit card insurance and are less than 30 years old?**

- 63
- 38
- 40
- 89

38

**13. Which of the following is cluster analysis?**

- Simple segmentation
- Grouping similar objects
- Labeled classification
- Query results grouping

Grouping similar objects

**14. A good clustering method will produce high quality clusters with**

- high inter class similarity
- high intra class similarity
- low intra class similarity
- None of above

low intra class similarity

**15. Which two parameters are needed for DBSCAN**

- Min threshold
- Min points and eps
- Min sup and min confidence
- Number of centroids

Min points and eps

**16. Which statement is true about neural network and linear regression models?**

- Both techniques build models whose output is determined by a linear sum of weighted input attribute values.
- The output of both models is a categorical attribute value.
- Both models require numeric attributes to range between 0 and 1.
- Both models require input attributes to be numeric.

Both models require input attributes to be numeric.

**17. In Apriori algorithm, if 1 item-sets are 100, then the number of candidate 2 item-sets are**

- 100
- 200
- 4950
- 5000

4950

**18. Significant Bottleneck in the Apriori algorithm is**

- Finding frequent itemsets
- Pruning
- Candidate generation
- Number of iterations

Candidate generation

**19. Machine learning techniques differ from statistical techniques in that machine learning methods**

- are better able to deal with missing and noisy data
- typically assume an underlying distribution for the data
- have trouble with large-sized datasets
- are not able to explain their behavior.

are better able to deal with missing and noisy data

**20. The probability of a hypothesis before the presentation of evidence.**

- a priori
- posterior
- conditional
- subjective

a priori

**data analytics mcq questions and answers**

**21. KDD represents extraction of**

- data
- knowledge
- rules
- model

knowledge

**21. Which statement about outliers is true?**

- Outliers should be part of the training dataset but should not be present in the test data.
- Outliers should be identified and removed from a dataset.
- The nature of the problem determines how outliers are used
- Outliers should be part of the test dataset but should not be present in the training data.

The nature of the problem determines how outliers are used

**21. The most general form of distance is**

- Manhattan
- Eucledian
- Mean
- Minkowski

Minkowski

**21. Which Association Rule would you prefer**

- High support and medium confidence
- High support and low confidence
- Low support and high confidence
- Low support and low confidence

Low support and high confidence

**21. In a Rule based classifier, If there is a rule for each combination of attribute values, what do you called that rule set R**

- Exhaustive
- Inclusive
- Comprehensive
- Mutually exclusive

Exhaustive

**21. The apriori property means**

- If a set cannot pass a test, its supersets will also fail the same test
- To decrease the efficiency, do level-wise generation of frequent item sets
- To improve the efficiency, do level-wise generation of frequent item sets
- If a set can pass a test, its supersets will fail the same test

If a set cannot pass a test, its supersets will also fail the same test

**21. If an item set â€˜XYZâ€™ is a frequent item set, then all subsets of that frequent item set are**

- Undefined
- Not frequent
- Frequent
- Can not say

Frequent

**21. The probability that a person owns a sports car given that they subscribe to automotive magazine is 40%. We also know that 3% of the adult population subscribes to automotive magazine. The probability of a person owning a sports car given that they don subscribe to automotive magazine is 30%. Use this information to compute the probability that a person subscribes to automotive magazine given that they own a sports car**

- 0.0368
- 0.0396
- 0.0389
- 0.0398

0.0396

**21. Simple regression assumes a __ relationship between the input attribute and output attribute.**

- quadratic
- inverse
- linear
- reciprocal

linear

**21. To determine association rules from frequent item sets**

- Only minimum confidence needed
- Neither support not confidence needed
- Both minimum support and confidence are needed
- Minimum support is needed

Both minimum support and confidence are needed

**21. If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is**

- C â€“> A
- D â€“>ABCD
- A â€“> BC
- B â€“> ADC

D â€“>ABCD

**21. Classification rules are extracted from _**

- decision tree
- root node
- branches
- siblings

decision tree

**21. What does K refers in the K-Means algorithm which is a non-hierarchical clustering approach?**

- Complexity
- Fixed value
- No of iterations
- number of clusters

number of clusters

**21. If Linear regression model perfectly first i.e., train error is zero, then _________**

- Test error is also always zero
- Test error is non zero
- Couldnâ€™t comment on Test error
- Test error is equal to Train error

Couldnâ€™t comment on Test error

**21. How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?**

- 1
- 2
- 3
- 4

2

**21. In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?**

- by 1
- no change
- by intercept
- by its slope

by its slope

**21. In syntax of linear model lm(formula,data,..), data refers to __**

- Matrix
- array
- vector
- list

vector

**21. In the mathematical Equation of Linear Regression Yâ€„=â€„Î²1 + Î²2X + Ïµ, (Î²1, Î²2) refers to __**

- (X-intercept, Slope)
- (Slope, X-Intercept)
- (Y-Intercept, Slope)
- (slope, Y-Intercept)

(Y-Intercept, Slope)

data analytics mcq, data analytics mcq pdf, data analytics mcq questions and answers, data analytics mcq with answers, data analytics mcq with answers pdf, big data analytics mcq, data analytics multiple choice questions, data analytics sppu mcq, big data analytics mcq with answers, big data analytics mcq questions with answers