Data mining mcq questions | dmw mcqs sppu

data mining mcq questions, data mining and warehousing mcq, data mining and warehousing mcq sppu, data mining and warehousing sppu mcq, data mining mcq pdf, dmw mcq, dmw mcq pdf, data warehousing mcq, data mining and warehousing mcq pdf

Data mining mcq with answers

Here are the 60 most important Data mining mcq and warehousing mcq that can be asked in your be comp online examination. Data mining mcq questions are given with the show answer button. dmw mcqs sppu can ask for 1 marks are all listed here.

In data mining and warehousing mcq questions given below, BOLD option is the correct answer

What is the method to interpret the results after rule generation?

A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori

Advertisement

OLAP database design is

A : Application-oriented
B : Object-oriented
C : Goal-oriented
D : Subject-oriented

Advertisement

Multilevel association rules can be mined efficiently using

A : Support
B : Confidence
C : Support count
D : Concept Hierarchies under support-confidence framework

accuracy is used to measure

A : classifier’s true abilities
B : classifier’s analytic abilities
C : classifier’s decision abilities
D : classifier’s predictive abilities

Advertisement

Supervised learning and unsupervised clustering both require at least one

A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute

Advertisement

The task of building decision model from labeled training data is called as

A : Supervised Learning
B : Unsupervised Learning
C : Reinforcement Learning
D : Structure Learning

What is the range of the cosine similarity of the two documents?

A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero

Advertisement

Multi-class classification makes the assumption that each sample is assigned to

A : one and only one label
B : many labels
C : one or many labels
D : no label

Which of these is not a frequent pattern mining algorithm?

A : Decision trees
B : Eclat
C : FP growth
D : Apriori

Advertisement

The first steps involved in the knowledge discovery is?

A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning

The distance between two points calculated using Pythagoras theorem is

A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance

What do you mean by dissimilarity measure of two objects?

A : Is a numerical measure of how alike two data objects are.
B : Is a numerical measure of how different two data objects are.
C : Higher when objects are more alike
D : Lower when objects are more different

Advertisement

An ROC curve for a given model shows the trade-off between

A : random sampling
B : test data and train data
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)

Each dimension is represented by only one table. Recognize the type of schema.

A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema

Choose the correct concept hierarchy.

A : city < street < state < country
B : street < city < state < country
C : street > city > state > country
D : street > city > country > state

Advertisement

Height is an example of which type of attribute

A : Nominal
B : Binary
C : Ordinal
D : Numeric

Which angle is used to measure document similarity?

A : Sin
B : Tan
C : Cos
D : Sec


Which of the following is the data mining tool?

A : Borland C
B : Weka
C : Borland C++
D : Visual C

A decision tree is also known as

A : general tree
B : binary tree
C : prediction tree
D : None of the options

Advertisement

recall is a measure of

A : completeness of what percentage of positive tuples are labeled
B : a measure of exactness for misclassification
C : a measure of exactness of what percentage of tuples are not classified
D : a measure of exactness of what percentage of tuples labeled as negative are at actual

What is the approach of basic algorithm for decision tree induction?

A : Greedy
B : Top Down
C : Procedural
D : Step by Step

The rule is considered as intersting if

A : They satisfy both minimum support and minimum confidence threshold
B : They satisfy both maximum support and maximum confidence threshold
C : They satisfy maximum support and minimum confidence threshold
D : They satisfy minimum support and maximum confidence threshold

For mining frequent itemsets, the Data format used by Apriori and FP Growth algorithms are

A : Apriori uses horizontal and FP-Growth uses vertical data format
B : Apriori uses vertical and FP-Growth uses horizontal data format
C : Apriori and FP-Growth both uses vertical data format
D : Apriori and FP-Growth both uses horizontal data format

Which of the following sequence is used to calculate proximity measures for ordinal attribute?

A : Replacement discretization and distance measure
B : Replacement characterizarion and distance measure
C : Normalization discretization and distance measure
D : Replacement normalization and distance measure

Multilevel association rule mining is

A : Association rules generated from candidate-generation method
B : Association rules generated from without candidate-generation method
C : Association rules generated from mining data at multiple abstarction level
D : Assocation rules generated from frequent itemsets

Advertisement

Which of the following is not correct use of cross validation?

A : Selecting variables to include in a model
B : Comparing predictors
C : Selecting parameters in prediction function
D : classification

What do you mean by support(A)?

A : Total number of transactions containing A
B : Total Number of transactions not containing A
C : Number of transactions containing A / Total number of transactions
D : Number of transactions not containing A / Total number of transactions

Data mining and warehousing mcq pdf

The fact table contains

A : The names of the facts
B : Keys to each of the related dimension tables
C : Facts and keys
D : Facts or keys

Advertisement

Every key structure in the data warehouse contains a time element

A : records
B : Explicitly
C : Implicitly and explicitly
D : Implicitly or explicitly

The accuracy of a classifier on a given test set is the percentage of

A : test set tuples that are correctly classified by the classifier
B : test set tuples that are incorrectly classified by the classifier
C : test set tuples that are incorrectly misclassified by the classifier
D : test set tuples that are not classified by the classifier

How will you counter over-fitting in decision tree?

A : By creating new rules
B : By pruning the longer rules
C : Both By pruning the longer rules’ and ‘ By creating new rules’
D : BY creating new tree

The confusion matrix is a useful tool for analyzing

A : Regression
B : Classification
C : Sampling
D : Cross validation

Advertisement

If A, B are two sets of items, and A is a subset of B. Which of the following statement is always true?

A : Support(A) is less than or equal to Support(B)
B : Support(A) is greater than or equal to Support(B)
C : Support(A) is equal to Support(B)
D : Support(A) is not equal to Support(B)

What is the limitation behind rule generation in Apriori algorithm?

A : Need to generate a huge number of candidate sets
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
C : Dropping itemsets with valued information
D : Both (a) dnd (b)

In asymmetric attribute

A : No value is considered important over other values
B : All values are equal
C : Only non-zero value is important
D : Range of values is important

One of the most well known software used for classification is

A : Java
B : C4.5
C : Oracle
D : C++

Identify the example of sequence data

A : weather forecast
B : data matrix
C : market basket data
D : genomic data

What type of matrix is required to represent binary data for proximity measures?

A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix

Advertisement

Some company wants to divide their customers into distinct groups to send offers this is an example of

A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection

This operation may add new dimension to the cube

A : Roll up
B : Drill down
C : Slice
D : Dice

Which of the following sentence is FALSE regarding regression?

A : It relates inputs to outputs.
B : It is used for prediction.
C : It may be used for interpretation.
D : It discovers causal relationships.

The following represents age distribution of students in an elementary
class. Find the mode of the values: 7, 9, 10, 13, 11, 7, 9, 19, 12, 11, 9, 7, 9, 10, 11.


A : 7
B : 9
C : 10
D : 11

These numbers are taken from the number of people that attended a particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the mean.

A : 25
B : 210
C : 62
D : 30

Advertisement

Effectiveness of the browsing is highest. Recognize the type of schema.

A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema

The cuboid that holds the lowest level of summarization is called as

A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid

The tables are easy to maintain and saves storage space.

A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema

A database has 4 transactions.Of these, 4 transactions include milk and bread. Further , of the given 4 transactions, 3 transactions include cheese. Find the support percentage for the following association rule, ” If milk and bread purchased then cheese is also purchased”.

A : 0.6
B : 0.75
C : 0.8
D : 0.7

What is the range of the angle between two term frequency vectors?

A : Zero to Thirty
B : Zero to Ninety
C : Zero to One Eighty
D : Zero to Fourty Five

What does a Pearson’s product-moment allow you to identify?

A : Whether there is a relationship between variables
B : Whether there is a significant effect and interaction of independent variables
C : Whether there is a significant difference between variables
D : Whether there is a significant effect and interaction of dependent variables

Consider three itemsets V1={tomato, potato,onion}, V2={tomato,potato}, V3={tomato}. Which of the following statement is correct?

A : support(V1) is greater than support (V2)
B : support(V3) is greater than support (V2)
C : support(V1) is greater than support(V3)
D : support(V2) is greater than support(V3)

What is the another name of Supremum distance?

A : Wighted Euclidean distance
B : City Block distance
C : Chebyshev distance
D : Euclidean distance

Advertisement

This technique uses mean and standard deviation scores to transform real-valued attributes.

A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization

When do we use Manhattan distance in data mining?

A : Dimension of the data decreases
B : Dimension of the data increases
C : Underfitting
D : Moderate size of the dimensions

Correlation analysis is used for

A : handling missing values
B : identifying redundant attributes
C : handling different data formats
D : eliminating noise

If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4, True Negatives (TN): 18. Calculate Precision and Recall.

A : Precision = 0.88, Recall=0.64
B : Precision = 0.44, Recall=0.78
C : Precision = 0.88, Recall=0.22
D : Precision = 0.77, Recall=0.55

Advertisement

A sub-database which consists of set of prefix paths in the FP-tree co-occuring with the sufix pattern is called as

A : Suffix path
B : FP-tree
C : Prefix path
D : Condition pattern base

Cost complexity pruning algorithm is used in?

A : CART
B : C4.5
C : ID3
D : ALL

Which is the most well known association rule algorithm and is used in most commercial products.

A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm

Which operation is required to calculate Hamming distacne between two objects?

A : AND
B : OR
C : NOT
D : XOR

Advertisement

data mining mcq questions, data mining and warehousing mcq, data mining and warehousing mcq sppu, data mining and warehousing sppu mcq, data mining mcq pdf, dmw mcq, dmw mcq pdf, data warehousing mcq, data mining and warehousing mcq pdf

Leave a Comment

Your email address will not be published. Required fields are marked *

error: Content is protected !!
Scroll to Top