You've reached the end of your free preview.
Want to read all 366 pages?
Unformatted text preview: Unit 1
1. What is Machine learning?
a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs
Answer: a
Explanation: Machine learning is the autonomous acquisition of knowledge through
the use of computer programs.
2. Which of the factors affect the performance of learner system does not include?
a) Representation scheme used
b) Training scenario
c) Type of feedback
d) Good data structures
Answer: d
Explanation: Factors that affect the performance of learner system does not include
good data structures.
3. Different learning methods does not include?
a) Memorization
b) Analogy
c) Deduction
d) Introduction
Answer: d
Explanation: Different learning methods does not include the introduction.
4. In language understanding, the levels of knowledge that does not include?
a) Phonological
b) Syntactic
c) Empirical
d) Logical
Answer: c
Explanation: In language understanding, the levels of knowledge that does not include
empirical knowledge.
??5. A model of language consists of the categories which does not include?
a) Language units
b) Role structure of units
c) System constraints
d) Structural units
Answer: d
Explanation: A model of language consists of the categories which does not include
structural units. 6. What is a top-down parser?
a) Begins by hypothesizing a sentence (the symbol S) and successively predicting
lower level constituents until individual preterminal symbols are written
b) Begins by hypothesizing a sentence (the symbol S) and successively predicting
upper level constituents until individual preterminal symbols are written
c) Begins by hypothesizing lower level constituents and successively predicting a
sentence (the symbol S)
d) Begins by hypothesizing upper level constituents and successively predicting a
sentence (the symbol S)
Answer: a
Explanation: A top-down parser begins by hypothesizing a sentence (the symbol S)
and successively predicting lower level constituents until individual preterminal
symbols are written.
7. Among the following which is not a horn clause?
a) p
b) Øp V q
c) p → q
d) p → Øq
Answer: d
Explanation: p → Øq is not a horn clause.
8. The action ‘STACK(A, B)’ of a robot arm specify to _______________
a) Place block B on Block A
b) Place blocks A, B on the table in that order
c) Place blocks B, A on the table in that order
d) Place block A on block B
Answer: d
1. Which of the following term is appropriate to the below figure? a) Large Data
b) Big Data
c) Dark Data
d) None of the mentioned
Answer: b
Explanation: Big data is a broad term for data sets so large or complex that traditional
data processing applications are inadequate.
2. Point out the correct statement.
a) Machine learning focuses on prediction, based on known properties learned from
the training data
b) Data Cleaning focuses on prediction, based on known properties learned from the
training data
c) Representing data in a form which both mere mortals can understand and get
valuable insights is as much a science as much as it is art
d) None of the mentioned Answer: d
Explanation: Visualization is becoming a very important aspect.
3. Which of the following characteristic of big data is relatively more concerned to
data science?
a) Velocity
b) Variety
c) Volume
d) None of the mentioned
Answer: b
Explanation: Big data enables organizations to store, manage, and manipulate vast
amounts of disparate data at the right speed and at the right time.
4. Which of the following analytical capabilities are provided by information
management company?
a) Stream Computing
b) Content Management
c) Information Integration
d) All of the mentioned
Answer: d
Explanation: With stream computing, store less, analyze more and make better
decisions faster.
5. Point out the wrong statement.
a) The big volume indeed represents Big Data
b) The data growth and social media explosion have changed how we look at the data
c) Big Data is just about lots of data
d) All of the mentioned Answer: c
Explanation: Big Data is actually a concept providing an opportunity to find new
insight into your existing data as well guidelines to capture and analysis your future
data.
6. Which of the following step is performed by data scientist after acquiring the data?
a) Data Cleansing
b) Data Integration
c) Data Replication
d) All of the mentioned Answer: a
Explanation: Data cleansing, data cleaning or data scrubbing is the process of
detecting and correcting (or removing) corrupt or inaccurate records from a record
set, table, or database. 7. 3V’s are not sufficient to describe big data.
a) True
b) False
Answer: a
Explanation: IBM data scientists break big data into four dimensions: volume,
variety, velocity and veracity.
8. Which of the following focuses on the discovery of (previously) unknown
properties on the data?
a) Data mining
b) Big Data
c) Data wrangling
d) Machine Learning
Answer: a
Explanation: Data munging or data wrangling is loosely the process of manually
converting or mapping data from one “raw” form into another format that allows
for more convenient consumption of the data with the help of semi-automated
tools.
9. Which of the following language should be replaced with the question mark in the
below figure? a) Java
b) PHP
c) COBOL
d) None of the mentioned
Answer: a
Explanation: Java is used for processing data in Big data Analytics.
10. Beyond Volume, variety and velocity are the issues of big data veracity.
a) True
b) False
Answer: a
Explanation: Data Veracity is uncertain or imprecise data. 1. Which of the following would be more appropriate to be replaced with question
mark in the following figure? a) Data Analysis
b) Data Science
c) Descriptive Analytics
d) None of the mentioned
Answer: b
Explanation: Data Science is a multidisciplinary which involves extraction of
knowledge from large volumes of data that are structured or unstructured.
2. Point out the correct statement.
a) Raw data is original source of data
b) Preprocessed data is original source of data
c) Raw data is the data obtained after processing steps
d) None of the mentioned
Answer: a
Explanation: Accounting programs are prototypical examples of data processing
applications.
3. Which of the following is performed by Data Scientist?
a) Define the question
b) Create reproducible code
c) Challenge results
d) All of the mentioned
Answer: d
Explanation: A data scientist is a job title for an employee or business intelligence
(BI) consultant who excels at analyzing data, particularly large amounts of data.
4. Which of the following is the most important language for Data Science?
a) Java
b) Ruby
c) R
d) None of the mentioned
Answer: c
Explanation: R is free software for statistical computing and analysis. 5. Point out the wrong statement.
a) Merging concerns combining datasets on the same observations to produce a result
with more variables
b) Data visualization is the organization of information according to preset
specifications
c) Subsetting can be used to select and exclude variables and observations
d) All of the mentioned Answer: b
Explanation: Data formatting is the organization of information according to preset
specifications.
6. Which of the following approach should be used to ask Data Analysis question?
a) Find only one solution for particular problem
b) Find out the question which is to be answered
c) Find out answer from dataset without asking question
d) None of the mentioned
Answer: b
Explanation: Data analysis has multiple facets and approaches.
7. Which of the following is one of the key data science skills?
a) Statistics
b) Machine Learning
c) Data Visualization
d) All of the mentioned
Answer: d
Explanation: Data visualization is the presentation of data in a pictorial or graphical
format.
8. Which of the following is a key characteristic of a hacker?
a) Afraid to say they don’t know the answer
b) Willing to find answers on their own
c) Not Willing to find answers on their own
d) All of the mentioned
Answer: b
Explanation: Hacker is an expert at programming and solving problems with a
computer.
9. Which of the following is characteristic of Processed Data?
a) Data is not ready for analysis
b) All steps should be noted
c) Hard to use for data analysis
d) None of the mentioned Answer: b
Explanation: Processing includes merging, summarizing and subsetting data.
10. Raw data should be processed only one time.
a) True
b) False
View Answer
Answer: b
Explanation: Raw data may only need to be processed once. Q. Which of the following can be used to impute data sets based only on information
in the training set. ? A. postProcess B. preProcess C. process D. All of the Mentioned Answer & Explanation
Answer: Option B
Explanation:
This can be done with K-nearest neighbors. Q.Which of the following is characteristic of best machine learning method ? A. Fast B. Accuracy C. Scalable D. All of the Mentioned Answer & Explanation
Answer: Option D
Explanation:
There is always a trade-off in prediction accuracy.
Q. According to analysts, for what can traditional IT systems provide a foundation
when they’re integrated with big data technologies like Hadoop? A. Big data management and data mining B. Data warehousing and business intelligence C. Management of Hadoop clusters D. Collecting and storing unstructured data Answer & Explanation
Answer: Option A
Q. All of the following accurately describe Hadoop, EXCEPT: A. Open source B. Real-time C. Java-based D. Distributed computing approach Answer & Explanation
Answer: Option B UNIT 2
1) Which of the following statement is true in following case?
A) Feature F1 is an example of nominal variable.
B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)
Ordinal variables are the variables which has some order in their categories. For
example, grade A should be consider as high grade than grade B. 2) Which of the following is an example of a deterministic algorithm?
A) PCA
B) K-Means
C) None of the above
Solution: (A)
A deterministic algorithm is that in which output does not change on different runs.
PCA would give the same result if we run again, but not k-means. 3) [True or False] A Pearson correlation between two variables is zero but, still
their values can still be related to each other.
A) TRUE
B) FALSE
Solution: (A)
Y=X2. Note that, they are not only associated, but one is a function of the other and
Pearson correlation between them is 0. 4) Which of the following statement(s) is / are true for Gradient Decent (GD) and
Stochastic Gradient Decent (SGD)? 1. In GD and SGD, you update a set of parameters in an iterative manner to
minimize the error function.
2. In SGD, you have to run through all the samples in your training set for a
single update of a parameter in each iteration.
3. In GD, you either use the entire data or a subset of training data to
update a parameter in each iteration.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: (A)
In SGD for each iteration you choose the batch which is generally contain the random
sample of data But in case of GD each iteration contain the all of the training
observations. 5) Which of the following hyper parameter(s), when increased may cause
random forest to over fit the data?
1. Number of Trees
2. Depth of Tree
3. Learning Rate
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: (B) Usually, if we increase the depth of tree it will cause overfitting. Learning rate is not
an hyperparameter in random forest. Increase in the number of tree will cause under
fitting. 6) Imagine, you are working with “Analytics Vidhya” and you want to develop a
machine learning algorithm which predicts the number of views on the articles.
Your analysis is based on features like author name, number of articles written
by the same author on Analytics Vidhya in past and a few other features. Which
of the following evaluation metric would you choose in that case?
1. Mean Square Error
2. Accuracy
3. F1 Score
A) Only 1
B) Only 2
C) Only 3
D) 1 and 3
E) 2 and 3
F) 1 and 2
Solution:(A)
You can think that the number of views of articles is the continuous target variable
which fall under the regression problem. So, mean squared error will be used as an
evaluation metrics. 7) Given below are three images (1,2,3). Which of the following option is correct
for these images? A)
B) C)
A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation functions.
B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation functions.
C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation functions.
D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation functions.
Solution: (D)
The range of SIGMOID function is [0,1].
The range of the tanh function is [-1,1].
The range of the RELU function is [0, infinity].
So Option D is the right answer. 8) Below are the 8 actual values of target variable in the train file.
[0,0,0,1,1,1,1,1]
What is the entropy of the target variable?
A) -(5/8 log(5/8) + 3/8 log(3/8))
B) 5/8 log(5/8) + 3/8 log(3/8)
C) 3/8 log(5/8) + 5/8 log(3/8) D) 5/8 log(3/8) – 3/8 log(5/8)
Solution: (A) The formula for entropy is
So the answer is A. 9) Let’s say, you are working with categorical feature(s) and you have not looked
at the distribution of the categorical variable in the test data.
You want to apply one hot encoding (OHE) on the categorical feature(s). What
challenges you may face if you have applied OHE on a categorical variable of
train dataset?
A) All categories of categorical variable are not present in the test dataset.
B) Frequency distribution of categories is different in train as compared to the test
dataset.
C) Train and Test always have same distribution.
D) Both A and B
E) None of these
Solution: (D)
Both are true, The OHE will fail to encode the categories which is present in test but
not in train so it could be one of the main challenges while applying OHE. The
challenge given in option B is also true you need to more careful while applying OHE
if frequency distribution doesn’t same in train and test. 10) Skip gram model is one of the best models used in Word2vec algorithm for
words embedding. Which one of the following models depict the skip gram
model? A) A
B) B
C) Both A and B
D) None of these
Solution: (B)
Both models (model1 and model2) are used in Word2vec algorithm. The model1
represent a CBOW model where as Model2 represent the Skip gram model. 11) Let’s say, you are using activation function X in hidden layers of neural
network. At a particular neuron for any given input, you get the output as
“-0.0001”. Which of the following activation function could X represent?
A) ReLU
B) tanh
C) SIGMOID
D) None of these
Solution: (B) The function is a tanh because the this function output range is between (-1,-1). 12) [True or False] LogLoss evaluation metric can have negative values.
A) TRUE
B) FALSE
Solution: (B)
Log loss cannot have negative values. 13) Which of the following statements is/are true about “Type-1” and “Type-2”
errors?
1. Type1 is known as false positive and Type2 is known as false negative.
2. Type1 is known as false negative and Type2 is known as false positive.
3. Type1 error occurs when we reject a null hypothesis when it is actually
true.
A) Only 1
B) Only 2
C) Only 3
D) 1 and 2
E) 1 and 3
F) 2 and 3
Solution: (E)
In statistical hypothesis testing, a type I error is the incorrect rejection of a true null
hypothesis (a “false positive”), while a type II error is incorrectly retaining a false null
hypothesis (a “false negative”). 14) Which of the following is/are one of the important step(s) to pre-process the
text in NLP based projects?
1. Stemming
2. Stop word removal 3. Object Standardization
A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”,
“es”, “s” etc) from a word.
Stop words are those words which will have not relevant to the context of the data for
example is/am/are.
Object Standardization is also one of the good way to pre-process the text. 15) Suppose you want to project high dimensional data into lower dimensions.
The two most famous dimensionality reduction algorithms used here are PCA
and t-SNE. Let’s say you have applied both algorithms respectively on data “X”
and you got the datasets “X_projected_PCA” , “X_projected_tSNE”.
Which of the following statements is true for “X_projected_PCA” &
“X_projected_tSNE” ?
A) X_projected_PCA will have interpretation in the nearest neighbour space.
B) X_projected_tSNE will have interpretation in the nearest neighbour space.
C) Both will have interpretation in the nearest neighbour space.
D) None of them will have interpretation in the nearest neighbour space.
Solution: (B)
t-SNE algorithm consider nearest neighbour points to reduce the dimensionality of the
data. So, after using t-SNE we can think that reduced dimensions will also have
interpretation in nearest neighbour space. But in case of PCA it is not the case.
Context: 16-17
Given below are three scatter plots for two features (Image 1, 2 & 3 from left to
right). 16) In the above images, which of the following is/are example of multi-collinear
features?
A) Features in Image 1
B) Features in Image 2
C) Features in Image 3
D) Features in Image 1 & 2
E) Features in Image 2 & 3
F) Features in Image 3 & 1
Solution: (D)
In Image 1, features have high positive correlation where as in Image 2 has high
negative correlation between the features so in both images pair of features are the
example of multicollinear features. 17) In previous question, suppose you have identified multi-collinear features.
Which of the following action(s) would you perform next?
1. Remove both collinear variables.
2. Instead of removing both variables, we can remove only one variable.
3. Removing correlated variables might lead to loss of information. In order
to retain those variables, we can use penalized regression models like
ridge or lasso regression.
A) Only 1
B)Only 2 C) Only 3
D) Either 1 or 3
E) Either 2 or 3
Solution: (E)
You cannot remove the both features because after removing the both features you
will lose all of the information so you should either remove the only 1 feature or you
can use the regularization algorithm like L1 and L2. 18) Adding a non-important feature to a linear regression model may result in.
1. Increase in R-square
2. Decrease in R-square
A) Only 1 is correct
B) Only 2 is correct
C) Either 1 or 2
D) None of these
Solution: (A)
After adding a feature in feature space, whether that feature is important or
unimportant features the R-squared always increase. 19) Suppose, you are given three variables X, Y and Z. The Pearson correlation
coefficients for (X, Y), (Y, Z) and (X, Z) are C1, C2 & C3 respectively.
Now, you have added 2 in all values of X (i.enew values become X+2), subtracted
2 from all values of Y (i.e. new values are Y-2) and Z remains the same. The new
coefficients for (X,Y), (Y,Z) and (X,Z) are given by D1, D2 & D3 respectively.
How do the values of D1, D2 & D3 relate to C1, C2 & C3?
A) D1= C1, D2 < C2, D3 > C3
B) D1 = C1, D2 > C2, D3 > C3
C) D1 = C1, D2 > C2, D3 < C3 D) D1 = C1, D2 < C2, D3 < C3
E) D1 = C1, D2 = C2, D3 = C3
F) Cannot be determined
Solution: (E)
Correlation between the features won’t change if you add or subtract a value in the
features. 20) Imagine, you are solving a classification problems with highly imbalanced
class. The majority class is observed 99% of times in the training data.
Your model has 99% accuracy after taking the predictions on test data. Which
of the following is true in such a case?
1.
2.
3.
4. Accuracy metric is not a good idea for imbalanced class problems.
Accuracy metric is a good idea for imbalanced class problems.
Precision and recall metrics are good for imbalanced class problems.
Precision and r...
View
Full Document
- Spring '16
- Mrs. Smita patil
- Linear Regression, Regression Analysis, Data Mining, Wind, Type I and type II errors