ML FI.pdf - Unit 1 1 What is Machine learning a The autonomous acquisition of knowledge through the use of computer programs b The autonomous

ML FI.pdf - Unit 1 1 What is Machine learning a The...

This preview shows page 1 out of 366 pages.

You've reached the end of your free preview.

Want to read all 366 pages?

Unformatted text preview: Unit 1 1. What is Machine learning? a) The autonomous acquisition of knowledge through the use of computer programs b) The autonomous acquisition of knowledge through the use of manual programs c) The selective acquisition of knowledge through the use of computer programs d) The selective acquisition of knowledge through the use of manual programs Answer: a Explanation: Machine learning is the autonomous acquisition of knowledge through the use of computer programs. 2. Which of the factors affect the performance of learner system does not include? a) Representation scheme used b) Training scenario c) Type of feedback d) Good data structures Answer: d Explanation: Factors that affect the performance of learner system does not include good data structures. 3. Different learning methods does not include? a) Memorization b) Analogy c) Deduction d) Introduction Answer: d Explanation: Different learning methods does not include the introduction. 4. In language understanding, the levels of knowledge that does not include? a) Phonological b) Syntactic c) Empirical d) Logical Answer: c Explanation: In language understanding, the levels of knowledge that does not include empirical knowledge. ??5. A model of language consists of the categories which does not include? a) Language units b) Role structure of units c) System constraints d) Structural units Answer: d Explanation: A model of language consists of the categories which does not include structural units. 6. What is a top-down parser? a) Begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written b) Begins by hypothesizing a sentence (the symbol S) and successively predicting upper level constituents until individual preterminal symbols are written c) Begins by hypothesizing lower level constituents and successively predicting a sentence (the symbol S) d) Begins by hypothesizing upper level constituents and successively predicting a sentence (the symbol S) Answer: a Explanation: A top-down parser begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written. 7. Among the following which is not a horn clause? a) p b) Øp V q c) p → q d) p → Øq Answer: d Explanation: p → Øq is not a horn clause. 8. The action ‘STACK(A, B)’ of a robot arm specify to _______________ a) Place block B on Block A b) Place blocks A, B on the table in that order c) Place blocks B, A on the table in that order d) Place block A on block B Answer: d 1. Which of the following term is appropriate to the below figure? a) Large Data b) Big Data c) Dark Data d) None of the mentioned Answer: b Explanation: Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. 2. Point out the correct statement. a) Machine learning focuses on prediction, based on known properties learned from the training data b) Data Cleaning focuses on prediction, based on known properties learned from the training data c) Representing data in a form which both mere mortals can understand and get valuable insights is as much a science as much as it is art d) None of the mentioned Answer: d Explanation: Visualization is becoming a very important aspect. 3. Which of the following characteristic of big data is relatively more concerned to data science? a) Velocity b) Variety c) Volume d) None of the mentioned Answer: b Explanation: Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time. 4. Which of the following analytical capabilities are provided by information management company? a) Stream Computing b) Content Management c) Information Integration d) All of the mentioned Answer: d Explanation: With stream computing, store less, analyze more and make better decisions faster. 5. Point out the wrong statement. a) The big volume indeed represents Big Data b) The data growth and social media explosion have changed how we look at the data c) Big Data is just about lots of data d) All of the mentioned Answer: c Explanation: Big Data is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data. 6. Which of the following step is performed by data scientist after acquiring the data? a) Data Cleansing b) Data Integration c) Data Replication d) All of the mentioned Answer: a Explanation: Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. 7. 3V’s are not sufficient to describe big data. a) True b) False Answer: a Explanation: IBM data scientists break big data into four dimensions: volume, variety, velocity and veracity. 8. Which of the following focuses on the discovery of (previously) unknown properties on the data? a) Data mining b) Big Data c) Data wrangling d) Machine Learning Answer: a Explanation: Data munging or data wrangling is loosely the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools. 9. Which of the following language should be replaced with the question mark in the below figure? a) Java b) PHP c) COBOL d) None of the mentioned Answer: a Explanation: Java is used for processing data in Big data Analytics. 10. Beyond Volume, variety and velocity are the issues of big data veracity. a) True b) False Answer: a Explanation: Data Veracity is uncertain or imprecise data. 1. Which of the following would be more appropriate to be replaced with question mark in the following figure? a) Data Analysis b) Data Science c) Descriptive Analytics d) None of the mentioned Answer: b Explanation: Data Science is a multidisciplinary which involves extraction of knowledge from large volumes of data that are structured or unstructured. 2. Point out the correct statement. a) Raw data is original source of data b) Preprocessed data is original source of data c) Raw data is the data obtained after processing steps d) None of the mentioned Answer: a Explanation: Accounting programs are prototypical examples of data processing applications. 3. Which of the following is performed by Data Scientist? a) Define the question b) Create reproducible code c) Challenge results d) All of the mentioned Answer: d Explanation: A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data. 4. Which of the following is the most important language for Data Science? a) Java b) Ruby c) R d) None of the mentioned Answer: c Explanation: R is free software for statistical computing and analysis. 5. Point out the wrong statement. a) Merging concerns combining datasets on the same observations to produce a result with more variables b) Data visualization is the organization of information according to preset specifications c) Subsetting can be used to select and exclude variables and observations d) All of the mentioned Answer: b Explanation: Data formatting is the organization of information according to preset specifications. 6. Which of the following approach should be used to ask Data Analysis question? a) Find only one solution for particular problem b) Find out the question which is to be answered c) Find out answer from dataset without asking question d) None of the mentioned Answer: b Explanation: Data analysis has multiple facets and approaches. 7. Which of the following is one of the key data science skills? a) Statistics b) Machine Learning c) Data Visualization d) All of the mentioned Answer: d Explanation: Data visualization is the presentation of data in a pictorial or graphical format. 8. Which of the following is a key characteristic of a hacker? a) Afraid to say they don’t know the answer b) Willing to find answers on their own c) Not Willing to find answers on their own d) All of the mentioned Answer: b Explanation: Hacker is an expert at programming and solving problems with a computer. 9. Which of the following is characteristic of Processed Data? a) Data is not ready for analysis b) All steps should be noted c) Hard to use for data analysis d) None of the mentioned Answer: b Explanation: Processing includes merging, summarizing and subsetting data. 10. Raw data should be processed only one time. a) True b) False View Answer Answer: b Explanation: Raw data may only need to be processed once. Q. Which of the following can be used to impute data sets based only on information in the training set. ? A. postProcess B. preProcess C. process D. All of the Mentioned Answer & Explanation Answer: Option B Explanation: This can be done with K-nearest neighbors. Q.Which of the following is characteristic of best machine learning method ? A. Fast B. Accuracy C. Scalable D. All of the Mentioned Answer & Explanation Answer: Option D Explanation: There is always a trade-off in prediction accuracy. Q. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop? A. Big data management and data mining B. Data warehousing and business intelligence C. Management of Hadoop clusters D. Collecting and storing unstructured data Answer & Explanation Answer: Option A Q. All of the following accurately describe Hadoop, EXCEPT: A. Open source B. Real-time C. Java-based D. Distributed computing approach Answer & Explanation Answer: Option B UNIT 2 1) Which of the following statement is true in following case? A) Feature F1 is an example of nominal variable. B) Feature F1 is an example of ordinal variable. C) It doesn’t belong to any of the above category. D) Both of these Solution: (B) Ordinal variables are the variables which has some order in their categories. For example, grade A should be consider as high grade than grade B. 2) Which of the following is an example of a deterministic algorithm? A) PCA B) K-Means C) None of the above Solution: (A) A deterministic algorithm is that in which output does not change on different runs. PCA would give the same result if we run again, but not k-means. 3) [True or False] A Pearson correlation between two variables is zero but, still their values can still be related to each other. A) TRUE B) FALSE Solution: (A) Y=X2. Note that, they are not only associated, but one is a function of the other and Pearson correlation between them is 0. 4) Which of the following statement(s) is / are true for Gradient Decent (GD) and Stochastic Gradient Decent (SGD)? 1. In GD and SGD, you update a set of parameters in an iterative manner to minimize the error function. 2. In SGD, you have to run through all the samples in your training set for a single update of a parameter in each iteration. 3. In GD, you either use the entire data or a subset of training data to update a parameter in each iteration. A) Only 1 B) Only 2 C) Only 3 D) 1 and 2 E) 2 and 3 F) 1,2 and 3 Solution: (A) In SGD for each iteration you choose the batch which is generally contain the random sample of data But in case of GD each iteration contain the all of the training observations. 5) Which of the following hyper parameter(s), when increased may cause random forest to over fit the data? 1. Number of Trees 2. Depth of Tree 3. Learning Rate A) Only 1 B) Only 2 C) Only 3 D) 1 and 2 E) 2 and 3 F) 1,2 and 3 Solution: (B) Usually, if we increase the depth of tree it will cause overfitting. Learning rate is not an hyperparameter in random forest. Increase in the number of tree will cause under fitting. 6) Imagine, you are working with “Analytics Vidhya” and you want to develop a machine learning algorithm which predicts the number of views on the articles. Your analysis is based on features like author name, number of articles written by the same author on Analytics Vidhya in past and a few other features. Which of the following evaluation metric would you choose in that case? 1. Mean Square Error 2. Accuracy 3. F1 Score A) Only 1 B) Only 2 C) Only 3 D) 1 and 3 E) 2 and 3 F) 1 and 2 Solution:(A) You can think that the number of views of articles is the continuous target variable which fall under the regression problem. So, mean squared error will be used as an evaluation metrics. 7) Given below are three images (1,2,3). Which of the following option is correct for these images? A) B) C) A) 1 is tanh, 2 is ReLU and 3 is SIGMOID activation functions. B) 1 is SIGMOID, 2 is ReLU and 3 is tanh activation functions. C) 1 is ReLU, 2 is tanh and 3 is SIGMOID activation functions. D) 1 is tanh, 2 is SIGMOID and 3 is ReLU activation functions. Solution: (D) The range of SIGMOID function is [0,1]. The range of the tanh function is [-1,1]. The range of the RELU function is [0, infinity]. So Option D is the right answer. 8) Below are the 8 actual values of target variable in the train file. [0,0,0,1,1,1,1,1] What is the entropy of the target variable? A) -(5/8 log(5/8) + 3/8 log(3/8)) B) 5/8 log(5/8) + 3/8 log(3/8) C) 3/8 log(5/8) + 5/8 log(3/8) D) 5/8 log(3/8) – 3/8 log(5/8) Solution: (A) The formula for entropy is So the answer is A. 9) Let’s say, you are working with categorical feature(s) and you have not looked at the distribution of the categorical variable in the test data. You want to apply one hot encoding (OHE) on the categorical feature(s). What challenges you may face if you have applied OHE on a categorical variable of train dataset? A) All categories of categorical variable are not present in the test dataset. B) Frequency distribution of categories is different in train as compared to the test dataset. C) Train and Test always have same distribution. D) Both A and B E) None of these Solution: (D) Both are true, The OHE will fail to encode the categories which is present in test but not in train so it could be one of the main challenges while applying OHE. The challenge given in option B is also true you need to more careful while applying OHE if frequency distribution doesn’t same in train and test. 10) Skip gram model is one of the best models used in Word2vec algorithm for words embedding. Which one of the following models depict the skip gram model? A) A B) B C) Both A and B D) None of these Solution: (B) Both models (model1 and model2) are used in Word2vec algorithm. The model1 represent a CBOW model where as Model2 represent the Skip gram model. 11) Let’s say, you are using activation function X in hidden layers of neural network. At a particular neuron for any given input, you get the output as “-0.0001”. Which of the following activation function could X represent? A) ReLU B) tanh C) SIGMOID D) None of these Solution: (B) The function is a tanh because the this function output range is between (-1,-1). 12) [True or False] LogLoss evaluation metric can have negative values. A) TRUE B) FALSE Solution: (B) Log loss cannot have negative values. 13) Which of the following statements is/are true about “Type-1” and “Type-2” errors? 1. Type1 is known as false positive and Type2 is known as false negative. 2. Type1 is known as false negative and Type2 is known as false positive. 3. Type1 error occurs when we reject a null hypothesis when it is actually true. A) Only 1 B) Only 2 C) Only 3 D) 1 and 2 E) 1 and 3 F) 2 and 3 Solution: (E) In statistical hypothesis testing, a type I error is the incorrect rejection of a true null hypothesis (a “false positive”), while a type II error is incorrectly retaining a false null hypothesis (a “false negative”). 14) Which of the following is/are one of the important step(s) to pre-process the text in NLP based projects? 1. Stemming 2. Stop word removal 3. Object Standardization A) 1 and 2 B) 1 and 3 C) 2 and 3 D) 1,2 and 3 Solution: (D) Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word. Stop words are those words which will have not relevant to the context of the data for example is/am/are. Object Standardization is also one of the good way to pre-process the text. 15) Suppose you want to project high dimensional data into lower dimensions. The two most famous dimensionality reduction algorithms used here are PCA and t-SNE. Let’s say you have applied both algorithms respectively on data “X” and you got the datasets “X_projected_PCA” , “X_projected_tSNE”. Which of the following statements is true for “X_projected_PCA” & “X_projected_tSNE” ? A) X_projected_PCA will have interpretation in the nearest neighbour space. B) X_projected_tSNE will have interpretation in the nearest neighbour space. C) Both will have interpretation in the nearest neighbour space. D) None of them will have interpretation in the nearest neighbour space. Solution: (B) t-SNE algorithm consider nearest neighbour points to reduce the dimensionality of the data. So, after using t-SNE we can think that reduced dimensions will also have interpretation in nearest neighbour space. But in case of PCA it is not the case. Context: 16-17 Given below are three scatter plots for two features (Image 1, 2 & 3 from left to right). 16) In the above images, which of the following is/are example of multi-collinear features? A) Features in Image 1 B) Features in Image 2 C) Features in Image 3 D) Features in Image 1 & 2 E) Features in Image 2 & 3 F) Features in Image 3 & 1 Solution: (D) In Image 1, features have high positive correlation where as in Image 2 has high negative correlation between the features so in both images pair of features are the example of multicollinear features. 17) In previous question, suppose you have identified multi-collinear features. Which of the following action(s) would you perform next? 1. Remove both collinear variables. 2. Instead of removing both variables, we can remove only one variable. 3. Removing correlated variables might lead to loss of information. In order to retain those variables, we can use penalized regression models like ridge or lasso regression. A) Only 1 B)Only 2 C) Only 3 D) Either 1 or 3 E) Either 2 or 3 Solution: (E) You cannot remove the both features because after removing the both features you will lose all of the information so you should either remove the only 1 feature or you can use the regularization algorithm like L1 and L2. 18) Adding a non-important feature to a linear regression model may result in. 1. Increase in R-square 2. Decrease in R-square A) Only 1 is correct B) Only 2 is correct C) Either 1 or 2 D) None of these Solution: (A) After adding a feature in feature space, whether that feature is important or unimportant features the R-squared always increase. 19) Suppose, you are given three variables X, Y and Z. The Pearson correlation coefficients for (X, Y), (Y, Z) and (X, Z) are C1, C2 & C3 respectively. Now, you have added 2 in all values of X (i.enew values become X+2), subtracted 2 from all values of Y (i.e. new values are Y-2) and Z remains the same. The new coefficients for (X,Y), (Y,Z) and (X,Z) are given by D1, D2 & D3 respectively. How do the values of D1, D2 & D3 relate to C1, C2 & C3? A) D1= C1, D2 < C2, D3 > C3 B) D1 = C1, D2 > C2, D3 > C3 C) D1 = C1, D2 > C2, D3 < C3 D) D1 = C1, D2 < C2, D3 < C3 E) D1 = C1, D2 = C2, D3 = C3 F) Cannot be determined Solution: (E) Correlation between the features won’t change if you add or subtract a value in the features. 20) Imagine, you are solving a classification problems with highly imbalanced class. The majority class is observed 99% of times in the training data. Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in such a case? 1. 2. 3. 4. Accuracy metric is not a good idea for imbalanced class problems. Accuracy metric is a good idea for imbalanced class problems. Precision and recall metrics are good for imbalanced class problems. Precision and r...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture