final03solution - ANDREW ID (CAPITALS): NAME (CAPITALS):...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
ANDREW ID (CAPITALS): NAME (CAPITALS): 10-701/15-781 Final, Fall 2003 You have 3 hours. There are 10 questions. If you get stuck on one question, move on to others and come back to the diFcult question later. The maximum possible total score is 100. Unless otherwise stated there is no need to show your working. Good luck! 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
1 Short Questions (16 points) (a) Traditionally, when we have a real-valued input attribute during decision-tree learning we consider a binary split according to whether the attribute is above or below some threshold. Pat suggests that instead we should just have a multiway split with one branch for each of the distinct values of the attribute. From the list below choose the single biggest problem with Pat’s suggestion: (i) It is too computationally expensive. (ii) It would probably result in a decision tree that scores badly on the training set and a testset. (iii) It would probably result in a decision tree that scores well on the training set but badly on a testset. (iv) It would probably result in a decision tree that scores well on a testset but badly on a training set. (b) You have a dataset with three categorical input attributes A, B and C. There is one categorical output attribute Y. You are trying to learn a Naive Bayes Classi±er for predicting Y. Which of these Bayes Net diagrams represents the naive bayes classi±er assumption? A C B A C B Y Y A C B Y A C B Y (i) (ii) (iv) (iii) (c) For a neural network, which one of these structural assumptions is the one that most a²ects the trade-o² between under±tting (i.e. a high bias model) and over±tting (i.e. a high variance model): (i) The number of hidden nodes (ii) The learning rate (iii) The initial choice of weights (iv) The use of a constant-term unit input 2
Background image of page 2
(d) For polynomial regression, which one of these structural assumptions is the one that most a±ects the trade-o± between under²tting and over²tting: (i) The polynomial degree (ii) Whether we learn the weights by matrix inversion or gradient descent (iii) The assumed variance of the Gaussian noise (iv) The use of a constant-term unit input (e) For a Gaussian Bayes classi²er, which one of these structural assumptions is the one that most a±ects the trade-o± between under²tting and over²tting: (i) Whether we learn the class centers by Maximum Likelihood or Gradient Descent (ii) Whether we assume full class covariance matrices or diagonal class covariance matrices (iii) Whether we have equal class priors or priors estimated from the data. (iv) Whether we allow classes to have di±erent mean vectors or we force them to share the same mean vector (f) For Kernel Regression, which one of these structural assumptions is the one that most a±ects the trade-o± between under²tting and over²tting: (i) Whether kernel function is Gaussian versus triangular versus box-shaped (ii) Whether we use Euclidian versus L 1 versus L metrics (iii) The kernel width (iv) The maximum height of the kernel function (g) (True or False ) Given two classi²ers A and B, if A has a lower VC-dimension than B then A almost certainly will perform better on a testset.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/26/2010 for the course MACHINE LE 10701 taught by Professor Ericp.xing during the Fall '08 term at Carnegie Mellon.

Page1 / 18

final03solution - ANDREW ID (CAPITALS): NAME (CAPITALS):...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online