{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# hal - Learning Structured Prediction by Demonstration Hal...

This preview shows pages 1–12. Sign up to view the full content.

SP/LBD @ CS 297 1 Hal Daumé III ([email protected]) Learning Structured Prediction by Demonstration Hal Daumé III Computer Science University of Maryland [email protected] CMSC 297 – That Honors Class Wednesday 28 September 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
SP/LBD @ CS 297 2 Hal Daumé III ([email protected]) Examples of structured problems
SP/LBD @ CS 297 3 Hal Daumé III ([email protected]) Examples of demonstrations

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
SP/LBD @ CS 297 4 Hal Daumé III ([email protected]) Examples of demonstrations
SP/LBD @ CS 297 5 Hal Daumé III ([email protected]) What does it mean to learn? Informally: to predict the future based on the past Slightly-less-informally: to take labeled examples and construct a function that will label them as a human would Formally: Given: A fixed unknown distribution D over X*Y A loss function over Y*Y A finite sample of (x,y) pairs drawn i.i.d. from D Construct a function f that has low expected loss with respect to D

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
SP/LBD @ CS 297 6 Hal Daumé III ([email protected]) Feature extractors A feature extractor Φ maps examples to vectors Feature vectors in NLP are frequently sparse Dear Sir. First, I must solicit your confidence in this transaction, this is by virture of its nature as being utterly confidencial and top secret. … W=dear : 1 W=sir : 1 W=this : 2 ... W=wish : 0 ... MISSPELLED : 2 NAMELESS : 1 ALL_CAPS : 0 NUM_URLS : 0 ... Φ
SP/LBD @ CS 297 7 Hal Daumé III ([email protected]) The perceptron Inputs = feature values Params = weights Sum is the response If the response is: Positive, output +1 Negative, output -1 When training, update on errors: Σ Φ 1 Φ 2 Φ 3 w 1 w 2 w 3 >0? 7 “Error” when: w i = w i y i sign [ i w i i ]≠ y

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
SP/LBD @ CS 297 8 Hal Daumé III ([email protected]) Linear models for binary classification f 1 f 2 Decision boundary is the set of “uncertain”points Linear decision boundaries are characterized by weight vectors BIAS : -3 free : 4 money : 2 the : 0 ... BIAS : 1 free : 1 money : 1 the : 0 ... “free money” x Φ (x) w i w i Φ i (x)
SP/LBD @ CS 297 9 Hal Daumé III ([email protected]) Structured Prediction 101 Learn a function mapping inputs to complex outputs: f : X Y I can can a can Pro Md Vb Dt Nn Pro Md Vb Dt Vb Pro Md Vb Dt Md Sequence Labeling Pro Md Nn Dt Nn Pro Md Nn Dt Vb Pro Md Nn Dt Md Pro Md Md Dt Nn Pro Md Md Dt Vb Parsing Barack Obama Joe Biden Biden he the President Input Space Decoding Output Space Mary did not slap the green witch . Mary no dio una bofetada a la bruja verda . Coreference Resolution Machine Translation

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
SP/LBD @ CS 297 10 Hal Daumé III ([email protected]) Why is structure important? Correlations among outputs Determiners often precede nouns Sentences usually have verbs My objective (aka “loss function”) forces it Translations should have good sequences of words Summaries should be coherent
SP/LBD @ CS 297 11 Hal Daumé III ([email protected]) Argmax is hard!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}