jurafsky&martin_3rdEd_17 (1).pdf

As with named entity recognition the most important

Info icon This preview shows pages 358–360. Sign up to view the full content.

As with named entity recognition, the most important step in this process is to identify useful surface features that will be useful for relation classification. Let’s look at some common features in the context of classifying the relationship between American Airlines (Mention 1, or M1) and Tim Wagner (Mention 2, M2) from this sentence: (21.5) American Airlines , a unit of AMR, immediately matched the move, spokesman Tim Wagner said Useful word features include The headwords of M1 and M2 and their concatenation Airlines Wagner Airlines-Wagner Bag-of-words and bigrams in M1 and M2 American, Airlines, Tim, Wagner, American Airlines, Tim Wagner Words or bigrams in particular positions M2: -1 spokesman M2: +1 said Bag of words or bigrams between M1 and M2: a, AMR, of, immediately, matched, move, spokesman, the, unit Stemmed versions of the same Useful named entity features include Named-entity types and their concatenation (M1: ORG , M2: PER , M1M2: ORG-PER ) Entity Level of M1 and M2 (from the set NAME, NOMINAL, PRONOUN) M1: NAME [ it or he would be PRONOUN ] M2: NAME [ the company would be NOMINAL] Number of entities between the arguments (in this case 1 , for AMR)
Image of page 358

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

21.2 R ELATION E XTRACTION 359 Finally, the syntactic structure of a sentence can signal many of the relation- ships among its entities. One simple and effective way to featurize a structure is to use strings representing syntactic paths : the path traversed through the tree in get- ting from one to the other. Constituency or dependency paths can both be helpful. Base syntactic chunk sequence from M1 to M2 NP NP PP VP NP NP Constituent paths between M1 and M2 NP " NP " S " S # NP Dependency-tree paths Airlines sub j matched comp said ! sub j Wagner Figure 21.13 summarizes many of the features we have discussed that could be used for classifying the relationship between American Airlines and Tim Wagner from our example text. M1 headword airlines M2 headword Wagner Word(s) before M1 NONE Word(s) after M2 said Bag of words between { a, unit, of, AMR, Inc., immediately, matched, the, move, spokesman } M1 type ORG M2 type PERS Concatenated types ORG - PERS Constituent path NP " NP " S " S # NP Base phrase path NP ! NP ! PP ! NP ! VP ! NP ! NP Typed-dependency path Airlines sub j matched comp said ! sub j Wagner Figure 21.13 Sample of features extracted during classification of the < American Airlines, Tim Wagner > tuple; M1 is the first mention, M2 the second. Supervised systems can get high accuracies with enough hand-labeled training data, if the test set is similar enough to the training set. But labeling a large training set is extremely expensive and supervised models are brittle: they don’t generalize well to different genres. 21.2.3 Semisupervised Relation Extraction via Bootstrapping Supervised machine learning assumes that we have a large collection of previously annotated material with which to train classifiers. Unfortunately, such collections are hard to come by.
Image of page 359
Image of page 360
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern