jurafsky&martin_3rdEd_17 (1).pdf

Weve shown a schematic of these backpointers in fig

Info icon This preview shows pages 30–32. Sign up to view the full content.

View Full Document Right Arrow Icon
We’ve shown a schematic of these backpointers in Fig. 2.17 , after a similar diagram in Gusfield (1997) . Some cells have multiple backpointers because the minimum extension could have come from multiple previous cells. In the second step, we perform a backtrace . In a backtrace, we start from the last cell (at the final row and backtrace column), and follow the pointers back through the dynamic programming matrix. Each complete path between the final cell and the initial cell is a minimum distance alignment. Exercise 2. 7 asks you to modify the minimum edit distance algorithm to store the pointers and compute the backtrace to output an alignment. # e x e c u t i o n # 0 1 2 3 4 5 6 7 8 9 i 1 - " 2 - " 3 - " 4 - " 5 - " 6 - " 7 - 6 7 8 n 2 - " 3 - " 4 - " 5 - " 6 - " 7 - " 8 " 7 - " 8 - 7 t 3 - " 4 - " 5 - " 6 - " 7 - " 8 - 7 " 8 - " 9 " 8 e 4 - 3 4 - 5 6 7 " 8 - " 9 - " 10 " 9 n 5 " 4 - " 5 - " 6 - " 7 - " 8 - " 9 - " 10 - " 11 -" 10 t 6 " 5 - " 6 - " 7 - " 8 - " 9 - 8 9 10 " 11 i 7 " 6 - " 7 - " 8 - " 9 - " 10 " 9 - 8 9 10 o 8 " 7 - " 8 - " 9 - " 10 - " 11 " 10 " 9 - 8 9 n 9 " 8 - " 9 - " 10 - " 11 - " 12 " 11 " 10 " 9 - 8 Figure 2.17 When entering a value in each cell, we mark which of the three neighboring cells we came from with up to three arrows. After the table is full we compute an alignment (minimum edit path) by using a backtrace , starting at the 8 in the lower-right corner and following the arrows back. The sequence of bold cells represents one possible minimum cost alignment between the two strings. While we worked our example with simple Levenshtein distance, the algorithm in Fig. 2.15 allows arbitrary weights on the operations. For spelling correction, for example, substitutions are more likely to happen between letters that are next to each other on the keyboard. We’ll discuss how these weights can be estimated in
Image of page 30

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2.5 S UMMARY 31 Ch. 5. The Viterbi algorithm, for example, is an extension of minimum edit distance that uses probabilistic definitions of the operations. Instead of computing the “mini- mum edit distance” between two strings, Viterbi computes the “maximum probabil- ity alignment” of one string with another. We’ll discuss this more in Chapter 9. 2.5 Summary This chapter introduced a fundamental tool in language processing, the regular ex- pression , and showed how to perform basic text normalization tasks including word segmentation and normalization , sentence segmentation , and stemming . We also introduce the important minimum edit distance algorithm for comparing strings. Here’s a summary of the main points we covered about these ideas: The regular expression language is a powerful tool for pattern-matching. Basic operations in regular expressions include concatenation of symbols, disjunction of symbols ( [] , | , and . ), counters ( * , + , and {n,m} ), anchors ( ˆ , $ ) and precedence operators ( ( , ) ). Word tokenization and normalization are generally done by cascades of simple regular expressions substitutions or finite automata. The Porter algorithm is a simple and efficient way to do stemming , stripping off affixes. It does not have high accuracy but may be useful for some tasks.
Image of page 31
Image of page 32
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern