jurafsky&martin_3rdEd_17 (1).pdf

# Weve shown a schematic of these backpointers in fig

• 499
• 100% (1) 1 out of 1 people found this document helpful

This preview shows pages 30–32. Sign up to view the full content.

We’ve shown a schematic of these backpointers in Fig. 2.17 , after a similar diagram in Gusfield (1997) . Some cells have multiple backpointers because the minimum extension could have come from multiple previous cells. In the second step, we perform a backtrace . In a backtrace, we start from the last cell (at the final row and backtrace column), and follow the pointers back through the dynamic programming matrix. Each complete path between the final cell and the initial cell is a minimum distance alignment. Exercise 2. 7 asks you to modify the minimum edit distance algorithm to store the pointers and compute the backtrace to output an alignment. # e x e c u t i o n # 0 1 2 3 4 5 6 7 8 9 i 1 - " 2 - " 3 - " 4 - " 5 - " 6 - " 7 - 6 7 8 n 2 - " 3 - " 4 - " 5 - " 6 - " 7 - " 8 " 7 - " 8 - 7 t 3 - " 4 - " 5 - " 6 - " 7 - " 8 - 7 " 8 - " 9 " 8 e 4 - 3 4 - 5 6 7 " 8 - " 9 - " 10 " 9 n 5 " 4 - " 5 - " 6 - " 7 - " 8 - " 9 - " 10 - " 11 -" 10 t 6 " 5 - " 6 - " 7 - " 8 - " 9 - 8 9 10 " 11 i 7 " 6 - " 7 - " 8 - " 9 - " 10 " 9 - 8 9 10 o 8 " 7 - " 8 - " 9 - " 10 - " 11 " 10 " 9 - 8 9 n 9 " 8 - " 9 - " 10 - " 11 - " 12 " 11 " 10 " 9 - 8 Figure 2.17 When entering a value in each cell, we mark which of the three neighboring cells we came from with up to three arrows. After the table is full we compute an alignment (minimum edit path) by using a backtrace , starting at the 8 in the lower-right corner and following the arrows back. The sequence of bold cells represents one possible minimum cost alignment between the two strings. While we worked our example with simple Levenshtein distance, the algorithm in Fig. 2.15 allows arbitrary weights on the operations. For spelling correction, for example, substitutions are more likely to happen between letters that are next to each other on the keyboard. We’ll discuss how these weights can be estimated in

This preview has intentionally blurred sections. Sign up to view the full version.

2.5 S UMMARY 31 Ch. 5. The Viterbi algorithm, for example, is an extension of minimum edit distance that uses probabilistic definitions of the operations. Instead of computing the “mini- mum edit distance” between two strings, Viterbi computes the “maximum probabil- ity alignment” of one string with another. We’ll discuss this more in Chapter 9. 2.5 Summary This chapter introduced a fundamental tool in language processing, the regular ex- pression , and showed how to perform basic text normalization tasks including word segmentation and normalization , sentence segmentation , and stemming . We also introduce the important minimum edit distance algorithm for comparing strings. Here’s a summary of the main points we covered about these ideas: The regular expression language is a powerful tool for pattern-matching. Basic operations in regular expressions include concatenation of symbols, disjunction of symbols ( [] , | , and . ), counters ( * , + , and {n,m} ), anchors ( ˆ , \$ ) and precedence operators ( ( , ) ). Word tokenization and normalization are generally done by cascades of simple regular expressions substitutions or finite automata. The Porter algorithm is a simple and efficient way to do stemming , stripping off affixes. It does not have high accuracy but may be useful for some tasks.
This is the end of the preview. Sign up to access the rest of the document.
• Fall '09

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern