Written Assignment 3: The Levenshtein Distance A spell checker is a word processing program that makes suggestions when it finds a word not in the...
View the step-by-step solution to:

Question

The levenshein distance 20191014_151024.jpg

src="/qa/attachment/10754888/" alt="20191014_151016.jpg" />

20191014_151016.jpg

Written Assignment 3: The Levenshtein Distance
A spell checker is a word processing program that makes
suggestions when it finds a word not in the dictionary .
To determine what words to suggest, it tries to find
similar words. One measure of word similarity is the
Levenshtein distance, which measures the number of
substitutions, additions, or deletions that are required to
change one word into another.
For example, the words spit and spot are a distance of 1 apart; changing spit to spot requires
one substitution (i for o). Likewise, spit is distance 1 from pit since the change requires one
deletion (the s). The word spite is also distance 1 from spit since it requires one addition (the
e). The word soot is distance 2 from spit since two substitutions wo
ns would be required (i for o
and p for o). This situation can be represented using the graph below whose vertices are the
words and the edges connect words at distance one.
spite
spit
spot
pit
soot
Here is another example. There are several words at distance 1 from the misspelled word
"aed": aid, and, led, med. These words are included in the following graph, together with th
words mad and let that are at distance 2 from aed. Note that the three words aed, aid, and
and only differ by the middle letter. So they are all at distance 1 from each other forming
'triangle' in the graph.
led
let
aed
med
mad
aid
and
107

20191014_151024.jpg

a. Create a graph using words as vertices, and edges connecting words with a Levenshtein
distance of 1. Use the misspelled word "moke" as the center, and try to find at least 10
connected dictionary words. How might a spell checker use this graph?
Caprin nouslob
10oz blow of'T (s
droo nonoutlawinT .Co rot q one
. sno soundaib is abrow Noornow eagbe alt bure abrown
b. Improve the method from above by assigning a weight to each edge based on the
likelihood of making the substitution, addition, or deletion. You can base the weights
on any reasonable approach: proximity of keys on a keyboard, common language
errors, etc. Include the weights on your graph from part (a) and explain how you
assigned the weights.
word of astjol slobim ed distrib vino ban
bern
Note that these weights and Dijkstra's algorithm can be used to find the shortest path
the spell checker.
from any word to "moke". A word with shortest distance to "moke" is a good candidate for
108

Recently Asked Questions

Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.

  • -

    Study Documents

    Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

    Browse Documents
  • -

    Question & Answers

    Get one-on-one homework help from our expert tutors—available online 24/7. Ask your own questions or browse existing Q&A threads. Satisfaction guaranteed!

    Ask a Question
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes