Unformatted text preview: in an HTML document •  create variables from informa+on found in text •  clean and transform text into a uniform format, resolving inconsistencies in format between files •  mine text by trea+ng documents directly as data •  “scrape” the web for data •  A regular expression (aka regex or regexp) is a paMern that describes a set of strings. •  This set may be finite or infinite, depending on the par+cular regexp. We say the regexp “matches” each element of that set. •  For example, the regexp grey|gray ! matches both grey and gray, whereas ^A.* matches any string star+ng with capital A. •  The idea is similar to wildcards in UNIX, but with many more possibili+es. Syntax: • Literal characters are matched only by the character itself. • A character...
