Discussion“^”matches the beginning of a line“*”means “0 or more repetitions of the preceding character”More concise but equivalent patterns^[0-9]++means “one or more repetitions of the preceding character”^\d+⊔\dmatches any digit, so “\d+⊔” matches any string of one or more digits followed by a 8pattern:^[0-9][0-9]*⊔replacement: ⊔More Special Characters — ^ * + — and Replacing the Matched
CS200 Winter 2018Regular ExpressionsSource text:<a href="">What’s matched:<a href="">Discussion•[^ab0-3] matches any character except a, b, 0, 1, 2, or 3•^ has this meaning only at the beginning of a character class — ie, when it’s immediately after a [Question: could we instead have used the following pattern to match URLs in HTML?".+"hint: + and * are “greedy” — they match the longest string they can9pattern:"[^"]+"Excluding Characters
CS200 Winter 2018Regular ExpressionsMatching the end of a line: $What does this pattern/replacement pair do?Remember that we’re using ⊔to represent a blank10pattern:⊔+$replacement:
CS200 Winter 2018Regular ExpressionsSource text:bmzister 21DalBbbunny 22BunBhvbingen 31BinHWhat’s found:<TR><TD>bmzister</TD><TD>21DalB</TD></TR> <TR><TD>bbunny</TD><TD>22BunB</TD></TR><TR><TD>hvbingen</TD><TD>31BinH</TD></TR>Discussion•typically up to nine subpatterns (\1 through \9)•“&” represents the entire matched pattern•why is it unnecessary to surround the pattern with “^” and “$” ?11pattern:(.+)⊔(.+)replacement: <TR><TD>\1</TD><TD>\2</TD></TR>⊔Sub / Replacement Patterns
CS200 Winter 2018Regular ExpressionsAnother Invisible Character — tab — and \wSource text:bmzister 21DalBbbunny 22BunBhvbingen 31BinHWhat’s found:bmzister21DalB bbunny22BunBhvbingen31BinHDiscussion\w[a–zA–Z0–9_]\trepresents the tab character\wis more general than what we see — is that ok?Other invisible (aka non-printing) characters\swhite space (blank, tab, newline, carriage return, form feed—ie the character class [⊔\t\n\r\f])\rMacintosh end-of-line (the “carriage return” character)—often represented by ¶\nUnix end-of-line (the “newline” character)—often represented by ¶\r\nWindows end-of-line — often represented by ¶12pattern:(\w+)⊔(\w+)replacement: \1\t\2
CS200 Winter 2018Regular ExpressionsAlternation: |Source text:Jack and JillWent up the hillTo fetch a pail of water.Jack fell downAnd broke his crownAnd Jill came tumbling after. What’s found:"Jack" and "Jill"Went up the hillTo fetch a pail of water."Jack" fell downAnd broke his crownAnd "Jill" came tumbling after. Discussion& represents the entire string matchedso the result encloses the names Jack and Jill in double quotesWhat strings does the following pattern match?((Jack|Jill)⊔ran⊔very(,⊔very)*⊔fast⊔up⊔the⊔hill!⊔+)+13pattern:(Jack|Jill)replacement: "&"
CS200 Winter 2018Regular ExpressionsThe replacement:\1:\3:\2:\1is replaced by the 1st ( ) pattern matched(\*?)\2is replaced by the 2nd ( ) pattern matched([^:,]*)\3is replaced by the 3rd ( ) pattern matched([^:]*)The pattern:(\*?)([^:,]*),\s([^:]*)p? matches 0 or 1 instances of pp* matches 0 or more instances of p\*matches * (ie without its special meaning)[^:,]