Discussion matches the beginning of a line means 0 or more repetitions of the

Discussion matches the beginning of a line means 0 or

This preview shows page 8 - 15 out of 22 pages.

Discussion “^” matches the beginning of a line “*” means “0 or more repetitions of the preceding character” More concise but equivalent patterns ^[0-9]+ + means “one or more repetitions of the preceding character” ^\d+ \d matches any digit, so “ \d+ ” matches any string of one or more digits followed by a 8 pattern: ^[0-9][0-9]* replacement: More Special Characters — ^ * + — and Replacing the Matched
Image of page 8
CS200 Winter 2018 Regular Expressions Source text: <a href= " " > What’s matched: <a href= " " > Discussion [^ab0-3] matches any character except a, b, 0, 1, 2, or 3 ^ has this meaning only at the beginning of a character class — ie, when it’s immediately after a [ Question: could we instead have used the following pattern to match URLs in HTML? ".+" hint: + and * are “greedy” — they match the longest string they can 9 pattern: "[^"]+" Excluding Characters
Image of page 9
CS200 Winter 2018 Regular Expressions Matching the end of a line: $ What does this pattern/replacement pair do? Remember that we’re using to represent a blank 10 pattern: +$ replacement:
Image of page 10
CS200 Winter 2018 Regular Expressions Source text: bmzister 21DalB bbunny 22BunB hvbingen 31BinH What’s found: <TR><TD> bmzister </TD><TD> 21DalB </TD></TR> <TR><TD> bbunny </TD><TD> 22BunB </TD></TR> <TR><TD> hvbingen </TD><TD> 31BinH </TD></TR> Discussion typically up to nine subpatterns (\1 through \9) “&” represents the entire matched pattern why is it unnecessary to surround the pattern with “^” and “$” ? 11 pattern: (.+) (.+) replacement: <TR><TD> \1 </TD><TD> \2 </TD></TR> Sub / Replacement Patterns
Image of page 11
CS200 Winter 2018 Regular Expressions Another Invisible Character — tab — and \w Source text: bmzister 21DalB bbunny 22BunB hvbingen 31BinH What’s found: bmzister 21DalB bbunny 22BunB hvbingen 31BinH Discussion \w [a–zA–Z0–9_] \t represents the tab character \w is more general than what we see — is that ok? Other invisible (aka non-printing) characters \s white space (blank, tab, newline, carriage return, form feed—ie the character class [ \t\n\r\f]) \r Macintosh end-of-line (the “carriage return” character)—often represented by ¶ \n Unix end-of-line (the “newline” character)—often represented by ¶ \r\n Windows end-of-line — often represented by ¶ 12 pattern: (\w+) (\w+) replacement: \1 \t \2
Image of page 12
CS200 Winter 2018 Regular Expressions Alternation: | Source text: Jack and Jill Went up the hill To fetch a pail of water. Jack fell down And broke his crown And Jill came tumbling after. What’s found: "Jack" and "Jill" Went up the hill To fetch a pail of water. "Jack" fell down And broke his crown And "Jill" came tumbling after. Discussion & represents the entire string matched so the result encloses the names Jack and Jill in double quotes What strings does the following pattern match? ((Jack|Jill) ran very(, very)* fast up the hill! +)+ 13 pattern: ( Jack | Jill) replacement: "&"
Image of page 13
CS200 Winter 2018 Regular Expressions The replacement : \1 : \3 : \2 : \1 is replaced by the 1st ( ) pattern matched (\*?) \2 is replaced by the 2nd ( ) pattern matched ([^:,]*) \3 is replaced by the 3rd ( ) pattern matched ([^:]*) The pattern : (\*?) ([^:,]*) ,\s ([^:]*) p? matches 0 or 1 instances of p p* matches 0 or more instances of p \* matches * (ie without its special meaning) [^:,]
Image of page 14
Image of page 15

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture