to mean “any capital letter”). In cases where there is a well-defined sequence asso- ciated with a set of characters, the brackets can be used with the dash ( - ) to specify any one character in a range . The pattern /[2-5]/ specifies any one of the charac- range ters 2 , 3 , 4 , or 5 . The pattern /[b-g]/ specifies one of the characters b , c , d , e , f , or g . Some other examples are shown in Fig. 2.3 . RE Match Example Patterns Matched /[A-Z]/ an upper case letter “we should call it ‘D renched Blossoms’ ” /[a-z]/ a lower case letter “m y beans were impatient to be hoed!” /[0-9]/ a single digit “Chapter 1 : Down the Rabbit Hole” Figure 2.3 The use of the brackets [] plus the dash - to specify a range. The square braces can also be used to specify what a single character cannot be, by use of the caret ˆ . If the caret ˆ is the first symbol after the open square brace [ , the resulting pattern is negated. For example, the pattern /[ˆa]/ matches any single character (including special characters) except a . This is only true when the caret is the first symbol after the open square brace. If it occurs anywhere else, it usually stands for a caret; Fig. 2.4 shows some examples. RE Match (single characters) Example Patterns Matched /[ˆA-Z]/ not an upper case letter “Oy fn pripetchik” /[ˆSs]/ neither ‘S’ nor ‘s’ “I have no exquisite reason for’t” /[ˆ\.]/ not a period “o ur resident Djinn” /[eˆ]/ either ‘e’ or ‘ ˆ “look up ˆ now” /aˆb/ the pattern ‘ aˆb “look up aˆ b now” Figure 2.4 Uses of the caret ˆ for negation or just to mean ˆ . We discuss below the need to escape the period by a backslash. How can we talk about optional elements, like an optional s in woodchuck and woodchucks ? We can’t use the square brackets, because while they allow us to say “s or S”, they don’t allow us to say “s or nothing”. For this we use the question mark /?/ , which means “the preceding character or nothing”, as shown in Fig. 2.5 . We can think of the question mark as meaning “zero or one instances of the previous character”. That is, it’s a way of specifying how many of something that

2.1 R EGULAR E XPRESSIONS 13 RE Match Example Patterns Matched /woodchucks?/ woodchuck or woodchucks “woodchuck /colou?r/ color or colour “colour Figure 2.5 The question mark ? marks optionality of the previous expression. we want, something that is very important in regular expressions. For example, consider the language of certain sheep, which consists of strings that look like the following: baa! baaa! baaaa! baaaaa! . . . This language consists of strings with a b , followed by at least two a ’s, followed by an exclamation point. The set of operators that allows us to say things like “some number of a s” are based on the asterisk or * , commonly called the Kleene * (gen- Kleene * erally pronounced “cleany star”). The Kleene star means “zero or more occurrences of the immediately previous character or regular expression”. So /a*/ means “any string of zero or more a s”. This will match a or aaaaaa , but it will also match Off Minor since the string Off Minor has zero a ’s. So the regular expression for matching
