jurafsky&martin_3rdEd_17 (1).pdf

It is often useful to be able to refer to a

Info icon This preview shows pages 17–19. Sign up to view the full content.

It is often useful to be able to refer to a particular subpart of the string matching the first pattern. For example, suppose we wanted to put angle brackets around all integers in a text, for example, changing the 35 boxes to the < 35 > boxes . We’d like a way to refer to the integer we’ve found so that we can easily add the brackets. To do this, we put parentheses ( and ) around the first pattern and use the number operator \1 in the second pattern to refer back. Here’s how it looks: s/([0-9]+)/<\1>/ The parenthesis and number operators can also specify that a certain string or expression must occur twice in the text. For example, suppose we are looking for the pattern “the Xer they were, the Xer they will be”, where we want to constrain the two X’s to be the same string. We do this by surrounding the first X with the parenthesis operator, and replacing the second X with the number operator \1 , as follows: /the (.*)er they were, the \1er they will be/
Image of page 17

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

18 C HAPTER 2 R EGULAR E XPRESSIONS , T EXT N ORMALIZATION , E DIT D ISTANCE Here the \1 will be replaced by whatever string matched the first item in paren- theses. So this will match The bigger they were, the bigger they will be but not The bigger they were, the faster they will be . This use of parentheses to store a pattern in memory is called a capture group . capture group Every time a capture group is used (i.e., parentheses surround a pattern), the re- sulting match is stored in a numbered register . If you match two different sets of register parentheses, \2 means whatever matched the second capture group. Thus /the (.*)er they (.*), the \1er we \2/ will match The faster they ran, the faster we ran but not The faster they ran, the faster we ate . Similarly, the third capture group is stored in \3 , the fourth is \4 , and so on. Parentheses thus have a double function in regular expressions; they are used to group terms for specifying the order in which operators should apply, and they are used to capture something in a register. Occasionally we might want to use parenthe- ses for grouping, but don’t want to capture the resulting pattern in a register. In that case we use a non-capturing group , which is specified by putting the commands non-capturing group ?: after the open paren, in the form (?: pattern ) . /(?:some|a few) (people|cats) like some \1/ will match some cats like some people but not some people like some a few . Substitutions and capture groups are very useful in implementing simple chat- bots like ELIZA (Weizenbaum, 1966) . Recall that ELIZA simulates a Rogerian psychologist by carrying on conversations like the following: User 1 : Men are all alike. ELIZA 1 : IN WHAT WAY User 2 : They’re always bugging us about something or other. ELIZA 2 : CAN YOU THINK OF A SPECIFIC EXAMPLE User 3 : Well, my boyfriend made me come here.
Image of page 18
Image of page 19
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern