This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 14:440:127– Introduction to Computers for Engineers Notes for Lecture 12 Rutgers University, Spring 2010 Instructor- Blase E. Ur 1 Pattern Matching- Regular Expressions (This section is not covered in your book. You can refer to www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/f0-42649.html ) Regular Expressions are a class of tools that allow you to do pattern-matching (identifying strings of letters or numbers that match a certain pattern). Some of the most powerful tools for regular expressions are in the Perl programming language; you also might encounter people writing regular expressions with the Unix command grep , which uses regular expressions to find files on a system. Matlab also allows you to use regular expressions with the following series of functions: • regexp matches a pattern (case sensitive) • regexpi matches a pattern (case insensitive i.e. A and a are the same) • regexprep replaces a pattern with something else 1.1 Matching the most basic patterns The arguments to regexp are 1) a string in which you’re searching for matches, and 2) a pattern (also given as a string). In the most basic case, let’s find where cat is located in the string ”the cat in the hat” : mystring = ’the cat in the hat’; regexp(mystring,’cat’) ans = 5 This result tells us that the pattern ’cat’ begins with the 5th character of the string. You could instead call regexp or regexpi as follows, requesting multiple outputs (and specifying what they are): [mat ix1 ix2] = regexp(pstr, expr, ’match’, ’start’, ’end’)- pstr is your string, and expr is your regular expression. mat will be a cell array of the matches themselves, start will be a vector of the starting points of the matches, and end will be a vector of the ending points of the matches. If you just wanted the matches you could simply say regexp(str,regexp,’match’) : mystring = ’the cat in the hat’; regexp(mystring,’cat’,’match’) ans = ’cat’ [a b c] = regexp(mystring,’cat’,’match’,’start’,’end’) a = ’cat’ b = 5 c = 7 1.2 Matching Symbols Of course, it’s not all that useful to only match words you can identify already. Thus, Matlab has a number of special symbols you can use for creating patterns. Note that whitespace means empty spaces, the characters that represent tabs, or new line characters, etc. 1 . matches any single character, including white space [abc] matches ANY single one of the characters in [ ] [a-z] matches ANY single character in that range (a,b,c,d...,x,y,z) [^abc] matches any single character NOT contained in [ ] \s matches any white-space character: [ \f\n\r\t\v] \S matches any non-whitespace character: [^ \f\n\r\t\v] \w matches any single alphanumeric/underscore character: [a-zA-Z_0-9] \W matches any character that’s not alphanumeric or an underscore \d matches any numeric digit: [0-9] \D matches any non-numeric character Let’s look at an example. Let’s find all words that contain a letter, anLet’s look at an example....
View Full Document
- Spring '08
- Regular expression, Blase E. Ur