CS 200
CS 200 Fall 2012
Regular Expressions
1
Regular Expressions
A Ubiquitous Tool for Manipulating Text
12 Nov 2007
Wednesday, November 21, 2012

CS 200 Fall 2012
Regular Expressions
Regular expressions are
•
a way of representing patterns in text
Why are they useful?
•
fi
nding text that matches a pattern
•
replacing it with something else
•
— saving you time
Where do you
fi
nd them?
•
major word processors (e.g. MS Word)
obscure keyword:
“wildcards”
•
most text editors (eg BBEdit, TextWrangler, TextPad, vi, emacs)
obscure keyword:
“grep”
•
various command line tools
UNIX grep,
fi
nd
Windows XP's
fi
ndstr
•
etc, etc, etc...
2
The Executive Summary
Wednesday, November 21, 2012

CS 200 Fall 2012
Regular Expressions
To import this data into a FileMaker table, we need
•
separate
fi
elds for
given
&
family
names
•
a separate
fi
eld for the
*
’s
fl
agging privacy
So we want to
•
replace
• • • :
*
Ferraro
,
David Joseph
: • • •
• • • :
Shillington
,
Tara Dawn
: • • •
•
by
• • • :
*
:
David Joseph
:
Ferraro
: • • •
• • • ::
Tara Dawn
:
Shillington
: • • •
3
00035091:ygavet:
Gavet
,
Yann
::math:NN:ND
92013945:alaustaris:
*
Ustaris
,
Arsenyk Lord Alexis
:math:math:4A:H
95011647:awowkodaw:
Wowkodaw
,
Andrij
:sy de:eng:4B:H
95014052:lpdcunha:
D'Cunha
,
Larry Paul
:sy de:eng:4B:H
95032773:djferraro:
*
Ferraro
,
David Joseph
:sy de:eng:4B:H
95044104:majarvis:
Jarvis
,
Michael Andrew
:civ e:eng:4B:H
95075344:hzshahid:
Shahid
,
Hasan Ziaulhaq
::sci:4N:H
95082835:sfodell:
O'Dell
,
Shane Fund
::sci:4N:H
95084257:jmkonik:
Konik
,
Jason
:sy de:eng:4B:H
96001912:kmemberson:
*
Emberson
,
Kathleen Marie
:math:math:4A:H
96007474:ajbehm:
*
Behm
,
Aaron Jeffrey
:math:math:4B:H
96007733:tdshillington:
Shillington
,
Tara Dawn
:phys:sci:4B:H
96012020:mdoris:
Doris
,
Matthew
:phys:sci:4B:H
• • •
(1)
These student ID numbers were randomly generated.
Example Problem 1 — Reformat a Classlist
(1)
Wednesday, November 21, 2012

CS 200 Fall 2012
Regular Expressions
These
fi
les having been generated by Audio Hijack Pro
as it records music from a digital cable channel (to play in my car via an iPod)
That is, for convenience we want to change
•
a sequence of date-time stamped
fi
le names generated by an audio recording program
RC 741 20061112 1500.m4a
•
to a more usable sequence having the form
Baroque nn.m4a
4
RC 741 20061112 1500.m4a
RC 741 20061112 1501.m4a
RC 741 20061112 1502.1.m4a
RC 741 20061112 1502.m4a
RC 741 20061112 1503.m4a
RC 741 20061112 1504.1.m4a
RC 741 20061112 1504.m4a
RC 741 20061112 1505.m4a
RC 741 20061112 1506.m4a
RC 741 20061112 1507.1.m4a
RC 741 20061112 1507.m4a
RC 741 20061112 1508.m4a
RC 741 20061112 1509.1.m4a
RC 741 20061112 1509.m4a
RC 741 20061112 1510.m4a
RC 741 20061112 1511.1.m4a
• • •
Change
Baroque 01.m4a
Baroque 02.m4a
Baroque 03.m4a
Baroque 04.m4a
Baroque 05.m4a
Baroque 06.m4a
Baroque 07.m4a
Baroque 08.m4a
Baroque 09.m4a
Baroque 10.m4a
Baroque 11.m4a
Baroque 12.m4a
Baroque 13.m4a
Baroque 14.m4a
Baroque 15.m4a
Baroque 16.m4a
• • •
To
Example Problem 2 — Changing a Large Number of File Names
Wednesday, November 21, 2012

CS 200 Fall 2012
Regular Expressions
We want to change
•
dates represented in the European “Day Month Year” style
25 Sep 1986
•
to a format more suitable for importation into an SQL database
09–25–1986
5
'ALG','1','15.00','Written problem','
25 Sep 1986
'
'ALG','2','15.00','Written problem','
19 Oct 1986
'
'ALG','3','20.00','Midterm','
29 Oct 1986
'
'ALG','4','10.00','Group problem','
15 Nov 1986
'
'ALG','5','10.00','Oral presentation','
27 Nov 1986
'
'ALG','6','30.00','Final Exam','
14 Dec 1986
'
'BIOL','1','10.00','Written assignment','
24 Sep 1986
'
'BIOL','2','15.00','Written assignment','
20 Oct 1986
'
'BIOL','3','30.00','Midterm','
