CS 200
CS 200 Spring 2016
Regular Expressions
1
Regular Expressions
A Ubiquitous Tool for Manipulating Text

CS 200Fall 2016
Regular Expressions
Regular expressions are
•
a way of representing patterns in text
Why are they useful?
•
finding text that matches a pattern
•
replacing it with something else
•
— saving you time
Where do you find them?
•
major word processors (e.g. MS Word)
obscure keyword:
“wildcards”
•
most text editors (eg BBEdit, TextWrangler, TextPad, vi, emacs)
obscure keyword:
“grep”
•
various command line tools
UNIX grep, find
Windows XP's findstr
•
etc, etc, etc...
2
The Executive Summary

CS 200Fall 2016
Regular Expressions
To import this data into a FileMaker table, we need
•
separate fields for
given
&
family
names
•
a separate field for the
*
’s flagging privacy
So we want to
•
replace
• • • :
*
Ferraro
,
David Joseph
: • • •
• • • :
Shillington
,
Tara Dawn
: • • •
•
by
• • • :
*
:
David Joseph
:
Ferraro
: • • •
• • • ::
Tara Dawn
:
Shillington
: • • •
3
00035091:ygavet:
Gavet
,
Yann
::math:NN:ND
92013945:alaustaris:
*
Ustaris
,
Arsenyk Lord Alexis
:math:math:4A:H
95011647:awowkodaw:
Wowkodaw
,
Andrij
:sy de:eng:4B:H
95014052:lpdcunha:
D'Cunha
,
Larry Paul
:sy de:eng:4B:H
95032773:djferraro:
*
Ferraro
,
David Joseph
:sy de:eng:4B:H
95044104:majarvis:
Jarvis
,
Michael Andrew
:civ e:eng:4B:H
95075344:hzshahid:
Shahid
,
Hasan Ziaulhaq
::sci:4N:H
95082835:sfodell:
O'Dell
,
Shane Fund
::sci:4N:H
95084257:jmkonik:
Konik
,
Jason
:sy de:eng:4B:H
96001912:kmemberson:
*
Emberson
,
Kathleen Marie
:math:math:4A:H
96007474:ajbehm:
*
Behm
,
Aaron Jeffrey
:math:math:4B:H
96007733:tdshillington:
Shillington
,
Tara Dawn
:phys:sci:4B:H
96012020:mdoris:
Doris
,
Matthew
:phys:sci:4B:H
• • •
(1)
These student ID numbers were randomly generated.
Example Problem 1 — Reformat a Classlist
(1)

CS 200Fall 2016
Regular Expressions
These files having been generated by Audio Hijack Pro
as it records music from a digital cable channel (to play in my car via an iPod)
That is, for convenience we want to change
•
a sequence of date-time stamped file names generated by an audio recording
program
RC 741 20061112 1500.m4a
•
to a more usable sequence having the form
4
RC 741 20061112 1500.m4a
RC 741 20061112 1501.m4a
RC 741 20061112 1502.1.m4a
RC 741 20061112 1502.m4a
RC 741 20061112 1503.m4a
RC 741 20061112 1504.1.m4a
RC 741 20061112 1504.m4a
RC 741 20061112 1505.m4a
RC 741 20061112 1506.m4a
RC 741 20061112 1507.1.m4a
RC 741 20061112 1507.m4a
RC 741 20061112 1508.m4a
RC 741 20061112 1509.1.m4a
RC 741 20061112 1509.m4a
RC 741 20061112 1510.m4a
RC 741 20061112 1511.1.m4a
• • •
Change
Baroque 01.m4a
Baroque 02.m4a
Baroque 03.m4a
Baroque 04.m4a
Baroque 05.m4a
Baroque 06.m4a
Baroque 07.m4a
Baroque 08.m4a
Baroque 09.m4a
Baroque 10.m4a
Baroque 11.m4a
Baroque 12.m4a
Baroque 13.m4a
Baroque 14.m4a
Baroque 15.m4a
Baroque 16.m4a
• • •
To
Example Problem 2 — Changing a Large Number of File

CS 200Fall 2016
Regular Expressions
We want to change
•
dates represented in the European “Day Month Year” style
25 Sep 1986
•
to a format more suitable for importation into an SQL database
09–25–1986
5
'ALG','1','15.00','Written problem','
25 Sep 1986
'
'ALG','2','15.00','Written problem','
19 Oct 1986
'
'ALG','3','20.00','Midterm','
29 Oct 1986
'
'ALG','4','10.00','Group problem','
15 Nov 1986
'
'ALG','5','10.00','Oral presentation','
27 Nov 1986
'
'ALG','6','30.00','Final Exam','
14 Dec 1986
'
'BIOL','1','10.00','Written assignment','
24 Sep 1986
'
'BIOL','2','15.00','Written assignment','
20 Oct 1986
'
'BIOL','3','30.00','Midterm','
16 Nov 1986
'
• • •
Example Problem 3 — Change Date Format

CS 200Fall 2016
Regular Expressions
Source text:
He is a rat.
