beginning of a word end of a word n preceding item

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: les
 •  mine
text
by
trea0ng
documents
directly
as
 data
 •  scrape the
web
for
data
 Syntax:
 • Literal
characters
are
matched
only
by
the
character
 itself.
 • A
character
class
is
matched
by
any
single
member
of
 the
specified
class.

For
example,
 
[A-Z]

 is
matched
by
any
capital
leNer.
 
 • Modifiers
operate
on
literal
characters,
character
 classes,
or
combina0ons
of
the
two.
For
example
^
is
an
 anchor
that
indicates
the
literal
must
appear
at
the
 beginning
of
the
string
 9
 10/18/12
 Warning
 •  The
syntax
for
regexps
is
extremely
concise
 •  
It
can
be
overwhelming
if
you
try
to
read
it
 like
you
would
regular
text.


 •  Always
break
it
down
into
these
three
 components:
literals,
character
classes,
 modifiers
 How
to
find
fake
words?

 rep1!c@ted

 •  What
makes
this
different
from
a
regular
 word?
 •  Numbers
and
punctua0on
surrounded
by
 leNers
 •  Concepts
of
 numbers ,
 punctua0on ,...
View Full Document

Ask a homework question - tutors are online