Lecture%2018%20-%20Regular%20Expressions - Lecture 18...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Copyright @ 2009 Ananda Gunawardena Lecture 18 Regular Expressions Many of today’s web applications require matching patterns in a text document to look for specific information. A good example is parsing a html file to extract <img> tags of a web document. If the image locations are available, then we can write a script to automatically download these images to a location we specify. Looking for tags like <img> is a form of searching for a pattern. Pattern searches are widely used in many applications like search engines. A regular expression(regex) is defined as a pattern that defines a class of strings. Given a string, we can then test if the string belongs to this class of patterns. Regular expressions are used by many of the unix utilities like grep, sed, awk, vi, emacs etc. We will learn the syntax of describing regex later. Pattern search is a useful activity and can be used in many applications. We are already doing some level of pattern search when we use wildcards such as *. For example, > ls *.c Lists all the files with c extension or h ls ab* lists all file names that starts with ab in the current directory. These type of commands (ls,dir etc) work with windows, unix and most operating systems. That is, the command ls will look for files with a certain name patterns but are limited in ways we can describe patterns. The wild card (*) is typically used with many commands in unix. For example, h cp *.c /afs/andrew.cmu.edu/course/15/123/handin/Lab6/guna copies all .c files in the current directory to the given directory Unix commands like ls, cp can use simple wild card (*) type syntax to describe specific patterns and perform the corresponding tasks. However, there are many powerful unix utilities that can look for patterns described in general purpose notations. One such utility is the grep command. The grep command Grep command is a unix tools that can be used for pattern matching. Its description is given by the following.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Copyright @ 2009 Ananda Gunawardena . The grep (Global Regular Expression Print) is a unix command utility that can be used to find specific patterns described in “regular expressions” , a notation which we will learn shortly. For example, the “grep” command can be used to match all lines containing a specific pattern. For example, h grep “<a href” guna.html > output.txt writes all lines containing the matching substring “<a href” to the file output.txt grep unix command can be an extremely handy tool for searching for patterns. If we do h grep “foo” filename it returns all lines of the file given by filename that matches string foo. Unix provide the | command (pipe command) to send an input from one process to another process. Say for example, we would like to find all files that have the pattern “guna”. We can do the following to accomplish that task. > ls | grep guna
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/27/2009 for the course CS 123 taught by Professor Bajkzek during the Fall '08 term at Carnegie Mellon.

Page1 / 8

Lecture%2018%20-%20Regular%20Expressions - Lecture 18...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online