Unstructured Information Processing
with Apache UIMA
CSE 392, Computers Playing Jeopardy!, Fall 2011
Stony Brook University
http:/www.cs.stonybrook.edu/~cse392
What is UIMA?
UIMA is a framework, a means to integrate text or other
unstructured information
Lecture 2:
Data Structures
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
String/Character I/O
There are several approaches to reading in the text input
required by many o
Lecture 1:
Getting Started
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Course Goals
To provide a challenging, self-motivating course for good
students to learn what mak
Lecture 3:
Strings
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Character Codes
Character codes are mappings between numbers and the
symbols which make up a particular a
Lecture 4:
Sorting
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Applications of Sorting
The key to understanding sorting is seeing how it can be used
to solve many impor
Lecture 5:
Arithmetic and Algebra
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
How Long is Long?
Todays PCs are typically 32-bit machines, so standard
integer data types
CSE 392 Programing Challenges
Prof. Steven Skiena
Spring 2012
The 100 Problem Read
Due Thursday, May 3, 2012
Learning to read and understand problems quickly is an important skill for solving them. In
this assignment, you will be given a block of 100 prog
1 Miscellaneous (6 points)
(a) Fill the missing right-hand side for the production of A in the grammar below such that the
rules form a LL“) grammar.
S —; A u A | A tr A
A—;
Solution:
A —> 6.
Consider the following gt-annnai':
S —~ 0.3 bS oboA
A—ieloA
1 Compiler Stages (14 points)
The following diagram shows the stages of a cannpilen Label each of the eleven unlabeled
diagram elements. Earl] unlabeled element is {Bllllﬂl' a generating tool used in rmnpiler
construction. a nspnssrmmtion of the Huhjmﬂ pr
1 Type Checking (16 points)
IC'onsider an extension of Cool to support arrays of objects. ‘We introduce an Array class
that inherits from Dbject. Other classes cannot inherit from the Array class. “re introduce
four new expressions for manipulating Cool a
Lecture 6:
Combinatorics
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Learning to Count
Combinatorics problems are notorious for their reliance on
cleverness and insight
Lecture 7:
Number Theory
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Number Theory and Divisibility
G-d created the integers. All else is the work of man.
Kronecker.
N
XSB Prolog
CSE 392, Computers Playing Jeopardy!, Fall 2011
Stony Brook University
http:/www.cs.stonybrook.edu/~cse392
1
IBM Watson Question Analysis for
Jeopardy! = UIMA + Prolog
(c) 2011 P.Fodor (CS Stony Brook) & Wikipedia
What Is Prolog?
Prolog is a l
XSB Prolog (cont.)
CSE 392, Computers Playing Jeopardy!, Fall 2011
Stony Brook University
http:/www.cs.stonybrook.edu/~cse392
1
Definite clause grammar (DCG)
A DCG is a way of expressing grammar in a logic
programming
programming language such as Prolog
Lecture 9:
Graph Traversal
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Graphs
Graphs are one of the unifying themes of computer science.
A graph G = (V, E) is dened by
Lecture 8:
Backtracking
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Backtracking
Backtracking is a systematic method to iterate through all
the possible congurations of
Lecture 10:
Graph Algorithms
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Properties of Graphs
Graph theory is the study of the properties of graph
structures. It provid
Lecture 11:
Dynamic Programming
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Dynamic Programming
Dynamic programming is a very powerful, general tool for
solving optimiz
Lecture 13:
Geometry
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Geometry
This chapter will deal with programming problems associated
with real geometry lines, points,
Lecture 14:
Computational Geometry
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http:/www.cs.sunysb.edu/skiena
Line Segments
Arbitrary closed curves or shapes can be represented by
ordered collections
1. For each of the follow prompts. write an},r non—emptg,r sentence:
0 Name one thing you would like to learn in this class.
0 'Write a question you would like the professor to answer — on any
topic, from personal opinions to the class material.
2. Consid