inkOutline_IWFHR2004 - Learning to Parse Hierarchical Lists...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Learning to Parse Hierarchical Lists and Outlines using Conditional Random Fields Ming Ye and Paul Viola Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 { mingye,viola } Abstract Handwritten notes are complex structures which in- clude blocks of text, drawings, and annotations. The main challenge for the newly emerging tablet computer is to provide high-level tools for editing and authoring handwritten documents using a natural interface. One frequent component of natural notes are lists and hier- archical outlines which correspond directly to the bul- leted lists and itemized structures in conventional text editing tools. We present a system which automatically recognizes lists and hierarchical outlines in handwritten notes, and then computes the correct structure. This inferred structure provides the foundation for new user interfaces and facilitates the importation of handwrit- ten notes into conventional editing tools. 1. Introduction Spontaneous on-line ink notes taken on a tablet PC frequently have hierarchical structures. Users typically write out paragraphs which are composed of lines, lines which are composed of words, and words which are composed of strokes. It is also very common that users create hierarchical structures between paragraphs us- ing different indentation and/or bullet schemes. Auto- matically interpreting these hierarchical structures al- lows for complex high-level manipulations such as in- sertion of a line, moving or collapsing sub-trees, and porting ink into text preparation systems like Microsoft Word with appropriate formatting. This paper focuses on outline parsing , the prob- lem of segmenting a block of text lines into para- graphs and determining the hierarchical structures be- tween the paragraphs. Each paragraph has certain formatting attributes such as its bullet and indenta- tion styles(see Figure 1) 1 . After parsing, the resulting 1 This definition includes a list item as a special case. outline tree has a single invisible ROOT node and a number of paragraph nodes, each containing at least one line and possibly a number of children nodes. The presented outline parser assumes that graphical elements have been filtered out, strokes have been grouped correctly into words, lines and blocks, and an- notations have been segmented and removed. These pre-processing modules are beyond the scope of this paper and will be described elsewhere. Two observations make outline parsing much sim- pler. The first is that the lines within each block are naturally ordered from top to bottom and that the nodes in the tree have the same depth first order 2 . The second observation implies that the hierarchical structure can be encoded by assigning each line a la- bel. The labels encode both the depth of the node in the tree and whether the line is a continuation of the same paragraph (see Figure 1). Given these two observations, the inference of the outline tree can be achieved as a line classification problem, where each...
View Full Document

This note was uploaded on 06/12/2011 for the course CAP 6105 taught by Professor Lavoila during the Spring '09 term at University of Central Florida.

Page1 / 6

inkOutline_IWFHR2004 - Learning to Parse Hierarchical Lists...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online