3 Pages

N06-4006

Course: N 06, Fall 2008
School: UPenn
Rating:
 
 
 
 
 

Word Count: 1441

Document Preview

A Knowtator: Protg plug-in for annotated corpus construction Philip V. Ogren Division of Biomedical Informatics Mayo Clinic Rochester, MN, USA Ogren.Philip@mayo.edu Abstract A general-purpose text annotation tool called Knowtator is introduced. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> N 06

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
A Knowtator: Protg plug-in for annotated corpus construction Philip V. Ogren Division of Biomedical Informatics Mayo Clinic Rochester, MN, USA Ogren.Philip@mayo.edu Abstract A general-purpose text annotation tool called Knowtator is introduced. Knowtator facilitates the manual creation of annotated corpora that can be used for evaluating or training a variety of natural language processing systems. Building on the strengths of the widely used Protg knowledge representation system, Knowtator has been developed as a Protg plug-in that leverages Protgs knowledge representation capabilities to specify annotation schemas. Knowtators unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between annotation types) can be defined and incorporated into use. Knowtator is available under the Mozilla Public License 1.1 at http://bionlp.sourceforge.net/Knowtator. 1 Introduction Knowtator is a general-purpose text annotation tool for creating annotated corpora suitable for evaluating Natural Language Processing (NLP) systems. Such corpora consist of texts (e.g. documents, abstracts, or sentences) and annotations that associate structured information (e.g. POS tags, named entities, shallow parses) with extents of the texts. An annotation schema is a specification of the kinds of annotations that can be created. Knowtator provides a very flexible mechanism for defining anno- tation schemas. This allows it to be employed for a large variety of corpus annotation tasks. Protg is a widely used knowledge representation system that facilitates construction and visualization of knowledge-bases (Noy, 2003) 1 . A Protg knowledge-base typically consists of class, instance, slot, and facet frames. Class definitions represent the concepts of a domain and are organized in a subsumption hierarchy. Instances correspond to individuals of a class. Slots define properties of a class or instance and relationships between classes or instances. Facets constrain the values that slots can have. Protg has garnered widespread usage by providing an architecture that facilitates the creation of third-party plug-ins such as visualization tools and inference engines. Knowtator has been implemented as a Protg plug-in and runs in the Protg environment. In Knowtator, an annotation schema is defined with Protg class, instance, slot, and facet definitions using the Protg knowledge-base editing functionality. The defined annotation schema can then be applied to a text annotation task without having to write any task specific software or edit specialized configuration files. Annotation schemas in Knowtator can model both syntactic (e.g. shallow parses) and semantic phenomena (e.g. protein-protein interactions). 2 Related work There exists a plethora of manual text annotation tools for creating annotated corpora. While it has been common for individual research groups to build customized annotation tools for their specific 1 http://protege.stanford.edu 273 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume, pages 273275, New York City, June 2006. c 2006 Association for Computational Linguistics Figure 1 Simple co-reference annotations in Knowtator annotation tasks, several text annotation tools have emerged in the last few years that can be employed to accomplish a wide variety of annotation tasks. Some of the better general-purpose annotation tools include Callisto 2 , WordFreak 3 (Morton and LaCivita, 2003), GATE 4 , and MMAX2 5 . Each of these tools is distributed with a limited number of annotation tasks that can be used out of the box. Many of the tasks that are provided can be customized to a limited extent to suit the requirements of a users annotation task via configuration files. In Callisto, for example, a simple annotation schema can be defined with an XML DTD that allows the creation of an annotation schema that is essentially a tag set augmented with simple (e.g. string) attributes for each tag. In addition to configuration files, WordFreak provides a plug-in architecture for creating task specific code modules that can be integrated into the user interface. A complex annotation schema might include hierarchical relationships between annotation types and constrained relationships between the types. Creating such an annotation schema can be a formidable challenge for the available tools either http://callisto.mitre.org http://wordfreak.sourceforge.net 4 http://gate.ac.uk/. GATE is a software architecture for NLP that has, as one of its many components, text annotation functionality. 5 http://mmax.eml-research.de/. 3 2 because configuration options are too limiting or because implementing a new plug-in is too expensive or time consuming. 3 3.1 Implementation Annotation schema Knowtator approaches definition the of an annotation schema as a knowledge engineering task by leveraging Protgs strengths as a knowledgebase editor. Protg has user interface components for defining class, instance, slot, and facet frames. A Knowtator annotation schema is created by defining frames using these user interface components as a knowledge engineer would when creating a conceptual model of some domain. For Knowtator the frame definitions model the phenomena that the annotation task seeks to capture. As a simple example, the co-reference annotation task that comes with Callisto can be modeled in Protg with two class definitions called markable and chain. The chain class has two slots references and primary_reference which are constrained by facets to have values of type markable. This simple annotation schema can now be used to annotate co-reference phenomena occur- 274 ring in text using Knowtator. Annotations in Knowtator created using this simple annotation schema are shown in Figure 1. A key strength of Knowtator is its ability to relate annotations to each other via the slot definitions of the corresponding annotated classes. In the co-reference example, the slot references of the class chain relates the markable annotations for the text extents the cat and It to the chain annotation. The constraints on the slots ensure that the relationships between annotations are consistent. Protg is capable of representing much more sophisticated and complex conceptual models which can be used, in turn, by Knowtator for text annotation. Also, because Protg is often used to create conceptual models of domains relating to biomedical disciplines, Knowtator is especially well suited for capturing named entities and relations between named entities for those domains. 3.2 Features modified. Annotation data can be exported to a simple XML format. Annotation filters can be used to view a subset of available annotations. This may be important if, for example, viewing only named entity annotations is desired in an annotation project that also contains many part-of-speech annotations. Filters are also used to focus IAA analysis and the export of annotations to XML. Knowtator can be run as a stand-alone system (e.g. on a laptop) without a network connection. For increased scalability, Knowtator can be used with a relational database backend (via JDBC). Knowtator and Protg are provided under the Mozilla Public License 1.1 and are freely available with source code at http://bionlp.sourceforge.net/ Knowtator and http://protege.stanford.edu, respectively. Both applications are implemented in the Java programming language and have been successfully deployed and used in the Windows, MacOS, and Linux environments. In addition to its flexible annotation schema definition capabilities, Knowtator has many other features that are useful for executing text annotation projects. A consensus set creation mode allows on...

Textbooks related to the document above:
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - UPT - 110
University of Pennsylvania ArchivesCLASS DAYUPUPJune18918911p rogramme.Music by Conductor,E.D. Beale's Orchestra. E. D. BEALE.OVERTURE, . MARCH, . .. ." Pique Damme," . . Signal, . Entrance of Class. . . . . . . . . . . . . . .
UPenn - ESE - 319
ESE319 Introduction to MicroelectronicsClass B Output Stage Class B Operation Multisim Simulations Class B Power Efficiency2008 Kenneth R. Laker (based on P. V. Lopresti 2006) updated 05Nov08 KRL1ESE319 Introduction to MicroelectronicsC
UPenn - ESE - 319
ESE319 Introduction to MicroelectronicsClass AB Output Stage Class AB amplifier Operation Multisim Simulations - Operation Class AB amplifier biasing Multisim Simulations - Biasing2008 Kenneth R. Laker (based on P. V. Lopresti 2006) updated
UPenn - CIS - 341
CIS341:CompilersLecture22ThePlan Today: Objects&compilaAonstrategies Project5:FullOATcompiler:objectsandextended typechecking Availableonlinesoon Due:November7that11:59pmCIS341:Compilers2MethodDispatch Idea:everymethodhasitsownsmallin
UPenn - FNCE - 934
University of Pennsylvania The Wharton School Professor Amir YaronEmpirical Research in Finance Oce: Steinberg-Dietrich Hall 2325 Phone: 215-898-1241 Fax: 215-898-6200 Email: yaron@wharton.upenn.edu I. OBJECTIVESFNCE-934, Fall 2008This course i
UPenn - FNCE - 219
UNIVERSITY OF PENNSYLVANIAThe Wharton School International Finance Spring 2008 Professor Amir YaronOffice Hours: Email: Homepage: Telephone: Wednesdays 4:30-5:30 PM at SH-DH 2325 yaron@wharton.upenn.edu http:/savage.wharton.upenn.edu/FNCE-219 898-
UPenn - MONTEREY - 06
Component models for embedded systems: from UML to AutosarFranois Terrierwith contributions from Sylvain Robert, Ansgar Radermacher, Frdric Loiret CEA-List francois.terrier@cea.frDTSIMonterey Workshop, Paris - 2006, October 17 1Local context of
UPenn - PLC - 31
Why cross-linguistic frequency cannot be equated with ease of acquisition Theoretical phonology has often assumed a biunivocal correspondance between ease of acquisition of phonological patterns and cross-linguistic frequency (Stampe, 1973; Tesar and
UPenn - PLC - 31
Morpho-syntactic Approach to Pronominal Binding 1. Introduction and Proposal: In this talk, I present the following novel proposal on the distribution of a bound variable reading for pronouns. (1) a. The availability of a bound variable reading for p
UPenn - PLC - 32
On Deontic Modality in Spanish Mar Biezma, UMass Amherst a THE PROBLEM: Bosque (1980) characterizes the Spanish example in (1) as an imperative. As Bosque notes, the innitive form haber (have.Aux.Inf) in (1) is the most common version of the prescrip
UPenn - PLC - 32
What Differentiates Two Japanese Exhaustive Focus Particles? Sachie Kotani University of Delaware There is more than one exhaustive focus (sensitive) particle meaning only in Japanese. This paper studies two of them, -dake and -bakari, which are both
UPenn - PLC - 33
Pronominal Modifiers Introduction: The semantic classications proposed for relative clauses and nominal modiers have typically been formulated by looking at cases of the form (d)-(adj+)-n-{cp/adjp} (e.g. the Italian students who are linguists). The g
UPenn - PLC - 31
On Slavic semelfactives and secondary imperfectives: implications for the split AspP The Data The Russian semelfactive suffix nu [1] (nou in Czech, na in Polish), rarely discussed in the rich literature on Slavic aspect(Forsyth 1970, Fowler 1994, Bor
UPenn - PLC - 32
Move or Agree? On Partial Control and Parasitic PC effects Anna Snarska Adam Mickiewicz University, PoznaWhile in the history of generative grammar the distinction between Obligatory Control and Non-obligatory Control has been high on the agenda for
UPenn - P - 2
Penn 064-Supplier Brochure5/2/079:46 AMPage 1www.upenn.edu/p2p2007 University of Pennsylvania P2P-SUP-0507P2P Communications University of Pennsylvania 3401 Walnut Street, Suite 421A Philadelphia, PA 19104-6228WelcomeThe Universitys Pr
UPenn - V - 010424
UNIVERSITY of PENNSYLVANIATuesday, April 24, 2001 Volume 47 Number 31 www.upenn.edu/almanac/SAS 2001 Teaching AwardsIra Abrams Award for Distinguished Teaching:Law Schools Levin AwardThe winner of the Law Schools Harvey Levin Award for Excelle
UPenn - V - 47
UNIVERSITY of PENNSYLVANIATuesday, April 24, 2001 Volume 47 Number 31 www.upenn.edu/almanac/SAS 2001 Teaching AwardsIra Abrams Award for Distinguished Teaching:Law Schools Levin AwardThe winner of the Law Schools Harvey Levin Award for Excelle
UPenn - N - 15
University of Pennsylvania Personnel RelationsOffice of Personnel Relations 737 Franklin Building/I6NEWSLETTERDecember 12, 1978A supplement to AlmanacRetirement Age Explained What is the University's retirement age? The General Counsel's Offi
UPenn - N - 25
University of Pennsylvania Personnel RelationsOffice of Personnel Relations 737 Franklin Building/I6NEWSLETTERDecember 12, 1978A supplement to AlmanacRetirement Age Explained What is the University's retirement age? The General Counsel's Offi
UPenn - N - 30
UNIVERSITY of PENNSYLVANIAPhiladelphia,PA 19104An Open Letter to the University CommunityI ask your help, as I did last year, on a matter of urgent concern to all of us: Governor Casey's proposal to eliminate Penn's entire $37.6 million appropria
UPenn - N - 38
UNIVERSITY of PENNSYLVANIAPhiladelphia,PA 19104An Open Letter to the University CommunityI ask your help, as I did last year, on a matter of urgent concern to all of us: Governor Casey's proposal to eliminate Penn's entire $37.6 million appropria
UPenn - N - 23
INSIDENominationsCall:AlicePaulAwards,p.2 SpeakingOut:Safety,MenofPenn,pp.2-3 SexualHarassmentPrograms,p.2 TheFederalBudgetandPenn,pp.4-5 SpringBreakSafetyMeasures,p.5 OfRecord:UniversityCouncilBylaws,pp.
UPenn - N - 41
INSIDENominationsCall:AlicePaulAwards,p.2 SpeakingOut:Safety,MenofPenn,pp.2-3 SexualHarassmentPrograms,p.2 TheFederalBudgetandPenn,pp.4-5 SpringBreakSafetyMeasures,p.5 OfRecord:UniversityCouncilBylaws,pp.
UPenn - N - 23
SPEAKING OUT: Gregorian on Buget Cuts,Rice to Barnes: Cummins on Sack, etc.Published Weekly by the University of Pennsylvania Volume 23, Number 24 March 8, 1977OPENINGS THINGS TO 1)0TRUSTEES MARCH 10: SAMP AND OTHER UPDATESIn the sunshine po
UPenn - N - 24
SPEAKING OUT: Gregorian on Buget Cuts,Rice to Barnes: Cummins on Sack, etc.Published Weekly by the University of Pennsylvania Volume 23, Number 24 March 8, 1977OPENINGS THINGS TO 1)0TRUSTEES MARCH 10: SAMP AND OTHER UPDATESIn the sunshine po
UPenn - N - 21
" LETTERS: Cutbacks (Lesnick) SENATE: SACActions " Financial Present and Wharton Forecast Future: Selectivity and Austerity (Meverson)Volume 21, Number 25March 18, 1975Measuring Excellence (Academic Planning Committee] " Foreign Languages at Pe
UPenn - N - 25
" LETTERS: Cutbacks (Lesnick) SENATE: SACActions " Financial Present and Wharton Forecast Future: Selectivity and Austerity (Meverson)Volume 21, Number 25March 18, 1975Measuring Excellence (Academic Planning Committee] " Foreign Languages at Pe
UPenn - N - 14
Memorandum from the President and the ProvostThe SANW DecisionThe Health Affairs Committee of the Trustees and Members of the University Community From: Martin Meyerson and Eliot Stellar Date: December 6, 1976 Re: The Future of the School of Allie
UPenn - N - 23
Memorandum from the President and the ProvostThe SANW DecisionThe Health Affairs Committee of the Trustees and Members of the University Community From: Martin Meyerson and Eliot Stellar Date: December 6, 1976 Re: The Future of the School of Allie
UPenn - N - 18
IN THIS ISSUE" A Time for Restoration (Meyerson) " SENATE: New Resolution on Sit-in; Revised Report on Reorganization " LETTERS " COUNCIL: ROTC Action Reaffirmed " JUDICIARY " Alumni Weekend " NEWS IN BRIEFThe Seashell and the Clergyman (1928) and
UPenn - N - 34
IN THIS ISSUE" A Time for Restoration (Meyerson) " SENATE: New Resolution on Sit-in; Revised Report on Reorganization " LETTERS " COUNCIL: ROTC Action Reaffirmed " JUDICIARY " Alumni Weekend " NEWS IN BRIEFThe Seashell and the Clergyman (1928) and
UPenn - N - 06
NEWS IN BRIEFOPEN LETTER: HONORARY DEGREE NOMINATIONSThe Committee on Honorary Degrees of the University Council invites members of the faculty, staff and student body to submit the names of persons to whom honorary degrees should be awarded at Com
UPenn - N - 18
NEWS IN BRIEFOPEN LETTER: HONORARY DEGREE NOMINATIONSThe Committee on Honorary Degrees of the University Council invites members of the faculty, staff and student body to submit the names of persons to whom honorary degrees should be awarded at Com
UPenn - N - 15
Published Weekly by the University of Pennsylvania Volume 25, Number 15 December 12, 1978" " " " "News Briefs " Bulletins " Honors " Deaths Faculn' Senate, Council and University Committees Speaking Out: Why Benefits Outreach? A Holiday Shopping
UPenn - N - 25
Published Weekly by the University of Pennsylvania Volume 25, Number 15 December 12, 1978" " " " "News Briefs " Bulletins " Honors " Deaths Faculn' Senate, Council and University Committees Speaking Out: Why Benefits Outreach? A Holiday Shopping
UPenn - N - 14
Tuesday, December 4, 1984Publishedby the Universityof PennsylvaniaVolume 31, Number 14SAS Dean's SearchThe search committee to advise on a Dean of the School of Arts and Sciences has been named, and its target date for filling the post vacat
UPenn - N - 31
Tuesday, December 4, 1984Publishedby the Universityof PennsylvaniaVolume 31, Number 14SAS Dean's SearchThe search committee to advise on a Dean of the School of Arts and Sciences has been named, and its target date for filling the post vacat
UPenn - N - 18
Planning for the '90sFive-Year Academic Plan for the University of PennsylvaniaWith this publication of the University's Five-Year Plan, we have completed the first phase of the campus-wide planning effort initiated two years ago when ten working g
UPenn - N - 37
Planning for the '90sFive-Year Academic Plan for the University of PennsylvaniaWith this publication of the University's Five-Year Plan, we have completed the first phase of the campus-wide planning effort initiated two years ago when ten working g
UPenn - N - 17
Tuesdai; Januarr /0. /984Published hithe Universiti of PennsrlvaniaVolume30, Number 17ATO Mediation Attempt Professor A. Leon Levin, the hearing officer, heard final witnesses January 2 in the University's proceeding against Alpha Tau Omega F
UPenn - N - 30
Tuesdai; Januarr /0. /984Published hithe Universiti of PennsrlvaniaVolume30, Number 17ATO Mediation Attempt Professor A. Leon Levin, the hearing officer, heard final witnesses January 2 in the University's proceeding against Alpha Tau Omega F
UPenn - N - 26
INSIDE" Senate: Statements of Candidates for Chair-Elect, p. 2 " Speaking Out: IDRAs, TA Stipends, ZBT, p.3 " Update, CrimeStats, p. 4 Pullout: CRC's Penn PrintoutTuesday, March 22, 1988 OF RECORD Religious HolidaysThe University's policy on reli
UPenn - N - 34
INSIDE" Senate: Statements of Candidates for Chair-Elect, p. 2 " Speaking Out: IDRAs, TA Stipends, ZBT, p.3 " Update, CrimeStats, p. 4 Pullout: CRC's Penn PrintoutTuesday, March 22, 1988 OF RECORD Religious HolidaysThe University's policy on reli
UPenn - N - 07
-FOR COMMENT Planning for the '90s Five-Year Academic Plan for the University of PennsylvaniaTwo years ago we began a new phase of our academic planning process: an intensive campus-wide effort to think through the University's priorities for the 19
UPenn - N - 37
-FOR COMMENT Planning for the '90s Five-Year Academic Plan for the University of PennsylvaniaTwo years ago we began a new phase of our academic planning process: an intensive campus-wide effort to think through the University's priorities for the 19
UPenn - N - 23
To the University Community: PleaseconsiderSCUEsWhitePaperontheCollegeofArtsandSciencesGeneralRequirement.Thispaperistheproduct ofintenseresearch,debate,andaboveall,hardwork.Wehaveattemptedtoreconcilemanyratherdi
UPenn - N - 40
To the University Community: PleaseconsiderSCUEsWhitePaperontheCollegeofArtsandSciencesGeneralRequirement.Thispaperistheproduct ofintenseresearch,debate,andaboveall,hardwork.Wehaveattemptedtoreconcilemanyratherdi
UPenn - N - 28
Tuesday April 1 /986 . 2 Number32Volume Published by the University of Pensylvai8Three NEH Fellows from PennAmong the nation's 262 National Endowment for the Humanities Fellows for 1986-87 are three from the University: " Dr. Susan Naquin,
UPenn - N - 32
Tuesday April 1 /986 . 2 Number32Volume Published by the University of Pensylvai8Three NEH Fellows from PennAmong the nation's 262 National Endowment for the Humanities Fellows for 1986-87 are three from the University: " Dr. Susan Naquin,
UPenn - N - 41
To the University Community PleaseconsiderSCUEsWhitePaperonUniversityMinorsandMinorPrograms.WepresentthisWhitePaperastheculminationofnearlyayearsresearch,debate,anddevelopment.SCUEhasattemptedtodesignaprogramuniqu
UPenn - N - 19
INSIDE OnMinorityPermanence,p.2 SalaryScalesfor1993,p.3 SpeakingOut:JustCause&CAFRs(Klide); ImproperProcedures&JustCause(Kronfeld); Responses(Marshak,Johnstone,Ferrer, Ramberg,Ross,Clelland,andRoussel),pp.4-7
UPenn - N - 39
INSIDE OnMinorityPermanence,p.2 SalaryScalesfor1993,p.3 SpeakingOut:JustCause&CAFRs(Klide); ImproperProcedures&JustCause(Kronfeld); Responses(Marshak,Johnstone,Ferrer, Ramberg,Ross,Clelland,andRoussel),pp.4-7
UPenn - N - 04
SENATE: Background on Grievance Suspension More Q & A on Labor (Robinson) " DEATHSPublished Weekly by the University of Pennsylvania Volume 24, Number 4 September 27, 1977Report of the Council on Equal Opportunity SPEAKING OUT " OPENINGS " THINGS
UPenn - N - 24
SENATE: Background on Grievance Suspension More Q & A on Labor (Robinson) " DEATHSPublished Weekly by the University of Pennsylvania Volume 24, Number 4 September 27, 1977Report of the Council on Equal Opportunity SPEAKING OUT " OPENINGS " THINGS
UPenn - MATH - 660
Homework 8: Completeness and Constant CurvatureDierential Geometry I due Tuesday, Nov. 251. Chapter 7, problems 8, 11, 12. 2. Chapter 8, problem 1.1
UPenn - AMSI - 2008
AMSI Jan. 2008Partial Differential EquationJerry L. Kazdan[Submit any 5 problems. Due: Wednesday 30 Jan. in class.] 1. Let S1 be the circle so f C1 (S1 ) means that f and its rst derivative are both coneikx tinuous and periodic with period 2.
UPenn - MATH - 114
(Three) Solutions to Exercise 45 in Section 10.3 Lee Kennard September 18, 2008 The problem statement yields two dierential equations: m (mv) = km = gm. (1) (2)If you werent sure how to get started, I want to demonstrate three distinct solutions. (
UPenn - CIS - 570
Introduction to Alias AnalysisLast time Common Subexpression Elimination Partial Redundancy Elimination Today Alias analysisCIS570 Lecture 10Introduction to Alias Analysis2Alias AnalysisGoal: Statically identify aliases Can memory refer
UPenn - CIT - 07
Basic C ElementsVariablesA data item upon which the programmer performs an operation A named space in memory E.g. z, counterChapter 12 Variables and OperatorsOperatorsPredefined actions performed on data items E.g. *, |, |, &, &, +Expression
UPenn - CIT - 593
Basic C ElementsVariablesA data item upon which the programmer performs an operation A named space in memory E.g. z, counterChapter 12 Variables and OperatorsOperatorsPredefined actions performed on data items E.g. *, |, |, &, &, +Expression
UPenn - CIS - 610
Chapter 2Cohomology of (Mostly) Constant Sheaves and Hodge Theory2.1 Real and ComplexLet X be a complex analytic manifold of (complex) dimension n. Viewed as a real manifold, X is a C manifold of dimension 2n. For every x X, we know TX,x is a