Programming_the_Semantic_Web.pdf - Download at Boykma.Com...

This preview shows page 1 out of 300 pages.

Unformatted text preview: Download at Boykma.Com Programming the Semantic Web Download at Boykma.Com Download at Boykma.Com Programming the Semantic Web Toby Segaran, Colin Evans, and Jamie Taylor Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Download at Boykma.Com Programming the Semantic Web by Toby Segaran, Colin Evans, and Jamie Taylor Copyright © 2009 Toby Segaran, Colin Evans, and Jamie Taylor. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles ( ). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected] Editor: Mary E. Treseler Production Editor: Sarah Schneider Copyeditor: Emily Quill Proofreader: Sarah Schneider Indexer: Seth Maislin Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: July 2009: First Edition. O’Reilly and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Programming the Semantic Web, the image of a red panda, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-0-596-15381-6 [M] 1246569738 Download at Boykma.Com Table of Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part I. Semantic Data 1. Why Semantics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Data Integration Across the Web Traditional Data-Modeling Methods Tabular Data Relational Data Evolving and Refactoring Schemas Very Complicated Schemas Getting It Right the First Time Semantic Relationships Metadata Is Data Building for the Unexpected “Perpetual Beta” 4 5 6 7 9 11 12 14 16 16 17 2. Expressing Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 An Example: Movie Data Building a Simple Triplestore Indexes The add and remove Methods Querying Merging Graphs Adding and Querying Movie Data Other Examples Places Celebrities Business 21 23 23 24 25 26 28 29 29 31 33 v Download at Boykma.Com 3. Using Semantic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A Simple Query Language Variable Binding Implementing a Query Language Feed-Forward Inference Inferring New Triples Geocoding Chains of Rules A Word About “Artificial Intelligence” Searching for Connections Six Degrees of Kevin Bacon Shared Keys and Overlapping Graphs Example: Joining the Business and Places Graphs Querying the Joined Graph Basic Graph Visualization Graphviz Displaying Sets of Triples Displaying Query Results Semantic Data Is Flexible 37 38 40 43 43 45 47 50 50 51 53 53 54 55 55 56 57 59 Part II. Standards and Sources 4. Just Enough RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 What Is RDF? The RDF Data Model URIs As Strong Keys Resources Blank Nodes Literal Values RDF Serialization Formats A Graph of Friends N-Triples N3 RDF/XML RDFa Introducing RDFLib Persistence with RDFLib SPARQL SELECT Query Form OPTIONAL and FILTER Constraints Multiple Graph Patterns CONSTRUCT Query Form vi | Table of Contents Download at Boykma.Com 63 64 64 65 66 68 68 69 70 72 73 76 80 83 84 86 87 89 91 ASK and DESCRIBE Query Forms SPARQL Queries in RDFLib Useful Query Modifiers 91 92 94 5. Sources of Semantic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Friend of a Friend (FOAF) Graph Analysis of a Social Network Linked Data The Cloud of Data Are You Your FOAF file? Consuming Linked Data Freebase An Identity Database RDF Interface Freebase Schema MQL Interface Using the metaweb.py Library Interacting with Humans 97 101 105 106 107 110 116 117 118 119 121 123 125 6. What Do You Mean, “Ontology”? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 What Is It Good For? A Contract for Meaning Models Are Data An Introduction to Data Modeling Classes and Properties Modeling Films Reifying Relationships Just Enough OWL Using Protégé Creating a New Ontology Editing an Ontology Just a Bit More OWL Functional and Inverse Functional Properties Inverse Properties Disjoint Classes Keepin’ It Real Some Other Ontologies Describing FOAF A Beer Ontology This Is Not My Beautiful Relational Schema! 127 128 128 129 129 132 134 135 140 140 141 145 146 146 146 148 148 148 149 152 7. Publishing Semantic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Embedding Semantics 155 Table of Contents | vii Download at Boykma.Com Microformats RDFa Yahoo! SearchMonkey Google’s Rich Snippets Dealing with Legacy Data Internet Video Archive Tables and Spreadsheets Legacy Relational Data RDFLib to Linked Data 156 158 160 161 162 162 167 169 172 Part III. Putting It into Practice 8. Overview of Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Sesame Using the Sesame Java API RDFS Inferencing in Sesame A Servlet Container for the Sesame Server Installing the Sesame Web Application The Workbench Adding Data SPARQL Queries REST API Other RDF Stores Jena (Open Source) Redland (Open Source) Mulgara (Open Source) OpenLink Virtuoso (Commercial and Open Source) Franz AllegroGraph (Commercial) Oracle (Commercial) SIMILE/Exhibit A Simple Exhibit Page Searching, Filtering, and Prettier Views Linking Up to Sesame Timelines 183 184 193 196 196 197 199 200 202 203 204 204 204 204 205 205 205 206 209 211 212 9. Introspecting Objects from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 RDFObject Examples RDFObject Framework How RDFObject Works 215 217 225 10. Tying It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 A Job Listing Application viii | Table of Contents Download at Boykma.Com 227 Application Requirements Job Listing Data Converting to RDF Loading the Data into Sesame Serving the Website CherryPy Mako Page Templates A Generic Viewer Getting Data from Sesame The Generic Template Getting Company Data Crunchbase Yahoo! Finance Reconciling Freebase Connections Specialized Views Publishing for Others RDFa RDF/XML Expanding the Data Locations Geography, Economy, Demography Sophisticated Queries Visualizing the Job Data Further Expansion 228 228 228 231 232 232 233 234 236 236 237 238 241 243 244 248 248 250 251 251 252 253 255 258 Part IV. Epilogue 11. The Giant Global Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Vision, Hype, and Reality Participating in the Global Graph Community Releasing Data into the Commons License Considerations The Data Cycle Bracing for Continuous Change 262 264 265 266 267 268 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Table of Contents | ix Download at Boykma.Com Download at Boykma.Com Foreword Some years back, Tim Berners-Lee opined that we would know that the semantic web was becoming a success when people stopped asking “why?” and started asking “how?”—the same way they did with the World Wide Web many years earlier. With this book, I finally feel comfortable saying we have turned that corner. This book is about the “how”—it provides the tools a programmer needs to get going now! This book’s approach to the semantic web is well matched to the community that is most actively ready to start exploiting these new web technologies: programmers. More than a decade ago, researchers such as myself started playing with some of the ideas behind the semantic web, and from about 1999 to 2005, significant research funding went into the field. The “noise” from all those researchers sometimes obscured the fact that the practical technology spinning off of this research was not rocket science. In fact, that technology, which you will read about in this book, has been maturing extremely well, and it is now becoming an important component of the web developer’s toolkit. In 2000 and 2001, articles about the semantic web started to appear in the memespace of the Web. Around 2005, we started to see not just small companies in the space, but some bigger players like Oracle embracing the technology. Late in 2006, John Markoff wrote a New York Times article referring to “Web 3.0,” and more developers started to take a serious look at the semantic web—and they liked what they saw. This developer community has helped create the tools and technologies so that, here in 2009, we’re starting to see a real take-off happening. Announcements of different uses of semantic web and related technologies are appearing on an almost daily basis. Semantic web technologies are being used by the Obama administration to provide transparency to government data, a move also being explored by many other governments around the world. Google and Yahoo! now collect and process embedded RDFa from web documents, and Microsoft recently discussed some of its semantic efforts in language-based web applications. Web 3.0 applications are attracting the sorts of user numbers that brought the early Web 2.0 apps to public attention, while a bunch of innovative startups you may not have heard of yet are exploring how to bring semantic technologies into an ever-widening range of web applications. xi Download at Boykma.Com With all this excitement, however, has come an obvious problem. There are now a lot more people asking “how?”, but since this technology is just coming into its own, there aren’t many people who know how to answer the question. Where the early semantic web evangelists like me have gotten pretty good at explaining the vision to a wide range of people, including database administrators, government employees, industrialists, and academics, the questions being asked lately have been harder and harder to address. When the CTO of a Fortune 500 company asks me why he should pay attention to the technology, I can’t wait to answer. However, when his developer asks me how best to find the appropriate objects for the predicates expressed in some embedded RDFa, or how the bindings of a BNode in the OPTIONAL clause of a SPARQL query work, I know that I’m soon going to be out of my depth. With the publication of this book, however, I can now point to it and say, “The answer’s in there.” The hole in the literature about how to make the semantic web work from the programmer’s viewpoint has finally been filled. This book also addresses another important need. Given that the top of the semantic web “layer cake” (see Chapter 11) is still in the research world, there’s been a lot of confusion. On one hand, terms like “Linked Data” and “Web 3.0” are being used to describe the immediately applicable and rapidly expanding technology that is needed for web applications today. Meanwhile, people are also exploring the “semantic web 2.0” developments that will power the next generation. This book provides an easy way for the reader to tell the “practical now” from the pie in the sky. Finally, I like this book for another reason: it embraces a philosophy I’ve often referred to as “a little Semantics goes a long way.”* On the Web, a developer doesn’t need to be a philosopher, an AI researcher, or a logician to understand how to make the semantic web work for him. However, figuring out just how much knowledge is enough to get going is a real challenge. In this book, Toby, Jamie, and Colin will show you “just enough RDF” (Chapter 4) and “just enough OWL” (Chapter 6) to allow you, the programmer, to get in there and start hacking. In short, the technologies are here, the tools are ready, and this book will show you how to make it all work for you. So what are you waiting for? The future of the Web is at your fingertips. —Jim Hendler Albany, NY March 2009 * xii | Foreword Download at Boykma.Com Preface Like biological organisms, computers operate in complex, interconnected environments where each element of the system constrains the behavior of many others. Similar to predator-prey relationships, applications and the data they consume tend to follow co-evolutionary paths. Cumulative changes in an application eventually require modification to the data structures on which it operates. Conversely, when enhancements to a data source emerge, the structures for expressing the additional information generally force applications to change. Unfortunately, because of the significant efforts involved, this type of lock-step evolution tends to dampen enhancements in both applications and data sources. At their core, semantic technologies decouple applications from data through the use of a simple, abstract model for knowledge representation. This model releases the mutual constraints on applications and data, allowing both to evolve independently. And by design, this degree of application-data independence promotes data portability. Any application that understands the model can consume any data source using the model. It is this level of data portability that underlies the notion of a machine-readable semantic web. The current Web works well because we as humans are very flexible data processors. Whether the information on a web page is arranged as a table, an outline, or a multipage narrative, we are able to extract the important information and use it to guide further knowledge discovery. However, this heterogeneity of information is indecipherable to machines, and the wide range of representations for data on the Web only compounds the problem. If the diversity of information available on the Web can be encoded by content providers into semantic data structures, any application could access and use the rich array of data we have come to rely on. In this vision, data is seamlessly woven together from disparate sources, and new knowledge is derived from the confluence. This is the vision of the semantic web. Now, whether an application can do anything interesting with this wealth of data is where you, the developer, come into the story! Semantic technologies allow you to focus on the behavior of your applications instead of on the data processing. What does this system do when given new data sources? How can it use enhanced data models? How does the user experience improve when multiple data sources enrich one another? xiii Download at Boykma.Com Partitioning low-level data operations from knowledge utilization allows you to concentrate on what drives value in your application. While the vision of the semantic web holds a great deal of promise, the real value of this vision is the technology that it has spawned for making data more portable and extensible. Whether you’re writing a simple “mashup” or maintaining a highperformance enterprise solution, this book provides a standard, flexible approach for integrating and future-proofing systems and data. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Programming the Semantic Web by Toby Segaran, Colin Evans, and Jamie Taylor. Copyright 2009 Toby Segaran, Colin Evans, and Jamie Taylor, 978-0-596-15381-6.” xiv | Preface Download at Boykma.Com If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at [email protected] Safari® Books Online When you see a Safari® Books Online icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf. Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at . How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: To comment or ask technical questions about this book, send email to: [email protected] For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our website at: The authors have established a website as a community resource for demonstrating practical approaches to semantic technology. You can access this site at: Preface | xv Download at Boykma.Com Download at Boykma.Com PART I Semantic Data Download at Boykma.Com Download at Boykma.Com CHAPTER 1 Why Semantics? Natural language is amazing. Without any effort you can ask a stranger how to find the nearest coffee shop; you can share your knowledge of music and martini making with your community of friends; you can go to the library, pick up a book, and learn from an author who lived hundreds of years ago. It is hard to imagine a better API for knowledge. As a simple example, think about the following two sentences. Both are of the form “subject-verb-object,” one of the simplest possible grammatical structures: 1. Colin enjoys mushrooms. 2. Mushrooms scare Jamie. Each of these sentences represents a piece of information. The words “Jamie” and “Colin” refer to specific people, the word “mushroom” refers to a class of organisms, and the words “enjoys” and “scare” tell you the relationship between the person and the organism. Because you know from previous experience what the verbs “enjoy” and “scare” mean, and you’ve probably seen a mushroom before, you’re able to understand the two sentences. And now that you’ve read them, you...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture