p313-ercegovac - The TEXTURE Benchmark: Measuring...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS Vuk Ercegovac David J. DeWitt Raghu Ramakrishnan University of Wisconsin, Madison, WI 53706, USA { vuk, dewitt, raghu } @cs.wisc.edu Abstract We introduce a benchmark called TEXTURE (TEXT Under RElations) to measure the rel- ative strengths and weaknesses of combin- ing text processing with a relational workload in an RDBMS. While the well-known TREC benchmarks focus on quality, we focus on effi- ciency. TEXTURE is a micro-benchmark for query workloads, and considers two central text support issues that previous benchmarks did not: (1) queries with relevance ranking, rather than those that just compute all an- swers, and (2) a richer mix of text and re- lational processing, reflecting the trend to- ward seamless integration. In developing this benchmark, we had to address the problem of generating large text collections that re- flected the (performance) characteristics of a given “seed” collection; this is essential for a controlled study of specific data characteris- tics and their effects on performance. In addi- tion to presenting the benchmark, with perfor- mance numbers for three commercial DBMSs, we present and validate a synthetic generator for populating text fields. 1 Introduction As applications emerge that require queries over both text and relations, supporting text as a new data type ( TextType ), has become a focal point for relational database systems. For example, consider an on-line store where each item in the catalog has an associated description and discussion forum. A user may search Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 by relational attributes such as price, product cate- gory, or brand, or by TextTypes such as “description” and “forum content”. Combining the two classes of attribute types, a user may request information about inexpensive systems for graphic design that are viewed positively by users. The items in the result may be sorted according to price, or ranked by a TextType query on descriptions. The topic of integrating TextType into an RDBMS has been widely studied ([20, 11, 17, 8]), and most com- mercial RDBMS’s have integrated TextTypes. How- ever, an application developer currently has no way to assess how a system that stores text in a relational DBMS will perform. In this paper, we propose a new benchmark, called TEXTURE, that compares perfor- mance of query workloads running on relational data- base systems....
View Full Document

This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.

Page1 / 12

p313-ercegovac - The TEXTURE Benchmark: Measuring...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online