32 A Unified Abstraction 24 lists a seamless integration of relational algebra

32 a unified abstraction 24 lists a seamless

This preview shows page 3 - 5 out of 14 pages.

3.2 A Unified Abstraction [24] lists a seamless integration of relational algebra and linear algebra as one of the current open research problems. They highlight the need for a holistic framework that sup- ports both the relational operations required for the feature engineering phase and the linear algebra support needed for the learning algorithms themselves. AIDA accomplishes this via a unified abstraction of data called TabularData , provid- ing both relational and linear algebra support for data sets. TabularData. TabularData objects reside in AIDA, and therefore in the RDBMS address space. They remain in memory beyond individual remote method invocations. Tab- ularData objects can work with both data stored in database tables as well as host language objects such as NumPy ar- rays. Users perform linear algebra and relational operations on a TabularData object using the client API, regardless of whether the actual data set is stored in the database or in NumPy. Behind the scenes, AIDA utilizes the underly- ing RDBMS’s SQL engine to execute relational operations and relies on NumPy to execute linear algebra. When re- quired, AIDA performs data transformations seamlessly be- tween the two systems (see Figure 2) without user involve- ment, and as we will see later, can often accomplish this without the need to copy the actual data. 1402
Image of page 3
RDBMS NumPy Embedded Python Interpreter TabularData Materialize Matrix Linear Algebra Operators + * @ column data Relational Operators Table UDF s DB Table /Resultset NumPyArray virtual columns Figure 2: TabularData Abstraction Linear algebra and relational operations. AIDA cashes in on the influence of contemporary popular systems for its client API. For linear algebra, it simply emulates the syntax and semantics of the statistical package it uses: NumPy. For relational operators, we decided to not use pure SQL as it will make it difficult to provide a seamlessly unified abstraction. Instead, we resort to object-relational map- pings (ORMs) [33], which allow an object-oriented view and method-based access to the data in database tables. While not as sophisticated as SQL, ORMs are fairly versatile. ORMs have shown to be very useful for web-developers, who are fa- miliar with object-oriented programming but not with SQL. ORMs make it easy to query the database from a proce- dural language without having to write SQL or work with the nuances of JDBC/ODBC APIs. By borrowing syntax and semantics from ORM – we mainly based our system on Django’s ORM module, a popular framework in Python [6] – we believe that data scientists who are familiar with Python and NumPy but not so much with SQL, will be at ease writ- ing database queries with AIDA. 3.3 Overview Example Let’s have a look at two very simple code snippets that can be run in a client-based Python interpreter using AIDA’s client API. The first code snippet represents a relational operator as it accesses the supplier table of the TPC-H benchmark [53] to calculate the number of suppliers and their total account balance.
Image of page 4
Image of page 5

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture