Building Knowledge Graphs
Fall 2017
Craig Knoblock
Pedro Szekely
University of Southern California
Knoblock and Szekely
Course Overview
Challenges in Building Knowledge
Graphs
Finding the data
Extracting it fro
Quiz 11 Solution
Consider natural-joining two relations R(A, B) and S(B, C) using the simple sort-based join algorithm
(where input relations are completely sorted first).
Suppose M = 100 pages, B(R) = 5,000 blocks, and B(S) = 20,000 blocks. Assume all me
INF 551 Fall 2016 (Morning)
Quiz 6: Data modeling (10 points)
10 minutes
Consider stores that sell electronic products such as laptops. The following tables record the purchase of
products from the. Note that underline indicates the
INF 551 Fall 2016 (Morning)
Quiz 12: Hadoop MapReduce (10 points)
10 minutes
Consider the WordCount program you have seen in class (with TokenizerMapper and IntSumReducer).
Assume that the Combiner is used which executes the same
INF 551 Fall 2016 (Morning)
Quiz 10: External Sorting (10 points)
10 minutes
Consider external-sorting 10 blocks of integers as shown below. Assume that each block can hold only
two integers. Assume only one buffer page is used in P
Quiz 9 Solution
Consider the following tables. Write a relational-algebra expression (which does NOT contain Cartesian
product) for each of the following questions.
Laptop(SerialNo, OperatingSystem, HardDrive)
Product(SerialNo, Brand, Model, Price)
Store(
INF 551 Fall 2016 (Morning)
Quiz 8: SQL, Constraints, and Views (10 points)
10 minutes
Consider again the tables you have seen in the last quiz. Write an SQL query for each of the following
questions.
Laptop(SerialNo, OperatingSy
Cheatsheet:Scikit Learn
Scikit-Learn is the most popular and widely used library for
machine learning in Python.
Pre-Processing
Function
Description
1
sklearn.preprocessing.StandardScaler
Standardize features by removing the
mean and scaling to unit varia
INF552: Programming Assignment 2
Part 1: Implementation [7 points]
Implement the K-means algorithm AND the Expectation Maximization algorithm for clustering using a
Gaussian Mixture Model (GMM). Run your algorithms on the data file "clusters.txt" using K,
Programming Assignment 1: Decision Trees
Part 1: Implementation [7 points]
Your job in this exercise is to predict whether you will have a good night-out in Jerusalem for the
coming New Year's Eve. Assume that you have kept a record of your previous night
INF552: Programming Assignment 3
Part 1: Implementation [7 points]
PCA (2 points)
Use PCA to reduce the dimensionality of the data points in pca-data.txt from 3D to 2D. Each line of
the data file represents the 3D coordinates of a single point. Please out
Part 2: Software Familiarization [2 points]
Do your own research and find out about a library function that offers a good implementation of the
decision tree algorithm. Learn how to use it. Compare it against your implementation and suggest some
ideas for
INF552: Programming Assignment 4
Part 1: Implementation [7 points]
[2 points] Implement the Perceptron Learning algorithm. Run it on the data file "classification.txt"
ignoring the 5th column. That is, consider only the first 4 columns in each row. The fi
INF552: Programming Assignment 6 [Hidden Markov Models]
Part 1: Implementation [7 points]
The file hmm-data.txt contains a map of a 10-by-10 2D grid-world. The most up-left cell has a
coordinate of (0, 0). The ith row has a x coordinate of i, and the jth
submission is required for each group by one of the group members. Please submit your homework on
BlackBoard (do NOT email the homework to the instructor or the TA).
INF552: Programming Assignment 5 [Neural Networks]
Part 1: Implementation [7 points]
In the directory gestures, there is a set of images1 that display "down" gestures (i.e., thumbs-down
images) or other gestures. In this assignment, you are required to im
Query Execution
INF 551
Wensheng Wu
1
Components of Query Processor
SQL query
SQL query
Parse query
Query
compilation
Metadata
query plan
Query
execution
data
query expression
tree
Select logical
query plan
logical query
plan tree
Select
physical plan
phy
Indexing
INF 551
Wensheng Wu
1
Outline
Types of indexes
B+ trees
2
Indexes
An index is a data structure that speeds up
selections on the search key field(s)
Fields = attributes
Search key = any subset of the fields of a relation
Search key is not th
INF 551 Fall 2016
Homework #5: Hadoop MapReduce & Apache Spark
[100 points]
Due: 11:59pm on 11/29/2016 to Blackboard
1. [40 points] Write a Hadoop MapReduce program LengthCount.java (by modifying
WordCount.java) to compute a histogram on the length of wor