DEIM Forum 2013 A1-6
A Study on Ecient Similar Sentence Matching
Yanhui GU , Zhenglu YANG , Miyuki NAKANO , and Masaru KITSUREGAWA
Institute of Industrial Science, the University of Tokyo
461 Komaba, Meguroku, Tokyo 1538505 Japan
E-mail: cfw_guyanhui,yan
Imashi Wijewickrama
R582V262
b-Bit Minwise Hashing
This paper is about b-bit minwise hashing. Minwise hashing is originally being used in data
resemblance in information retrieval or in data management in social networks or for online
advertising. Accordi
Evaluating approximate sketch counts to show that low frequency words are prone to more ARE error
use only we have only plotted word pairs with count/frequency at most 100.
Where Exact and Predicted denotes values of exact and CM/CU counts respectively; N
Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data
This paper is about a sketch based conditional random sampling (CRS) technique for sparse data.
This method overcomes the challenges in conventiona
Counting Triangles and the Curse of the Last Reducer
Siddharth Suri
Sergei Vassilvitskii
Yahoo! Research
11 W. 40th St, 17th Floor
New York, NY 10018, USA
cfw_suri, sergei@yahoo-inc.com
ABSTRACT
The clustering coecient of a node in a social network is
a f
One Sketch for All: Theory and Application of Conditional Random Sampling
This study is about modifying the Conditional Random Sampling (CRC) method to handle dynamic,
streaming data, since the original version of CRC was crea
Chapter 2
Data Models
Discussion Focus
Although all of the topics covered in this chapter are important, our students have given us consistent
feedback: If you can write precise business rules from a description of
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangle is a graph static that is used in random graph models as in real world graph
applications such as spam detection. To count