6.338_Project_Report_Runmi&Lu

The running time decreases when more processes are

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: t number of processes in Julia were tested. Memory usage saving Different hash functions could result in different memory usage levels. By implementing the 2­pass method and hashing, roughly 76% memory has been saved on average. Computational time saving The running times of single items count, item pairs count, and the total running time are with respect to different numbers of processes are shown in Figure 2. Compared to item pairs counting, single items counting required much less time, and therefore the trend for total time is mainly decided by the running time of item pairs counting. The running time decreases when more processes are used. However, when the number of processes exceeds 5, the overhead from the multi­processing framework exceeds the time saving and the total running time increases when more processes are used. This type of test on a relatively small dataset is important to make decisions about how many processes to use in real large­scale cases. Figure 2: Running time with different number of processes V. Conclusions In this project, we attempted to improve the naive solution algorithm of a realistic problem: finding frequent item pairs in massive transaction datasets. Our methods, though relatively intuitive, are capable of reducing both the required memory space and the running time significantly. This process helped us to better understand the MapReduce process and to appreciate Julia’s extremely easy­to­use parallel computing features. In future researches, Julia will become a strong candidate when we try to decide a scientific computing language to use. There are research papers focusing on more sophisticated methods with rigorous mathematical derivations and proofs. For example: Pasquier, Nicolas, et al. "Discovering frequent closed itemsets for association rules." Database Theory—ICDT’99. Springer Berlin Heidelberg, 1999. 398­416. Zaki, Mohammed J. "Efficiently mining frequent trees in a forest." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002. Bodon, Ferenc. "A trie­based APRIORI implementation for mining frequent item sequences." Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations. ACM, 2005....
View Full Document

This document was uploaded on 02/27/2014 for the course CS 18.337 at MIT.

Ask a homework question - tutors are online