Partitioning large data to scale up lattice-based algorithm

Partitioning large data to scale up lattice-based algorithm...

Info icon This preview shows pages 1–2. Sign up to view the full content.

Partitioning large data to scale up lattice-based algorithm Huaiguo Fu, Engelbert Mephu Nguifo CRIL-CNRS FRE2499, Universit´e d’Artois Rue de l’universit´e SP 16, 62307 Lens cedex. France fu,mephu Abstract Concept lattice is an effective tool and platform for data analysis and knowledge discovery such as classification or association rules mining. The lattice algorithm to build for- mal concepts and concept lattice plays an essential role in the application of concept lattice. We propose a new effi- cient scalable lattice-based algorithm: ScalingNextClosure to decompose the search space of any huge data in some partitions, and then generate independently concepts (or closed itemsets) in each partition. The experimental results show the efficiency of this algorithm. 1. Introduction Concept lattice structure [6, 8] has shown to be an ef- fective tool for data analysis and knowledge discovery. It has been applied to machine learning, data mining and in- formation retrieval, etc. Concept lattice can derive concep- tual structures from data. It studies how objects can be hier- archically grouped together according to their common at- tributes. It can generate formal concepts from the data to reveal the relations between objects and attributes. So con- cept lattice is a natural framework for data mining. Its char- acteristics are very suitable for data mining. For example, a closed itemset (or the intent of a formal concept) is a max- imal itemset for association rules [11]. So the problem of finding frequent itemsets from data for association rules can be reduced to finding frequent closed itemsets with closed itemset lattice or concept lattice. The lattice algorithm to build formal concepts and con- cept lattice plays an essential role in the application of con- cept lattice. Several algorithms were proposed to generate concepts or concept lattices of a data context, for exam- ple: Bordat [2], Ganter (NextClosure algorithm) [5], Chein [3], Norris [9], Godin [7] and Nourine [10], etc. Experimen- tal comparisons of performance of existing algorithms show that NextClosure algorithm is the best for large and dense data [8, 4]. But the problem is that it still takes very high expensive time cost to deal with huge data. So in this pa- per, we propose a new efficient lattice-based algorithm Scal- ingNextClosure that decomposes the search space of any huge data in some partitions, and then generates indepen- dently concepts or closed itemsets in each partition. The new algorithm is a kind of decomposition algorithm of concept lattice. All existing decomposition algorithms for generating concept lattices use an approach of context decomposition, that are different from ours. Our new algo- rithm uses a new method to decompose the search space. It can freely decompose the search space in any set of parti- tions if there are concepts in each partition, and then gener- ate them independently in each partition. So this algorithm can be used to analyze huge data and to generate formal concepts. Moreover for this algorithm, each partition only
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern