V2I800139 - Copy.pdf

# Suppose we are given a document set d of n documents

• 5

This preview shows pages 2–5. Sign up to view the full content.

feature clustering algorithm is proposed to deal with these issues. Suppose, we are given a document set D of n documents d 1 , d 2 , . . . , d n , together with the feature vector W of m words w 1 , w 2 , . . . , w m and p classes c 1 , c 2 , . . . , c p , as specified. We construct one word pattern for each word in W. For word w i , its word pattern x i is defined, similarly as in [27], by X i =< X i1 , X i2 ,…………. X in > = < P(C1/Wi), P(C2/Wi ……………., P(Cn/Wi > for i j p. Note that d qi indicates the number of occurrences of w i in document d q Also, qj is defined as either 1 or 0 Therefore, we have m word patterns in total.

This preview has intentionally blurred sections. Sign up to view the full version.

Sainani et al., International Journal of Advanced Research in Computer Science and Software Engineering 2 (8), August- 2012, pp. 258-262 © 2012, IJARCSSE All Rights Reserved Page | 260 Fig 1: Flowdiagram1 Self-Constructing Clustering Our clustering algorithm is an incremental, self-constructing learning approach. Word patterns are considered one by one. The user does not need to have any idea about the number of clusters in advance. No clusters exist at the beginning, and clusters can be created if necessary. For each word pattern, the similarity of this word pattern to each existing cluster is calculated to decide whether it is combined into an existing cluster or a new cluster is created. Once a new cluster is created, the corresponding membership function should be initialized. On the contrary, when the word pattern is combined into an existing cluster, the membership function of that cluster should be updated accordingly. Fig 2: flowdiagram2 Let k be the number of currently existing clusters. The clusters are G 1 , G 2 , . . .,G k , respectively. Each cluster G j has mean m j= mj1,mj2,…..mjn and deviation σ j = < σ j1 σ j 2 ……….. j σm j > . Let S j be the size of cluster G j . Initially, we have k =0. So, no clusters exist at the beginning. For each word pattern X i =< X i1 , X i2 ,…………. X in > we calculate the similarity of x i to each existing clusters, i.e., Fig 3 :basic main formula from , [40] for 1 j k. We say that xi passes the similarity test on cluster G j if μGJ(Xi) where p, 0 p 1, is a predefined threshold. If the user intends to have larger clusters, then he/she can give a smaller threshold. Otherwise, a bigger threshold can be given. As the threshold increases, the number of clusters also increases. Note that, as usual, the power in above function is 2 [34], [35]. Its value has an effect on the number of clusters obtained. A larger value will make the boundaries of the Gaussian function sharper, and more clusters will be obtained for a given threshold. Pre-processing Document set Read each document Construct word Pattern Get feature vector Remove stopword&do streaming Self-constructing clustering Reading off all word patterns Compute the similarity Generate temp word Compare with cluster Generate new cluster
Sainani et al., International Journal of Advanced Research in Computer Science and Software Engineering 2 (8), August- 2012, pp. 258-262 © 2012, IJARCSSE All Rights Reserved Page | 261 Feature Extraction Word patterns have been grouped into clusters, and words in the feature vector W are also clustered accordingly. For

This preview has intentionally blurred sections. Sign up to view the full version.

This is the end of the preview. Sign up to access the rest of the document.
• Fall '16
• FIX
• Machine Learning, The Land, International Journal of Advanced Research in Computer Science and Software Engineering, IEEE Trans, Text Classification, Sainani Arpitha

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern