Coupet hehenberger 1998 do not only apply clustering

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ally every document consist of 5,000 words in average. More than 140,000 documents have to be handled by the European patent office (EPO) per year. They are processed by 2,500 patent examiners in three locations. In several studies the classification quality of state-of-the-art methods was analyzed. Koster et al. (2001) reported very good result with an 3% error rate for 16,000 full text documents to be classified in 16 classes (mono-classification) and a 6% error rate in the same setting for abstracts only by using the Winnow (Littlestone 1988) and the Rocchio algorithm (Rocchio 1971). These results are possible due to the large amount of available training documents. Good results are also reported in (Krier & Zacca 2002) for an internal EPO text classification application with a precision of 81 % and an recall of 78 %. Text clustering techniques for patent analysis are often applied to support the analysis of patents in large companies by structuring and visualizing the investigated corpus. Thus, these methods find their way in a lot of commercia...
View Full Document

Ask a homework question - tutors are online