This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ally every document consist of 5,000 words in average. More
than 140,000 documents have to be handled by the European patent ofﬁce (EPO)
per year. They are processed by 2,500 patent examiners in three locations.
In several studies the classiﬁcation quality of state-of-the-art methods was
analyzed. Koster et al. (2001) reported very good result with an 3% error rate
for 16,000 full text documents to be classiﬁed in 16 classes (mono-classiﬁcation)
and a 6% error rate in the same setting for abstracts only by using the Winnow
(Littlestone 1988) and the Rocchio algorithm (Rocchio 1971). These results are
possible due to the large amount of available training documents. Good results
are also reported in (Krier & Zacca 2002) for an internal EPO text classiﬁcation
application with a precision of 81 % and an recall of 78 %.
Text clustering techniques for patent analysis are often applied to support
the analysis of patents in large companies by structuring and visualizing the
investigated corpus. Thus, these methods ﬁnd their way in a lot of commercia...
View Full Document
- Summer '11