CITATION ANALYSISCitation-based plagiarism detection (CbPD) relies on citation analysis, and is the only approach toplagiarism detection that does not rely on the textual similarity. CbPD examines the citation and referenceinformation in texts to identify similar patterns in the citation sequences. As such, this approach issuitable for scientific texts, or other academic documents that contain citations. Citation analysis to detectplagiarism is a relatively young concept. It has not been adopted by commercial software, but a first
prototype of a citation-based plagiarism detection system exists. Similar order and proximity of citationsin the examined documents are the main criteria used to compute citation pattern similarities.CITATION PATTERNSCitation patterns represent subsequences non-exclusively containing citations shared by the documentscompared. Factors, including the absolute number or relative fraction of shared citations in the pattern, aswell as the probability that citations co-occur in a document are also considered to quantify the patterns’degree of similarity.STYLOMETRYStylometry subsumes statistical methods for quantifying an author’s unique writing style and is mainlyused for authorship attribution or intrinsic plagiarism detection. Detecting plagiarism by authorshipattribution requires checking whether the writing style of the suspicious document, which is writtensupposedly by a certain author, matches with that of a corpus of documents written by the same author.Intrinsic plagiarism detection, on the other hand, uncover plagiarism based on internal evidences in thesuspicious document without comparing it with other documents. This is performed by constructing andcomparing stylometric models for different text segments of the suspicious document, and passages thatare stylistically different from others are marked as potentially plagiarized/infringed. Although they aresimple to extract, character n-grams are proven to be among the best stylometric features for intrinsicplagiarism detection.PERFORMANCEComparative evaluations of content similarity detection systems indicate that their performance dependson the type of plagiarism present (see figure). Except for citation pattern analysis, all detectionapproaches rely on textual similarity. It is therefore symptomatic that detection accuracy decreases themore plagiarism cases are obfuscated.Detection performance of CaPD approaches depending on the type of plagiarism being presentLiteral copies, aka copy and paste (c&p) plagiarism or blatant copyright infringement, or modestlydisguised plagiarism cases can be detected with high accuracy by current external PDS if the source isaccessible to the software. Especially substring matching procedures achieve a good performance for c&pplagiarism, since they commonly use lossless document models, such as suffix trees. The performance ofsystems using fingerprinting or bag of words analysis in detecting copies depends on the information loss
Upload your study docs or become a
Course Hero member to access this document
Upload your study docs or become a
Course Hero member to access this document
End of preview. Want to read all 28 pages?
Upload your study docs or become a
Course Hero member to access this document