Plagiarism encompasses the use of ideas, concepts, words, or structures without appropriately acknowledging the source to benefit in a setting where originality is expected. Academic plagiarism is a serious problem that harms society and the scientific process, because it distorts the mechanisms for tracing and correcting results. In the worst case, academic plagiarism can jeopardize lives, e.g., if medical or pharmaceutical studies are plagiarized and wrong findings affect later research or practical applications.
Plagiarism Detection is an information retrieval task supported by specialized information retrieval systems, called plagiarism detection systems. Today’s available plagiarism detection systems exclusively perform literal text string comparisons. These systems capably identify copies, but often fail to detect disguised plagiarism, such as paraphrases, translations, or idea plagiarism. The weakness of current systems results in a large fraction of today’s disguised forms of plagiarism going undetected.
The challenge of identifying semantic similarity of documents to detect disguised forms of plagiarism has attracted intense research that spans several fields including information retrieval, computational linguistics, big data management and analysis, information visualization.
- N. Meuschke and B. Gipp. State of the Art in Detecting Academic Plagiarism. International Journal for Educational Integrity, 9 (1): 50–71, June 2013. (pdf: http://www.ojs.unisa.edu.au/index.php/IJEI/article/view/847)
- S. M. Alzahrani, N. Salim, and A. Abraham. Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. In IEEE Transactions on Systems, Man, and Cybernetic–Part C: Applications and Reviews, volume 42, pages 133–149, Mar. 2012. doi:10.1109/TSMCC.2011.2134847. (pdf: http://isda01.softcomputing.net/smcc2011.pdf)
- C. D. Manning, P. Raghavan, and H. Schütze. An Introduction to Information Retrieval. Cambridge University Press, Cambridge, England 2009. (free online edition: http://www-nlp.stanford.edu/IR-book/)
Topic-specific literature will be distributed during the initial meeting.
The seminar can serve as a starting point for a topically related bachelor’s or master’s project and thesis. For current project and theses proposals visit:http://www.isg.uni-konstanz.de/students-corner/
1. presentation (30 min)
2. term paper (8-10 pages per person, ACM style)
Group work is possible.
Seminar participants will explore current research approaches that tackle the challenges of identifying semantic similarity and detecting disguised plagiarism.
Participants will pick a research topic from a pool of suggestions that will be provided or will pick a topic according to their own interests. For their topic, the participants will give an overview of research relevant to the topic in a presentation during the seminar (30 min) and a term paper (8 - 10 pages per person, ACM style) due at the end of the seminar. Topic-specific literature suggestions will be provided during the initial meeting. Additional independent literature research and independent working by seminar participants is expected.
Seminar participants will gain an overview of the state-of-the-art technologies for plagiarism detection and their individual strengths and weaknesses. They will be able to describe the current research trends and challenges in plagiarism detection, as well as the predominant approaches for tackling these research challenges.
Each participant will perform an in-depth literature review on one current approach for identifying semantic similarity between documents and how this approach can be applied to identifying plagiarism. The participants will present their findings in an academic paper and a 30 minute long presentation during one of the seminar sessions. Through this process, which the lecturers supervise and guide, the participants will train their ability to:
- find, organize, and systematically read relevant research papers
- analyze, compare, and contrast research approaches and findings
- structure, write, and format an academic paper
- present their work using appropriate presentation techniques and presentation aids
- answer questions and discuss their work with peers
By successfully completing the seminar, participants will achieve valuable preparation in terms of the knowledge and methodological skills required to successfully complete a bachelor’s or master’s project related to identifying semantic document similarity (e.g. for plagiarism detection), as well as for many other information retrieval tasks.
Total workload: 4 ECTS = 120 hours