plagiarismDetector

This is a command-line program that performs plagiarism detection using a N- tuple comparison algorithm allowing for synonyms in the text. The program takes in 3 required arguments, and one optional. In other cases such as no arguments, the program prints out usage instructions.

file name for a list of synonyms
input file 1
input file 2
(optional) the number N, the tuple size. If not supplied, the default should be N=3.

The synonym file has lines each containing one group of synonyms. For example a line saying "run sprint jog" means these words should be treated as equal.

The input files are declared plagiarized based on the number of N-tuples in file1 that appear in file2, where the tuples are compared by accounting for synonyms as described above. For example, the text "go for a run" has two 3-tuples, ["go for a", "for a run"] both of which appear in the text "go for a jog".

The output of the program is the percent of tuples in file1 which appear in file2. So for the above example, the output would be one line saying "100%". In another example, for texts "go for a run" and "went for a jog" and N=3 we would output "50%" because only one 3-tuple in the first text appears in the second one.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.metadata		.metadata
.settings		.settings
RemoteSystemsTempFiles		RemoteSystemsTempFiles
bin		bin
src/com/detector		src/com/detector
test		test
.classpath		.classpath
.project		.project
README.md		README.md
algorithm.txt		algorithm.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plagiarismDetector

About

Releases

Packages

Languages

doneria-anjali/plagiarism-detector

Folders and files

Latest commit

History

Repository files navigation

plagiarismDetector

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages