Giter Site home page Giter Site logo

plagarism-detector's Introduction

Plagarism-Detector

The project is implemented using Java as a programming language. The plagiarism detector helps to determine if contents of two files are matching upto an extent or not.

Steps:

  1. Clasifying Documnet:
  • The given document are classified into two types i.e Code file(Containing code) or Normal Text file based on the content of file.
  • If common coding keywords are detected in the file it is classified as code file or else it is considered as a normal text file.
  1. Pre processing of data:
  • Based on type of file classified preprocessing is performed.
  • If the given file is Code file, brackets and other mathematical keywords are removed.
  • If the given file is Normal Text file, the whole string is converted into lower case, numbers, brackets and website refrences are removed to make comparison easier and shorter.
  • Then the common stop words used in enlish language are removed from both the files to reduce the repetetion of less significant words.
  1. Frequency Map:
  • I have implement a frequency map for each document to store frequency of common words.
  • The algorithm creates two seperated hash maps and store frequency of words and then compares the two hash maps to detemin common frequency of words.
  • The frequency map is use to calculate the similiarity between given two documents.
  1. Longest Common Subsequence
  • After testing common string comparision algorithms like Edit Distance, Rabin Karp, Knoth-Morris-Pratt and Cosine distance, i was able to generate most favourable results using LCS Algorithm.
  • The LCS algorithm compares string character by character and stores the longest common sub sequence present in both the given files inot a matrix.
  • With combination of LCS and Frequency Map, I was able to get satisfactory results on the given data set for a threshold value of 55.
  • If the socre is greater than 55 the program returns 1 i.e Plagiarism found esle 0 i.e plagiarism not found.

plagarism-detector's People

Contributors

nipunh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.