Giter Site home page Giter Site logo

martinwang2014 / ac2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from manuel-freire/ac2

0.0 0.0 0.0 854 KB

Source code plagiarism detection tool

License: GNU General Public License v3.0

C 1.04% Java 86.65% Pascal 0.05% ANTLR 12.16% HTML 0.08% Dockerfile 0.02%

ac2's Introduction

AC

Read the wiki for a step-by-step example of an analysis, and documentation on how to perform the most common tasks in AC2.

Introduction

AC is a source code plagiarism detection tool. It aids instructors and graders to detect plagiarism within a group of assignments written in languages such as C, C++, Java, PHP, XML, Python, ECMAScript, Pascal or VHDL (plaintext works too, but is less precise). AC incorporates multiple similarity detection algorithms found in the scientific literature, and allows their results to be visualized graphically.

Manuel Freire, the main author, is currently teaching Computer Science at the Universidad Complutense de Madrid. AC is still being used in other Spanish universities (UAM, URJC) and abroad (UTorino), although this list is by no means either complete or exhaustive. By moving to Github, the project expects to gain contributors and ensure future development.

There are other plagiarism detection tools, such as JPlag, Moss, or Plaggie. AC is, in this context

  • local, and does not require sending data to remote servers (not the case of Moss)
  • robust, using Normalized Compression Distance as its main measure-of-similarity, but with the possibility of integrating other, additional measures to gain better pictures of what is going on (JPlag, Moss and Plaggie have hard-coded analyses, mostly based on sub-string matching after tokenization).
  • heavy on information visualization. AC will not provide "percentage of copy"; instead, it will create graphical representations of the degree of similarity between student submissions within a group, so instructors can build their own explanations regarding what really happened (see here and here for papers on AC's visualizations).

Installing and running the program

You will need a Java Runtime Environment installed (JRE 9 or above for versions 2.2+, JRE 8 for prior versions). You can download the latest Java JRE from Oracle's website, although OpenJDK will work just as well.

Once you have Java installed, simply download the latest release (look for the latest one that has a .jar file available), and either double-click it (assuming the JRE is correctly installed), or use the command-line to execute it via java -jar name-of-jar

Building the source

If you want to build the binaries yourself or change the code, you will also need to download Apache Maven.

To build everything,

  • run mvn install.
  • the executable jar-file will be stored in ac-ui/target/ac-version-githash.jar (note that version is set in pom.xml, and githash is calculated by checking the commit in git). There are other, non-default entry-points (for instance, without UI or skipping the assignment selection phase).

Pull requests and issues are very welcome.

Docker

You can also run this with docker. You will need an X-server to use as the display.

docker build -t ac2 . docker run -it --rm -e DISPLAY=<x-server-host>:0.0 -v <local_path>:<container_path> ac2

Adding support for programming languages

AC2 uses Antlr4 grammars to generate lexers (= tokenizers) and parsers for languages. This makes adding support for new languages a breeze: you only need to plug in a good Antlr4 grammar.

Currently, AC supports

To add support for tokenizing more programming languages, place the grammar file (.g4) into the grammars, and update the AntlrTokenizerFactory so that files with your chosen extensions are parsed using the corresponding parsers and lexers.

License and code-structure

The code is split into 4 modules:

  • ac-lexers: contains source-code lexers and parsers. The lexers and parsers are generated by Antlr4 from their .g4 grammars.
  • ac-core: contains the main similarity-detection engine. Depends on the ac-lexers to compare token-streams instead of raw text. Use of token-streams greatly reduces comparison noise due to extraneous comments, or differences in whitespace or identifier names.
  • clover is used as a graph-layout library. Uses JGraphT and JGraph for graph representation and rendering. Note that JGraph is now known as GraphMX; clover relies on an old version.
  • ac-ui provides the user interface, and relies on all other AC modules. It is entirely possible to build a command-line tool to run comparisons without any interface.

All modules of AC are licensed under the GPLv3.

The direct dependencies of each module have their own licences - check the output of mvn dependency:tree (after running mvn install on the project) for a full list of transitive dependencies.

History

AC was born in the Escuela Politécnica Superior of the Universidad Autónoma de Madrid to deter and detect source-code plagiarism in programming assignments.

Notable versions:

  • 1.0: a collection of csh scripts, written by an unnamed teacher at EPS/UAM
  • 1.1 (March 2006): Manuel Freire rewrites it in bash, using a single script, and using graphviz for graph visualization
  • 1.2: Manuel Cebrián and Juan del Rosal join the team, and the code is rewritten in Java, with clover used for graph visualization instead of graphviz. Widespread use in UAM. Old website (http://tangow.ii.uam.es/ac).
  • 1.8 (Oct 2010): Successfully used to uncover gene mutation paths in that year's VAST analytics contest (although a custom test was added).
  • 2.0 (Oct 2016): Switched to Maven, Antlr4, moved to github

ac2's People

Contributors

manuel-freire avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.