Giter Site home page Giter Site logo

detect-ai-text-easily's Introduction

Streamlit Blog Code AI Words File

Detect AI Text by Just Looking at it

Abstract of a Research Paper written using ChatGPT

ChatGPT often generates words that may require a dictionary for understanding, or it comes up with words that just sound magical. This isn’t only true for ChatGPT, other open-source language models like Mistral do the same. There’s no harm in seeking assistance from AI to create content, as long as it’s done ethically, but in a science-writing competition for 14–16 year-olds, a judge got suspicious when he saw the phrase “Labyrinthian mazes” in an essay, which seemed too advanced for a teenager writing. So, he used AI tools to check it. Unfortunately, all four tools gave the same result, almost the entire essay, around 90–96%, seemed to be written by AI, not a human. However, not all of us are professionals, If we see the above phrase, we may have skipped it due to our limited awareness.

There is a need for critical thinking skills to identify if AI is the author

The easiest way to spot AI-generated text is by checking for words that you don’t usually use but are common for ChatGPT. Consider a massive corpus of over 19 billion English words from blogs, articles, news, and more, updated daily from 2010 to now. I looked for the word **“delve” **using a string search algorithm, and it showed up 52,388 times. I plot its yearly pattern and identified an unusual behavior, a ~200% growth in its appearance on the internet from 2022, the same year when ChatGPT was released on November 30th.

Trend of Delve word occurrence in NOW Corpus (by Fareed Khan)

Other words, like **“intricacies” **or “unwavering”, also shows a similar increase, just like “delve”. They’re being used more often lately.

Trend of intricacies and unwavering in NOW Corpus (by Fareed Khan)

This choice of vocabulary is not necessarily something that AI exclusively uses, as humans also use a diverse range of words. Although, in academic writing, we often use phrases like “explore” or “discuss in more detail” instead of “delve”. I ask ChatGPT to rephrase “discuss in more detail …”, ****the initial five suggestions it provides typically include this phrase.

Rephrasing using ChatGPT

Moreover, I try to analyze the arXiv database, a famous publishing papers platform containing more than 2 million papers in it up to 2023. I try to detect the word** “delve” **in the papers abstracts and plot its yearly pattern. I was amazed to see that this word has been widely used in the papers abstracts in the year 2023, the same word that ChatGPT suggested in its top 5 suggestions.

Trend of Delve word occurrence in arXiv Database (by Fareed Khan)

This indicates that academic writers may be using ChatGPT, either for rephrasing or generating content. The presence of the word “delve” serves as a hint or a doubt that the document submitted from a student or an online blog, either that paragraph or that portion of text, has been rephrased or enhanced using ChatGPT.

Drawing upon my research expertise and two years of experience working with LLMs, I’ve put together a pretty comprehensive list of 100 words you can keep an eye out for in a piece of text to help you figure out if it’s been generated or paraphrased using AI.

But checking for such number of words is not an easy job so to achieve it quickly, I made a web app that quickly checks your text. Just upload your file or paste your text, and it’ll do the rest. Easy peasy!

Hope you enjoy the read!

detect-ai-text-easily's People

Contributors

fareedkhan-dev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.