This project is a web application built using ReactJS for the frontend and Python for the backend. It aims to detect whether a given text or prompt is authored by a human or generated by an AI model. The application participates in the Voight-Kampff Generative AI Authorship Verification 2024 challenge.
- AI vs Human Detection: Detects whether a text is written by a human or generated by an AI.
- Machine Learning Models: Utilizes various machine learning models for classification.
- ReactJS Frontend: User-friendly interface built with ReactJS.
- Python Backend: Backend processing with Python to handle AI detection logic.
A lightweight Python web framework used to build web applications. It provides features for routing HTTP requests and generating responses. Flask-CORS:
A Flask extension that handles Cross-Origin Resource Sharing (CORS), enabling the server to respond to requests from different origins.
A deep learning library used to load and run models. It is used here to run the GPT-2 model and perform computations such as calculating Perplexity.
The GPT2LMHeadModel and GPT2TokenizerFast classes from the transformers library are used to load the GPT-2 language model and tokenizer. GPT-2 is a pre-trained language model that can generate and analyze text.
Used for text processing tasks such as extracting valid characters or splitting sentences into lines.
Used to maintain the order of results in a dictionary, making it easier to track and display the analysis steps.
The core functionality of the code revolves around using the GPT-2 model to calculate Perplexity and Burstiness of a given text. Here is how it works:
A measure of how well a model predicts a given sentence. Lower perplexity indicates the text is more predictable and likely to be written by a human, while higher perplexity suggests it may have been generated by AI. The algorithm uses the GPT-2 model to compute the Perplexity score by evaluating the negative log likelihood (NLL) of each word in the input text, which is then used to calculate Perplexity.
This measures the variation in Perplexity across different lines in the input text. Higher variation or burstiness can indicate unusual text patterns, often found in AI-generated text.
The code uses a threshold-based decision system to categorize the text: If the Perplexity score is below a threshold (60), the text is likely AI-generated. If it falls between 60 and 80, the text is deemed "most likely AI," but it needs more text for better judgment. If the Perplexity is above 80, the text is likely to be human-written.
The code exposes a POST route ('/') where a user can submit a JSON payload with a text field. The system processes the text using the GPT2PPL class, calculates Perplexity and Burstiness, and then returns a response with a label ("AI-generated" or "Human-written") and the computed values.
The code can be used in scenarios where there is a need to determine whether a given text was written by a human or generated by an AI model. This can be useful for:
Content authenticity checks: To identify whether an article, blog post, or essay was written by a human or AI.
AI detection in education: To detect if students have submitted AI-generated text as their own work.
Content moderation: To flag AI-generated content in social media or online forums.
Modules used: Flask, Flask-CORS, PyTorch, transformers, regex, and OrderedDict.
Algorithms used: Perplexity (text predictability), Burstiness (variation in sentence predictability), and thresholding for labeling the text as AI or human-written.
Use case: AI vs. human text detection, content authenticity verification.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/AkashKobal/Generative-AI-Detection.git cd Generative-AI-Detection
-
Install frontend dependencies:
cd frontend npm install
-
Install backend dependencies:
cd ../server pip install -r requirements.txt
-
Start the backend server:
python app.py
-
Start the frontend server:
npm start
- Open your browser and go to
http://localhost:3000
. - Upload a pair of texts (one human-written and one AI-generated).
- Click on the "Analyse" button to see the results.
- The dataset used in this project consists of a collection of human-written and AI-generated texts.
- Texts are analyzed to calculate metrics like Perplexity and Burstiness to determine their likely origin (AI or human).
- Perplexity: Measures how well the AI model predicts the next word in a text. Lower perplexity suggests human authorship, while higher perplexity suggests AI generation.
- Burstiness: Measures the variation in perplexity across different lines of text. Higher burstiness often indicates AI-generated text.
- Reorganized the content under appropriate headings.
- Added a Data section for clarity on the dataset.
- Reformatting of steps in Installation to improve clarity.
Contributions are welcome! If you have suggestions for improvements or new features, please fork the repository and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.