Skip to content

Contexi let you interact with entire codebase or data with context using a local LLM on your system.

License

Notifications You must be signed in to change notification settings

AI-Security-Research-Group/contexi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONTEXI


Contexi lets you interact with your entire codebase as a code review co-pilot, using LLM locally.

-----------------------------------------------------

Contexi uses:

  • Multi Prompt Contextually Guided Retrieval-Augmented Generation
  • Self-Critique & Self-Corrective using Chain-of-Thoughts
  • Document Re-Ranking

highly optimized techniques to provide the most relevant context-aware responses to questions about your code/data.

Key Features

✅ Analyzes and understands your entire codebase and data, not just isolated code snippets.
✅ Answers questions about potential security vulnerabilities anywhere in the code.
✅ Import code using git url for analysis.
✅ Learns from follow-up questions and continuously answers based on chat history context
✅ Runs entirely on your local machine for free, No Internet is required.

Web UI

Screenshot 2024-10-01 at 11 34 06 PM

How it works?

🚀 Get started with Wiki

Pre-requisites

  • Ollama - Preferred models: qwen2.5 (for more precise results)
  • Recommended 16 GB RAM and plenty of free disk space
  • Python 3.7+
  • Various Python dependencies (see requirements.txt)

Supported Programming Languege/Data:

  • Tested in Java codebase (You can configure config.yml to load other code/file formats)

Installation

We'd recommend installing app on python virtual environment

  1. Clone this repository:

    git clone https://github.com/AI-Security-Research-Group/Contexi.git
    cd Contexi
    
  2. Install the required Python packages:

    pip install -r requirements.txt
    
  3. Edit config.yml parameters based on your requirements.

  4. Run

    python3 main.py
    

Usage

Upon running main.py just select any of the below options:

(venv) coder@system ~/home/Contexi $

Welcome to Contexi!
Please select a mode to run:
1. Interactive session
2. UI
3. API
Enter your choice (1, 2, or 3): 

You are ready to use the magic stick. 🪄

API Mode

Send POST requests to http://localhost:8000/ask with your questions.

Example using curl:

curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question": "What is the purpose of the Login class?"}'

Response format:

{
  "answer": "The Login class is responsible for handling user authentication..."
}

Open an Issue if you're having problem with running or installing this script. (Script is tested in mac environment.)

Customization

You can customize various aspects of the script:

  • Adjust the chunk_size and chunk_overlap in the split_documents_into_chunks function to change how documents are split.
  • Modify the PROMPT_TEMPLATE to alter how the LLM interprets queries and generates responses.
  • Change the max_iterations in perform_crag to adjust how many times the system will attempt to refine an answer.
  • Modify the num_ctx in initialize_llm to adjust the llm context window for better results.
  • Adjust n_ideas parameter to define the depth of accuracy and completeness you need in the answers.

Troubleshooting

  • If you encounter memory issues, try reducing the chunk_size and num_ctx or the number of documents processed at once.
  • Ensure that Ollama is running and the correct model name is mentioned in config.yml file.

Use Cases

  • Codebase Analysis: Understand and explore large code repositories by asking natural language questions.
  • Security Auditing: Identify potential security vulnerabilities by querying specific endpoints or functions.
  • Educational Tools: Help new developers understand codebases by providing detailed answers to their questions.
  • Documentation Generation: Generate explanations or documentation for code segments. AND MORE..

To-Do List for Contributors

  • Make the important parameters configurable using yaml file
  • Drag and drop folder in UI for analysis
  • Scan source folder and suggest file extension to be analyzed
  • Make config.yml configurable in UI
  • Session based chat to switch context on each new session
  • Persistant chat UI interface upon page refresh
  • Add only most recent response in history context
  • Implement tree-of-thoughts concept
  • Create web interface
  • Integrate the repository import feature which imports repo locally automatically to perform analysis

Security Workflow (To-Do)

  • Use Semgrep to identify potential vulnerabilities based on patterns.
  • Pass the identified snippets to a data flow analysis tool to determine if the input is user-controlled.
  • Provide the LLM with the code snippet, data flow information, and any relevant AST representations.
  • Ask the LLM to assess the risk based on this enriched context.
  • Use the LLM's output to prioritize vulnerabilities, focusing on those where user input reaches dangerous functions.
  • Optionally, perform dynamic analysis or manual code review on high-risk findings to confirm exploitability.

Contributing

Contributions to Contexi are welcome! Please submit pull requests or open issues on the GitHub repository.

Acknowledgments

Releases

No releases published

Packages

No packages published

Languages