Skip to content

Latest commit

 

History

History
153 lines (110 loc) · 6.63 KB

README.md

File metadata and controls

153 lines (110 loc) · 6.63 KB

CONTEXI


Contexi lets you interact with your entire codebase as a code review co-pilot, using LLM locally.

-----------------------------------------------------

Contexi uses:

  • Multi Prompt Contextually Guided Retrieval-Augmented Generation
  • Self-Critique & Self-Corrective using Chain-of-Thoughts
  • Document Re-Ranking

highly optimized techniques to provide the most relevant context-aware responses to questions about your code/data.

Key Features

✅ Analyzes and understands your entire codebase and data, not just isolated code snippets.
✅ Answers questions about potential security vulnerabilities anywhere in the code.
✅ Import code using git url for analysis.
✅ Learns from follow-up questions and continuously answers based on chat history context
✅ Runs entirely on your local machine for free, No Internet is required.

Web UI

Screenshot 2024-10-01 at 11 34 06 PM

How it works?

🚀 Get started with Wiki

Pre-requisites

  • Ollama - Preferred models: qwen2.5 (for more precise results)
  • Recommended 16 GB RAM and plenty of free disk space
  • Python 3.7+
  • Various Python dependencies (see requirements.txt)

Supported Programming Languege/Data:

  • Tested in Java codebase (You can configure config.yml to load other code/file formats)

Installation

We'd recommend installing app on python virtual environment

  1. Clone this repository:

    git clone https://github.com/AI-Security-Research-Group/Contexi.git
    cd Contexi
    
  2. Install the required Python packages:

    pip install -r requirements.txt
    
  3. Edit config.yml parameters based on your requirements.

  4. Run

    python3 main.py
    

Usage

Upon running main.py just select any of the below options:

(venv) coder@system ~/home/Contexi $

Welcome to Contexi!
Please select a mode to run:
1. Interactive session
2. UI
3. API
Enter your choice (1, 2, or 3): 

You are ready to use the magic stick. 🪄

API Mode

Send POST requests to http://localhost:8000/ask with your questions.

Example using curl:

curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question": "What is the purpose of the Login class?"}'

Response format:

{
  "answer": "The Login class is responsible for handling user authentication..."
}

Open an Issue if you're having problem with running or installing this script. (Script is tested in mac environment.)

Customization

You can customize various aspects of the script:

  • Adjust the chunk_size and chunk_overlap in the split_documents_into_chunks function to change how documents are split.
  • Modify the PROMPT_TEMPLATE to alter how the LLM interprets queries and generates responses.
  • Change the max_iterations in perform_crag to adjust how many times the system will attempt to refine an answer.
  • Modify the num_ctx in initialize_llm to adjust the llm context window for better results.
  • Adjust n_ideas parameter to define the depth of accuracy and completeness you need in the answers.

Troubleshooting

  • If you encounter memory issues, try reducing the chunk_size and num_ctx or the number of documents processed at once.
  • Ensure that Ollama is running and the correct model name is mentioned in config.yml file.

Use Cases

  • Codebase Analysis: Understand and explore large code repositories by asking natural language questions.
  • Security Auditing: Identify potential security vulnerabilities by querying specific endpoints or functions.
  • Educational Tools: Help new developers understand codebases by providing detailed answers to their questions.
  • Documentation Generation: Generate explanations or documentation for code segments. AND MORE..

To-Do List for Contributors

  • Make the important parameters configurable using yaml file
  • Drag and drop folder in UI for analysis
  • Scan source folder and suggest file extension to be analyzed
  • Make config.yml configurable in UI
  • Session based chat to switch context on each new session
  • Persistant chat UI interface upon page refresh
  • Add only most recent response in history context
  • Implement tree-of-thoughts concept
  • Create web interface
  • Integrate the repository import feature which imports repo locally automatically to perform analysis

Security Workflow (To-Do)

  • Use Semgrep to identify potential vulnerabilities based on patterns.
  • Pass the identified snippets to a data flow analysis tool to determine if the input is user-controlled.
  • Provide the LLM with the code snippet, data flow information, and any relevant AST representations.
  • Ask the LLM to assess the risk based on this enriched context.
  • Use the LLM's output to prioritize vulnerabilities, focusing on those where user input reaches dangerous functions.
  • Optionally, perform dynamic analysis or manual code review on high-risk findings to confirm exploitability.

Contributing

Contributions to Contexi are welcome! Please submit pull requests or open issues on the GitHub repository.

Acknowledgments