🌟 LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search

This project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named LoomRAG, that leverages OpenAI's CLIP model for neural cross-modal image retrieval and semantic search, and OpenAI's Whisper model for audio processing. The system allows users to input text queries, images, or audio to retrieve multimodal responses seamlessly through vector embeddings. It features a comprehensive annotation interface for creating custom datasets and supports CLIP model fine-tuning with configurable parameters for domain-specific applications. The system also supports uploading images, PDFs, and audio files (including real-time recording) for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface.

Experience the project in action:

📸 Implementation Screenshots


Data Upload Page	Data Search / Retrieval


Data Annotation Page	CLIP Fine-Tuning

✨ Features

🔄 Cross-Modal Retrieval: Search text to retrieve both text and image results using deep learning
🖼️ Image-Based Search: Search the database by uploading an image to find similar content
🧠 Embedding-Based Search: Uses OpenAI's CLIP, Whisper and SentenceTransformer's Embedding Models for embedding the input data
🎯 CLIP Fine-Tuning: Supports custom model training with configurable parameters including test dataset split size, learning rate, optimizer, and weight decay
🔨 Fine-Tuned Model Integration: Seamlessly load and utilize fine-tuned CLIP models for enhanced search and retrieval
📤 Upload Options: Allows users to upload images, PDFs and audio files for AI-powered processing and retrieval
🎙️ Audio Integration: Upload audio files or record audio directly through the interface
🔗 URL Integration: Add images directly using URLs and scrape website data including text and images
🕷️ Web Scraping: Automatically extract and index content from websites for comprehensive search capabilities
🏷️ Image Annotation: Enables users to annotate uploaded images through an intuitive interface
🔍 Augmented Text Generation: Enhances text results using LLMs for contextually rich outputs
🌐 Streamlit Interface: Provides a user-friendly web interface for interacting with the system

🗺️ Roadmap

Fine-tuning CLIP for domain-specific datasets
Image-based search and retrieval
Adding support for audeo modalities
Adding support for video modalities
Improving the re-ranking system for better contextual relevance
Enhanced PDF parsing with semantic section segmentation

🏗️ Architecture Overview

Data Indexing:
- Text, images, and PDFs are preprocessed and embedded using the CLIP model
- Embeddings are stored in a vector database for fast and efficient retrieval
- Support for direct URL-based image indexing and website content scraping
Query Processing:
- Text queries / image-based queries are converted into embeddings for semantic search
- Uploaded images, audio files and PDFs are processed and embedded for comparison
- The system performs a nearest neighbor search in the vector database to retrieve relevant text, images, and audio
Response Generation:
- For text results: Optionally refined or augmented using a language model
- For image results: Directly returned or enhanced with image captions
- For audio results: Returned with relevant metadata and transcriptions where applicable
- For PDFs: Extracts text content and provides relevant sections
Image Annotation:
- Dedicated annotation page for managing uploaded images
- Support for creating and managing multiple datasets simultaneously
- Flexible annotation workflow for efficient data labeling
- Dataset organization and management capabilities
Model Fine-Tuning:
- Custom CLIP model training on annotated images
- Configurable training parameters for optimization
- Integration of fine-tuned models into the search pipeline

🚀 Installation

Clone the repository:

git clone https://github.com/NotShrirang/LoomRAG.git
cd LoomRAG

Create a virtual environment and install dependencies:
```
pip install -r requirements.txt
```

📖 Usage

Running the Streamlit Interface:
- Start the Streamlit app:
```
streamlit run app.py
```
- Access the interface in your browser to:
  - Submit natural language queries
  - Upload images or PDFs to retrieve contextually relevant results
  - Upload or record audio files
  - Add images using URLs
  - Scrape and index website content
  - Search using uploaded images
  - Annotate uploaded images
  - Fine-tune CLIP models with custom parameters
  - Use fine-tuned models for improved search results
Example Queries:
- Text Query: "sunset over mountains"
  Output: An image of a sunset over mountains along with descriptive text
- PDF Upload: Upload a PDF of a scientific paper
  Output: Extracted key sections or contextually relevant images

⚙️ Configuration

📊 Vector Database: It uses FAISS for efficient similarity search
🤖 Model: Uses OpenAI CLIP for neural embedding generation
✍️ Augmentation: Optional LLM-based augmentation for text responses
🎛️ Fine-Tuning: Configurable parameters for model training and optimization

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes.

📄 License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.devcontainer		.devcontainer
config		config
data_annotations		data_annotations
data_search		data_search
data_upload		data_upload
model_finetuning		model_finetuning
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
utils.py		utils.py
vectordb.py		vectordb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search

📸 Implementation Screenshots

✨ Features

🗺️ Roadmap

🏗️ Architecture Overview

🚀 Installation

📖 Usage

⚙️ Configuration

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Releases 7

Languages

License

NotShrirang/LoomRAG

Folders and files

Latest commit

History

Repository files navigation

🌟 LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search

📸 Implementation Screenshots

✨ Features

🗺️ Roadmap

🏗️ Architecture Overview

🚀 Installation

📖 Usage

⚙️ Configuration

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Languages