Skip to content

Latest commit

 

History

History
98 lines (69 loc) · 2.78 KB

README.md

File metadata and controls

98 lines (69 loc) · 2.78 KB

Logo

Screenshot Scribe

Perform batch OCR processing on your images and screenshots, and gather the text into one formatted file with Gemini 1.5 Flash.

About

I take lots of screenshots of things that I want to reference later but almost never actually do. This tool is intended to make accessing that info much easier.

Gemini 1.5 Flash was chosen for the job since it greatly outperforms open-source OCR solutions like Tesseract and EasyOCR, but is quicker and free/cheaper relative to many other multimodal LLMs.

Features

  • Batch OCR processing of images
  • Easy to customize system prompt
  • .json output by default, with optional formatted .md and .docx file types

Getting Started

Prerequisites

You will need to have the following:

Installation

  1. Clone the repo
    git clone https://github.com/yuandere/screenshotscribe
  2. Install Poetry packages and start a virtual environment
    poetry install
    
    poetry shell
  3. Create a .env file in the project directory and add your API key
    GEMINI_API_KEY = XXXXXX
    
  4. Move any images you want processed into the folder /images_to_process
  5. Run
    screenshotscribe
    

Contributing

Contributions are appreciated! If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Acknowledgments