GitHub - yuandere/screenshotscribe: Perform batch OCR processing for images, and gather the text into one formatted file with Gemini 1.5 Flash

Screenshot Scribe

Perform batch OCR processing on your images and screenshots, and gather the text into one formatted file with Gemini 1.5 Flash.

About

I take lots of screenshots of things that I want to reference later but almost never actually do. This tool is intended to make accessing that info much easier.

Gemini 1.5 Flash was chosen for the job since it greatly outperforms open-source OCR solutions like Tesseract and EasyOCR, but is quicker and free/cheaper relative to many other multimodal LLMs.

Features

Batch OCR processing of images
Easy to customize system prompt
.json output by default, with optional formatted .md and .docx file types

Getting Started

Prerequisites

You will need to have the following:

Python 3.12 or up
Poetry 1.8 or up
Free API key from Google AI Studio (note usage limits)

Installation

Clone the repo

git clone https://github.com/yuandere/screenshotscribe

Install Poetry packages and start a virtual environment
```
poetry install

poetry shell
```
Create a .env file in the project directory and add your API key
```
GEMINI_API_KEY = XXXXXX
```
Move any images you want processed into the folder /images_to_process
Run
```
screenshotscribe
```

Contributing

Contributions are appreciated! If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
images_to_process		images_to_process
screenshotscribe		screenshotscribe
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screenshot Scribe

About

Features

Getting Started

Prerequisites

Installation

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

yuandere/screenshotscribe

Folders and files

Latest commit

History

Repository files navigation

Screenshot Scribe

About

Features

Getting Started

Prerequisites

Installation

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages