TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding

⭐️ To support the hard work, consider leaving a star !

Official Release TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding.

Authors: Itay Etelis*, David Sarne*, Avi Rosenfeld
Institutes: Bar-Ilan University -- The Department of Computer Science

Highlights

30K GPT4-Vision-generated captions.
A superior vision language model specilizing in visual text reasoning, TextGPT4V-7B
An Image Caption Pipeline -- AWS Based, approaching GPT4-Vision's caption capability.

Release

[2023/11/25] TextGPT4V dataset: paper and project page are released!

Todo-List

Release TextGPT4v dataset
Release TextGPT4v Model finetuned LLaVa
Checkpoints of TextGPT4v-7B
GPT4V Prompting AWS Infrastructure

Model Zoo

To be released

Usage

To be released

Data Preparation

Our captions data are available at TextGPT4v in the JSON format.

Acknowledgments

LLaVA: the dataset is constructed in relation to LLaVa, and intentended to be used on this model, it could ofcourse be used on any other VLM.

Citation

If you find our work useful for your research or applications, please cite using this BibTeX:

@misc{chen2023sharegpt4v,
      title={TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding}, 
      author={Itay Etelis and David Sarne and Avi Rosenfels},
      year={2023},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding

Highlights

Release

Todo-List

Model Zoo

Usage

Data Preparation

Acknowledgments

Citation

About

Releases

Packages

Etelis/TextGPT4V

Folders and files

Latest commit

History

Repository files navigation

TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding

Highlights

Release

Todo-List

Model Zoo

Usage

Data Preparation

Acknowledgments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages