TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding
⭐️ To support the hard work, consider leaving a star !
Official Release TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding.
- Authors: Itay Etelis*, David Sarne*, Avi Rosenfeld
- Institutes: Bar-Ilan University -- The Department of Computer Science
- 30K GPT4-Vision-generated captions.
- A superior vision language model specilizing in visual text reasoning, TextGPT4V-7B
- An Image Caption Pipeline -- AWS Based, approaching GPT4-Vision's caption capability.
[2023/11/25] TextGPT4V dataset: paper and project page are released!
- Release TextGPT4v dataset
- Release TextGPT4v Model finetuned LLaVa
- Checkpoints of TextGPT4v-7B
- GPT4V Prompting AWS Infrastructure
To be released
To be released
Our captions data are available at TextGPT4v in the JSON format.
- LLaVA: the dataset is constructed in relation to LLaVa, and intentended to be used on this model, it could ofcourse be used on any other VLM.
If you find our work useful for your research or applications, please cite using this BibTeX:
@misc{chen2023sharegpt4v,
title={TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding},
author={Itay Etelis and David Sarne and Avi Rosenfels},
year={2023},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CV}
}