-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 421.image caption generator benchmark and added its data in bench… #218
base: master
Are you sure you want to change the base?
Conversation
…marks-data submodule Signed-off-by: Abhishek Kumar <abhishek22512@gmail.com>
WalkthroughThe changes introduce multiple new files for an image captioning benchmark within a subproject. This includes configuration files, input generation functionality, and an implementation for generating captions using pre-trained models. Additionally, requirements files are added to specify necessary Python packages. The subproject reference has also been updated to a new commit in the version control system. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- benchmarks-data (1 hunks)
- benchmarks/700.image/701.image-captioning/config.json (1 hunks)
- benchmarks/700.image/701.image-captioning/input.py (1 hunks)
- benchmarks/700.image/701.image-captioning/python/function.py (1 hunks)
- benchmarks/700.image/701.image-captioning/python/requirements.txt (1 hunks)
Files skipped from review due to trivial changes (2)
- benchmarks-data
- benchmarks/700.image/701.image-captioning/config.json
benchmarks/700.image/701.image-captioning/python/requirements.txt
Outdated
Show resolved
Hide resolved
Benchmark PR data is here spcl/serverless-benchmarks-data#4 |
benchmarks/400.inference/421.image-captioning/python/function.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Abhishek Kumar <abhishek22512@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- benchmarks/400.inference/421.image-captioning/config.json (1 hunks)
- benchmarks/400.inference/421.image-captioning/input.py (1 hunks)
- benchmarks/400.inference/421.image-captioning/python/function.py (1 hunks)
- benchmarks/400.inference/421.image-captioning/python/requirements.txt (1 hunks)
Files skipped from review due to trivial changes (1)
- benchmarks/400.inference/421.image-captioning/config.json
Additional context used
Ruff
benchmarks/400.inference/421.image-captioning/python/function.py
34-34: Undefined name
os
(F821)
Additional comments not posted (1)
benchmarks/400.inference/421.image-captioning/python/requirements.txt (1)
1-3
: Verify Python version compatibility.The listed package versions need to be verified for compatibility with the Python versions used in this project.
input_files = [] | ||
for ext in ['*.jpg', '*.jpeg', '*.png']: | ||
input_files.extend(glob.glob(os.path.join(data_dir, ext))) | ||
|
||
if not input_files: | ||
raise ValueError("No input files found in the provided directory.") | ||
|
||
for file in input_files: | ||
img = os.path.relpath(file, data_dir) | ||
upload_func(0, img, file) | ||
|
||
input_config = { | ||
'object': { | ||
'key': img, | ||
'width': 200, | ||
'height': 200 | ||
}, | ||
'bucket': { | ||
'bucket': benchmarks_bucket, | ||
'input': input_paths[0], | ||
'output': output_paths[0] | ||
} | ||
} | ||
|
||
return input_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix variable scope and enhance error handling.
The variable img
is used outside its defining loop, which could lead to unexpected behavior if no files are found. Consider defining img
outside the loop or handling this case explicitly.
Apply this diff to address the scope issue and enhance error handling:
-def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
+def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func):
input_files = []
for ext in ['*.jpg', '*.jpeg', '*.png']:
input_files.extend(glob.glob(os.path.join(data_dir, ext)))
if not input_files:
raise ValueError("No input files found in the provided directory.")
+ img = None # Define img outside the loop to ensure it's available later
for file in input_files:
img = os.path.relpath(file, data_dir)
upload_func(0, img, file)
if img is None:
raise ValueError("No valid image files processed.")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
input_files = [] | |
for ext in ['*.jpg', '*.jpeg', '*.png']: | |
input_files.extend(glob.glob(os.path.join(data_dir, ext))) | |
if not input_files: | |
raise ValueError("No input files found in the provided directory.") | |
for file in input_files: | |
img = os.path.relpath(file, data_dir) | |
upload_func(0, img, file) | |
input_config = { | |
'object': { | |
'key': img, | |
'width': 200, | |
'height': 200 | |
}, | |
'bucket': { | |
'bucket': benchmarks_bucket, | |
'input': input_paths[0], | |
'output': output_paths[0] | |
} | |
} | |
return input_config | |
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func): | |
input_files = [] | |
for ext in ['*.jpg', '*.jpeg', '*.png']: | |
input_files.extend(glob.glob(os.path.join(data_dir, ext))) | |
if not input_files: | |
raise ValueError("No input files found in the provided directory.") | |
img = None # Define img outside the loop to ensure it's available later | |
for file in input_files: | |
img = os.path.relpath(file, data_dir) | |
upload_func(0, img, file) | |
if img is None: | |
raise ValueError("No valid image files processed.") |
import datetime | ||
import io | ||
from urllib.parse import unquote_plus | ||
from PIL import Image | ||
import torch | ||
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer | ||
from . import storage | ||
|
||
# Load the pre-trained ViT-GPT2 model | ||
model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | ||
image_processor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | ||
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | ||
|
||
model.eval() | ||
|
||
client = storage.storage.get_instance() | ||
|
||
def generate_caption(image_bytes): | ||
image = Image.open(io.BytesIO(image_bytes)).convert("RGB") | ||
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values | ||
|
||
with torch.no_grad(): | ||
generated_ids = model.generate(pixel_values, max_length=16, num_beams=4) | ||
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) | ||
|
||
return generated_text | ||
|
||
def handler(event): | ||
bucket = event.get('bucket').get('bucket') | ||
input_prefix = event.get('bucket').get('input') | ||
key = unquote_plus(event.get('object').get('key')) | ||
|
||
download_begin = datetime.datetime.now() | ||
img = client.download_stream(bucket, os.path.join(input_prefix, key)) | ||
download_end = datetime.datetime.now() | ||
|
||
process_begin = datetime.datetime.now() | ||
caption = generate_caption(img) | ||
process_end = datetime.datetime.now() | ||
|
||
download_time = (download_end - download_begin) / datetime.timedelta(microseconds=1) | ||
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1) | ||
|
||
return { | ||
'result': { | ||
'caption': caption, | ||
}, | ||
'measurement': { | ||
'download_time': download_time, | ||
'download_size': len(img), | ||
'compute_time': process_time | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined name and approve model usage.
The code uses os
without importing it, which will cause a runtime error. The implementation of the model loading and image processing is correctly done.
Fix the missing import:
+import os
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import datetime | |
import io | |
from urllib.parse import unquote_plus | |
from PIL import Image | |
import torch | |
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer | |
from . import storage | |
# Load the pre-trained ViT-GPT2 model | |
model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
image_processor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
model.eval() | |
client = storage.storage.get_instance() | |
def generate_caption(image_bytes): | |
image = Image.open(io.BytesIO(image_bytes)).convert("RGB") | |
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values | |
with torch.no_grad(): | |
generated_ids = model.generate(pixel_values, max_length=16, num_beams=4) | |
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) | |
return generated_text | |
def handler(event): | |
bucket = event.get('bucket').get('bucket') | |
input_prefix = event.get('bucket').get('input') | |
key = unquote_plus(event.get('object').get('key')) | |
download_begin = datetime.datetime.now() | |
img = client.download_stream(bucket, os.path.join(input_prefix, key)) | |
download_end = datetime.datetime.now() | |
process_begin = datetime.datetime.now() | |
caption = generate_caption(img) | |
process_end = datetime.datetime.now() | |
download_time = (download_end - download_begin) / datetime.timedelta(microseconds=1) | |
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1) | |
return { | |
'result': { | |
'caption': caption, | |
}, | |
'measurement': { | |
'download_time': download_time, | |
'download_size': len(img), | |
'compute_time': process_time | |
} | |
} | |
import datetime | |
import io | |
from urllib.parse import unquote_plus | |
from PIL import Image | |
import torch | |
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer | |
from . import storage | |
+import os | |
# Load the pre-trained ViT-GPT2 model | |
model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
image_processor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning") | |
model.eval() | |
client = storage.storage.get_instance() | |
def generate_caption(image_bytes): | |
image = Image.open(io.BytesIO(image_bytes)).convert("RGB") | |
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values | |
with torch.no_grad(): | |
generated_ids = model.generate(pixel_values, max_length=16, num_beams=4) | |
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) | |
return generated_text | |
def handler(event): | |
bucket = event.get('bucket').get('bucket') | |
input_prefix = event.get('bucket').get('input') | |
key = unquote_plus(event.get('object').get('key')) | |
download_begin = datetime.datetime.now() | |
img = client.download_stream(bucket, os.path.join(input_prefix, key)) | |
download_end = datetime.datetime.now() | |
process_begin = datetime.datetime.now() | |
caption = generate_caption(img) | |
process_end = datetime.datetime.now() | |
download_time = (download_end - download_begin) / datetime.timedelta(microseconds=1) | |
process_time = (process_end - process_begin) / datetime.timedelta(microseconds=1) | |
return { | |
'result': { | |
'caption': caption, | |
}, | |
'measurement': { | |
'download_time': download_time, | |
'download_size': len(img), | |
'compute_time': process_time | |
} | |
} |
Tools
Ruff
34-34: Undefined name
os
(F821)
This Pull Request introduces a new
Image Caption Generation benchmark
for SEBS, implemented using Python. This benchmark uses a hugging face modelnlpconnect/vit-gpt2-image-captioning
. Link to model https://huggingface.co/nlpconnect/vit-gpt2-image-captioning and its license under Apache 2.0 https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.mdcc @mcopik
Summary by CodeRabbit
New Features
Bug Fixes