Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIL Image conversion issues with Gemini API Parts #5033

Open
xuefei-wang opened this issue Jan 14, 2025 · 1 comment
Open

PIL Image conversion issues with Gemini API Parts #5033

xuefei-wang opened this issue Jan 14, 2025 · 1 comment
Labels
awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster needs-triage

Comments

@xuefei-wang
Copy link

xuefei-wang commented Jan 14, 2025

What happened?

I encountered type compatibility issues when trying to pass images to Gemini API. The images were converted into PIL Images, but not to Part, therefore causing issues.

I created this conversion function (see below) that works for my use case and added it to autogen/oai/gemini.py. Just wanted to post it in case anyone needs it.

What did you expect to happen?

Error message:

TypeError: Parameter to MergeFrom() must be instance of same class: expected Part got PIL.PngImagePlugin.PngImageFile.

How can we reproduce it (as minimally and precisely as possible)?

from dotenv import load_dotenv

load_dotenv()

import os
from autogen import UserProxyAgent
from autogen.agentchat.contrib.multimodal_conversable_agent import (
    MultimodalConversableAgent,
)


visual_critic_agent = MultimodalConversableAgent(
    "visual_critic_agent",
    llm_config={
        "config_list": [
            {
                "model": "gemini-1.5-flash",
                "api_key": os.environ["GEMINI_API_KEY"],
                "api_type": "google",
            }
        ],
        "cache_seed": None,
    },
)

user_agent = UserProxyAgent(
    "user_agent", human_input_mode="ALWAYS", max_consecutive_auto_reply=0
)


user_agent.initiate_chat(
    visual_critic_agent,
    message="""Please tell me what is in this image?
    <img https://goldenmeadowsretrievers.com/wp-content/uploads/2023/08/golden-retriever-dog-with-newborn-golden-retriever.jpg>
""",
)

AutoGen version

0.4.1

Which package was this bug in

Core

Model used

gemini

Python version

No response

Operating system

No response

Any additional info you think would be helpful for fixing this bug

def _pil_to_part(image: Image.Image) -> Part:
    byte_arr = BytesIO()
    image.save(byte_arr, format=image.format or 'PNG')
    image_bytes = byte_arr.getvalue()
    
    blob = Blob(
        mime_type=f"image/{image.format.lower() if image.format else 'png'}", 
        data=image_bytes
    )
    
    return Part(inline_data=blob)


def _convert_pil_images_in_parts(curr_parts):
    """
    Converts any PIL Images in a list of parts to Part objects while preserving other parts.
    
    Args:
        curr_parts: List of mixed content (PIL Images and Parts)
        
    Returns:
        List where all PIL Images have been converted to Parts
    """
    updated_parts = []
    for part in curr_parts:
        if isinstance(part, Image.Image):
            updated_parts.append(_pil_to_part(part))
        else:
            updated_parts.append(part)
    return updated_parts
@ekzhu
Copy link
Collaborator

ekzhu commented Jan 14, 2025

Thanks for the issue.

I believe this has already been fixed in 0.4.1. The code you are showing is using 0.2 API.

Would you like to submit a fix to the 0.2 package?

Make sure you are using autogen-agentchat and autogen-ext. See readme.

@ekzhu ekzhu added the awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-op-response Issue or pr has been triaged or responded to and is now awaiting a reply from the original poster needs-triage
Projects
None yet
Development

No branches or pull requests

2 participants