Skip to content

Ollama VLMs not working with LlamaBot StructuredBot #218

@ericmjl

Description

@ericmjl

I'm encountering issues when trying to use Ollama Vision Language Models (VLMs) with LlamaBot's StructuredBot. The following code example fails:

from pydantic import BaseModel, Field
from typing import List
import llamabot as lmb
from pyprojroot import here

class Item(BaseModel):
    name: str = Field(description="Name of the item on the receipt.")
    quantity: int = Field(description="Number of items purchased", default=1)
    amount: float = Field(description="Total amount for this item.")

class Receipt(BaseModel):
    items: List[Item]
    total_amount: float = Field(description="Total amount paid.")

receipt_bot = lmb.StructuredBot(
    system_prompt="You are a skilled OCR bot for receipts.",
    pydantic_model=Receipt,
    # Ollama models should fail
    model_name="ollama_chat/qwen2.5vl:32b",
    api_base="https://ericmjl--ollama-service-ollamaservice-server.modal.run",
)

receipt = receipt_bot(
    lmb.user(here() / "notebooks" / "assets" / "receipt.png")
)

The issue appears to be specific to using Ollama VLMs with StructuredBot for image processing tasks. The code fails when attempting to process images through the Ollama VLM endpoint.

Expected behavior: The StructuredBot should successfully process the receipt image and return structured data according to the Receipt model.

Actual behavior: The operation fails (specific error details to be added).

Environment:

  • LlamaBot version: [current version]
  • Ollama model: qwen2.5vl:32b
  • Python version: [current version]

This may be related to how LlamaBot handles image inputs with Ollama VLM endpoints or how the structured output is processed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions