-
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
scipy-sprintsIssues for SciPy sprintsIssues for SciPy sprints
Description
I'm encountering issues when trying to use Ollama Vision Language Models (VLMs) with LlamaBot's StructuredBot. The following code example fails:
from pydantic import BaseModel, Field
from typing import List
import llamabot as lmb
from pyprojroot import here
class Item(BaseModel):
name: str = Field(description="Name of the item on the receipt.")
quantity: int = Field(description="Number of items purchased", default=1)
amount: float = Field(description="Total amount for this item.")
class Receipt(BaseModel):
items: List[Item]
total_amount: float = Field(description="Total amount paid.")
receipt_bot = lmb.StructuredBot(
system_prompt="You are a skilled OCR bot for receipts.",
pydantic_model=Receipt,
# Ollama models should fail
model_name="ollama_chat/qwen2.5vl:32b",
api_base="https://ericmjl--ollama-service-ollamaservice-server.modal.run",
)
receipt = receipt_bot(
lmb.user(here() / "notebooks" / "assets" / "receipt.png")
)The issue appears to be specific to using Ollama VLMs with StructuredBot for image processing tasks. The code fails when attempting to process images through the Ollama VLM endpoint.
Expected behavior: The StructuredBot should successfully process the receipt image and return structured data according to the Receipt model.
Actual behavior: The operation fails (specific error details to be added).
Environment:
- LlamaBot version: [current version]
- Ollama model: qwen2.5vl:32b
- Python version: [current version]
This may be related to how LlamaBot handles image inputs with Ollama VLM endpoints or how the structured output is processed.
Metadata
Metadata
Assignees
Labels
scipy-sprintsIssues for SciPy sprintsIssues for SciPy sprints