AgenticNER

This repository hosts source code for my master's thesis 'Collaborative Multi-Agent Architecture for Domain-Agnostic Named Entity Recognition'. Instructions to run the benchmarks and reproduce the results from thesis can be found below.

Installation

Requirements:
- Python 3.12+

Install Poetry:

curl -sSL https://install.python-poetry.org | python3 -

Create a virtual environment and install the dependencies

poetry install

Activate the virtual environment

poetry shell

Running Benchmarks

The repo includes a command-line interface to run various benchmarks and evaluate different variants of AgenticNER.

Prerequisites

Get API key from Anthropic and export the API key

export ANTHROPIC_API_KEY="<your api key>"

Get API key from tavily.com to enable internet access for research agent and export the API key

export TAVILY_API_KEY="<your api key>"

Set Anthropic API key in config.yaml to start LiteLLM proxy to create OpenAI compatible interface for Anthropic models. This is required by AutoGen.
Start LiteLLM proxy in separate command line session

poetry run litellm --config config.yaml

Export LiteLLM server address as OpenAI base URL

export OPENAI_BASE_URL=http://0.0.0.0:4000

Command Structure

poetry run python src/ner/eval/run.py --benchmark <benchmark> --variant <variant> [--llm <llm>] [--sample-size <size>]

Parameters

--benchmark: Choose the benchmark dataset
- Options: genia, music, buster, astro
--variant: Choose the evaluation variant
- few-shot: Basic few-shot learning approach
- agentic-ner-no-grounding: AgenticNER without grounding
- agentic-ner-grounding: AgenticNER with grounding enabled
- agentic-ner-grounding-no-internet: AgenticNER with grounding but no internet access
- agentic-ner-grounding-no-researcher: AgenticNER with grounding but no researcher agent
--llm: Choose the LLM model (default: haiku)
- Options: haiku, sonnet
--sample-size: Number of samples to evaluate (default: 500)

Example Commands

# Run few-shot evaluation on GENIA using Haiku
poetry run python src/ner/eval/run.py --benchmark genia --variant few-shot

# Run full AgenticNER on MusicRecoNER using Sonnet
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-grounding --llm sonnet

# Run AgenticNER without internet on Buster with 100 samples
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-grounding-no-internet --sample-size 100

Reproducing results from the thesis

To reproduce the results from the thesis, run the following commands for each benchmark:

GENIA Benchmark

# Baseline (Few-shot single LLM call with Haiku)
poetry run python src/ner/eval/run.py --benchmark genia --variant few-shot

# Baseline (Few-shot single LLM call with Sonnet)
poetry run python src/ner/eval/run.py --benchmark genia --variant few-shot --llm sonnet

# AgenticNER variants
poetry run python src/ner/eval/run.py --benchmark genia --variant agentic-ner-no-grounding
poetry run python src/ner/eval/run.py --benchmark genia --variant agentic-ner-grounding
poetry run python src/ner/eval/run.py --benchmark genia --variant agentic-ner-grounding-no-internet
poetry run python src/ner/eval/run.py --benchmark genia --variant agentic-ner-grounding-no-researcher
poetry run python src/ner/eval/run.py --benchmark genia --variant agentic-ner-grounding --llm sonnet

MusicRecoNER Benchmark

poetry run python src/ner/eval/run.py --benchmark music --variant few-shot
poetry run python src/ner/eval/run.py --benchmark music --variant few-shot --llm sonnet
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-no-grounding
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-grounding
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-grounding-no-internet
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-grounding-no-researcher
poetry run python src/ner/eval/run.py --benchmark music --variant agentic-ner-grounding --llm sonnet

Buster Benchmark

poetry run python src/ner/eval/run.py --benchmark buster --variant few-shot
poetry run python src/ner/eval/run.py --benchmark buster --variant few-shot --llm sonnet
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-no-grounding
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-grounding
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-grounding-no-internet
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-grounding-no-researcher
poetry run python src/ner/eval/run.py --benchmark buster --variant agentic-ner-grounding --llm sonnet

AstroNER Benchmark

poetry run python src/ner/eval/run.py --benchmark astro --variant few-shot
poetry run python src/ner/eval/run.py --benchmark astro --variant few-shot --llm sonnet
poetry run python src/ner/eval/run.py --benchmark astro --variant agentic-ner-no-grounding
poetry run python src/ner/eval/run.py --benchmark astro --variant agentic-ner-grounding
poetry run python src/ner/eval/run.py --benchmark astro --variant agentic-ner-grounding-no-internet
poetry run python src/ner/eval/run.py --benchmark astro --variant agentic-ner-grounding-no-researcher
poetry run python src/ner/eval/run.py --benchmark astro --variant agentic-ner-grounding --llm sonnet

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
docs		docs
src/ner		src/ner
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
config.yaml		config.yaml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgenticNER

Installation

Running Benchmarks

Prerequisites

Command Structure

Parameters

Example Commands

Reproducing results from the thesis

GENIA Benchmark

MusicRecoNER Benchmark

Buster Benchmark

AstroNER Benchmark

About

Uh oh!

Releases

Packages

Languages

License

narasul/agentic-ner

Folders and files

Latest commit

History

Repository files navigation

AgenticNER

Installation

Running Benchmarks

Prerequisites

Command Structure

Parameters

Example Commands

Reproducing results from the thesis

GENIA Benchmark

MusicRecoNER Benchmark

Buster Benchmark

AstroNER Benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages