Skip to content

AleSteB/Demo-MetadataExtract4Cat

Repository files navigation

Demo - MetaExtract4Cat

A small demo Streamlit application to extract simple metadata from free text and add it to a local ontology-based Knowledge Graph (KG).

This repository demonstrates using an NLP backend to extract fields such as name, organization, age, favorite reaction, and catalysis research field from user text, present and validate the extracted metadata in a Streamlit UI, and add it to a local OWL Knowledge Graph.

Highlights

  • Streamlit front-end for interactive metadata extraction and KG management.
  • Uses an ontology (OWL) as the underlying data model.
  • Provides KG query UI to inspect people, organizations, reactions and research fields stored in the KG.

Repository structure

  • streamlitApp.py — Main Streamlit app. Handles the UI, calls extraction functions (from codebase.py), performs basic validation, shows KG queries and writes to the local ontology.
  • codebase.py — Project logic used by the Streamlit UI (extraction calls, helper functions such as dict2kg, kg_query, call_ollama, etc.).
  • config.json — Configuration used by the app (ontology paths and other settings).
  • ontologies/ — Folder containing the base ontologies and the knowledge graph files.
    • BaseOntology.owl and BaseOntology.properties
    • KnowledgeGraph.owl and KnowledgeGraph.properties
  • modelfiles/ — Folder containing the modelfiles of the used LLMs
  • evaluation/ — Folder containing JSON files with abstracts and extracted labels.
  • LICENSE — Project license.
  • README.md — (this file)

Prerequisites

  • Python 3.8+ (3.10/3.11 recommended)
  • The following Python packages are used (install with pip):
    • streamlit
    • owlready2
    • pandas

Note: The repository uses codebase.py for core functionality. That file may require additional packages or API keys depending on the implementation of the extraction backend (for example, a local LLM/OLLAMA client or a cloud API). Inspect codebase.py and config.json for any additional requirements. The LLMs are uploaded to Zenodo and can be found here

Suggested setup (Windows PowerShell)

  1. Create and activate a virtual environment:
python -m venv .venv; .\.venv\Scripts\Activate.ps1
  1. Install common dependencies:
python -m pip install --upgrade pip
pip install streamlit owlready2 pandas
  1. Review and update config.json if necessary (ontology paths, URLs, or endpoints used by codebase.py).

Run the app

From the repository root (PowerShell):

streamlit run .\streamlitApp.py

This will open the Streamlit UI in your browser. The app expects a short free-text paragraph describing a person (name, organization, age, favorite reaction, catalysis research field). Click "Digest Data" to run the extraction, validate the extracted metadata in the form on the right, and click "Add to local Knowledge Graph" to persist it to ontologies/KnowledgeGraph_inferred.owl.

Important notes

  • The app imports everything from codebase.py (via from codebase import *). That module provides the extraction function (call_ollama in the current streamlitApp.py), KG-writing (dict2kg) and query functions (kg_query). Make sure codebase.py is present and configured appropriately.
  • owlready2 uses JPype under the hood; installing owlready2 via pip should handle this, but on some systems additional Java configuration may be necessary.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages