LogTagger

🔍 Overview

LogTagger is a specialized tool designed for automated and semi-automated labeling of cybersecurity logs to create high-quality datasets for training AI models, including Large Language Models (LLMs). It integrates with Security Information and Event Management (SIEM) systems, receives logs, performs automatic classification, applies standardized tags (e.g., MITRE ATT&CK), allows for expert manual refinement, and exports the processed data for AI model training.

🌟 Key Features

SIEM Integration: Connect with Wazuh, Splunk, Elastic, and other SIEM systems via REST API
Automatic Log Labeling: Apply tags based on predefined rules (True_positive, False_positive, Attack_Type)
MITRE ATT&CK Framework: Automatic identification of tactics and techniques
Semi-Automatic Labeling: Support for expert review and manual tag adjustment
Advanced ML Classification:
- Modular ML provider system supporting local, API and demo modes
- Classification confidence metrics with configurable thresholds
- Human verification workflow for ML-classified events
- Performance metrics tracking and visualization
Dataset Export: Generate structured CSV or JSON datasets for AI training
Visualization Dashboard: Web interface for log review, manual tagging, and analytics

🔧 Tech Stack

Backend: Flask (Python)
Frontend: React (JavaScript)
Database: PostgreSQL
Containerization: Docker
Authentication: JWT-based authentication system
Machine Learning:
- Local ML with scikit-learn
- Remote ML API integration
- Performance metrics tracking

⚙️ Installation

Prerequisites

Python 3.8+
Node.js 14+
PostgreSQL 12+
Git

Quick Setup (Automated)

The easiest way to get started is by using our automated setup script:

# Clone the repository
git clone https://github.com/yourusername/logtagger.git
cd logtagger

# Run the setup script
chmod +x setup.sh
./setup.sh

The setup script will:

Install all required dependencies
Set up the PostgreSQL database
Configure the application
Create a default admin user (username: admin, password: admin)

Manual Setup

If you prefer to set up manually:

Clone the repository:

git clone https://github.com/yourusername/logtagger.git
cd logtagger

Set up the backend:

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
cd backend
pip install -r requirements.txt

Set up the database:

# Create PostgreSQL database
createuser -P logtagger  # Use 'logtagger' as the password when prompted
createdb -O logtagger logtagger

# Update config if needed
# Edit backend/config.py with your database details

Set up the frontend:
```
cd ../frontend
npm install
```

🚀 Running the Application

Using the Start Script

After installation, you can start both backend and frontend with:

./start.sh

Manual Start

Start the backend server:

cd backend
source ../venv/bin/activate  # On Windows: ..\venv\Scripts\activate
python app.py

The backend API will be available at http://localhost:5000

Start the frontend development server:
```
cd frontend
npm start
```
The frontend will be available at http://localhost:3000
Login with default credentials:
- Username: admin
- Password: admin
Important: Change the default password immediately after first login.

📊 Database Structure

LogTagger uses a PostgreSQL database with two main tables:

events - Structured security events with labeling information
raw_logs - Raw log data from SIEM systems
ml_performance_metrics - Metrics tracking ML model performance

To inspect your database structure:

cd backend
python tools/inspect_database.py

🤖 Machine Learning Integration

LogTagger features a flexible ML subsystem with the following capabilities:

Modular ML Provider System:
- Local ML: Use scikit-learn based models for offline classification
- API ML: Connect to external ML service via REST API
- Demo Provider: Run with simulated ML for testing and demonstrations
ML Dashboard:
- Monitor model performance with precision, recall, and F1 metrics
- Track performance by attack type classification
- Review ML-classified events and provide human verification
Configuration Options:
- Set confidence thresholds for auto-applying labels
- Configure human verification requirements
- Enable/disable ML classification system-wide

To use ML features:

Navigate to "System Configuration" and enable ML classification
Configure ML API endpoints or use the built-in local model
Access the ML Dashboard to monitor performance and verify events

🔒 Security

All API requests use HTTPS with SSL/TLS
Authentication is handled via JWT tokens
Role-based authorization (Admin, Analyst, Viewer)
Regular database backups are recommended

📄 Documentation

For more detailed documentation:

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📝 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
frontend		frontend
venv		venv
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
clean_deploy.sh		clean_deploy.sh
deploy.sh		deploy.sh
fix_models.sh		fix_models.sh
initialize_db.py		initialize_db.py
reset_db.sh		reset_db.sh
run.sh		run.sh
syntax_fix.sh		syntax_fix.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogTagger

🔍 Overview

🌟 Key Features

🔧 Tech Stack

⚙️ Installation

Prerequisites

Quick Setup (Automated)

Manual Setup

🚀 Running the Application

Using the Start Script

Manual Start

📊 Database Structure

🤖 Machine Learning Integration

🔒 Security

📄 Documentation

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LogTagger

🔍 Overview

🌟 Key Features

🔧 Tech Stack

⚙️ Installation

Prerequisites

Quick Setup (Automated)

Manual Setup

🚀 Running the Application

Using the Start Script

Manual Start

📊 Database Structure

🤖 Machine Learning Integration

🔒 Security

📄 Documentation

🤝 Contributing

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages