LogTagger is a specialized tool designed for automated and semi-automated labeling of cybersecurity logs to create high-quality datasets for training AI models, including Large Language Models (LLMs). It integrates with Security Information and Event Management (SIEM) systems, receives logs, performs automatic classification, applies standardized tags (e.g., MITRE ATT&CK), allows for expert manual refinement, and exports the processed data for AI model training.
- SIEM Integration: Connect with Wazuh, Splunk, Elastic, and other SIEM systems via REST API
- Automatic Log Labeling: Apply tags based on predefined rules (True_positive, False_positive, Attack_Type)
- MITRE ATT&CK Framework: Automatic identification of tactics and techniques
- Semi-Automatic Labeling: Support for expert review and manual tag adjustment
- Advanced ML Classification:
- Modular ML provider system supporting local, API and demo modes
- Classification confidence metrics with configurable thresholds
- Human verification workflow for ML-classified events
- Performance metrics tracking and visualization
- Dataset Export: Generate structured CSV or JSON datasets for AI training
- Visualization Dashboard: Web interface for log review, manual tagging, and analytics
- Backend: Flask (Python)
- Frontend: React (JavaScript)
- Database: PostgreSQL
- Containerization: Docker
- Authentication: JWT-based authentication system
- Machine Learning:
- Local ML with scikit-learn
- Remote ML API integration
- Performance metrics tracking
- Python 3.8+
- Node.js 14+
- PostgreSQL 12+
- Git
The easiest way to get started is by using our automated setup script:
# Clone the repository
git clone https://github.com/yourusername/logtagger.git
cd logtagger
# Run the setup script
chmod +x setup.sh
./setup.shThe setup script will:
- Install all required dependencies
- Set up the PostgreSQL database
- Configure the application
- Create a default admin user (username:
admin, password:admin)
If you prefer to set up manually:
-
Clone the repository:
git clone https://github.com/yourusername/logtagger.git cd logtagger -
Set up the backend:
# Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies cd backend pip install -r requirements.txt
-
Set up the database:
# Create PostgreSQL database createuser -P logtagger # Use 'logtagger' as the password when prompted createdb -O logtagger logtagger # Update config if needed # Edit backend/config.py with your database details
-
Set up the frontend:
cd ../frontend npm install
After installation, you can start both backend and frontend with:
./start.sh-
Start the backend server:
cd backend source ../venv/bin/activate # On Windows: ..\venv\Scripts\activate python app.py
The backend API will be available at http://localhost:5000
-
Start the frontend development server:
cd frontend npm startThe frontend will be available at http://localhost:3000
-
Login with default credentials:
- Username:
admin - Password:
admin
Important: Change the default password immediately after first login.
- Username:
LogTagger uses a PostgreSQL database with two main tables:
events- Structured security events with labeling informationraw_logs- Raw log data from SIEM systemsml_performance_metrics- Metrics tracking ML model performance
To inspect your database structure:
cd backend
python tools/inspect_database.pyLogTagger features a flexible ML subsystem with the following capabilities:
-
Modular ML Provider System:
- Local ML: Use scikit-learn based models for offline classification
- API ML: Connect to external ML service via REST API
- Demo Provider: Run with simulated ML for testing and demonstrations
-
ML Dashboard:
- Monitor model performance with precision, recall, and F1 metrics
- Track performance by attack type classification
- Review ML-classified events and provide human verification
-
Configuration Options:
- Set confidence thresholds for auto-applying labels
- Configure human verification requirements
- Enable/disable ML classification system-wide
To use ML features:
- Navigate to "System Configuration" and enable ML classification
- Configure ML API endpoints or use the built-in local model
- Access the ML Dashboard to monitor performance and verify events
- All API requests use HTTPS with SSL/TLS
- Authentication is handled via JWT tokens
- Role-based authorization (Admin, Analyst, Viewer)
- Regular database backups are recommended
For more detailed documentation:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.
