Skip to content

Orangewarrior/vec-eyes-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Vec-Eyes πŸ”πŸ§ 

High-performance behavior intelligence engine and CLI written in Rust

Vec-Eyes is a powerful, extensible behavior analysis platform designed to classify, detect, and analyze patterns across text, logs, datasets, and system traces.

Built with performance, flexibility, and real-world use cases in mind, Vec-Eyes combines:

  • 🧠 Machine Learning (KNN, Naive Bayes)
  • πŸ”‘ NLP Pipelines (Tokenization, TF-IDF, Embeddings)
  • ⚑ Vector-based similarity (Word2Vec, FastText)
  • πŸ”Ž Rule-based detection (Regex / optional VectorScan)
  • πŸ“Š Hybrid scoring engine

πŸš€ Why Vec-Eyes?

Vec-Eyes is not just a spam classifier.

It is a behavior intelligence engine capable of detecting patterns across multiple domains:

πŸ” Security & Fraud Detection

  • Spam & phishing emails
  • Web attacks (SQLi, XSS, fuzzing)
  • Malware behavior
  • Fraud patterns in logs and transactions

🧬 Biological & Scientific Classification

Vec-Eyes can also be used for:

  • Virus pattern classification
  • Human / biological data classification
  • Bacteria & fungus identification (textual/log patterns)
  • Bioinformatics-style sequence classification (adaptable pipelines)

πŸ“Š General Pattern Recognition

  • Log anomaly detection
  • Dataset classification
  • Behavioral clustering

βš™οΈ Core Features

🧠 Machine Learning

  • KNN (Cosine, Euclidean, Manhattan, Minkowski)
  • Naive Bayes (Count, TF-IDF)

πŸ”‘ NLP Engine

  • Tokenization & normalization
  • TF-IDF vectorization
  • Word2Vec (lightweight training)
  • FastText-style embeddings (subword support)

πŸ”Ž Rule Engine

  • Regex matcher (default, no dependencies)
  • Optional high-performance engine (VectorScan)
  • YAML-driven rules with scoring system

πŸ“Š Hybrid Scoring

Combine:

  • ML probability
  • Rule matches
  • Custom weights

πŸ“‚ Project Structure

vec-eyes-lib/
vec-eyes-cli/

Vec-Eyes CLI πŸš€

High-performance behavior classification CLI powered by Vec-Eyes Core

Vec-Eyes CLI is a production-ready command-line interface built on top of vec-eyes-lib, designed for real-world workflows in:

  • πŸ” Security (web attacks, phishing, malware)
  • πŸ’° Fraud detection (financial transactions, risk scoring)
  • 🧬 Biological classification (virus, bacteria, anomaly patterns)
  • πŸ“Š General behavior intelligence pipelines

⚑ Why Vec-Eyes CLI?

  • YAML-first configuration (reproducible pipelines)
  • Multi-model ML engine (KNN, Bayes, SVM, RF, Boosting, IsolationForest)
  • Hybrid scoring (ML + rule engine)
  • Parallel execution via Rayon (threads)
  • Designed for real datasets, not toy examples

πŸš€ Quick Start

cargo run --   --rules-yaml rules.yaml   --classify-objects ./samples/

πŸ§ͺ Validate YAML

cargo run -- --validate-yaml rules.yaml

βœ” Validates:

  • required parameters
  • model-specific constraints
  • dataset paths

πŸ“„ Example 1 β€” Spam Detection (KNN + FastText)

method: KnnCosine
nlp: FastText
k: 5
threads: 4

datasets:
  hot:
    - /data/email/spam/
  cold:
    - /data/email/normal/

rules:
  - title: Spam Keywords
    match_rule: "free|bonus|win|casino"
    score: 70

Run:

vec-eyes --rules-yaml spam.yaml --classify-objects ./emails/

πŸ“„ Example 2 β€” Web Attack Detection (RandomForest + OOB)

method: RandomForest
nlp: FastText
threads: 8

random_forest_mode: ExtraTrees
random_forest_n_trees: 200
random_forest_bootstrap: true
random_forest_oob_score: true

datasets:
  hot:
    - /data/http/attacks/
  cold:
    - /data/http/normal/

rules:
  - title: SQL Injection
    match_rule: "union select|or 1=1"
    score: 90

πŸ“„ Example 3 β€” Fraud Detection (Logistic Regression)

method: LogisticRegression
nlp: TfIdf
threads: 4

logistic_learning_rate: 0.01
logistic_epochs: 100

datasets:
  hot:
    - /data/fraud/high-risk/
  cold:
    - /data/fraud/low-risk/

πŸ“„ Example 4 β€” Anomaly Detection (Isolation Forest)

method: IsolationForest
nlp: FastText

isolation_forest_n_trees: 150
isolation_forest_contamination: 0.02

datasets:
  hot:
    - /data/anomaly/outliers/
  cold:
    - /data/anomaly/normal/

βš™οΈ CLI Arguments Overview

Argument Description
--rules-yaml Path to YAML config
--validate-yaml Validate config only
--classify-objects Directory of files to classify
--threads Override thread count
--output-json Export results as JSON
--output-csv Export results as CSV

🧠 Performance Notes

  • threads controls Rayon parallelism
  • KNN β†’ parallel distance computation
  • Bayes β†’ parallel scoring
  • RandomForest / Boosting β†’ parallel training

🧠 Model Selection Guide

Model Best For
KNN Similarity / noisy text
Bayes Fast baseline
Logistic Fraud / production baseline
SVM Text classification
RandomForest Structured signals
IsolationForest Anomaly detection

🧩 Example (Advanced CLI Override)

vec-eyes   --rules-yaml rules.yaml   --threads 8   --random-forest-n-trees 300   --classify-objects ./traffic/

πŸ“Š Output

Example JSON output:

{
  "file": "sample.txt",
  "classification": ["WEB_ATTACK"],
  "score": 92.5
}

🀝 Contributing

We welcome contributions:

  • New datasets
  • Performance improvements
  • New classifiers
  • Better YAML validation

πŸ’¬ Final Note

Vec-Eyes CLI is not just a wrapper.

It is a production-grade behavior intelligence interface designed to bridge:

  • ML pipelines
  • rule-based detection
  • real-world data workflows

Built in Rust. Designed for performance. Ready for serious use.

πŸ‘€ Author

Orangewarrior


⭐ Star the project if you like it!

About

Vec-Eyes CLI is a high-performance behavior analysis tool for classifying and detecting patterns in text, logs, and datasets using NLP, vector embeddings, and rule-based matching.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages