High-performance behavior intelligence engine and CLI written in Rust
Vec-Eyes is a powerful, extensible behavior analysis platform designed to classify, detect, and analyze patterns across text, logs, datasets, and system traces.
Built with performance, flexibility, and real-world use cases in mind, Vec-Eyes combines:
- π§ Machine Learning (KNN, Naive Bayes)
- π‘ NLP Pipelines (Tokenization, TF-IDF, Embeddings)
- β‘ Vector-based similarity (Word2Vec, FastText)
- π Rule-based detection (Regex / optional VectorScan)
- π Hybrid scoring engine
Vec-Eyes is not just a spam classifier.
It is a behavior intelligence engine capable of detecting patterns across multiple domains:
- Spam & phishing emails
- Web attacks (SQLi, XSS, fuzzing)
- Malware behavior
- Fraud patterns in logs and transactions
Vec-Eyes can also be used for:
- Virus pattern classification
- Human / biological data classification
- Bacteria & fungus identification (textual/log patterns)
- Bioinformatics-style sequence classification (adaptable pipelines)
- Log anomaly detection
- Dataset classification
- Behavioral clustering
- KNN (Cosine, Euclidean, Manhattan, Minkowski)
- Naive Bayes (Count, TF-IDF)
- Tokenization & normalization
- TF-IDF vectorization
- Word2Vec (lightweight training)
- FastText-style embeddings (subword support)
- Regex matcher (default, no dependencies)
- Optional high-performance engine (VectorScan)
- YAML-driven rules with scoring system
Combine:
- ML probability
- Rule matches
- Custom weights
vec-eyes-lib/
vec-eyes-cli/
High-performance behavior classification CLI powered by Vec-Eyes Core
Vec-Eyes CLI is a production-ready command-line interface built on top of vec-eyes-lib, designed for real-world workflows in:
- π Security (web attacks, phishing, malware)
- π° Fraud detection (financial transactions, risk scoring)
- 𧬠Biological classification (virus, bacteria, anomaly patterns)
- π General behavior intelligence pipelines
- YAML-first configuration (reproducible pipelines)
- Multi-model ML engine (KNN, Bayes, SVM, RF, Boosting, IsolationForest)
- Hybrid scoring (ML + rule engine)
- Parallel execution via Rayon (
threads) - Designed for real datasets, not toy examples
cargo run -- --rules-yaml rules.yaml --classify-objects ./samples/cargo run -- --validate-yaml rules.yamlβ Validates:
- required parameters
- model-specific constraints
- dataset paths
method: KnnCosine
nlp: FastText
k: 5
threads: 4
datasets:
hot:
- /data/email/spam/
cold:
- /data/email/normal/
rules:
- title: Spam Keywords
match_rule: "free|bonus|win|casino"
score: 70Run:
vec-eyes --rules-yaml spam.yaml --classify-objects ./emails/method: RandomForest
nlp: FastText
threads: 8
random_forest_mode: ExtraTrees
random_forest_n_trees: 200
random_forest_bootstrap: true
random_forest_oob_score: true
datasets:
hot:
- /data/http/attacks/
cold:
- /data/http/normal/
rules:
- title: SQL Injection
match_rule: "union select|or 1=1"
score: 90method: LogisticRegression
nlp: TfIdf
threads: 4
logistic_learning_rate: 0.01
logistic_epochs: 100
datasets:
hot:
- /data/fraud/high-risk/
cold:
- /data/fraud/low-risk/method: IsolationForest
nlp: FastText
isolation_forest_n_trees: 150
isolation_forest_contamination: 0.02
datasets:
hot:
- /data/anomaly/outliers/
cold:
- /data/anomaly/normal/| Argument | Description |
|---|---|
--rules-yaml |
Path to YAML config |
--validate-yaml |
Validate config only |
--classify-objects |
Directory of files to classify |
--threads |
Override thread count |
--output-json |
Export results as JSON |
--output-csv |
Export results as CSV |
threadscontrols Rayon parallelism- KNN β parallel distance computation
- Bayes β parallel scoring
- RandomForest / Boosting β parallel training
| Model | Best For |
|---|---|
| KNN | Similarity / noisy text |
| Bayes | Fast baseline |
| Logistic | Fraud / production baseline |
| SVM | Text classification |
| RandomForest | Structured signals |
| IsolationForest | Anomaly detection |
vec-eyes --rules-yaml rules.yaml --threads 8 --random-forest-n-trees 300 --classify-objects ./traffic/Example JSON output:
{
"file": "sample.txt",
"classification": ["WEB_ATTACK"],
"score": 92.5
}We welcome contributions:
- New datasets
- Performance improvements
- New classifiers
- Better YAML validation
Vec-Eyes CLI is not just a wrapper.
It is a production-grade behavior intelligence interface designed to bridge:
- ML pipelines
- rule-based detection
- real-world data workflows
Built in Rust. Designed for performance. Ready for serious use.
Orangewarrior