Deterministic-first SOC Investigation Platform • MCP-Compatible Blue-Team Toolkit
HuntMCP is a portfolio-grade, deterministic-first SOC investigation prototype. It parses exported security logs, normalizes events, applies rule-based hunting logic, extracts IOCs, enriches them with CTI sources, and produces an analyst-style investigation report.
- Deterministic Detection Engine: Rule-based, auditable detection with Sigma-inspired logic
- MCP-Compatible: Designed for Model Context Protocol integration with security agents
- LLM-Assisted Triage Only: The LLM is an analyst assistant, never the detection engine
- Offline-First: Mock/local defaults for CTI and LLM enable safe, offline demos
- Security-Hardened: Path validation, secret redaction, upload limits, no shell execution
HuntMCP can:
- Parse: CSV, JSON, and JSONL log exports (Windows Security, Sysmon, DNS, Proxy/Web, generic CSV)
- Detect: Deterministic rule engine with ~20 detection rules covering common attack patterns
- Enrich: IOC extraction with mock/local CTI and optional external CTI connectors (URLhaus, AbuseIPDB, OTX, VirusTotal)
- Triage: OpenAI-compatible LLM for analyst assistance with deterministic mock fallback
- Report: Markdown and HTML investigation reports with timeline, IOC tables, and MITRE ATT&CK mapping
- Persist: SQLite case storage for investigation review and API access
- Serve: FastAPI backend with MCP server tools for parse, detect, enrich, triage, report, timeline, entity graph, and case management
┌─────────────────────────────────────────────────────────────────────────┐
│ HuntMCP Architecture │
└─────────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Parsers │───▶│ Detection │───▶│ IOC Engine │───▶│ Enrichment │
│ │ │ Engine │ │ │ │ │
│ • Windows │ │ • Rules │ │ • Extract │ │ • Mock CTI │
│ • Sysmon │ │ • Correlation│ │ • Normalize │ │ • URLhaus │
│ • DNS │ │ • Thresholds │ │ • Deduplicate│ │ • AbuseIPDB │
│ • Proxy/Web │ │ │ │ │ │ • OTX │
│ • Generic │ │ │ │ │ │ • VirusTotal │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Core Engine Layer │
├──────────────┬──────────────┬──────────────┬──────────────┬──────────────┤
│ Timeline │ Entity Graph │ Storage │ Security │ Reporting │
│ │ │ │ │ │
│ • Chronological│ • Build │ • SQLite │ • Path │ • Markdown │
│ • Events │ • Pivot │ • Cases │ Validation │ • HTML │
│ • Findings │ • Relations │ • Runs │ • Secret │ • JSON │
│ • IOCs │ • Centrality │ • Events │ Redaction │ • Evidence │
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Interface Layer │
├──────────────┬──────────────┬──────────────┬──────────────┬──────────────┤
│ CLI │ FastAPI │ MCP Server │ Optional │ │
│ │ │ │ AI Layer │ │
│ • huntmcp │ • /health │ • parse │ • LLM Triage │ │
│ • parse │ • /investigate│ • detect │ • Mock Fallback│ │
│ • detect │ • /cases │ • enrich │ • Redaction │ │
│ • enrich │ • /findings │ • triage │ • Safety │ │
│ • triage │ • /iocs │ • report │ Prompts │ │
│ • report │ • /timeline │ • timeline │ │ │
│ • coverage │ • /entity-graph│ • entity │ │ │
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .huntmcp self-testExpected output:
{
"ok": true,
"event_count": 5,
"finding_count": 5,
"enriched_count": 5,
"triage_count": 5
}# Parse demo logs
huntmcp parse --input data/sample_logs/demo_attack.csv --type auto
# Run detection
huntmcp detect
# Enrich with mock CTI (offline-safe)
huntmcp enrich --cti mock
# Generate report
huntmcp reportpython -m pytest -q
python -m ruff check huntmcp huntmcp.py tests
python -m ruff format --check huntmcp huntmcp.py testsThe repository includes a demo attack dataset at data/sample_logs/demo_attack.csv containing a realistic attack chain:
- Multiple failed logons (credential spraying)
- Successful logon after failures (brute force success)
- Suspicious PowerShell encoded command (obfuscation)
- LOLBin execution (certutil downloading payload)
- DNS beacon-like events (C2 communication)
- Suspicious proxy/web URL (malware delivery)
- Possible data exfiltration (large POST upload)
This dataset triggers multiple detection rules and produces a realistic investigation report for portfolio demonstrations.
python -m huntmcp.mcp_server.server- huntmcp_parse_logs: Parse security logs into normalized events
- huntmcp_detect: Run deterministic detection rules
- huntmcp_extract_and_enrich_iocs: Extract and enrich IOCs
- huntmcp_triage_findings: LLM-assisted triage (mock fallback)
- huntmcp_generate_report: Generate investigation report
- huntmcp_run_pipeline: Full investigation pipeline
- huntmcp_generate_timeline: Generate event timeline
- huntmcp_build_entity_graph: Build entity relationship graph
- huntmcp_get_cases: List all cases
- huntmcp_get_case_details: Get case details
- huntmcp_get_case_timeline: Get case timeline
# Parse sample logs
parse_result = huntmcp_parse_logs(
input_path="data/sample_logs/demo_attack.csv",
log_type="auto"
)
# Run detection
detect_result = huntmcp_detect(events=parse_result["events"])
# Generate timeline
timeline_result = huntmcp_get_case_timeline(
case_id=detect_result["case_id"]
)
# Get report
report_result = huntmcp_get_case_report(
case_id=detect_result["case_id"]
)HuntMCP is designed with security as a core principle:
- CTI enrichment uses mock/local data by default
- LLM triage uses deterministic mock fallback when no API key is configured
- All tests run offline without external dependencies
- Input paths are validated to prevent directory traversal
/investigationspath restricted to workspace- Blocked paths:
.env,.git, config files, sensitive directories
- API keys are never logged or sent to LLM
- Redaction supports: usernames, internal IPs, hostnames, emails, tokens, cookies
- Secret patterns are stripped from all outputs
- Maximum file size: 10MB (configurable via
HUNTMCP_MAX_UPLOAD_SIZE_BYTES) - Blocked extensions:
.exe,.dll,.bat,.cmd,.ps1,.vbs,.js,.jar,.sh - Content validation for malicious payloads
- LLM is assistant-only, never the detection engine
- Prompt templates warn LLM not to follow instructions from log content
- Log content treated as untrusted input
- Only selected suspicious context sent to LLM
- No arbitrary shell execution via MCP tools
- File operations restricted to safe paths
- No exposure of sensitive system information
- Tool schemas match actual implementations
{
"rule_id": "multiple_failed_logons",
"title": "Multiple Failed Logons",
"severity": "high",
"summary": "12 failed logons within 600s for user=jsmith host=WORKSTATION-01",
"matched_event_count": 12,
"first_seen": "2024-01-15T10:23:45Z",
"last_seen": "2024-01-15T10:33:12Z",
"mitre_attack": ["T1110.003"],
"iocs": ["192.168.1.100", "jsmith"]
}2024-01-15 10:23:45 - Failed logon (user: jsmith, source: 192.168.1.100)
2024-01-15 10:24:12 - Failed logon (user: jsmith, source: 192.168.1.100)
2024-01-15 10:25:01 - Failed logon (user: jsmith, source: 192.168.1.100)
2024-01-15 10:33:12 - Successful logon (user: jsmith, source: 192.168.1.100)
2024-01-15 10:35:22 - PowerShell encoded command execution
2024-01-15 10:36:45 - DNS query to suspicious domain (evil.example.com)
2024-01-15 10:38:01 - Large POST upload to external endpoint
{
"ioc_value": "evil.example.com",
"ioc_type": "domain",
"cti_sources": {
"urlhaus": {"status": "malicious", "threat": "malware_download"},
"abuseipdb": {"abuse_confidence_score": 100, "last_reported_at": "2024-01-14"},
"otx": {"pulses": 3, "sections": ["malware_domains", "c2"]}
}
}- Markdown: Structured report with executive summary, findings, timeline, IOC tables
- HTML: Styled HTML report with severity highlighting and interactive tables
- JSON: Machine-readable output for integration
This repository is intended as a cybersecurity portfolio project and advanced MVP, not as a production SIEM or detection replacement.
Current validated state:
- CLI pipeline works end to end
- FastAPI backend supports investigation jobs and persisted case summaries
- SQLite persistence stores cases, runs, events, findings, IOCs, enrichments, triage results, reports
- Deterministic tests run offline even when a local
.envexists - Current validation baseline: 300+ passed, ruff check clean, ruff format clean
- MCP server with 20+ tools for security agent integration
- HuntMCP is not a SIEM
- HuntMCP is not a production detection replacement
- HuntMCP does not perform actor attribution
- HuntMCP does not automatically prove malicious activity
- Generated reports require analyst review
- Public CTI sources can be noisy, stale, incomplete, or unavailable
- Windows Security
- Sysmon
- DNS
- Proxy/Web
- Generic CSV
The default rule set includes ~20 detection rules covering:
Windows Security/Sysmon:
- Multiple failed logons (password spraying)
- Failed logon followed by successful logon (brute force success)
- Suspicious PowerShell encoded command
- New local admin user creation
- PowerShell download cradle
- LOLBin execution (certutil, rundll32, regsvr32, mshta, bitsadmin)
- Suspicious parent-child process
- Possible credential dumping / LSASS access
- Remote service creation
- Scheduled task creation
- Registry persistence
DNS:
- DNS beaconing candidate
- High entropy/random-looking domain
- Long domain name
- Repeated beacon-like DNS queries
- NXDOMAIN spike
Proxy/Web:
- Known suspicious URL/domain access
- Suspicious user-agent
- Large POST/upload
- Suspicious TLD access
- Repeated C2-like callback pattern
The engine is deterministic and auditable. All rules include MITRE ATT&CK tactic/technique mapping where applicable.
Create a local .env from the example:
Copy-Item .env.example .envExample variables:
OPENAI_API_KEY=
LLM_MODEL=mimo-v2.5
OPENAI_BASE_URL=
URLHAUS_AUTH_KEY=
ABUSEIPDB_API_KEY=
OTX_API_KEY=
VIRUSTOTAL_API_KEY=
LLM_TIMEOUT_SECONDS=30
HUNTMCP_MAX_UPLOAD_SIZE_BYTES=10485760
HUNTMCP_MAX_LLM_FINDINGS=50
HUNTMCP_CTI_LOOKUP_LIMIT=250
HUNTMCP_CORS_ALLOW_ORIGINS=After installing with pip install -e ., you can use the huntmcp command directly:
# Parse logs
huntmcp parse --input data/sample_logs/demo_attack.csv --type auto
# Run detection
huntmcp detect
# Enrich findings
huntmcp enrich --cti mock
# Run LLM/mock triage
huntmcp triage --limit 5
# Generate report
huntmcp report
# Run persisted investigation workflow
huntmcp init-db
huntmcp investigate --input data/sample_logs/demo_attack.csv --type auto --cti mock --case-id demo-attack --case-name "Demo Attack Investigation"
huntmcp case-summary --case-id demo-attack
# Generate MITRE ATT&CK coverage report
huntmcp coverage
# Validate configuration
huntmcp validate-config
# List detection rules
huntmcp rules list
huntmcp rules show --rule-id multiple_failed_logonsStart the API:
uvicorn huntmcp.api:app --reloadSet explicit CORS origins if needed:
$env:HUNTMCP_CORS_ALLOW_ORIGINS="http://localhost:3000,http://localhost:5173"
uvicorn huntmcp.api:app --reloadUseful endpoints:
GET /health
POST /investigate
POST /investigations
GET /jobs/{job_id}
GET /cases
GET /cases/{case_id}/summary
GET /cases/{case_id}/timeline
GET /cases/{case_id}/entity-graph
GET /findings/{case_id}
GET /iocs/{case_id}
GET /reports/{case_id}
GET /rules
GET /rules/{rule_id}
GET /coverage
GET /admin/migration/status
POST /admin/migrate
GET /admin/audit-logs
GET /dashboard
Run the full validation suite:
python -m pytest -q
python -m ruff check huntmcp huntmcp.py tests
python -m ruff format --check huntmcp huntmcp.py testsCurrent local validation:
300+ passed
All ruff checks passed
All files formatted
The repository intentionally excludes:
.envand.env.example(contains sensitive configuration)- Python caches (
__pycache__,.pytest_cache,.ruff_cache) - SQLite databases (
*.sqlite,*.sqlite3) - Generated normalized events (
data/normalized/*.json) - Generated findings (
data/findings/*.json) - Generated enrichment output (
data/enriched/*.json) - Generated reports (
reports/*.md,reports/*.json) - Downloaded public CTI datasets (
data/cache/,data/uploads/) - Build artifacts (
dist/,build/,*.egg-info/)
Only source code, tests, docs, configs, and sample logs should be pushed.
- This is a portfolio-grade prototype, not a production SOC platform
- The current detection set is intentionally small (~20 rules)
- SQLite is suitable for local/single-user workflows, not multi-tenant production
- Very large datasets need a true stateful streaming detection engine
- External CTI quality depends on third-party availability and rate limits
- LLM output can be wrong and must be reviewed by an analyst
- API authentication, RBAC, audit logging, and deployment hardening are future work
- More Sigma-compatible rule loading and rule metadata
- Richer analyst UI for timeline, IOC pivoting, and finding review
- API authentication and role-based access control
- Job queue backend for longer investigations
- More CTI connector response normalization and rate-limit handling
- Docker deployment profiles and production secret management
- True stateful streaming detection for very large datasets
- Enhanced MCP server wrappers for all modules
See LICENSE file for details.