Skip to content

feat: add SNMP provider with SNMPv1/v2c/v3, trap listener, polling, OID mapping#6133

Open
CharlesWong wants to merge 3 commits intokeephq:mainfrom
CharlesWong:feat/snmp-provider
Open

feat: add SNMP provider with SNMPv1/v2c/v3, trap listener, polling, OID mapping#6133
CharlesWong wants to merge 3 commits intokeephq:mainfrom
CharlesWong:feat/snmp-provider

Conversation

@CharlesWong
Copy link

@CharlesWong CharlesWong commented Mar 25, 2026

/claim #2112

feat: SNMP Provider — SNMPv1/v2c/v3 Trap Receiver + OID Polling + 25 Unit Tests

Closes #2112


Why this PR

SNMP is the industry-standard protocol for network device monitoring — routers, switches, firewalls, UPS units, and servers all emit SNMP traps when something goes wrong. Without an SNMP provider, Keep users running on-prem or hybrid infrastructure have no way to ingest these alerts.

I reviewed all five existing SNMP bounty PRs (#5525, #5552, #5599, #5637, #6107) to understand what each one got right, where each one fell short, and what a production-grade implementation actually needs. This PR is the result of that analysis.


What's in this PR

Files changed

File Lines Purpose
keep/providers/snmp_provider/__init__.py 3 Module export
keep/providers/snmp_provider/snmp_provider.py 525 Provider implementation
keep/providers/snmp_provider/test_snmp_provider.py 351 25 unit tests

Feature comparison with competing PRs

Feature This PR #5525 #5552 #5599 #5637 #6107
SNMPv1 traps partial partial partial
SNMPv2c traps
SNMPv3 (auth+priv) partial partial partial
Trap listener daemon thread
Clean dispose() lifecycle
Thread-safe alert cache + lock
Optional OID polling
JSON-configurable OID→alert mapping
Longest-prefix OID matching
Built-in enterprise severity defaults
Graceful fallback (no pysnmp)
Bad JSON config handled safely
Unit tests 25 ✅ 4 0 0 0 0

Design decisions and why

1. Longest-prefix OID matching

All five competing PRs use exact-match OID lookups. In practice, enterprise SNMP implementations send trap OIDs with trailing instance identifiers (e.g. 1.3.6.1.4.1.9.9.13.3.0.1 instead of exactly 1.3.6.1.4.1.9.9.13). Exact match silently drops these traps.

This PR implements longest-prefix matching: all configured OID prefixes are sorted by length (descending) and the first match wins. This mirrors how real NMS tools (Nagios, Zabbix, PRTG) handle OID-based routing.

def _map_oid_to_alert(self, oid: str) -> dict:
    # Sort by prefix length descending — longest match wins
    for prefix in sorted(self._oids_mapping.keys(), key=len, reverse=True):
        if oid.startswith(prefix):
            return self._oids_mapping[prefix]
    return {}

2. Built-in enterprise severity defaults

When no OID mapping is configured, the provider infers severity from well-known IETF and enterprise OID prefixes. This means zero-config works out of the box for common network events:

OID prefix Trap type Inferred severity
1.3.6.1.6.3.1.1.5.3 linkDown critical
1.3.6.1.6.3.1.1.5.5 authenticationFailure critical
1.3.6.1.6.3.1.1.5.2 warmStart warning
1.3.6.1.6.3.1.1.5.1 coldStart info
1.3.6.1.6.3.1.1.5.4 linkUp info
1.3.6.1.4.1.9.* Cisco enterprise high
1.3.6.1.4.1.2636.* Juniper enterprise high
1.3.6.1.4.1.11.* HP/HPE enterprise high
1.3.6.1.4.1.2011.* Huawei enterprise medium

3. Thread-safe alert caching with copy-on-read

The trap listener thread writes to self._alerts under a threading.Lock. get_alerts() returns a shallow copy so callers cannot mutate the internal state. All competing PRs that have a cache skip the lock entirely.

def get_alerts(self, ...) -> list[AlertDto]:
    if not self._listener_running:
        self._start_trap_listener()
    with self._lock:
        return list(self._alerts)  # return copy, not reference

4. Graceful degradation without pysnmp

pysnmp-lextudio is an optional dependency. If it is not installed the provider logs a warning and get_alerts() returns an empty list rather than raising an ImportError. This avoids crashing the entire Keep process on providers that do not have the optional dep installed.

5. SNMPv3 auth+priv support

Full USM (User-based Security Model) support with configurable auth protocol (MD5/SHA) and privacy protocol (DES/AES). Credentials are marked sensitive: True so they are redacted in Keep's UI and logs.

6. Safe JSON config handling

If oids_mapping or poll_targets contains invalid JSON, the provider logs a warning and falls back to empty mapping/list instead of raising at startup. None of the competing PRs handle this.


Test coverage

$ cd keep/providers/snmp_provider && python3 -m unittest test_snmp_provider -v

test_dispose_joins_running_threads ... ok
test_dispose_sets_stop_event ... ok
test_dispose_with_no_threads_does_not_raise ... ok
test_calls_start_listener_when_not_running ... ok
test_returns_copy_not_reference ... ok
test_returns_list ... ok
test_bad_oids_mapping_uses_empty ... ok
test_bad_poll_targets_uses_empty ... ok
test_exact_oid_returns_config ... ok
test_longest_prefix_wins ... ok
test_no_match_returns_empty ... ok
test_prefix_match ... ok
test_case_insensitive ... ok
test_critical ... ok
test_empty_returns_none ... ok
test_unknown_returns_none ... ok
test_cisco_oid_is_high ... ok
test_cold_start_is_info ... ok
test_link_down_is_critical ... ok
test_unknown_defaults_to_info ... ok
test_invalid_version_raises ... ok
test_v3_without_username_raises ... ok
test_valid_v1 ... ok
test_valid_v2c ... ok
test_valid_v3_with_username ... ok

----------------------------------------------------------------------
Ran 25 tests in 0.007s

OK

All 25 tests pass without pysnmp installed — pysnmp is fully mocked at the sys.modules level before any imports so the test suite is self-contained and CI-friendly.

Test classes

Class Tests What is covered
TestValidateConfig 5 v1/v2c/v3 valid; invalid version raises; v3 no username raises
TestOidMapping 4 exact match; prefix match; longest prefix wins; no match returns empty
TestSeverityInference 4 linkDown→critical; coldStart→info; Cisco→high; unknown→info
TestParseSeverity 4 critical; case-insensitive; empty→None; unknown→None
TestDispose 3 stop event set; threads joined; no threads is safe
TestGetAlerts 3 returns list; returns copy; starts listener on first call
TestInvalidJsonConfig 2 bad oids_mapping falls back; bad poll_targets falls back

Manual testing

Send a test trap (requires snmp-utils or net-snmp):

# Start listener on port 1620 (no root required)
# Configure the provider with port=1620, version=2c, community_string=public

# Send a linkDown trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.6.3.1.1.5.3 \
  1.3.6.1.2.1.2.2.1.1 i 2

# Send a Cisco enterprise trap
snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.4.1.9.9.13.3.0.1 \
  1.3.6.1.2.1.1.5 s "router-01.example.com"

The resulting AlertDto will have:

  • name: from oids_mapping config or the OID string
  • severity: AlertSeverity.CRITICAL for linkDown (from built-in defaults)
  • source: ["snmp"]
  • description: formatted varbind list

Checklist

  • Code follows Keep's provider pattern (BaseProvider, AuthConfig pydantic dataclass, AlertDto mapping)
  • Optional dependency handled gracefully (no crash if pysnmp not installed)
  • Thread-safe implementation with proper dispose()
  • 25 unit tests, all passing, no external dependencies required
  • SNMPv1, v2c, v3 all supported
  • Sensitive fields (auth_key, priv_key) marked sensitive: True

…ID mapping, and 25 unit tests

Closes keephq#2112

## Summary
Implements a production-quality SNMP provider for Keep that:
- Receives SNMP traps (v1, v2c, v3) and converts them to Keep alerts
- SNMPv3 authentication (MD5/SHA) and privacy (DES/AES)
- Configurable OID-to-alert severity mapping (JSON, longest-prefix wins)
- Optional periodic SNMP polling of target devices
- Graceful degradation when pysnmp-lextudio is not installed
- Clean lifecycle management (daemon threads + stop event + dispose())
- 25 unit tests (pysnmp fully mocked, no external deps required)

## Installation
```bash
pip install pysnmp-lextudio
```

## Configuration
| Field | Default | Description |
|-------|---------|-------------|
| host | 0.0.0.0 | Listen address for traps |
| port | 162 | UDP port |
| community_string | public | SNMPv1/v2c community |
| version | 2c | SNMP version: 1, 2c, or 3 |
| username | | SNMPv3 username |
| auth_key | | SNMPv3 auth key (sensitive) |
| auth_protocol | MD5 | SNMPv3: MD5 or SHA |
| priv_key | | SNMPv3 privacy key (sensitive) |
| priv_protocol | DES | SNMPv3: DES or AES |
| oids_mapping | {} | JSON OID→alert name/severity map |
| poll_enabled | false | Enable periodic OID polling |
| poll_targets | [] | JSON list of polling targets |
| poll_interval | 60 | Polling interval (seconds) |
@vercel
Copy link

vercel bot commented Mar 25, 2026

@CharlesWong is attempting to deploy a commit to the KeepHQ Team on Vercel.

A member of the Team first needs to authorize it.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Feature A new feature Provider Providers related issues labels Mar 25, 2026
CharlesWong added a commit to CharlesWong/httpx that referenced this pull request Mar 25, 2026
…dates

- Submitted Keep projectdiscovery#2112 SNMP provider (keephq/keep#6133)
- Added algora_scraper.py (browser pattern scraping)
- Added monopoly_check.py (detect single-winner repos)
- Updated config.py: added MIN_REPO_STARS + 12 blacklisted Tier 3 repos
- Added scorer.py star-count check (skip repos < 50 stars)
- Added new verified payers: deskflow, highlight, outerbase, golemcloud
- Saved algora_snapshot.json with 20 current open bounties
@CharlesWong
Copy link
Author

Additional proof / validation details:

cd keep/providers/snmp_provider
python3 -m unittest test_snmp_provider -v
Ran 25 tests in 0.007s
OK

Manual trap examples used for verification design:

snmptrap -v 2c -c public localhost:1620 "" 1.3.6.1.6.3.1.1.5.3 1.3.6.1.2.1.2.2.1.1 i 2

This should map to a linkDown-style critical alert via the built-in severity defaults unless overridden in oids_mapping.

I also intentionally tested invalid JSON in oids_mapping and poll_targets to verify the provider degrades safely instead of failing at startup.

CharlesWong and others added 2 commits March 25, 2026 12:41
- Move authentication_config creation into validate_config() to match
  Keep's BaseProvider pattern (base __init__ calls validate_config before
  child __init__ body runs)
- Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
- Mark community_string as sensitive in AuthConfig metadata
- Update test stub BaseProvider to call validate_config() like the real one
- Adjust validation tests to expect errors during construction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s, dedup fingerprint

- Extract source IP from transport address in trap callback via
  snmpEngine.msgAndPduDsp.getTransportInfo(stateReference)
- Parse snmpTrapOID.0 (1.3.6.1.6.3.1.1.4.1.0) as first-class trap_oid
  and use it as primary OID for severity/status lookup
- Map recovery OIDs (linkUp, coldStart, warmStart) to AlertStatus.RESOLVED
  instead of FIRING; linkDown and authFailure remain FIRING
- Set AlertDto.fingerprint to "source_ip:trap_oid" for deduplication
- Store source_ip and trap_oid in AlertDto.labels
- Add 12 new tests covering all four changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🙋 Bounty claim Feature A new feature Provider Providers related issues size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[🔌 Provider]: SNMP provider

1 participant