perf: optimize database pools, batch commits, and node health checks#605
perf: optimize database pools, batch commits, and node health checks#605wahh3b-lgtm wants to merge 2 commits into
Conversation
- DB pool: increase pool_recycle to 1800s, add pool_timeout (default 15s), enable pool_use_lifo, add ge=1 validation - Alembic migration: add composite indexes (status/expire, status/used_traffic, node_id/created_at, etc.) - Record usages: cache dialect lookup, batch node stats in single transaction with per-row safe_execute fallback - Node health checks: concurrent asyncio.gather with snapshot revalidation and exception logging - XRayConfig: revert TLS reads to synchronous (run_until_complete fails on running loop) - Update .env.example with new pool defaults
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR tunes database connection pool settings ( ChangesPerformance & Reliability Improvements
Sequence Diagram(s)sequenceDiagram
rect rgba(70, 130, 180, 0.5)
Note over Scheduler,safe_execute: Batched node stats recording
Scheduler->>record_node_stats_batched: invoke with node_stats
record_node_stats_batched->>record_node_stats_batched: aggregate upsert_params per node
record_node_stats_batched->>engine.begin: single transaction under JOB_SEM
alt batch success
engine.begin-->>record_node_stats_batched: committed
else OperationalError/DatabaseError
engine.begin-->>record_node_stats_batched: exception raised
record_node_stats_batched->>safe_execute: per-node fallback write
safe_execute-->>record_node_stats_batched: done
end
end
sequenceDiagram
rect rgba(60, 179, 113, 0.5)
Note over Caller,Node: Concurrent node health lookup
Caller->>NodeManager: get_healthy_nodes()
NodeManager->>NodeManager: snapshot nodes under reader lock
NodeManager->>asyncio.gather: node.get_health() for all nodes concurrently
asyncio.gather->>Node: get_health()
Node-->>asyncio.gather: Health or exception
asyncio.gather-->>NodeManager: results list
NodeManager->>NodeManager: skip exceptions, re-validate node identity
NodeManager-->>Caller: filtered node list
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/jobs/record_usages.py (1)
515-516:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPython 2 syntax error: Invalid exception handling.
Line 515 uses Python 2 exception syntax
except ValueError, TypeError:which is invalid in Python 3 and will cause aSyntaxErrorwhen the module is loaded. The correct Python 3 syntax requires parentheses for multiple exception types.🐛 Proposed fix
- except ValueError, TypeError: + except (ValueError, TypeError):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/jobs/record_usages.py` around lines 515 - 516, The except clause uses Python 2 syntax with comma-separated exception types which is invalid in Python 3 and causes a SyntaxError. In the exception handling block where invalid_uids.append(uid) is called, change the except statement from using a comma between ValueError and TypeError to wrapping both exception types in parentheses, using the correct Python 3 syntax for handling multiple exception types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/node/__init__.py`:
- Around line 78-81: The asyncio.gather() call in the get_healthy_nodes()
function lacks a timeout on individual node.get_health() calls, which means a
single hung probe can block the entire operation and stall downstream work. Wrap
each node.get_health() call in the generator expression with
asyncio.wait_for(..., timeout=10) to match the timeout pattern already
implemented in app/jobs/node_checker.py. The existing return_exceptions=True
parameter will automatically catch and handle the timeout exceptions, allowing
timed-out nodes to be skipped gracefully.
---
Outside diff comments:
In `@app/jobs/record_usages.py`:
- Around line 515-516: The except clause uses Python 2 syntax with
comma-separated exception types which is invalid in Python 3 and causes a
SyntaxError. In the exception handling block where invalid_uids.append(uid) is
called, change the except statement from using a comma between ValueError and
TypeError to wrapping both exception types in parentheses, using the correct
Python 3 syntax for handling multiple exception types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d494fb77-7c18-4819-b000-f555f9df5079
📒 Files selected for processing (8)
.env.exampleapp/core/xray.pyapp/db/__init__.pyapp/db/base.pyapp/db/migrations/versions/a1b2c3d4e5f6_add_performance_indexes.pyapp/jobs/record_usages.pyapp/node/__init__.pyconfig.py
💤 Files with no reviewable changes (1)
- app/db/init.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
|
|
|
|
Superseded by #615 targeting dev branch |
Changes
config.py,app/db/base.py): Increasedpool_recycleto 1800s, added configurablepool_timeout(default 15s) withge=1validation, enabledpool_use_lifo=Truea1b2c3d4e5f6) —(status, expire),(status, used_traffic),(node_id, created_at)and others withif_not_exists=Trueapp/jobs/record_usages.py): Cached dialect lookup, batched node stats in single transaction with per-rowsafe_executefallback that continues on individual failuresapp/node/__init__.py): Concurrentasyncio.gather()with snapshot revalidation and exception logging for failed probesapp/core/xray.py): Reverted TLS reads to synchronous (run_until_completeon a running loop always raisedRuntimeError)Files Changed
config.pyapp/db/base.pyapp/db/__init__.pyapp/db/migrations/versions/a1b2c3d4e5f6_add_performance_indexes.py(new)app/jobs/record_usages.pyapp/node/__init__.pyapp/core/xray.py.env.exampleVerification
Summary by CodeRabbit