fix: update Storj deployment and add stop_timeout support#25
Conversation
- Fix Storj image name (storjlabs/storagenode, not storj/storagenode) - Add stop_timeout field to service schema and orchestrator - Respect per-service stop_timeout on stop/restart (Storj needs 300s) - Add one-time setup instructions to Storj YAML - Improve Storj collector error message with LAN IP hint Closes #24
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 18 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdates Storj deployment to use Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/collectors/storj.py (1)
61-70:⚠️ Potential issue | 🟡 MinorAlso catch
httpx.ConnectTimeoutfor the same LAN-IP hint.When a LAN IP is unreachable or blackholed, httpx may raise
ConnectTimeoutinstead ofConnectError, which currently bypasses this helpful error message.Proposed fix
- except httpx.ConnectError: + except (httpx.ConnectError, httpx.ConnectTimeout):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/collectors/storj.py` around lines 61 - 70, Update the Storj collector's exception handling so the LAN-IP hint is also shown for timeout cases: in the try/except block that currently catches httpx.ConnectError (used when calling the Storj API in the collector code that returns an EarningsResult with platform/self.platform and api_url), extend the except clause to also catch httpx.ConnectTimeout (e.g. change the except to handle both httpx.ConnectError and httpx.ConnectTimeout) so both connection failures and connect timeouts produce the same helpful LAN-IP error message.app/orchestrator.py (1)
140-144:⚠️ Potential issue | 🟠 MajorUse graceful shutdown when replacing an existing container.
The redeploy code at lines 140-144 force-removes the old container, bypassing the
_get_stop_timeout()helper already used instop_service(). This skips graceful shutdown and can corrupt Storj data. The same issue exists at line 261 indeploy_raw().Proposed fix
# Remove any existing container with the same name try: old = client.containers.get(name) logger.info("Removing existing container %s", name) + old.stop(timeout=_get_stop_timeout(slug)) - old.remove(force=True) + old.remove() except NotFound: passApply the same fix to line 261 in
deploy_raw()and consider adding graceful stop toremove_service()at line 328 as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/orchestrator.py` around lines 140 - 144, Replace the force-remove of an existing container (currently using client.containers.get(...) then old.remove(force=True)) with a graceful stop followed by a normal removal: call old.stop(timeout=_get_stop_timeout(...)) (matching how stop_service() computes timeout), then call old.remove() without force so Docker can cleanly shut down; apply the same change in deploy_raw() where old.remove(force=True) is used and also add a graceful stop before removal in remove_service() so all three code paths use _get_stop_timeout() and avoid force removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/orchestrator.py`:
- Around line 227-228: Validate stop_timeout before casting by ensuring the
value is a positive integer and falling back to the default 30 on invalid input;
replace direct int(stop_timeout) usage with a small validation block that tries
to coerce to int inside a try/except (catch ValueError/TypeError), verifies the
resulting int is >= 0, and only then assigns run_kwargs["stop_timeout"] =
parsed_value, otherwise leave run_kwargs unset or explicitly set it to 30; apply
the same change for every place in orchestrator.py where stop_timeout is cast
(the occurrences around the current if stop_timeout block and the similar logic
at the 302-308 region) so deploy/stop/restart use the safe fallback.
In `@services/storage/storj.yml`:
- Around line 60-65: The documented one-time setup under the setup block drops
the Docker user flag and can create root-owned bind-mounted identity/config
files; update the docker run invocation in the setup description to include
--user $(id -u):$(id -g) so the container creates files with the calling host
user's UID:GID (preserve host ownership for the identity and storage mounts
referenced in the setup block).
---
Outside diff comments:
In `@app/collectors/storj.py`:
- Around line 61-70: Update the Storj collector's exception handling so the
LAN-IP hint is also shown for timeout cases: in the try/except block that
currently catches httpx.ConnectError (used when calling the Storj API in the
collector code that returns an EarningsResult with platform/self.platform and
api_url), extend the except clause to also catch httpx.ConnectTimeout (e.g.
change the except to handle both httpx.ConnectError and httpx.ConnectTimeout) so
both connection failures and connect timeouts produce the same helpful LAN-IP
error message.
In `@app/orchestrator.py`:
- Around line 140-144: Replace the force-remove of an existing container
(currently using client.containers.get(...) then old.remove(force=True)) with a
graceful stop followed by a normal removal: call
old.stop(timeout=_get_stop_timeout(...)) (matching how stop_service() computes
timeout), then call old.remove() without force so Docker can cleanly shut down;
apply the same change in deploy_raw() where old.remove(force=True) is used and
also add a graceful stop before removal in remove_service() so all three code
paths use _get_stop_timeout() and avoid force removal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 59a1231f-14f7-4f9d-b26e-dcf550ab5731
📒 Files selected for processing (4)
app/collectors/storj.pyapp/orchestrator.pyservices/_schema.ymlservices/storage/storj.yml
Address CodeRabbit review: - Add _parse_stop_timeout() to safely handle malformed values - Add --user $(id -u):$(id -g) to Storj setup command to preserve host ownership
Summary
storjlabs/storagenode(was incorrectlystorj/storagenode)stop_timeoutschema field: services can now specify graceful shutdown time; orchestrator passes it tocontainers.run()and respects it on stop/restartstop_timeout: 300, one-timesetup:instructions, updated notes about image name and identity generationTest plan
stop_timeoutset → verify Docker container has correct stop timeoutstop_timeout→ confirm default 30s behavior unchangedCloses #24
Summary by CodeRabbit
New Features
Bug Fixes
Documentation