-
Notifications
You must be signed in to change notification settings - Fork 1
docs(logging): document async durability semantics and testing patterns #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
IntegerAlex
wants to merge
4
commits into
master
Choose a base branch
from
v0.3.0
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
4dd30eb
docs(logging): document async durability semantics and testing patterns
IntegerAlex 794771f
refactor(async): deprecate legacy AsyncLogger and add config validation
IntegerAlex c860b2c
feat(async): add flush semantics and enhanced test utilities
IntegerAlex 75b88bf
docs: update migration guide and performance testing instructions
IntegerAlex File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,6 +167,16 @@ def setup_analytics_logging(): | |
| return logger | ||
| ``` | ||
|
|
||
| ### Flush Semantics (Important) | ||
|
|
||
| **`flush()` on async loggers does NOT guarantee durability.** | ||
|
|
||
| - **Legacy AsyncLogger** (from `kakashi.core.logger`): `flush()` only sleeps 1ms to yield to the background worker. It does **not** wait for queued messages to be written. | ||
| - **Functional async loggers** (from `kakashi.core.async_interface`): There is no `flush()` that drains the queue. Messages are processed asynchronously in batches. | ||
| - **Only `shutdown_async_backend(timeout=...)` guarantees** that all queued messages are processed before the process exits. This is the ONLY way to ensure no message loss at shutdown. | ||
|
|
||
| **Do not rely on `flush()` for application logic.** Treat async logging as best-effort telemetry. For critical audit trails or state, write to a reliable store (database, message queue) and log secondarily. | ||
|
|
||
| ### Error Handling and Recovery | ||
|
|
||
| ```python | ||
|
|
@@ -181,27 +191,45 @@ def graceful_shutdown(): | |
| print(f"Error during shutdown: {e}") | ||
| ``` | ||
|
|
||
| ### Best Practices | ||
| ### Application Shutdown Patterns | ||
|
|
||
| 1. **Queue Sizing**: Start with 25,000-50,000 max queue size, adjust based on throughput | ||
| 2. **Worker Count**: Use 2-4 workers for most applications, 8+ for extreme throughput | ||
| 3. **Batch Sizing**: Start with 200-500 batch size, increase for higher throughput | ||
| 4. **Monitoring**: Monitor queue sizes and worker health in production | ||
| 5. **Error Handling**: Implement graceful shutdown to prevent message loss | ||
| 6. **Graceful Shutdown**: Ensure all queued messages are processed on shutdown | ||
| **You must call `shutdown_async_backend()` at application exit** to prevent message loss. If the process exits without shutdown, queued messages may never be written. | ||
|
|
||
| #### Recommended: Register with atexit | ||
|
|
||
| ```python | ||
| import atexit | ||
| from kakashi.core.async_interface import shutdown_async_backend | ||
|
|
||
| # Register shutdown handler | ||
| # Register early (e.g. at startup) | ||
| atexit.register(shutdown_async_backend) | ||
|
|
||
| # Or manually at exit: | ||
| def cleanup(): | ||
| shutdown_async_backend(timeout=10.0) | ||
| # Or with custom timeout for high-volume apps: | ||
| atexit.register(lambda: shutdown_async_backend(timeout=10.0)) | ||
| ``` | ||
|
|
||
| #### Timeout Considerations | ||
|
|
||
| - **Default (5s)**: Suitable for most applications | ||
| - **High-volume**: Use 10-30s if you expect large queues or slow disk/network I/O | ||
| - **Tests**: Use shorter timeouts (1-2s) to avoid hanging test suites | ||
| - If timeout is exceeded, some messages may be dropped during shutdown | ||
|
|
||
| #### What Happens Without Shutdown | ||
|
|
||
| - Worker threads are daemon threads in some configurations; the process can exit before the queue drains | ||
| - Even with non-daemon workers, abrupt exit (SIGKILL, `os._exit`) bypasses atexit | ||
| - Result: silent message loss. Always register shutdown for production services. | ||
|
|
||
| ### Best Practices | ||
|
|
||
| 1. **Queue Sizing**: Start with 25,000-50,000 max queue size, adjust based on throughput | ||
| 2. **Worker Count**: Use 2-4 workers for most applications, 8+ for extreme throughput | ||
| 3. **Batch Sizing**: Start with 200-500 batch size, increase for higher throughput | ||
| 4. **Monitoring**: Monitor queue sizes and worker health in production | ||
| 5. **Error Handling**: Implement graceful shutdown to prevent message loss | ||
| 6. **Graceful Shutdown**: Ensure all queued messages are processed on shutdown (see Application Shutdown Patterns above) | ||
|
|
||
| --- | ||
|
|
||
| *Last updated: 2025-08-27* | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Stale "Last updated" date. The date reads 🤖 Prompt for AI Agents |
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| --- | ||
| id: common-pitfalls | ||
| title: Common Pitfalls with Async Loggers | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| This guide documents common mistakes when using Kakashi's async logging and how to avoid them. | ||
|
|
||
| ## 1. Relying on `flush()` for Durability | ||
|
|
||
| **Wrong**: Assuming `flush()` waits for all queued messages to be written. | ||
|
|
||
| ```python | ||
| async_logger.info("Critical audit event") | ||
| async_logger.flush() # Does NOT guarantee durability! | ||
| # Message may still be in queue - process could exit before it's written | ||
| ``` | ||
|
|
||
| **Why**: `AsyncLogger.flush()` only sleeps 1ms to yield to the background worker. Functional async loggers have no `flush()` that drains the queue. | ||
|
|
||
| **Right**: Call `shutdown_async_backend(timeout=...)` at application exit. | ||
|
|
||
| ```python | ||
| import atexit | ||
| from kakashi.core.async_interface import shutdown_async_backend | ||
| atexit.register(shutdown_async_backend) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 2. Not Shutting Down Async Backends | ||
|
|
||
| **Wrong**: Letting the process exit without calling shutdown. | ||
|
|
||
| ```python | ||
| # main.py | ||
| logger = get_async_logger(__name__) | ||
| logger.info("Server stopping") | ||
| # Process exits - queued messages are lost | ||
| ``` | ||
|
|
||
| **Why**: Worker threads may not get CPU time to drain the queue before the process exits. Daemon threads are terminated immediately. | ||
|
|
||
| **Right**: Register shutdown with atexit or call it explicitly before exit. | ||
|
|
||
| ```python | ||
| atexit.register(lambda: shutdown_async_backend(timeout=10.0)) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 3. Using Async Loggers in Tests Without Cleanup | ||
|
|
||
| **Wrong**: Asserting on log output immediately after async logging. | ||
|
|
||
| ```python | ||
| def test_login(): | ||
| logger = get_async_logger("auth") | ||
| logger.info("User logged in", user_id="123") | ||
| assert "User logged in" in open("app.log").read() # Flaky - message may not be written yet | ||
| ``` | ||
|
|
||
| **Why**: Messages are enqueued asynchronously. The assertion runs before the worker processes the queue. | ||
|
|
||
| **Right**: Shut down the backend before asserting, or use a sync logger for this test. | ||
|
|
||
| ```python | ||
| def test_login(): | ||
| logger = get_async_logger("auth") | ||
| logger.info("User logged in", user_id="123") | ||
| shutdown_async_backend(timeout=2.0) | ||
| assert "User logged in" in open("app.log").read() | ||
| ``` | ||
|
|
||
| See [Testing Patterns](/docs/guides/testing-patterns) for more. | ||
|
|
||
| --- | ||
|
|
||
| ## 4. Queue Overflow and Silent Message Drops | ||
|
|
||
| **Wrong**: Assuming every log call results in a written message. | ||
|
|
||
| ```python | ||
| # High throughput - queue fills up | ||
| for i in range(1_000_000): | ||
| logger.info("Event", id=i) # Some messages may be dropped | ||
| ``` | ||
|
|
||
| **Why**: When the queue is full, the configured overflow strategy (e.g. `drop_oldest`) causes silent drops. There is no exception. | ||
|
|
||
| **Right**: Monitor queue size via `get_async_stats()`, size the queue appropriately, and treat async logging as best-effort for non-critical telemetry. | ||
|
|
||
| ```python | ||
| stats = get_async_stats() | ||
| if stats.get("queue_size", 0) > 0.8 * stats.get("max_queue_size", 1): | ||
| # Consider backpressure or alerting | ||
| pass | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 5. Mixing Legacy and Functional Async APIs | ||
|
|
||
| **Wrong**: Calling both shutdown functions and getting confused about which backend is active. | ||
|
|
||
| ```python | ||
| from kakashi import get_async_logger, shutdown_async_logging | ||
| from kakashi.core.async_interface import get_async_logger as get_func_async, shutdown_async_backend | ||
|
|
||
| # Two different systems - easy to mix up | ||
| legacy = get_async_logger("a") | ||
| func = get_func_async("b") | ||
| # Which shutdown drains which queue? | ||
| ``` | ||
|
|
||
| **Why**: Kakashi has two separate async implementations. Each has its own queue and shutdown. | ||
|
|
||
| **Right**: Stick to one system. For new code, use the functional API throughout. | ||
|
|
||
| ```python | ||
| from kakashi.core.async_interface import get_async_logger, shutdown_async_backend | ||
| logger = get_async_logger(__name__) | ||
| # ... | ||
| shutdown_async_backend() | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 6. Blocking on Full Queue with Default Config | ||
|
|
||
| **Wrong**: Assuming `put` always succeeds; in `block` mode a full queue can block the calling thread. | ||
|
|
||
| ```python | ||
| # Default overflow_strategy is "block" | ||
| for i in range(100_000): | ||
| logger.info("x") # Can block indefinitely if workers are slow | ||
| ``` | ||
|
|
||
| **Right**: For high-throughput or latency-sensitive paths, use `queue_overflow_strategy="drop_oldest"` and accept potential drops, or increase queue size and worker count. | ||
|
|
||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| | Pitfall | Mitigation | | ||
| |---------|------------| | ||
| | Relying on `flush()` | Use `shutdown_async_backend()` at exit | | ||
| | No shutdown | Register `atexit.register(shutdown_async_backend)` | | ||
| | Flaky tests | Shut down before asserting, or use sync logger | | ||
| | Queue overflow | Monitor stats, size queue, treat as best-effort | | ||
| | Mixing APIs | Use functional async only for new code | | ||
| | Blocking on full queue | Use `drop_oldest` or increase capacity | |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version in footer appears stale.
The footer says "Kakashi v0.2.1" but
kakashi/__init__.pydeclares__version__ = "2.0.0"and this PR targets the v0.3.0 branch. Consider updating this to match the actual release version.🤖 Prompt for AI Agents