Skip to content

[WORKFLOW SDK BUG] workflow connection to sidecar doesn't auto-heal #900

@olitomlinson

Description

@olitomlinson

1.17 RC2

Expected Behavior

When dapr goes away (think transient error, sidecar restart etc) the python workflow worker fails to reconnect automatically when dapr becomes available again.

FYI Dotnet SDK has full auto-healing tolerating the loss of either app or sidecar.

Actual Behavior

The python workflow worker doesn't successfully connect back to the sidecar.

The only way to mitigate is to stop the python app, and then restart it.

Steps to Reproduce the Problem

So far I've only reproduced this during a multi-app scenario (dotnet workflow + python Activity) but I'm not suggesting that this is the only scenario, it could very well occur on a normal python workflow app (not a multi-app workflow) I've just not tested it yet.

Anyway...

While the python app is mid-way through processing an inflight Activity, kill the python sidecar. Observe the logs of the python app. The logs indicate that the connection is dropped, but there is no evidence from the logs that the connection is attempting to re-establish.

Logs of my python app processing an activity, and then failing to persist the results of the completed activity (due to connection being dropped) and then subsequently restarting the Activity, but only when I start and stop the python app again. No auto-healing is occurring.

2026-01-28 23:37:05.752 | 2026-01-28 23:37:05,751 - semantic_search.activities.embedding_activity - WARNING - 🐌 SANDBAG MODE: Sleeping for 40 seconds before processing (artificial delay for testing)
2026-01-28 23:37:35.455 | 2026-01-28 23:37:35,454 - durabletask-worker - WARNING - Stream reader: RPC error (code=StatusCode.UNAVAILABLE): <_MultiThreadedRendezvous of RPC that terminated with:
2026-01-28 23:37:35.455 | 	status = StatusCode.UNAVAILABLE
2026-01-28 23:37:35.455 | 	details = "Socket closed"
2026-01-28 23:37:35.455 | 	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.18.0.8:50001 {grpc_status:14, grpc_message:"Socket closed"}"
2026-01-28 23:37:35.455 | >
2026-01-28 23:37:35.455 | 2026-01-28 23:37:35.454 durabletask-worker WARNING: Stream reader: RPC error (code=StatusCode.UNAVAILABLE): <_MultiThreadedRendezvous of RPC that terminated with:
2026-01-28 23:37:35.455 | 	status = StatusCode.UNAVAILABLE
2026-01-28 23:37:35.455 | 	details = "Socket closed"
2026-01-28 23:37:35.455 | 	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.18.0.8:50001 {grpc_status:14, grpc_message:"Socket closed"}"
2026-01-28 23:37:35.455 | >
2026-01-28 23:37:35.456 | 2026-01-28 23:37:35,455 - durabletask-worker - INFO - Work item stream ended normally
2026-01-28 23:37:35.456 | 2026-01-28 23:37:35,456 - durabletask-worker - INFO - No longer listening for work items
2026-01-28 23:37:35.456 | 2026-01-28 23:37:35.455 durabletask-worker INFO: Work item stream ended normally
2026-01-28 23:37:35.456 | 2026-01-28 23:37:35.456 durabletask-worker INFO: No longer listening for work items
2026-01-28 23:37:45.752 | 2026-01-28 23:37:45,751 - semantic_search.activities.embedding_activity - INFO - Sandbag delay complete, proceeding with embedding generation
2026-01-28 23:37:45.752 | 2026-01-28 23:37:45,751 - semantic_search.activities.embedding_activity - INFO - Generating embeddings for 1 text(s) using model: all-MiniLM-L6-v2
2026-01-28 23:37:48.376 | 2026-01-28 23:37:48,376 - semantic_search.activities.embedding_activity - INFO - Loading model on device: cpu (no GPU available)
2026-01-28 23:37:48.376 | 2026-01-28 23:37:48,376 - semantic_search.activities.embedding_activity - INFO - Loading sentence transformer model: all-MiniLM-L6-v2
2026-01-28 23:37:48.376 | 2026-01-28 23:37:48,376 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2026-01-28 23:37:55.399 | 2026-01-28 23:37:55,399 - semantic_search.activities.embedding_activity - INFO - Model all-MiniLM-L6-v2 successfully loaded on cpu
2026-01-28 23:37:55.471 | 2026-01-28 23:37:55,471 - semantic_search.activities.embedding_activity - INFO - Generated 1 embeddings using all-MiniLM-L6-v2 (dim=384) in 49720.09ms on cpu
2026-01-28 23:37:55.471 | 
2026-01-28 23:37:55.475 | 2026-01-28 23:37:55,472 - durabletask-worker - ERROR - Failed to deliver activity response for 'generate_embeddings#0' of orchestration ID 'semantic-search-5522a6df' to sidecar: Cannot invoke RPC on closed channel!
2026-01-28 23:37:55.475 | Traceback (most recent call last):
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/durabletask/worker.py", line 821, in _execute_activity
2026-01-28 23:37:55.475 |     stub.CompleteActivityTask(res)
2026-01-28 23:37:55.475 | 2026-01-28 23:37:55.472 durabletask-worker ERROR: Failed to deliver activity response for 'generate_embeddings#0' of orchestration ID 'semantic-search-5522a6df' to sidecar: Cannot invoke RPC on closed channel!
2026-01-28 23:37:55.475 | Traceback (most recent call last):
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/durabletask/worker.py", line 821, in _execute_activity
2026-01-28 23:37:55.475 |     stub.CompleteActivityTask(res)
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1163, in __call__
2026-01-28 23:37:55.475 |     state, call = self._blocking(
2026-01-28 23:37:55.475 |                   ^^^^^^^^^^^^^^^
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1134, in _blocking
2026-01-28 23:37:55.475 |     call = self._channel.segregated_call(
2026-01-28 23:37:55.475 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 547, in grpc._cython.cygrpc.Channel.segregated_call
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 403, in grpc._cython.cygrpc._segregated_call
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 407, in grpc._cython.cygrpc._segregated_call
2026-01-28 23:37:55.475 | ValueError: Cannot invoke RPC on closed channel!
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1163, in __call__
2026-01-28 23:37:55.475 |     state, call = self._blocking(
2026-01-28 23:37:55.475 |                   ^^^^^^^^^^^^^^^
2026-01-28 23:37:55.475 |   File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1134, in _blocking
2026-01-28 23:37:55.475 |     call = self._channel.segregated_call(
2026-01-28 23:37:55.475 |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 547, in grpc._cython.cygrpc.Channel.segregated_call
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 403, in grpc._cython.cygrpc._segregated_call
2026-01-28 23:37:55.475 |   File "src/python/grpcio/grpc/_cython/_cygrpc/channel.pyx.pxi", line 407, in grpc._cython.cygrpc._segregated_call
2026-01-28 23:37:55.475 | ValueError: Cannot invoke RPC on closed channel!
2026-01-28 23:39:40.001 | INFO:     172.18.0.8:57460 - "GET /dapr/config HTTP/1.1" 404 Not Found
2026-01-28 23:42:02.898 | INFO:     Shutting down
2026-01-28 23:42:03.001 | INFO:     Waiting for application shutdown.
2026-01-28 23:42:03.003 | INFO:     Application shutdown complete.
2026-01-28 23:42:03.006 | INFO:     Finished server process [1]
2026-01-28 23:42:03.007 | 2026-01-28 23:42:03.007 durabletask-worker INFO: Stopping gRPC worker...
2026-01-28 23:42:03.007 | 2026-01-28 23:42:03,006 - __main__ - INFO - Received signal 15, initiating shutdown...
2026-01-28 23:42:03.007 | 2026-01-28 23:42:03,007 - __main__ - INFO - Shutting down services...
2026-01-28 23:42:03.007 | 2026-01-28 23:42:03,007 - semantic_search.workflow_manager - INFO - Stopping Dapr workflow runtime
2026-01-28 23:42:03.007 | 2026-01-28 23:42:03,007 - durabletask-worker - INFO - Stopping gRPC worker...
2026-01-28 23:42:03.009 | 2026-01-28 23:42:03.009 durabletask-worker INFO: Worker shutdown completed
2026-01-28 23:42:03.009 | 2026-01-28 23:42:03,009 - durabletask-worker - INFO - Worker shutdown completed
2026-01-28 23:42:03.010 | 2026-01-28 23:42:03,009 - semantic_search.workflow_manager - INFO - Dapr workflow runtime stopped
2026-01-28 23:42:03.010 | 2026-01-28 23:42:03,009 - __main__ - INFO - All services stopped
2026-01-28 23:42:08.053 | /usr/local/lib/python3.11/site-packages/dapr/conf/helpers.py:43: UserWarning: http and https schemes are deprecated for grpc, use myhost?tls=false or myhost?tls=true instead
2026-01-28 23:42:08.053 |   warn(
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - __main__ - INFO - Starting Hello World App with Dapr Workflows
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - __main__ - INFO - Dapr HTTP Port: Not set
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - __main__ - INFO - Dapr GRPC Port: Not set
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - __main__ - INFO - Starting Dapr workflow runtime...
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - semantic_search.workflow_manager - INFO - Starting Dapr workflow runtime
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08,077 - durabletask-worker - INFO - Starting gRPC worker that connects to dns:semantic-search-dapr:50001
2026-01-28 23:42:08.078 | 2026-01-28 23:42:08.077 durabletask-worker INFO: Starting gRPC worker that connects to dns:semantic-search-dapr:50001
2026-01-28 23:42:08.087 | 2026-01-28 23:42:08,087 - durabletask-worker - INFO - Created fresh connection to dns:semantic-search-dapr:50001
2026-01-28 23:42:08.087 | 2026-01-28 23:42:08.087 durabletask-worker INFO: Created fresh connection to dns:semantic-search-dapr:50001
2026-01-28 23:42:08.088 | 2026-01-28 23:42:08,088 - durabletask-worker - INFO - Successfully connected to dns:semantic-search-dapr:50001. Waiting for work items...
2026-01-28 23:42:08.088 | 2026-01-28 23:42:08.088 durabletask-worker INFO: Successfully connected to dns:semantic-search-dapr:50001. Waiting for work items...
2026-01-28 23:42:08.102 | 2026-01-28 23:42:08,102 - semantic_search.activities.embedding_activity - WARNING - 🐌 SANDBAG MODE: Sleeping for 40 seconds before processing (artificial delay for testing)
2026-01-28 23:42:10.082 | 2026-01-28 23:42:10,081 - semantic_search.workflow_manager - INFO - Dapr workflow runtime started successfully
2026-01-28 23:42:10.082 | 2026-01-28 23:42:10,082 - __main__ - INFO - Dapr workflow runtime started
2026-01-28 23:42:10.082 | 2026-01-28 23:42:10,082 - __main__ - INFO - Starting web server...
2026-01-28 23:42:10.092 | INFO:     Started server process [1]
2026-01-28 23:42:10.092 | INFO:     Waiting for application startup.
2026-01-28 23:42:10.093 | INFO:     Application startup complete.
2026-01-28 23:42:10.093 | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2026-01-28 23:42:48.103 | 2026-01-28 23:42:48,102 - semantic_search.activities.embedding_activity - INFO - Sandbag delay complete, proceeding with embedding generation
2026-01-28 23:42:48.103 | 2026-01-28 23:42:48,103 - semantic_search.activities.embedding_activity - INFO - Generating embeddings for 1 text(s) using model: all-MiniLM-L6-v2
2026-01-28 23:42:51.104 | 2026-01-28 23:42:51,104 - semantic_search.activities.embedding_activity - INFO - Loading model on device: cpu (no GPU available)
2026-01-28 23:42:51.104 | 2026-01-28 23:42:51,104 - semantic_search.activities.embedding_activity - INFO - Loading sentence transformer model: all-MiniLM-L6-v2
2026-01-28 23:42:51.104 | 2026-01-28 23:42:51,104 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2026-01-28 23:42:54.109 | 2026-01-28 23:42:54,109 - semantic_search.activities.embedding_activity - INFO - Model all-MiniLM-L6-v2 successfully loaded on cpu
2026-01-28 23:42:54.230 | 2026-01-28 23:42:54,230 - semantic_search.activities.embedding_activity - INFO - Generated 1 embeddings using all-MiniLM-L6-v2 (dim=384) in 46128.03ms on cpu
2026-01-28 23:42:54.230 | 
2026-01-28 23:42:54.275 | 2026-01-28 23:42:54,274 - semantic_search.activities.embedding_activity - INFO - Generating embeddings for 5 text(s) using model: all-MiniLM-L6-v2
2026-01-28 23:42:54.278 | 2026-01-28 23:42:54,278 - semantic_search.activities.embedding_activity - WARNING - 🐌 SANDBAG MODE: Sleeping for 40 seconds before processing (artificial delay for testing)
2026-01-28 23:42:54.340 | 
2026-01-28 23:42:54.341 | 2026-01-28 23:42:54,341 - semantic_search.activities.embedding_activity - INFO - Generated 5 embeddings using all-MiniLM-L6-v2 (dim=384) in 66.43ms on cpu
2026-01-28 23:42:54.363 | 2026-01-28 23:42:54,363 - semantic_search.activities.embedding_activity - INFO - Generating embeddings for 5 text(s) using model: all-MiniLM-L6-v2
2026-01-28 23:42:54.365 | 2026-01-28 23:42:54,365 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.7321
2026-01-28 23:42:54.383 | 2026-01-28 23:42:54,383 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.7321
2026-01-28 23:42:54.385 | 2026-01-28 23:42:54,385 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.0983
2026-01-28 23:42:54.411 | 2026-01-28 23:42:54,411 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.0983
2026-01-28 23:42:54.413 | 2026-01-28 23:42:54,413 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.3279
2026-01-28 23:42:54.470 | 2026-01-28 23:42:54,469 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.0612
2026-01-28 23:42:54.471 | 2026-01-28 23:42:54,471 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.3279
2026-01-28 23:42:54.507 | 2026-01-28 23:42:54,506 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.0612
2026-01-28 23:42:54.519 | 2026-01-28 23:42:54,518 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.5146
2026-01-28 23:42:54.556 | 2026-01-28 23:42:54,556 - semantic_search.activities.embedding_activity - INFO - Generated 5 embeddings using all-MiniLM-L6-v2 (dim=384) in 193.18ms on cpu
2026-01-28 23:42:54.556 | 
2026-01-28 23:42:54.561 | 2026-01-28 23:42:54,561 - semantic_search.activities.similarity_activity - INFO - Computed similarity: 0.5146
2026-01-28 23:43:27.703 | Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00, 14.50it/s]
2026-01-28 23:43:27.703 | Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00,  8.43it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00,  8.42it/s]
2026-01-28 23:43:27.703 | Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00, 16.28it/s]
2026-01-28 23:43:27.703 | Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00,  5.32it/s]
2026-01-28 23:43:27.703 | Batches: 100%|██████████| 1/1 [00:00<00:00,  5.32it/s]
2026-01-28 23:43:34.281 | 2026-01-28 23:43:34,280 - semantic_search.activities.embedding_activity - INFO - Sandbag delay complete, proceeding with embedding generation
2026-01-28 23:43:34.282 | 2026-01-28 23:43:34,281 - semantic_search.activities.embedding_activity - INFO - Generating embeddings for 1 text(s) using model: all-MiniLM-L6-v2
2026-01-28 23:43:34.356 | 
2026-01-28 23:43:34.366 | Batches:   0%|          | 0/1 [00:00<?, ?it/s]
2026-01-28 23:43:34.366 | Batches: 100%|██████████| 1/1 [00:00<00:00, 16.67it/s]
2026-01-28 23:43:34.356 | 2026-01-28 23:43:34,354 - semantic_search.activities.embedding_activity - INFO - Generated 1 embeddings using all-MiniLM-L6-v2 (dim=384) in 40076.24ms on cpu

Release Note

RELEASE NOTE:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions