Skip to content

Conversation

@hayotensor
Copy link
Contributor

@hayotensor hayotensor commented Jan 2, 2026

What was wrong?

Issue

#1124

There is a race condition in gossipsub.py. When a peer disconnects, it is removed from peer_protocol immediately. However, the heartbeat loop (which runs every 0.5s or so) might still have that peer in its list of candidates. The code is trying to access self.peer_protocol[peer_id] blindly, assuming that if a peer is known it must have protocol info, which leads to a crash when it’s missing.

Logs

Traceback (most recent call last):
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/base.py", line 334, in _run_and_manage_task
    await task.run()
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 118, in run
    await self._async_fn(*self._async_fn_args)
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 573, in heartbeat
    peers_to_graft, peers_to_prune = self.mesh_heartbeat()
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 630, in mesh_heartbeat
    selected_peers = self._get_in_topic_gossipsub_peers_from_minus(
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 825, in _get_in_topic_gossipsub_peers_from_minus
    gossipsub_peers_in_topic = {
  File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 828, in <setcomp>
    if self.peer_protocol[peer_id]
KeyError: <libp2p.peer.id.ID (12D3KooWF963f4jiFX26xDKu7BrqtVYTx4Jk8rUQQUxwiJQjVFWH)>
2026-01-02 08:55:00,460 [ERROR] [__main__] Fatal error: Exceptions from Trio nursery (1 sub-exception)
  + Exception Group Traceback (most recent call last):
  |   File "/home/bob/py-libp2p-subnet-gossip/subnet/cli/run_node.py", line 304, in main
  |     trio.run(server.run)
  |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 2549, in run
  |     raise runner.main_task_outcome.error
  |   File "/home/bob/py-libp2p-subnet-gossip/subnet/server/server_v2.py", line 315, in run
  |     # Start the peer-store cleanup task, TTL
  |   File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
  |     await self.gen.athrow(typ, value, traceback)
  |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/host/basic_host.py", line 357, in _run
  |     async with background_trio_service(network):
  |   File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
  |     await self.gen.athrow(typ, value, traceback)
  |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 456, in background_trio_service
  |     async with trio.open_nursery() as nursery:
  |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1125, in __aexit__
  |     raise combined_error_from_nursery
  | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 461, in background_trio_service
    |     yield manager
    |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/host/basic_host.py", line 374, in _run
    |     yield
    |   File "/home/bob/py-libp2p-subnet-gossip/subnet/server/server_v2.py", line 315, in run
    |     # Start the peer-store cleanup task, TTL
    |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1125, in __aexit__
    |     raise combined_error_from_nursery
    | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
    +-+---------------- 1 ----------------
      | Exception Group Traceback (most recent call last):
      |   File "/home/bob/py-libp2p-subnet-gossip/subnet/server/server_v2.py", line 342, in run
      |     subnet_info_tracker = SubnetInfoTracker(
      |   File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
      |     await self.gen.athrow(typ, value, traceback)
      |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 456, in background_trio_service
      |     async with trio.open_nursery() as nursery:
      |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1125, in __aexit__
      |     raise combined_error_from_nursery
      | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
      +-+---------------- 1 ----------------
        | Exception Group Traceback (most recent call last):
        |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 461, in background_trio_service
        |     yield manager
        |   File "/home/bob/py-libp2p-subnet-gossip/subnet/server/server_v2.py", line 357, in run
        |     async with background_trio_service(gossipsub):
        |   File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
        |     await self.gen.athrow(typ, value, traceback)
        |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 456, in background_trio_service
        |     async with trio.open_nursery() as nursery:
        |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1125, in __aexit__
        |     raise combined_error_from_nursery
        | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
        +-+---------------- 1 ----------------
          | Exception Group Traceback (most recent call last):
          |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 461, in background_trio_service
          |     yield manager
          |   File "/home/bob/py-libp2p-subnet-gossip/subnet/server/server_v2.py", line 358, in run
          |     logger.info("Pubsub and GossipSub services started.")
          |   File "/usr/lib/python3.10/contextlib.py", line 217, in __aexit__
          |     await self.gen.athrow(typ, value, traceback)
          |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 456, in background_trio_service
          |     async with trio.open_nursery() as nursery:
          |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1125, in __aexit__
          |     raise combined_error_from_nursery
          | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
          +-+---------------- 1 ----------------
            | Exception Group Traceback (most recent call last):
            |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 248, in run
            |     raise ExceptionGroup(
            | exceptiongroup.ExceptionGroup: Encountered multiple Exceptions:  (1 sub-exception)
            +-+---------------- 1 ----------------
              | Traceback (most recent call last):
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/base.py", line 334, in _run_and_manage_task
              |     await task.run()
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/tools/async_service/trio_service.py", line 118, in run
              |     await self._async_fn(*self._async_fn_args)
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 573, in heartbeat
              |     peers_to_graft, peers_to_prune = self.mesh_heartbeat()
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 630, in mesh_heartbeat
              |     selected_peers = self._get_in_topic_gossipsub_peers_from_minus(
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 825, in _get_in_topic_gossipsub_peers_from_minus
              |     gossipsub_peers_in_topic = {
              |   File "/home/bob/py-libp2p-subnet-gossip/.venv/lib/python3.10/site-packages/libp2p/pubsub/gossipsub.py", line 828, in <setcomp>
              |     if self.peer_protocol[peer_id]
              | KeyError: <libp2p.peer.id.ID (12D3KooWF963f4jiFX26xDKu7BrqtVYTx4Jk8rUQQUxwiJQjVFWH)>

How was it fixed?

In Pubsub._get_in_topic_gossipsub_peers_from_minus to use self.peer_protocol.get(peer_id) instead of direct dictionary access via self.peer_protocol[peer_id]. This safely ignores peers that are partially disconnected during the heartbeat cycle.

Summary of approach.

Use dict.get() to possibly return None, instead of directly trying to access the element in a dictionary.

To-Do

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

Cute animal picture

@sumanjeet0012
Copy link
Contributor

@hayotensor
That is a good suggestion. We can either use the get method or a try–catch block when accessing the value directly. For this scenario, using the get method should be sufficient.

@seetadev
Please run the CI/CD pipeline.

@sumanjeet0012
Copy link
Contributor

@hayotensor
Please create an issue for this PR and add the corresponding newsfragment file.

@hayotensor
Copy link
Contributor Author

@sumanjeet0012 Added the newsfragment.

I kept the get method rather than using a try except due to more concise code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants