Skip to content

fix: _filter_under_community_level silently drops entities without community assignments#2375

Open
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-2348-nan-filter-community-level
Open

fix: _filter_under_community_level silently drops entities without community assignments#2375
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-2348-nan-filter-community-level

Conversation

@hanhan761
Copy link
Copy Markdown

Fixes #2348

Problem

_filter_under_community_level in indexer_adapters.py filters entities using df[df.level <= community_level]. Entities without community assignments have NaN in the level column after the left join with community membership data. Since NaN <= x evaluates to False in pandas/NumPy, all unassigned entities are silently dropped with no warning.

This causes CLI query commands (graphrag query --method local/global/drift) to operate on a drastically reduced entity set. Small, sparse, or domain-specific datasets are especially vulnerable since isolated nodes are routinely excluded by Leiden community detection.

Fix

Changed the filter to also preserve rows where level is NaN:

df[(df.level <= community_level) | df.level.isna()]

This is semantically correct: "not in any community" is not the same as "in a community above the requested level." Entities without community assignments should pass through the filter.

Test

Added tests/unit/query/test_indexer_adapters.py with 5 test cases covering:

  • NaN levels preserved alongside valid levels
  • Entities above the threshold correctly filtered out
  • All entities below threshold preserved
  • All NaN (no community assignments) preserved
  • Mixed NaN and above-threshold levels

@hanhan761 hanhan761 requested a review from a team as a code owner May 30, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_filter_under_community_level silently drops all entities without community assignments due to NaN comparison

1 participant