Skip to content

fix: preserve entities without community assignment in _filter_under_community_level (fixes #2348)#2370

Open
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-2348-nan-community-filter
Open

fix: preserve entities without community assignment in _filter_under_community_level (fixes #2348)#2370
hanhan761 wants to merge 1 commit into
microsoft:mainfrom
hanhan761:fix-2348-nan-community-filter

Conversation

@hanhan761
Copy link
Copy Markdown

@hanhan761 hanhan761 commented May 30, 2026

Summary

Preserves entities without community assignments in _filter_under_community_level instead of silently dropping them.

Issue

Closes #2348

Root Cause

The filter df[df.level <= community_level] evaluates to False for any row where level is NaN. Entities not assigned to any community by the Leiden algorithm have level=NaN after the left join with community membership data. This silently drops all unassigned entities — in some datasets, 93% or more of the entity set — causing CLI queries to return empty or degraded results with no warning.

Fix

Changed _filter_under_community_level to use the pandas-safe expression:

df[(df.level <= community_level) | df.level.isna()]

This preserves entities without community assignments while still applying the level filter to assigned entities. Entities with level=NaN are not "above the requested level" — they simply have no community, and should not be excluded.

Edge Cases Not Covered

  • This does not add a warning when a large percentage of entities lack community assignments (the issue author suggested that as an additional improvement). The scope is limited to fixing the data loss.
  • The community_level=None path (which skips the filter entirely) is unaffected.

Verification

3 new unit tests:

  • NaN rows preserved alongside assigned rows within level
  • All-assigned case (regression, unchanged behavior)
  • All-NaN case (all preserved)

@hanhan761 hanhan761 requested a review from a team as a code owner May 30, 2026 08:38
@hanhan761 hanhan761 force-pushed the fix-2348-nan-community-filter branch 2 times, most recently from 06b21f5 to 440a8cc Compare May 30, 2026 08:40
Closes microsoft#2348

Entities without community assignments have level=NaN after the left
join with community membership data. The filter df[df.level <= x]
evaluates to False for NaN in NumPy/pandas, silently dropping all
unassigned entities with no warning.

This causes CLI queries (graphrag query --method local/global/drift)
to operate on a drastically reduced entity set, often producing empty
or degraded results.

Fix: change the filter to df[(df.level <= community_level) | df.level.isna()]
so that entities without community assignments are preserved.
@hanhan761 hanhan761 force-pushed the fix-2348-nan-community-filter branch from 440a8cc to cbcb854 Compare May 30, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_filter_under_community_level silently drops all entities without community assignments due to NaN comparison

1 participant