Skip to content

Implement mdl metric#1375

Open
popchanovska wants to merge 2 commits into
SkBlaz:masterfrom
popchanovska:fix/mdl_metric
Open

Implement mdl metric#1375
popchanovska wants to merge 2 commits into
SkBlaz:masterfrom
popchanovska:fix/mdl_metric

Conversation

@popchanovska

Copy link
Copy Markdown
Contributor

Description

Implements MDL metric for community detection. MDL gives a partition a score in bits. Lower is better. A good grouping mean dense inside communities and sparse between them compresses the network.

Motivation

The mdl had placeholder value 0.0, was not implemented.

Changes

  • In multilayer_quality_metrics.py:
    Added mdl_score(). It scores each layer on its own and sums the results.
  • In multilayer_quality_metrics.py:
    Added _mdl_single_layer(). It computes the bits for one layer: a model cost (bits to write down which community each node is in) plus a data cost (bits to describe the edges given those communities).
  • Scores each layer separately.
  • Only count edges that stay inside a layer. Edges crossing between layers are left out.
  • Skips cycle-edges (ex: A to A).
  • Handles directed graphs. A directed graph allows edges in both directions, so the "maximum possible edges" count is adjusted to match.
  • In autocommunity_executor.py:
    import mdl_score.

Testing

  • make test

Breaking Changes

None.

Known Limitations

  • Cross-layer edges are not handled.
  • A node being in several layers is handled once per layer.
  • The model cost is a simple version. It does not add a separate cost for the community structure itself, or other complex computations.

@popchanovska popchanovska requested a review from SkBlaz as a code owner June 7, 2026 16:09
@SkBlaz

SkBlaz commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Nice work, some polish required, @popchanovska

Needs tests for mdl_score before merge.
The PR adds ~151 lines for MDL scoring but no tests in the changed files list. At minimum, add tests for empty partitions, single-layer graphs, multi-layer graphs, directed graphs, singleton communities, missing partition nodes, and inter-layer edges.

Partial partitions can be unfairly rewarded.
mdl_score builds layer partitions only from partition.items(), and _mdl_single_layer sets n = len(layer_partition). Edges whose endpoints are missing from the partition are skipped. This means an algorithm returning only a subset of nodes may get a smaller MDL simply because unassigned nodes and their edges are ignored.

Inter-layer edges are ignored.
The code only appends edges when u_layer == v_layer, so cross-layer coupling edges do not affect MDL. That may be acceptable for an intra-layer-only score, but it is surprising for a multilayer MDL metric used by AutoCommunity.

Parallel/weighted edges are not properly modeled.
The implementation clamps p to 1.0 when edge counts exceed simple-graph capacity. That prevents nan, but it silently turns multiedge density overflow into a perfect-density block. If py3plex can hold multigraph or weighted edges, MDL should either reject those inputs, collapse them explicitly, or use a multigraph/weighted likelihood.

Performance can degrade for highly fragmented partitions.
_mdl_single_layer loops over all community pairs per layer. If a partition has many singleton communities, this becomes roughly quadratic in the number of communities per layer. Probably okay for small graphs, but worth noting or testing on expected AutoCommunity graph sizes.

Once these are addressed, we have a solid version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants