Skip to content

Add basic element filter#559

Draft
ElliottKasoar wants to merge 35 commits into
ddmms:mainfrom
ElliottKasoar:add-element-filter-basic
Draft

Add basic element filter#559
ElliottKasoar wants to merge 35 commits into
ddmms:mainfrom
ElliottKasoar:add-element-filter-basic

Conversation

@ElliottKasoar
Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar commented May 19, 2026

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

Summary

This is a (somewhat) simplified version of #512, in that each test will only filter based on the superset of elements it contains, and so we will not partially update scores.

However, this still requires most of the mechanics of the full filter:

  • Introduces 'mock' calculator to ensure all test outputs are generated and saved.
  • Introduces (formally unused) 'error' calculator, to check that calculations complete when non-runtime errors are raised by calculations
  • Updates all calculations to pass with use of error calculator i.e. they should be robust to errors
  • Add info and elements attributes to app, which can store information required for element filtering
  • Add stores property to all apps, to make computed/raw/weights/thresholds globally accessible
  • Store set of original data for all tables
  • Add filter_table function to apps, which at base just returns the same data, but is the mechanism for applying the filter
  • Separate handling of None and NaN and allow None as valid score option.
  • Combines all updates to summary (category and overall) tables to ensure race conditions do not lead to incorrect calculations when multiple triggers occur
  • Adds callback to trigger table recalculation on stored updates

How this works more generally is:

  • During analysis, we must save the elements that are involved in the calculation to info.json or similar
    • Ideally, this would be done in a way that can be used for the full filter later, but this is not necessary
    • This is still to be done for most tests
  • During the app, we define a way to read the elements from info.json, and a filter_table function that uses this information, and a list of elements to be filtered, to modify the rows of the tables
    • This is still to be done for most tests, but for the simple form, this should generally be almost identical to the current examples

This is all that benchmark contributors should need to worry about. From a backend perspective, the main change is that while previously, only one table would be updated at a time, and these tables would always be on the current page, now all tables will update simultaneously, and most of these will not be rendered.

This means that we cannot rely in table.data, so must use globally accessible stored values for tables. We then also need to make sure updates to summaries are robust to multiple triggers, rather than one at a time. The current form is relatively inefficient, as it will trigger many, many times, but I think this is an ok starting point.

There are a few things left to do:

  • Update all analysis to store info.json data
  • Update all apps to convert the dict from info.json to a set of elements
  • Add filter_table to all apps to apply the filter to a table
  • Test the robustness of the callbacks
  • Try to improve callback efficiency e.g. a single global sync, which isn't trivial to set up robustly
  • Add store for filter inputs
  • Add @joehart2001's element filter input via periodic table
  • Add quick-select options for models

Linked issue

Resolves #255

Testing

@ElliottKasoar ElliottKasoar requested a review from joehart2001 May 19, 2026 11:41
@ElliottKasoar ElliottKasoar added enhancement New feature or request breaking API breaking changes labels May 19, 2026
Comment thread ml_peg/app/filter.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe best to call this file filters.py

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, filter is a keyword, so a bit dangerous

Comment thread ml_peg/calcs/conftest.py
Comment thread ml_peg/app/molecular_crystal/DMC_ICE13/app_DMC_ICE13.py
warnings.warn("Unable to read elements lists.", stacklevel=2)

@staticmethod
def filter_data(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to generalise this in some way. im unsure we can fully generalise it, but at least different helper functions for different types of data

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely! For now, I think we'll have two main general types: one where we only have one list e.g. Li diffusion, and one where we have a list of lists as the elements, and then the filter for now should basically be the same for all apps: if there's an overlap in the superset, set all metrics to None, which should be entirely general, but as we expand this, they'll differentiate based on data

from ml_peg.app import APP_ROOT
from ml_peg.app.base_app import BaseApp
from ml_peg.app.utils.build_callbacks import (
filter_table,
Copy link
Copy Markdown
Collaborator

@joehart2001 joehart2001 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is in base_app.py not build callbacks?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes the DMC_ICE/X23 examples are less up to date, sorry

Comment thread ml_peg/app/utils/register_callbacks.py
)

states = []
for entry in app_entries:
Copy link
Copy Markdown
Collaborator

@joehart2001 joehart2001 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we use unsorted app entries but we use sorted above. i think this will assign the incorrect weights as we dont assingn to a specific key

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I think this should be sorted too, god catch

@ElliottKasoar ElliottKasoar force-pushed the add-element-filter-basic branch from f68f79d to ba17009 Compare May 19, 2026 16:43
@ElliottKasoar ElliottKasoar mentioned this pull request May 19, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking API breaking changes enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter by elements

2 participants