Skip to content

guess_TopologyAttrs(to_guess=["elements"]) silently skips atoms with element="" (missing_value_label not set on Elements/Atomtypes) #5414

Description

@jfmoulin

Expected behavior

When a PDB file has atoms with blank element columns MDAnalysis emits:

UserWarning: Unknown element  found for some atoms. These have been given an
empty element record. If needed they can be guessed using
universe.guess_TopologyAttrs(context='default', to_guess=['elements']).

Calling that function as instructed should fill in the missing elements for those atoms via name-based inference (DefaultGuesser.guess_atom_element).

Actual behavior

guess_TopologyAttrs(to_guess=["elements"]) does nothing for atoms that already have element="". Their element remains empty and the warning points users at a no-op call.

Downstream consequence: atom.type is also "" for those atoms (MDAnalysis populates type from the PDB element column), so any code that reads atom.type raises ValueError for the affected atoms.

Code to reproduce

import io
import warnings
import MDAnalysis as mda

# Minimal PDB: N1 has a blank element column (cols 77-78 empty)
PDB = """\
CRYST1   30.000   30.000   30.000  90.00  90.00  90.00 P 1           1
ATOM      1  C1  MOL     1       5.000   5.000  15.000  1.00  0.00           C
ATOM      2  N1  MOL     1       6.000   5.000  15.000  1.00  0.00            
END
"""

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    u = mda.Universe(io.StringIO(PDB), format="PDB")

print("before guessing:", u.atoms.elements)
# ['C' '']

u.guess_TopologyAttrs(context="default", to_guess=["elements"])
print("after  guessing:", u.atoms.elements)
# Expected: ['C' 'N']
# Actual:   ['C' '']   ← N1 still blank

# The guesser itself works correctly in isolation:
from MDAnalysis.guesser import DefaultGuesser
print(DefaultGuesser(u).guess_atom_element("N1"))  # → 'N'  ✓

Root cause

GuesserBase.guess_attr calls TopologyAttr.are_values_missing to decide which atoms need guessing:

# GuesserBase.guess_attr (base.py)
empty_values = top_attr.are_values_missing(attr_values)

are_values_missing (added in 2.8.0, PR #3753) checks:

missing_value_label = getattr(cls, "missing_value_label", None)
# ...
return values == missing_value_label

Neither Elements nor Atomtypes define missing_value_label, so getattr returns None. The check becomes "" == NoneFalse for every atom. guess_attr concludes nothing needs guessing and returns None silently.

Both classes use "" as the empty sentinel in _gen_initial_values, but that sentinel is never declared as missing_value_label. The mechanism was introduced in PR #3753 but these two classes were not updated at that time.

Proposed fix

--- a/package/MDAnalysis/core/topologyattrs.py
+++ b/package/MDAnalysis/core/topologyattrs.py
@@ class Elements(AtomStringAttr):
     attrname = "elements"
     singular = "element"
     dtype = object
+    missing_value_label = ""

@@ class Atomtypes(AtomStringAttr):
     attrname = "types"
     singular = "type"
     per_object = "atom"
     dtype = object
+    missing_value_label = ""

After this fix are_values_missing returns True for atoms with element="" and guess_attr passes them to DefaultGuesser.guess_elements as intended.

It may be worth auditing other AtomStringAttr subclasses (e.g. Atomnames, Resnames) to check whether any others have the same mismatch between _gen_initial_values and missing_value_label.

Versions

  • MDAnalysis: 2.10.0 (bug also present on current develop)
  • Python: 3.12
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions