Expected behavior
When a PDB file has atoms with blank element columns MDAnalysis emits:
UserWarning: Unknown element found for some atoms. These have been given an
empty element record. If needed they can be guessed using
universe.guess_TopologyAttrs(context='default', to_guess=['elements']).
Calling that function as instructed should fill in the missing elements for those atoms via name-based inference (DefaultGuesser.guess_atom_element).
Actual behavior
guess_TopologyAttrs(to_guess=["elements"]) does nothing for atoms that already have element="". Their element remains empty and the warning points users at a no-op call.
Downstream consequence: atom.type is also "" for those atoms (MDAnalysis populates type from the PDB element column), so any code that reads atom.type raises ValueError for the affected atoms.
Code to reproduce
import io
import warnings
import MDAnalysis as mda
# Minimal PDB: N1 has a blank element column (cols 77-78 empty)
PDB = """\
CRYST1 30.000 30.000 30.000 90.00 90.00 90.00 P 1 1
ATOM 1 C1 MOL 1 5.000 5.000 15.000 1.00 0.00 C
ATOM 2 N1 MOL 1 6.000 5.000 15.000 1.00 0.00
END
"""
with warnings.catch_warnings():
warnings.simplefilter("ignore")
u = mda.Universe(io.StringIO(PDB), format="PDB")
print("before guessing:", u.atoms.elements)
# ['C' '']
u.guess_TopologyAttrs(context="default", to_guess=["elements"])
print("after guessing:", u.atoms.elements)
# Expected: ['C' 'N']
# Actual: ['C' ''] ← N1 still blank
# The guesser itself works correctly in isolation:
from MDAnalysis.guesser import DefaultGuesser
print(DefaultGuesser(u).guess_atom_element("N1")) # → 'N' ✓
Root cause
GuesserBase.guess_attr calls TopologyAttr.are_values_missing to decide which atoms need guessing:
# GuesserBase.guess_attr (base.py)
empty_values = top_attr.are_values_missing(attr_values)
are_values_missing (added in 2.8.0, PR #3753) checks:
missing_value_label = getattr(cls, "missing_value_label", None)
# ...
return values == missing_value_label
Neither Elements nor Atomtypes define missing_value_label, so getattr returns None. The check becomes "" == None → False for every atom. guess_attr concludes nothing needs guessing and returns None silently.
Both classes use "" as the empty sentinel in _gen_initial_values, but that sentinel is never declared as missing_value_label. The mechanism was introduced in PR #3753 but these two classes were not updated at that time.
Proposed fix
--- a/package/MDAnalysis/core/topologyattrs.py
+++ b/package/MDAnalysis/core/topologyattrs.py
@@ class Elements(AtomStringAttr):
attrname = "elements"
singular = "element"
dtype = object
+ missing_value_label = ""
@@ class Atomtypes(AtomStringAttr):
attrname = "types"
singular = "type"
per_object = "atom"
dtype = object
+ missing_value_label = ""
After this fix are_values_missing returns True for atoms with element="" and guess_attr passes them to DefaultGuesser.guess_elements as intended.
It may be worth auditing other AtomStringAttr subclasses (e.g. Atomnames, Resnames) to check whether any others have the same mismatch between _gen_initial_values and missing_value_label.
Versions
- MDAnalysis: 2.10.0 (bug also present on current
develop)
- Python: 3.12
- OS: Linux
Expected behavior
When a PDB file has atoms with blank element columns MDAnalysis emits:
Calling that function as instructed should fill in the missing elements for those atoms via name-based inference (
DefaultGuesser.guess_atom_element).Actual behavior
guess_TopologyAttrs(to_guess=["elements"])does nothing for atoms that already haveelement="". Their element remains empty and the warning points users at a no-op call.Downstream consequence:
atom.typeis also""for those atoms (MDAnalysis populatestypefrom the PDB element column), so any code that readsatom.typeraisesValueErrorfor the affected atoms.Code to reproduce
Root cause
GuesserBase.guess_attrcallsTopologyAttr.are_values_missingto decide which atoms need guessing:are_values_missing(added in 2.8.0, PR #3753) checks:Neither
ElementsnorAtomtypesdefinemissing_value_label, sogetattrreturnsNone. The check becomes"" == None→Falsefor every atom.guess_attrconcludes nothing needs guessing and returnsNonesilently.Both classes use
""as the empty sentinel in_gen_initial_values, but that sentinel is never declared asmissing_value_label. The mechanism was introduced in PR #3753 but these two classes were not updated at that time.Proposed fix
After this fix
are_values_missingreturnsTruefor atoms withelement=""andguess_attrpasses them toDefaultGuesser.guess_elementsas intended.It may be worth auditing other
AtomStringAttrsubclasses (e.g.Atomnames,Resnames) to check whether any others have the same mismatch between_gen_initial_valuesandmissing_value_label.Versions
develop)