Skip to content

read_mclust returns zero units for .t files (extension never detected; uint64 .t misread) #4602

@bendichter

Description

@bendichter

read_mclust returns zero units for .t files (extension never detected; uint64 .t misread)

Summary

MClustSortingExtractor / read_mclust silently returns a sorting with 0 units for
plain .t (and .t32, .raw*) input. Two bugs in mclustextractors.py:

Bug 1 — extension is always detected as t64

for e in ext_list:                       # ["t64", "t32", "t", "raw64", "raw32"]
    files = Path(folder_path).glob(f"*.{e}")
    if files:                            # a generator is ALWAYS truthy
        ext = e; break                   # -> ext is ALWAYS "t64"

Path.glob returns a generator, so if files: is always true and the loop breaks on the
first extension. For a folder of .t files it then iterates the (empty) *.t64 generator,
producing 0 units. Fix: materialize the glob, e.g. files = sorted(Path(folder_path).glob(...)).

Bug 2 — .t files can be uint64

Real MClust 3.x writes big-endian uint64 timestamps into the plain .t suffix, but the
reader assigns >u4 to any non-64 extension. Reading uint64 data as uint32 yields
alternating zero/value pairs. The community reference loader handles this by reading as
>u4 and dropping zero words (so both 32- and 64-bit .t files work).

Reproduction

See attached reproduce_mclust_bug.py (prints num_units: 0). Also reproduces on real data:
http://datasets.datalad.org/labs/mvdm/BiconditionalOdor/M040-2020-04-28-CDOD11/M040-2020-04-28-TT03_1.t
(169,979 spikes; read_mclust returns 0).

Proposed fix

  • files = sorted(Path(folder_path).glob(f"*.{e}"))
  • For the t family, when dataformat == ">u4", drop zero words to tolerate uint64-in-.t:
    times = times[times > 0]

Reference loader (handles both widths):
https://github.com/vandermeerlab/data-formats/blob/master/loader_mclust.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions