read_mclust returns zero units for .t files (extension never detected; uint64 .t misread)
Summary
MClustSortingExtractor / read_mclust silently returns a sorting with 0 units for
plain .t (and .t32, .raw*) input. Two bugs in mclustextractors.py:
Bug 1 — extension is always detected as t64
for e in ext_list: # ["t64", "t32", "t", "raw64", "raw32"]
files = Path(folder_path).glob(f"*.{e}")
if files: # a generator is ALWAYS truthy
ext = e; break # -> ext is ALWAYS "t64"
Path.glob returns a generator, so if files: is always true and the loop breaks on the
first extension. For a folder of .t files it then iterates the (empty) *.t64 generator,
producing 0 units. Fix: materialize the glob, e.g. files = sorted(Path(folder_path).glob(...)).
Bug 2 — .t files can be uint64
Real MClust 3.x writes big-endian uint64 timestamps into the plain .t suffix, but the
reader assigns >u4 to any non-64 extension. Reading uint64 data as uint32 yields
alternating zero/value pairs. The community reference loader handles this by reading as
>u4 and dropping zero words (so both 32- and 64-bit .t files work).
Reproduction
See attached reproduce_mclust_bug.py (prints num_units: 0). Also reproduces on real data:
http://datasets.datalad.org/labs/mvdm/BiconditionalOdor/M040-2020-04-28-CDOD11/M040-2020-04-28-TT03_1.t
(169,979 spikes; read_mclust returns 0).
Proposed fix
files = sorted(Path(folder_path).glob(f"*.{e}"))
- For the
t family, when dataformat == ">u4", drop zero words to tolerate uint64-in-.t:
times = times[times > 0]
Reference loader (handles both widths):
https://github.com/vandermeerlab/data-formats/blob/master/loader_mclust.py
read_mclustreturns zero units for.tfiles (extension never detected; uint64.tmisread)Summary
MClustSortingExtractor/read_mclustsilently returns a sorting with 0 units forplain
.t(and.t32,.raw*) input. Two bugs inmclustextractors.py:Bug 1 — extension is always detected as
t64Path.globreturns a generator, soif files:is always true and the loop breaks on thefirst extension. For a folder of
.tfiles it then iterates the (empty)*.t64generator,producing 0 units. Fix: materialize the glob, e.g.
files = sorted(Path(folder_path).glob(...)).Bug 2 —
.tfiles can be uint64Real MClust 3.x writes big-endian uint64 timestamps into the plain
.tsuffix, but thereader assigns
>u4to any non-64extension. Reading uint64 data as uint32 yieldsalternating zero/value pairs. The community reference loader handles this by reading as
>u4and dropping zero words (so both 32- and 64-bit.tfiles work).Reproduction
See attached
reproduce_mclust_bug.py(printsnum_units: 0). Also reproduces on real data:http://datasets.datalad.org/labs/mvdm/BiconditionalOdor/M040-2020-04-28-CDOD11/M040-2020-04-28-TT03_1.t
(169,979 spikes;
read_mclustreturns 0).Proposed fix
files = sorted(Path(folder_path).glob(f"*.{e}"))tfamily, whendataformat == ">u4", drop zero words to tolerate uint64-in-.t:times = times[times > 0]Reference loader (handles both widths):
https://github.com/vandermeerlab/data-formats/blob/master/loader_mclust.py