feat: Reading times by saphjra · Pull Request #58 · MultiplEYE-COST/multipleye-preprocessing

saphjra · 2026-02-19T09:01:45Z

minor adjustments to the bugs we found yesterday: dynamic matching of lab number and an additional eyelink constant variety

…able to parse data from same county but different labs

…the future

…_init_.py for the datacollection subfolder, further adapted the _parse_from_message method in multipleye_data_collection.py

… and page

cbueth

Hey @saphjra, thanks for the changes! I have reviewed the PR and have some suggestions and questions.
The PR adds the foundation for reading times parsing, but the actual usage of _parse_messages() with the new regex patterns hasn't been connected yet.

To my understanding, currently, the codebase parses twice: once via pymovements (for gaze data) and once manually via _parse_asc() in multipleye_data_collection.py:616. Would it be possible to use these already parsed messages? The parsed messages are in a DataFrame with the message and the timestamp.

What would be needed to get them is to add messages=True in preprocessing/io/load.py: gaze = pm.gaze.from_asc(asc_file, patterns=[...], messages=True, ...). Then gaze.messages is a polars DataFrame with 'time' and 'content' columns.

I suppose then in preprocessing/data_collection/multipleye_data_collection.pyz they could be converted to dict, if really needed. More interesting is then just to go though the messages and look for the message contents. This should be possible more efficiently than fully parsing the whole file. In that way not much processing time will be added to by this PR.

Your PR includes the preprocessing notebook .ipynb, if there are not specific changes needed for every user, it would be nice to keep it unchanged. Same goes for the configuration yaml.

While I know these changes have been coordinated with @theDebbister and already discussed in person, it would be nice to have a more descriptive PR name and an extended description with a simple change description (can be quite dry). Before the title was Feat/rm frames, same as the branch name, not mentioning the lab number bug fix nor the message parsing.

I am happy to hear back from you 👏

cbueth · 2026-03-03T15:12:22Z

preprocessing/data_collection/multipleye_data_collection.py

+                pass
+                # breaks_df = pd.DataFrame(breaks)
+                # breaks_df.to_csv(
+                #    result_folder / f"breaks_{session_identifier}.tsv",
+                #    sep="\t",
+                #    index=False,
+                # )


Why is this code commented out? Rather remove or use.

cbueth · 2026-03-03T15:13:30Z

preprocessing/data_collection/multipleye_data_collection.py

+
+        return breaks_df
+
+    def _parse_messages(self, session_identifier: str):


Is the new function used anywhere? It would be nice to have the functionality be integrated into the preprocessing pipeline.

cbueth · 2026-03-03T15:14:14Z

preprocessing/data_collection/multipleye_data_collection.py

+        )
+        return rt_df
+
+    def _create_empty_break_frame(self, session_identifier: str):


I am afraid this function is used anywhere. What is it for?

cbueth · 2026-03-03T15:14:37Z

preprocessing/constants.py

 SACCADE = "saccade"

 # Regular Expressions
+MESSAGE_REGEX_NEW = re.compile(r"(?P<message>.*)")


MESSAGE_REGEX_NEW is defined but never used.

cbueth · 2026-03-03T15:15:54Z

multipleye_settings_preprocessing.yaml

@@ -1,13 +1,13 @@
-data_collection_name: "MultiplEYE_SQ_CH_Zurich_1_2025"
+data_collection_name: "MultiplEYE_CS_CZ_Prague_2_2026"


The setting yaml should not be committed if not relevant for other users.

cbueth · 2026-03-03T15:19:09Z

preprocessing.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multipleye._parse_messages(idf)"


It would be nice to have the message parsing integrated in the existing pipeline.

This is the only place ._parse_messages(idf) is used, right? The function name _parse_messages() with the underscore points to a private function, these should not be used by the user.

To me it looks like quite some addition to the pipeline notebook. Are these changes relevant to all. users? This notebook is the one the users are using, so please double check that this is the place we want this feature to land. If not, please undo.

cbueth · 2026-03-03T15:21:37Z

preprocessing/data_collection/__init__.py

 """Data collection submodule of the preprocessing module."""

 from .multipleye_data_collection import MultipleyeDataCollection
+from .merid_data_collection import Merid


This looks like a useful export to me, at least for Merid users 👍

saphjra added 3 commits February 19, 2026 09:36

added additional eyelink name variety to constant.py

4c8d758

made the hard coded labnum in line 234 of stimulus.py variable to be …

5b4688f

…able to parse data from same county but different labs

made the hard coded labnum in line 234 of stimulus.py variable to be …

5071b76

…able to parse data from same county but different labs

saphjra requested a review from theDebbister February 19, 2026 09:01

saphjra added 3 commits February 19, 2026 16:36

create new function _parse_messages to substitute _parse_from_Asc in …

d028176

…the future

adapted run_merid_preprocessing.py to new workflow, changed also the …

842d0e3

…_init_.py for the datacollection subfolder, further adapted the _parse_from_message method in multipleye_data_collection.py

changed _parse_messages to update dataframe by matching stimulus name…

7899586

… and page

cbueth changed the title ~~Feat/rm frames~~ feat: Reading times Mar 3, 2026

cbueth reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Reading times#58

feat: Reading times#58
saphjra wants to merge 6 commits intomainfrom
feat/rm_frames

saphjra commented Feb 19, 2026

Uh oh!

cbueth left a comment

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

cbueth Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return breaks_df

		def _parse_messages(self, session_identifier: str):

		@@ -1,13 +1,13 @@
		data_collection_name: "MultiplEYE_SQ_CH_Zurich_1_2025"
		data_collection_name: "MultiplEYE_CS_CZ_Prague_2_2026"

Conversation

saphjra commented Feb 19, 2026

Uh oh!

cbueth left a comment

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cbueth Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants