Skip to content

feat: Reading times#58

Open
saphjra wants to merge 6 commits intomainfrom
feat/rm_frames
Open

feat: Reading times#58
saphjra wants to merge 6 commits intomainfrom
feat/rm_frames

Conversation

@saphjra
Copy link
Copy Markdown
Collaborator

@saphjra saphjra commented Feb 19, 2026

minor adjustments to the bugs we found yesterday: dynamic matching of lab number and an additional eyelink constant variety

@saphjra saphjra requested a review from theDebbister February 19, 2026 09:01
…_init_.py for the datacollection subfolder, further adapted the _parse_from_message method in multipleye_data_collection.py
@cbueth cbueth changed the title Feat/rm frames feat: Reading times Mar 3, 2026
Copy link
Copy Markdown
Collaborator

@cbueth cbueth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @saphjra, thanks for the changes! I have reviewed the PR and have some suggestions and questions.
The PR adds the foundation for reading times parsing, but the actual usage of _parse_messages() with the new regex patterns hasn't been connected yet.

To my understanding, currently, the codebase parses twice: once via pymovements (for gaze data) and once manually via _parse_asc() in multipleye_data_collection.py:616. Would it be possible to use these already parsed messages? The parsed messages are in a DataFrame with the message and the timestamp.

What would be needed to get them is to add messages=True in preprocessing/io/load.py: gaze = pm.gaze.from_asc(asc_file, patterns=[...], messages=True, ...). Then gaze.messages is a polars DataFrame with 'time' and 'content' columns.

I suppose then in preprocessing/data_collection/multipleye_data_collection.pyz they could be converted to dict, if really needed. More interesting is then just to go though the messages and look for the message contents. This should be possible more efficiently than fully parsing the whole file. In that way not much processing time will be added to by this PR.


Your PR includes the preprocessing notebook .ipynb, if there are not specific changes needed for every user, it would be nice to keep it unchanged. Same goes for the configuration yaml.


While I know these changes have been coordinated with @theDebbister and already discussed in person, it would be nice to have a more descriptive PR name and an extended description with a simple change description (can be quite dry). Before the title was Feat/rm frames, same as the branch name, not mentioning the lab number bug fix nor the message parsing.

I am happy to hear back from you 👏

Comment on lines +1159 to +1165
pass
# breaks_df = pd.DataFrame(breaks)
# breaks_df.to_csv(
# result_folder / f"breaks_{session_identifier}.tsv",
# sep="\t",
# index=False,
# )
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code commented out? Rather remove or use.


return breaks_df

def _parse_messages(self, session_identifier: str):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new function used anywhere? It would be nice to have the functionality be integrated into the preprocessing pipeline.

)
return rt_df

def _create_empty_break_frame(self, session_identifier: str):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this function is used anywhere. What is it for?

SACCADE = "saccade"

# Regular Expressions
MESSAGE_REGEX_NEW = re.compile(r"(?P<message>.*)")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MESSAGE_REGEX_NEW is defined but never used.

@@ -1,13 +1,13 @@
data_collection_name: "MultiplEYE_SQ_CH_Zurich_1_2025"
data_collection_name: "MultiplEYE_CS_CZ_Prague_2_2026"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setting yaml should not be committed if not relevant for other users.

"metadata": {},
"outputs": [],
"source": [
"multipleye._parse_messages(idf)"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have the message parsing integrated in the existing pipeline.

This is the only place ._parse_messages(idf) is used, right? The function name _parse_messages() with the underscore points to a private function, these should not be used by the user.

To me it looks like quite some addition to the pipeline notebook. Are these changes relevant to all. users? This notebook is the one the users are using, so please double check that this is the place we want this feature to land. If not, please undo.

"""Data collection submodule of the preprocessing module."""

from .multipleye_data_collection import MultipleyeDataCollection
from .merid_data_collection import Merid
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a useful export to me, at least for Merid users 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants