Conversation
…able to parse data from same county but different labs
…able to parse data from same county but different labs
…_init_.py for the datacollection subfolder, further adapted the _parse_from_message method in multipleye_data_collection.py
cbueth
left a comment
There was a problem hiding this comment.
Hey @saphjra, thanks for the changes! I have reviewed the PR and have some suggestions and questions.
The PR adds the foundation for reading times parsing, but the actual usage of _parse_messages() with the new regex patterns hasn't been connected yet.
To my understanding, currently, the codebase parses twice: once via pymovements (for gaze data) and once manually via _parse_asc() in multipleye_data_collection.py:616. Would it be possible to use these already parsed messages? The parsed messages are in a DataFrame with the message and the timestamp.
What would be needed to get them is to add messages=True in preprocessing/io/load.py: gaze = pm.gaze.from_asc(asc_file, patterns=[...], messages=True, ...). Then gaze.messages is a polars DataFrame with 'time' and 'content' columns.
I suppose then in preprocessing/data_collection/multipleye_data_collection.pyz they could be converted to dict, if really needed. More interesting is then just to go though the messages and look for the message contents. This should be possible more efficiently than fully parsing the whole file. In that way not much processing time will be added to by this PR.
Your PR includes the preprocessing notebook .ipynb, if there are not specific changes needed for every user, it would be nice to keep it unchanged. Same goes for the configuration yaml.
While I know these changes have been coordinated with @theDebbister and already discussed in person, it would be nice to have a more descriptive PR name and an extended description with a simple change description (can be quite dry). Before the title was Feat/rm frames, same as the branch name, not mentioning the lab number bug fix nor the message parsing.
I am happy to hear back from you 👏
| pass | ||
| # breaks_df = pd.DataFrame(breaks) | ||
| # breaks_df.to_csv( | ||
| # result_folder / f"breaks_{session_identifier}.tsv", | ||
| # sep="\t", | ||
| # index=False, | ||
| # ) |
There was a problem hiding this comment.
Why is this code commented out? Rather remove or use.
|
|
||
| return breaks_df | ||
|
|
||
| def _parse_messages(self, session_identifier: str): |
There was a problem hiding this comment.
Is the new function used anywhere? It would be nice to have the functionality be integrated into the preprocessing pipeline.
| ) | ||
| return rt_df | ||
|
|
||
| def _create_empty_break_frame(self, session_identifier: str): |
There was a problem hiding this comment.
I am afraid this function is used anywhere. What is it for?
| SACCADE = "saccade" | ||
|
|
||
| # Regular Expressions | ||
| MESSAGE_REGEX_NEW = re.compile(r"(?P<message>.*)") |
There was a problem hiding this comment.
MESSAGE_REGEX_NEW is defined but never used.
| @@ -1,13 +1,13 @@ | |||
| data_collection_name: "MultiplEYE_SQ_CH_Zurich_1_2025" | |||
| data_collection_name: "MultiplEYE_CS_CZ_Prague_2_2026" | |||
There was a problem hiding this comment.
The setting yaml should not be committed if not relevant for other users.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "multipleye._parse_messages(idf)" |
There was a problem hiding this comment.
It would be nice to have the message parsing integrated in the existing pipeline.
This is the only place ._parse_messages(idf) is used, right? The function name _parse_messages() with the underscore points to a private function, these should not be used by the user.
To me it looks like quite some addition to the pipeline notebook. Are these changes relevant to all. users? This notebook is the one the users are using, so please double check that this is the place we want this feature to land. If not, please undo.
| """Data collection submodule of the preprocessing module.""" | ||
|
|
||
| from .multipleye_data_collection import MultipleyeDataCollection | ||
| from .merid_data_collection import Merid |
There was a problem hiding this comment.
This looks like a useful export to me, at least for Merid users 👍
minor adjustments to the bugs we found yesterday: dynamic matching of lab number and an additional eyelink constant variety