Skip to content

Jupyter notebooks for the "Data and Text Processing for Health and Life Sciences" book, covering Unix shell basics, text manipulation, and data processing workflows. Run them instantly in Google Colab - no local setup required. Licensed under CC BY 4.0.

Notifications You must be signed in to change notification settings

lasigeBioTM/data-text-processing-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data and Text Processing for Health and Life Sciences - Jupyter Notebooks

Interactive Jupyter notebooks accompanying the book "Data and Text Processing for Health and Life Sciences". Each notebook is a hands-on, step-by-step tutorial demonstrating how Unix shell scripting can be used to find, retrieve, and process biomedical data and text.

Visual summary of the book Data and Text Processing for Health and Life Sciences

Note: Includes a fix for the ChEBI 2.0 web interface, which currently lacks detailed cross-references on individual entry pages.


Contents

Folder Description
notebooks/ Jupyter notebooks (one per tutorial)
data/ Input and output data files used by the notebooks
scripts/ Shell scripts created during the tutorials

Tutorials

# Notebook Google Colab Topics Covered
01 unix-shell Open In Colab Unix basics: ls, pwd, head, cat, piping. Environment setup for ChEBI retrieval.
02 data-retrieval Open In Colab curl with EBI APIs. Download UniProt cross-references (CSV/XML). Build getdata.sh.
03 data-extraction Open In Colab grep filtering (HUMAN/RAT/MOUSE), cut for column selection. Build getproteins.sh.
04 task-repetition Open In Colab Loops, xargs, and parallel for batch processing.
05 xml-processing Open In Colab xmllint with XPath queries on UniProt XML. Extract PubMed IDs.
06 text-retrieval Open In Colab RDF publication data (UniProt/NCBI). Extract titles and abstracts.
07 text-processing Open In Colab Pattern matching, regular expressions, tokenization, and sentence splitting.
08 semantic-processing Open In Colab OWL ontologies (ChEBI, DOID), URI/label conversion, synonyms, NER with the MER tool.

Running the Notebooks

Option 1 - Google Colab

You can open any notebook manually:

  1. Go to Google Colab
  2. File -> Open notebook -> GitHub tab
  3. Paste the repository URL: https://github.com/lasigeBioTM/data-text-processing-notebooks
  4. Select a notebook from the notebooks/ folder and click Open

Option 2 - Local Jupyter

git clone https://github.com/lasigeBioTM/data-text-processing-notebooks
cd data-text-processing-notebooks
jupyter notebook notebooks/

License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Creative Commons License

About

Jupyter notebooks for the "Data and Text Processing for Health and Life Sciences" book, covering Unix shell basics, text manipulation, and data processing workflows. Run them instantly in Google Colab - no local setup required. Licensed under CC BY 4.0.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published