Data and Text Processing for Health and Life Sciences - Jupyter Notebooks

Interactive Jupyter notebooks accompanying the book "Data and Text Processing for Health and Life Sciences". Each notebook is a hands-on, step-by-step tutorial demonstrating how Unix shell scripting can be used to find, retrieve, and process biomedical data and text.

Note: Includes a fix for the ChEBI 2.0 web interface, which currently lacks detailed cross-references on individual entry pages.

Tutorials

#	Notebook	Topics Covered
01	unix-shell	Unix basics: `ls`, `pwd`, `head`, `cat`, piping. Environment setup for ChEBI retrieval.
02	data-retrieval	`curl` with EBI APIs. Download UniProt cross-references (CSV/XML). Build `getdata.sh`.
03	data-extraction	`grep` filtering (HUMAN/RAT/MOUSE), `cut` for column selection. Build `getproteins.sh`.
04	task-repetition	Loops, `xargs`, and `parallel` for batch processing.
05	xml-processing	`xmllint` with XPath queries on UniProt XML. Extract PubMed IDs.
06	text-retrieval	RDF publication data (UniProt/NCBI). Extract titles and abstracts.
07	text-processing	Pattern matching, regular expressions, tokenization, and sentence splitting.
08	semantic-processing	OWL ontologies (ChEBI, DOID), URI/label conversion, synonyms, NER with the MER tool.

Running the Notebooks

Option 1 - Google Colab

You can open any notebook manually:

Go to Google Colab
File -> Open notebook -> GitHub tab
Paste the repository URL: https://github.com/lasigeBioTM/data-text-processing-notebooks
Select a notebook from the notebooks/ folder and click Open

Option 2 - Local Jupyter

git clone https://github.com/lasigeBioTM/data-text-processing-notebooks
cd data-text-processing-notebooks
jupyter notebook notebooks/

License

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
data		data
notebooks		notebooks
scripts		scripts
README.md		README.md
infographic.png		infographic.png

Folder	Description
`notebooks/`	Jupyter notebooks (one per tutorial)
`data/`	Input and output data files used by the notebooks
`scripts/`	Shell scripts created during the tutorials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data and Text Processing for Health and Life Sciences - Jupyter Notebooks

Contents

Tutorials

Running the Notebooks

Option 1 - Google Colab

Option 2 - Local Jupyter

License

About

Uh oh!

Releases

Packages

Languages

lasigeBioTM/data-text-processing-notebooks

Folders and files

Latest commit

History

Repository files navigation

Data and Text Processing for Health and Life Sciences - Jupyter Notebooks

Contents

Tutorials

Running the Notebooks

Option 1 - Google Colab

Option 2 - Local Jupyter

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages