Single-cell sequencing technologies are powerful tools used to assess genomic, transcriptomic and proteomics information at the single-cell level. In recent years, the application of techniques that use single-cell sequencing have become increasingly common in several areas of research: including medicine, agriculture, and other life sciences disciplines. Single-cell sequencing may be used to study many aspects of an organism’s biology, both in health and disease, and the results of these studies contribute immensely to advancing the understanding of organisms as a whole.
The establishment of collaborative scientific endeavors like the Human Cell Atlas or the LatinCells Project is a testament to the surging enthusiasm and curiosity in this domain. Yet, when we look towards Latin America, we find a gap in the necessary infrastructure, financial support, and subject matter expertise required to harness these cutting-edge technologies. Recognizing this, our workshop is designed to bridge this gap. We provide participants with hands-on experience in the laboratory and in-depth bioinformatics training, ensuring that the region advances in its capabilities with single-cell methodologies.
Our notebooks are available in multilingual versions and can be accessed in two simple ways:
You can run the notebooks directly in your browser using Google Colab, with no need to install anything locally.
Just follow our step-by-step multilingual tutorial to learn how to:
- Manually upload
.ipynbfiles - Clone the GitHub repository and open notebooks directly in Colab
Some notebooks with many embedded images may not render properly on GitHub. For full functionality, we recommend opening them directly in Colab or accessing our official website.
If you prefer to work offline or want a fully configured environment, you can run the notebooks using Docker.
Check out our Docker tutorial for detailed instructions on:
- Pulling the official image from DockerHub
- Mounting local directories to save outputs and access reference files
- Launching the Jupyter interface locally
If you prefer a simpler way to browse the notebooks without installing anything or creating an account, you can access them directly through our official website. On the site you will find:
- The same content available in the repository, organized by module, with accessibility in three languages (English, Spanish, and Portuguese)
- Code cells displayed in a copy‑ready format for easy transfer to your own environment
- Easier loading and browsing of information, with no login required
Please note: scNotebooks cannot be executed directly on the website; they are provided for browsing and copying code only.
If you prefer to see the executed results directly, you can access the notebooks in PDF format. On the PDFs you will find:
- All code cells already run, with outputs displayed alongside the code
- Figures, plots, and tables rendered in their final form
- A static, easy‑to‑browse format that does not require installation or login
Please note: PDFs are currently available only in English, but they can be easily translated using any online translation tool. They are read‑only and cannot be edited or re‑executed, serving as a resource for reviewing outputs and understanding the workflow without running the code yourself.
We value continuous improvement and collaboration. To support learners and researchers, we maintain a dedicated space in GitHub Discussions, where you can engage with us directly:
- Ask Questions: Clarify workflows, tools, or concepts presented in the notebooks
- Report Errors: Help us identify and correct mistakes to improve the material
- Request Help: Share challenges and receive guidance from contributors and the community
Our GitHub forum is linked from the official site, providing an open channel for communication and collective learning.
Jupyter Notebooks and Google Colaboratory provide interactive environments that combine code and explanatory text, supporting reproducible analysis. In this module, learners will explore their structure, including code and text cells, and gain familiarity with key public databases for single‑cell and gene expression data across humans and other organisms. Hands‑on exercises guide users through accessing, exploring, and analyzing these resources, building essential skills in biological data manipulation.
This notebook has many embedded images may not render properly on GitHub. We recommend opening them directly in Colab for full functionality or web site.
Module:
Site:
This module introduces the R programming language, widely used in data science and bioinformatics for statistical analysis and data manipulation. Learners will explore the R environment, basic syntax, and core data structures such as vectors and data frames. The module also presents the ggplot2 package, a powerful tool for creating elegant and customizable visualizations using the grammar of graphics. Through hands-on exercises, users will practice writing R code, creating plots, and interpreting biological data, building a strong foundation for future analytical tasks.
This notebook introduces essential command-line operations in Linux, covering fundamental commands that are broadly applicable across programming languages with minimal adaptations. These foundational skills will support efficient data management and analysis in computational biology. Additionally, we will explore the key steps in processing raw sequencing reads into count matrices using Cell Ranger, discussing its main outputs and role in single-cell transcriptomics.
Single‑cell RNA‑seq analysis requires a structured workflow to ensure data quality and biological interpretability. In this module, learners will use Seurat to import raw data, apply filtering, and perform preliminary visualization as part of quality control. Key steps include evaluating quality metrics, normalizing to reduce technical variability, and clustering cells by gene expression profiles to reveal underlying heterogeneity. Building on this foundation, users will conduct differential expression and abundance analysis, annotate cell types, and perform functional enrichment to uncover regulatory mechanisms and pathways involved in development and disease. The module also introduces practical strategies for identifying marker genes and removing ambient mRNA contamination, ensuring cleaner datasets and more reliable downstream results. Through these exercises, participants gain both conceptual understanding and hands‑on skills for comprehensive scRNA‑seq analysis.
Data integration and batch correction are essential for reliable single‑cell analysis, ensuring that biological signals are not obscured by technical or donor‑specific variation. In this module, learners will investigate how differences in protocols, platforms, or sample origin generate batch effects, and how defining batch covariates influences integration outcomes. Practical exercises with Seurat and Harmony provide hands‑on experience in applying correction methods, tuning parameters, and evaluating integration quality. Benchmarking activities allow users to compare strategies, highlighting trade‑offs between reducing unwanted variation and preserving meaningful biological information. By combining theoretical concepts with applied workflows, participants gain the skills needed to select and implement effective integration approaches in diverse single‑cell studies.
This module explores how single‑cell RNA‑seq can be used to reconstruct cell‑state trajectories. Learners will study how gene expression changes dynamically during development and differentiation, and how computational tools infer these transitions. Hands‑on activities with Monocle3 demonstrate pseudotime ordering and trajectory inference, allowing users to trace developmental histories and interpret functional shifts in cellular states.
Cell–cell communication is fundamental for coordinating activities in multicellular systems, shaping processes such as development, immune response, and tissue homeostasis. In this module, learners will examine how signaling and interaction sustain these functions and explore strategies for inferring interactions from single‑cell gene expression data, with emphasis on curated ligand–receptor resources. Hands‑on activities with LIANA provide practical experience, applying multiple state‑of‑the‑art methods within a unified workflow to compare predictions and interpret biological relevance. Through these exercises, users gain both conceptual understanding and applied skills for studying cellular coordination in complex tissues.
Multimodal single‑cell analysis combines transcriptomic and protein measurements to provide a more comprehensive view of cellular states. In this module, learners will analyze umbilical cord blood mononuclear cells (CBMCs) using Seurat to explore relationships between RNA and surface protein expression. Working with RNA and antibody‑derived tag (ADT) count matrices, users investigate expression patterns and their biological implications. Hands‑on activities include downloading data from NCBI GEO and performing key analyses, reinforcing both theoretical concepts and practical skills.
T cell receptor (TCR) profiling and CITE‑Seq are powerful techniques for single‑cell immunology, offering complementary insights into adaptive immune responses and cellular heterogeneity. In this module, learners will explore how TCR profiling reveals the diversity and specificity of T cell repertoires, while CITE‑Seq integrates transcriptomic and protein measurements to provide a multimodal view of cellular states. Core concepts are followed by hands‑on activities using computational tools, where participants apply these methods to study immune repertoires and interpret complex datasets. Through this approach, users gain both theoretical grounding and practical skills for investigating adaptive immunity in single‑cell research.
Spatial transcriptomics provides a way to map gene expression directly within tissues, revealing cell types, their distribution, and interactions in native contexts. In this module, learners will examine how spatially resolved profiles advance our understanding of development and disease. Practical activities with Seurat guide users through building an analysis pipeline, recovering gene expression across regions, and applying cell type deconvolution methods to interpret spatial organization.
Module 11 - An introduction to Single cell Assay for Transposase-Accessible Chromatin sequencing (scATAC-seq):
scATAC‑seq enables the study of chromatin accessibility at single‑cell resolution, revealing regulatory regions such as enhancers and promoters and providing insights into epigenetic variation across cell types and disease states. In this module, learners will work with data from Kumegawa et al. (2022), which profiled over 10,000 cells from breast cancer subtypes and identified GRHL2 as a key transcription factor in endocrine resistance. Hands‑on activities with ArchR guide users through processing raw sequencing data, identifying accessible regions, analyzing transcription factor activity, and integrating scATAC‑seq with scRNA‑seq. Benchmarking exercises compare integration strategies, ensuring accurate interpretation of gene regulation mechanisms.
Alternative polyadenylation (APA) shapes transcript diversity by altering poly(A) site usage, with important implications for gene regulation. In this module, learners will use SCAPE‑APA, a specialized tool for single‑cell RNA‑seq data, to detect, quantify, and interpret APA events. The workflow introduces input formats, preprocessing steps, and strategies for visualizing APA dynamics across cell types. Guided exercises provide hands‑on experience applying SCAPE‑APA to real datasets, enabling users to extract biologically meaningful insights and develop practical skills for analyzing transcriptomic complexity at single‑cell resolution.
FAIR principles (Findable, Accessible, Interoperable, Reusable) provide the foundation for transparent and reproducible data sharing. In this module, learners will practice organizing metadata consistently and preparing submissions to public repositories such as GEO, SRA, SCEA, HCA Data Portal, and CellxGene. Emphasis is placed on understanding standard file formats and workflows, ensuring that datasets can be effectively reused and integrated across platforms.
This comprehensive material has been a result of collaborative efforts since 2021 and has been successfully employed in numerous courses organized by esteemed institutions like the Human Cell Atlas, LatinCells initiative, and Wellcome Connecting Sciences. We extend our heartfelt gratitude to all the individuals listed below, who have actively contributed to the development and refinement of this material over the years. Their dedication and expertise have been instrumental in making this resource valuable for the bioinformatics community.
We appreciate the continuous support and feedback from participants, mentors, and institutions that have made this endeavor possible. Together, we strive to advance the understanding and application of single-cell genomics in Latin America and the Caribbean.
List of Contributors - Listed Alphabetically:
- Alex K. Shalek
- Adolfo Rojas-Hidalgo
- Benilton S. Carvalho
- Bruno Vinagre
- Cesar A. Prada-Medina
- Cristóvão Antunes
- Daniela Russo
- Diego Pérez-Stuardo
- Emiliano Vicencio
- Erick Armingol
- Gabriela Rapozo
- Gerardo Munoz
- Leandro Santos
- Joyce Karoline Silva
- Mariana Boroni
- Natalia Tavares
- Orr Ashenberg
- Patricia Severino
- Raúl Arias-Carrasco
- Ricardo Khouri
- Sebastián Urquiza-Zurich
- Sergio Triana
- Vinicius Maracaja-Coutinho
- Yesid Cuesta-Astroz
This work is licensed under a Creative Commons Attribution 4.0 International License.
