| layout | default |
|---|---|
| title | Home |
<p>
Machine Learning Engineer with a <strong>Master’s in Visual Computing from Saarland University</strong> and
research experience at <strong>DFKI (German Research Center for Artificial Intelligence)</strong>.
I design and build <strong>end-to-end machine learning systems</strong> for real-world applications in
<strong>computer vision</strong>, <strong>medical imaging</strong>, and
<strong>multimodal generative models</strong>.
</p>
<p>
Currently seeking roles in <strong>Machine Learning Engineering</strong> and
<strong>Computer Vision</strong>.
</p>
<p>
<a href="mailto:amribrahim.amer@gmail.com">Email</a> ·
<a href="https://www.linkedin.com/in/amr-amer-2023-cs/">LinkedIn</a> ·
<a href="https://github.com/amramer">GitHub</a> ·
<a href="/assets/Amr_Amer_CV.pdf" download>Download CV</a>
|
<span style="
display: inline-flex;
align-items: center;
gap: 6px;
background: transparent;
color: #22c55e;
border: 1px solid #22c55e;
border-radius: 999px;
padding: 3px 11px;
font-size: 12px;
font-weight: 500;
vertical-align: middle;
">
<span style="
width: 7px;
height: 7px;
border-radius: 50%;
background: #22c55e;
display: inline-block;
animation: otw-pulse 2s infinite;
"></span>
Open to work
</span>
</p>
<style>
@keyframes otw-pulse {
0% { box-shadow: 0 0 0 0 rgba(34,197,94,0.45); }
70% { box-shadow: 0 0 0 7px rgba(34,197,94,0); }
100% { box-shadow: 0 0 0 0 rgba(34,197,94,0); }
}
</style>
- Design and train deep learning models for real-world computer vision problems
- Build end-to-end ML pipelines: data → training → evaluation → deployment & visualization
- Optimize models through systematic experimentation and hyperparameter tuning
- Develop interactive demos and dashboards for analysis and presentation
Multimodal generative model for generating realistic listener avatars in dyadic conversations, conditioned on personality traits.
The model predicts facial expressions, head motion, and upper-body gestures of a listener from the speaker’s audio and motion signals.
• • •
Human conversations rely heavily on non-verbal cues such as facial expressions, eye gaze, and body movement.
Most prior work does not model how these behaviors vary with personality traits.
• • •
Developed a multimodal generative architecture combining:
- Transformer encoder for temporal modeling
- VQ-VAE for quantized motion representation learning
- SMPL-X body parameters for realistic face and upper-body avatar generation
- Integration of audio, facial motion, and body gesture signals
The model generates personality-aware listener behavior sequences in response to a speaker.
• • •
Quantitative evaluation:
- Face: FID 6.15, P-FID 10.31
- Upper Body: FID 43.16, P-FID 87.73
User study:
- Participants correctly identified extroverted vs introverted avatars in 86% of cases
- 71% preference for the personality-aware model over the personality-agnostic baseline
• • •
PyTorch · Transformers · VQ-VAE · SMPL-X (PIXIE)
Librosa · OpenCV · CUDA · TensorBoard
SLURM · Enroot · Multi-GPU Training (A100)
🌐 Website
Thesis Website
💻 Repository
GitHub Repository
📄 Thesis Document
Read the Full Master’s Thesis
Badminton-VisionAI is an end-to-end computer vision pipeline that converts badminton match videos into structured performance analytics for players and coaches.
The system tracks both players and the shuttlecock, detects shot events, projects motion onto a mini-court representation, and generates a downloadable coach-style performance report.
• • •
Tracking & On-Court Intelligence
Player tracking • Shuttle tracking • Shot detection • Court mapping
Analytics Dashboard (Streamlit)
Player statistics • Heatmaps • Shot analysis • Coach-ready report
• • •
Badminton tracking presents several challenges for computer vision systems:
- Extremely small and fast shuttlecock
- Frequent player occlusions
- Rapid direction changes that break naive trackers
This project combines object detection, multi-object tracking, geometric projection, and temporal analytics to produce coach-readable match insights, not just raw detections.
• • •
- Stable multi-object player tracking
- Shuttlecock trajectory tracking with speed estimation
- Shot type classification with temporal stabilization
- Mini-court projection using homography
- Interactive analytics dashboard for match analysis
- Automated coach report with movement and shot statistics
• • •
Computer Vision: Python · OpenCV · YOLO · Multi-Object Tracking
Geometry & Tracking: Court Homography · Trajectory Projection
Event Analysis: Shot Detection · Temporal Label Stabilization
Visualization: Streamlit · Plotly
Reporting: ReportLab PDF Export
• • •
-
🌐 Live Demo (Streamlit App)
Badminton-VisionAI Live Demo -
💻 Source Code Repository
GitHub Repository – Badminton-VisionAI -
🎥 Pipeline Demo – Player Tracking & Shot Analysis
Watch the System Pipeline Video -
📊 Analytics Dashboard Demo – Match Insights & Heatmaps
Watch the Analytics Dashboard Video
• • •
This project reflects my interest in real-world computer vision systems that transform model outputs into actionable insights.
End-to-end semantic segmentation of urban street scenes for autonomous driving perception.
- Dataset: BDD100K
- Task: Multi-class semantic segmentation
- Classes: Road, Traffic Light, Traffic Sign, Vehicle, Person, Bicycle, Background
- Evaluation: mean Intersection over Union (mIoU)
- Results: Achieved mIoU ≈ 0.45, with strong performance on dominant classes
(Road: ~0.88, Vehicle: ~0.78) - Optimization: Systematic hyperparameter optimization via experiment sweeps
- Pipeline: data preparation → training → evaluation → deployment & visualization
Repository:
Github repo.
Tech Used: PyTorch · Fastai · Semantic Segmentation · Hyperparameter Optimization · Experiment Tracking
Multi-label 3D semantic segmentation of glioma sub-regions from volumetric MRI scans.
- Dataset: Medical Decathlon / BraTS (multi-modal MRI)
- Model: 3D SegResNet (MONAI)
- Input Modalities: FLAIR · T1 · T1Gd · T2
- Target Structures: Whole Tumor (WT), Tumor Core (TC), Enhancing Tumor (ET)
- Training & Evaluation: Dice Loss and Mean Dice
- Results: Mean Dice = 0.78 on validation (WT: 0.90, TC: 0.82, ET: 0.59)
- Pipeline: preprocessing → training → inference → deployment & visualization
Repository:
Github repo.
Tech Used: PyTorch · MONAI · 3D SegResNet · Multi-modal MRI
3D Medical Image Transforms · Sliding Window Inference · Experiment Tracking (W&B)
Exploratory project focused on applying vision–language models in realtime webcam-based settings.
- Realtime image captioning and visual question answering using BLIP via Hugging Face Transformers
- Image classification using ResNet-50 pretrained on ImageNet
- Realtime Gradio-based webcam application for live image captioning and classification with adjustable inference parameters
The project is implemented as a set of Jupyter notebooks, progressing from offline image understanding to a realtime vision application.
Repository:
Github repo.
Tech Used:
Python · PyTorch · Torchvision · Hugging Face Transformers · Gradio · PIL · NumPy
Conversational AI system enabling real-time voice and text interaction through a custom web-based interface.
-
Speech-to-text and text-to-speech for natural dialogue
-
LLM-powered conversational reasoning
-
Human-like conversational avatar integrated into the UI
-
Modular frontend–backend design for extensibility
Tech Used: Python · OpenAI GPT · Speech-to-Text · Text-to-Speech · HTML · CSS · JavaScript · Bootstrap · jQuery
Status: 🚧 Ongoing
| Area | Skills & Tools |
|---|---|
| Computer Vision Tasks | Object Detection · Segmentation · Tracking · Video Motion Analysis · Digital Avatar Generation |
| Models & Frameworks | PyTorch · Tensorflow · OpenCV · YOLO · SAM · CNNs · Vision Transformers (ViT) · TrackNet · Supervision |
| Training & Evaluation | Transfer Learning · Fine-tuning · Loss Design · Metric Selection · Hyperparameter Optimization · Multi-GPU Training (CUDA) |
| Data & Experimentation | Dataset Preparation · Data Augmentation · Efficient Data Loading · Weights & Biases · TensorBoard · Ablation Studies |
| Deployment & Inference | Docker · GPU Inference Pipelines · Batch & Real-time Inference · AWS (EC2 GPU) · Streamlit (Demos) |
| Compute & Infrastructure | SLURM (HPC) · GPU Job Scheduling · Multi-node Training · Large-scale GPU Experiments |
| Software Engineering Foundations | Python · C++. Object-Oriented Design · Version Control (Git) · Debugging · Logging · Unit Testing |
Outside of my professional work, I enjoy aerial photography and videography using drones. This section highlights selected visual projects created in my free time.
<iframe width="45%" src="https://player.vimeo.com/video/1152190588" frameborder="0" allowfullscreen></iframe> <iframe width="45%" src="https://player.vimeo.com/video/1152190476" frameborder="0" allowfullscreen></iframe> <iframe width="45%" src="https://player.vimeo.com/video/1152190673" frameborder="0" allowfullscreen></iframe>
View more Drone videos on Vimeo →
If you're hiring or would like to collaborate:
Email: amribrahim.amer@gmail.com
LinkedIn: https://www.linkedin.com/in/amr-amer-2023-cs/





