AWS-certified Data Scientist & ML Engineer applying statistics, experimentation, and MLOps to turn messy data into decisions across banking, healthcare, and airline products. I design and deploy fraud detection, forecasting, and analytics solutions that reduce risk, unlock growth opportunities, and give stakeholders clear, actionable insights. I enjoy working end-to-end, from exploring raw data and shaping features to shipping production-grade ML pipelines and dashboards that teams actually use.
print(RakeshSarmaKarra.masters) | Data Science & Advanced Analytics
print(RakeshSarmaKarra.university_name) | University of North Texas
print(RakeshSarmaKarra.focus) | AIML Engineer, Data Science, Data Analytics, MLOps, Cloud Analytics print(RakeshSarmaKarra.domains_worked) | Banking/Finance, Airlines, Healthcare
print(RakeshSarmaKarra.experience) | 6 years
print(RakeshSarmaKarra.location) | United States
Portfolio - Data & AI Projects: https://rakeshsarmakarra.github.io/
Linear & logistic regression, regularization (Ridge, Lasso)
Tree-based models (Decision Trees, Random Forest, XGBoost, LightGBM)
Clustering (K-Means), KNN, SVM, Naive Bayes, basic neural networks
Time series (ARIMA, SARIMA), anomaly detection, A/B testing
Feature engineering, cross validation, hyperparameter tuning (grid/random search), SHAP
Exploratory data analysis, hypothesis testing, KPI design
Root cause analysis, recommendations, competitor analysis
Prescriptive analytics, business improvements
Agile/Scrum, user stories, project planning, milestones
Work breakdown structure, risk register, risk mitigation, communication plans
30–70% rule, quality management
Analytical decision making, problem solving
Data storytelling, written & verbal communication, team collaboration
- Developed and evaluated machine learning models in Python to analyze community program data (e.g., participation, outcomes, engagement), generating predictive insights that supported data driven decision making for nonprofit initiatives.
- Built end to end analytical workflows using Python, SQL, and Excel to clean raw community datasets, engineer features, train baseline models, and visualize key findings for stakeholders, documenting the work in GitHub for version control and collaboration.
- Contributing as an ML Engineer volunteer to design and prototype machine learning models in Python for Murphy Charitable Foundation’s new application supporting vulnerable communities (e.g., child sponsorship and donor engagement use cases).
- Building and iterating on data pipelines using Python and SQL (data cleaning, feature engineering, basic model training), with experiments and notebooks version controlled through Git and regularly pushed to GitHub.
- Led discussions on BigQuery, Vertex AI in GCP, guiding students in data storage, retrieval, and ML model deployment.
- Designed and delivered Python-based lab sessions covering advanced machine learning, and big data analytics.
- Conducted AI bias research, identifying and analyzing system, developer, and statistical biases in ML models.
- Designed an AI reliability survey, revealing 68% of participants believed AI enhanced performance, providing insights into user trust and adoption.
- Performed a comparative analysis of bias, hate speech detection, and sentiment classification across AI models from ChatGPT, Gemini, Meta AI, and Claude AI.
- Mentored students on ML workflows, responsible AI, and bias mitigation strategies, ensuring ethical and effective AI model development.
- Applied data-driven research methods to assess the performance and limitations of state-of-the-art AI tools, optimizing their usage in real-world applications.
- Find my research work in below sections
Industry‑sponsored capstone project in collaboration with Humana and University of North Texas.
- Built a Medicare Advantage member risk prediction model in XGBoost to classify engaged vs. unengaged beneficiaries, improving accuracy by 70% and achieving a ROC-AUC of 0.76 to support risk-adjusted outreach and plan performance.
- Used SHAP-based model interpretability and KPI analysis to surface key drivers of low engagement and applied clustering techniques to flag nearly 60% of high-risk, unengaged members for targeted interventions that can reduce medical cost. - Designed a prescriptive analytics framework boosting engagement by 40% through targeted outreach strategies.
- Conducted competitor analysis on UHC etc. to benchmark Humanas positioning and inform outreach strategy refinement.
- Recommended financially impactful engagement strategies (multilingual campaigns, telehealth access, community-based programs) aimed at improving quality metrics, STAR ratings, and revenue linked CMS bonus payments presented findings to stakeholders in a business focused storyboard.
- Applied CRISP-DM methodology (6 steps process) to initiate flight research, aligning it with strategic business objectives.
- Deep dived into the dataset(6.5M records) and extracted descriptive statistics.
- Generated visualizations with the help of matplotlib, seaborn libraries to know about the nature of the data points.
- Performed correlation and causality analysis, identified date column as the only regressor column related to target variable.
- Performed data transformations using pandas to_datetime function to convert date column into weekday number.
- Built random forest model and enhanced accuracy by 78% and mitigated error metrics using seed, cross validation techniques etc.
- Performed congestion analysis in baggage wise and time of the flights congestion index, identified 3 peak intervals in the day.
- Implemented a concept called SMTP(Safe Mail Transfer Protocol) to automated alerts in goggle colab and reduced execution time by 60%.
- Detected a 50% drop of baggage counts due to rainfall and thunderstorm during diagnostic stage by analyzing residual plots.
- Developed an automated feature engineering and fraud detection pipeline using Featuretools and XGBoost, improving model accuracy for high risk transaction detection and data imputation by 15%.
- Designed and orchestrated ETL workflows in Apache Airflow to migrate data from CMR to MDM, increasing data stewardship reliability to 76% and stabilizing downstream analytics.
- Streamlined ingestion of large unstructured zip files into SAS by combining UNIX utilities (unzip, grep, awk) with SAS scripts, reducing data loading and preprocessing time by 40% compared to legacy workflows.
- Enhanced data quality audits and compliance analytics using advanced SQL (CTE, joins, subqueries, case when) under CCPA 2020, accelerating validation and regulatory reporting.
- Collaborated with engineering and product development teams to automate profiling and compliance reports using Python (Pandas) and Autosys/Bitbucket, reducing manual validation by ~30% and strengthening data integrity monitoring.
- Built and validated predictive risk models in SAS Enterprise Miner and Python (Scikit-learn) for loan default prediction across products such as personal loans, gold loans, and fixed deposits, improving underwriting accuracy by 12%.
- Implemented K-Means clustering in SAS and Python on 4M+ customer records to segment portfolios by behavior and holdings, enabling targeted cross-sell campaigns that increased conversion by 22%.
- Automated portfolio performance reporting using PROC REPORT, dynamic SAS macros, and PROC SQL, cutting manual reporting time by 60% and providing near real-time visibility into loans, transactions, and customer behavior.
- Optimized data pipelines with PROC SORT and macro-driven workflows, reducing query latency by 35% for fraud and compliance reports across 20+ banking products.
- Prototyped VB portfolio dashboards and delivered executive-ready views for BI, Risk, and senior leadership stakeholders.
Click on the image to view the project
- Received Silver Award in Citi bank for California Consumer Privacy Act 2020, CMR project.
- Received Bronze Award in Citi Bank for CMR to MDM migration project.
- Received Work Excellence Award in ICICI bank
- Completed the Texas Higher Education Coordinating Board's AI Professional Development Program (Sept–Dec 2024), gaining hands-on competency in prompt engineering, custom GPT development, AI-enhanced content creation, ethical AI considerations, and trustworthy generative AI frameworks, aligning technical skills with responsible AI deployment strategies for academic and industry applications.
- Collaborated with a cross-functional team of four graduate students to investigate bias propagation, hallucination patterns, and fairness concerns in LLM-based systems, conducting systematic prompt-response experiments, behavioral analysis, and statistical evaluation to quantify model variability and inform design recommendations for more transparent and equitable generative AI applications in educational and research contexts.
- Participated in speaker sessions and case-based workshops on business analytics, data visualization, and predictive modeling, gaining exposure to real-world applications and tools.
- Collaborated with peers in analytics challenges and networking events, strengthening problem-solving, presentation skills, and industry connections.
- Attended the UNT Data Science Talk Series featuring leading experts, rising stars, and academic leaders presenting on cutting-edge topics including machine learning, artificial intelligence, data visualization, big data analytics, and ethical considerations in data usage, broadening exposure to industry best practices and emerging research trends.
- Participated in collaborative learning sessions that fostered knowledge exchange on state-of-the-art methodologies such as deep learning frameworks (TensorFlow, Keras), cloud-based ML tools, natural language processing, and computational data science techniques, strengthening technical acumen and staying current with the evolving data science landscape.
- Participated in workshops on Hugging Face models, including practical tokenization sessions emphasizing LLMs for real-world AI applications.
- Promotes collaborative projects, research, and learning in AI/ML open to all majors, building skills in model deployment and innovation.
- Engages members in events like online workshops that explore AI usability, such as fine-tuning LLMs for tasks like natural language processing and ethical AI use.
- Awarded a Participation Certificate for the online workshop “How to Use Hugging Face AI Models?” in Feb 2026, recognizing active engagement in LLM and AI usability sessions (certificate: Link).


