Guus Bouwens | Data Scientist & AI Engineer

Projects

Women's Health Model

Mendeley dataset; 76 engineered features; gradient boosting; 73.6% accuracy on women's menstrual cycle phases.

Neural Cellular Automata

PyTorch model growing patterns from a single seed; demonstrates self-healing from user damage via a learned convolutional update rule.

Pulse Rate Algorithm

Troika PPG + accelerometer; feature engineering + RF regression; 8.8 BPM MAE @ 90% availability; validated on CAST clinical data.

Early-Onset Colon Cancer Analysis

Analyzed 1.48M cancer cases from SEER21 dataset using Python/ML to identify 15 novel early-onset risk factors (up to 5.4x) including tumor size paradox and racial disparities; achieved AUC=0.65.

Crypto Dashboard

Real‑time price + social sentiment; continuous YouTube ingestion and scoring; trend indicators and watchlist alerts.

Movie Recommendation System (Harvard)

MovieLens 10M: baselines → regularized movie + user + Release effects) → matrix factorization; parameter tuning; gives best RMSE 0.783.

Mechanistic Interpretability

Probed DeepSeek‑R1 layer activations across tasks; compared layer groups to reveal specialization and cognitive pattern emergence.

NeuroEvolution Flappy Bird

Pygame simulation; genetic algorithm trains a neural network; fitness based on survival time; features elitism and mutation.

Highway Driving (Reinforcement Learning)

Built a full top-down autonomous driving stack in Python using Highway-Env + Stable-Baselines3 PPO, iteratively tuning reward shaping and episode length to boost speed discipline and delivered a model that consistently survives 200-step evaluations while maintaining high-speed, collision-free highway behavior.

Local RAG

Built production-grade retrieval-augmented generation system processing 800+ PDFs (1M+ chunks) with hybrid search (ChromaDB + BM25), GPU-accelerated cross-encoder reranking, multi-tier conversation memory, streaming responses, and query rewriting—achieving 6-8s query times with context rot prevention and research-backed optimizations.

Food Segmentation (Computer Vision)

Converted the dataset FoodSeg103 masks to YOLO-seg polygons and fine-tuned YOLO11x-seg on my GPU to better segment foods.

Food Picture to Nutrients App

Built an AI pipeline that converts food photos into precise macro and micro nutrient breakdowns using automated USDA data matching. Try it out by clicking here!

University (grade)

Multivariate Statistical Analysis (10/10): I led this report, which presents a multivariate statistical analysis of survey data from 1,100 university students to deconstruct the complex, interconnected factors contributing to student stress. By employing techniques such as Principal Component Analysis and Confirmatory Factor Analysis, the researchers identified and validated a three-factor structure comprising psychological distress, academic pressure, and environmental support. Cluster analysis further segmented the student population into distinct "Thriving," "Moderate," and "High Risk" profiles, while Market Basket Analysis revealed that the combination of poor sleep and low self-esteem significantly predicts high stress. Ultimately, the study argues against one-size-fits-all solutions, advocating instead for data-driven, targeted interventions to address the specific needs of vulnerable student segments.
Probability and Inference Game Using the Newsvendor Problem (10/10): I applied theoretical concepts from the Newsvendor Problem (NVP) to optimize order quantities for a European bakery chain. Using real-world data, I modeled profit functions while incorporating holding costs, and derived the optimal order quantities using parametric and non-parametric methods. Through Monte Carlo simulations, I compared the performance of these methods and analyzed the strengths and weaknesses of each. The results indicated that parametric methods outperformed non-parametric ones when the underlying demand distribution was correctly specified. Additionally, I performed a detailed time series analysis on bakery demand data to identify trends and patterns, especially focusing on the impact of weekends on demand variability.
Classifying Financial News Articles Using Machine Learning (9.5/10): I worked on classifying financial news articles from the Reuters-21578 dataset, specifically identifying articles related to crude oil. The project involved pre-processing textual data through tokenization, stemming, and the removal of stop words. Using a Term Frequency-Inverse Document Frequency (TF-IDF) matrix and Latent Semantic Analysis (LSA) for dimensionality reduction, I built and compared several machine learning classifiers, including Logistic Regression, k-Nearest Neighbors (k-NN), and Naive Bayes (NB). I also implemented the AdaBoost algorithm to boost the performance of NB classifiers. The results showed that Logistic Regression achieved the highest recall, while k-NN outperformed other models in precision. This project provided valuable insights into document classification techniques in natural language processing and their application to financial data.
Factor Analysis of Musical Attributes Across 50 Artists (8.5/10): This project involved constructing an orthogonal factor model (OFM) to uncover underlying patterns within a dataset of musical attributes, including energy, danceability, and popularity. By applying a VARIMAX rotation, I identified three key factors—calmness, non-danceability, and studio production quality. These factors provided insights into genre distinctions, artist positioning, and the relative influence of different musical characteristics. The analysis was further visualized by plotting artists on an energy-danceability spectrum, offering valuable information for music industry stakeholders and recommendation systems.
Stochastic Volatility and CAPM with Time-Varying Beta (8.5/10): This project involved modeling volatility dynamics for the S&P 500 using a Stochastic Volatility (SV) model and comparing it to traditional GARCH models. The SV model demonstrated high persistence and volatility clustering in financial time series, providing valuable insights for risk management. The project also incorporated a dynamic version of the Capital Asset Pricing Model (CAPM) with time-varying beta to capture evolving market risks for Microsoft, Bank of America, and Exxon Mobil. By employing Maximum Likelihood Estimation (MLE), I assessed changes in market exposure over time and explored the benefits of dynamic risk modeling for portfolio management.
Ethics of Supplying Drugs for Lethal Injections: A Moral Dilemma (8.5/10): In this essay, I explored the ethical considerations surrounding pharmaceutical companies supplying drugs for lethal injections used in executions in the United States. The analysis drew from various ethical frameworks, including utilitarianism, Kantian ethics, and Aristotelian perspectives, to examine whether it is morally justifiable for pharmaceutical companies to provide drugs for executions. The essay highlighted the conflict between the human right to life and the government's effort to make executions as humane as possible. The ethical dilemma was further complicated by the potential consequences of refusing to supply drugs, which could lead to less humane alternative execution methods. Ultimately, the essay provided a nuanced evaluation of this complex ethical issue from multiple moral viewpoints.
Time Series Analysis (10/10) and Vector Error Correction Models (8/10): In these two projects, I conducted a detailed time series analysis of macroeconomic indicators, focusing on French and Euro area data. The first project involved modeling GDP growth, inflation, and interest rates for France using vector autoregressive (VAR) models, performing hypothesis tests, and assessing model stability. The second project expanded on this work by analyzing Euro area time series data, incorporating housing wealth and exploring cointegration relationships using Vector Error Correction Models (VECMs). I applied advanced statistical methods to uncover long-term equilibrium dynamics and short-term adjustment processes.
Volatility and Risk Forecasting Using GARCH and GAS Models (8/10): I developed models to predict the volatility and risk measures of daily returns for financial assets, including a stock (ASML), a cryptocurrency (Bitcoin), and an exchange-traded fund (S&P 500 ETF). Using both GARCH and Generalized Autoregressive Score (GAS) models, I generated 1-day-ahead forecasts for volatility and conducted backtesting to assess the accuracy of these models in predicting Value at Risk (VaR) and Expected Shortfall (ES). The analysis revealed that GARCH models performed well for low-volatility data like the S&P 500, while GAS and HARCH models better captured the volatility dynamics of highly volatile assets like Bitcoin. This project provided insights into the relative strengths of GARCH and GAS models for financial risk management across different asset classes.
Predicting Law Graduate Salaries Using OLS and k-NN Regression (8/10): I developed and compared regression models to predict the starting salaries of law graduates based on factors such as law school rank, LSAT scores, GPA, and the cost of education. Using Ordinary Least Squares (OLS) and k-Nearest Neighbor (k-NN) regression, I analyzed the relationship between law school prestige and graduate salaries. The project also incorporated Monte Carlo simulations to assess uncertainties in the results. Ultimately, the KNN model outperformed OLS in several instances, demonstrating better predictive accuracy for this dataset. This analysis provided valuable insights into the factors influencing graduate earnings and the benefits of non-parametric methods like KNN in regression analysis.
Bootstrap Methods for Time Series Analysis (8/10): I explored the performance of various bootstrap methods in time series analysis, particularly focusing on heteroscedasticity in data. Through an in-depth comparison of methods such as the residual bootstrap, wild bootstrap (with both normal and Rademacher distributions), and pairs bootstrap, I evaluated their effectiveness in hypothesis testing, particularly under heteroscedastic conditions. Results indicated that the wild bootstrap with Rademacher distribution consistently provided the most accurate results. This project offered valuable insights into the reliability of bootstrap methods in econometric models, particularly when dealing with time series data.
Modeling Volatility in the German Stock Market (8/10): I analyzed the volatility dynamics of the German stock market, focusing on the DAX index using GARCH and GJR-GARCH models. The goal was to identify patterns of volatility clustering and assess risk using various Value-at-Risk (VaR) measures. By comparing multiple GARCH models (GARCH(1,1), GARCH(2,1), and GARCH(2,2)), I identified GARCH(1,1) as the most suitable model. The project further involved portfolio management using multivariate GARCH models, where I estimated optimal portfolio weights for DAX and HSI stocks under different market conditions. Additionally, I explored time-varying conditional correlations using the CCC model and evaluated how model choice impacts risk assessment and investment strategies.

Professional

@ McDermott International, Ltd

RAG optimization: implemented custom evaluation metrics, tested and implemented the best chunking methods, and added metadata for improved retrieval and response quality. Integrated hybrid retrieval and reranking techniques. All of this was done in a secure environment, ensuring no data leakage.

@ Invest-NL

Evaluated and Fine-tuned LLMs: I tackled the challenge of optimizing company classification using state-of-the-art Large Language Models (LLMs). This project focused on evaluating and fine-tuning models like GPT-4o, Claude 3.5 Sonnet, Llama-3.1, and Mistral Large 2 to categorize companies in innovative sectors such as Deep Tech, Life Sciences & Health, and Agrifood. The Llama-3.1–70B-Instruct model demonstrated the highest accuracy and efficiency for the task, surpassing traditional machine learning methods. Additionally, I fine-tuned a smaller Llama-3.1–8B model using Direct Preference Optimization (DPO), achieving a boost in speed and classification accuracy. This work highlights the potential of fine-tuned models for enhancing AI-driven investment decisions. I also wrote an article about this project!
Developed Dashboards for Company Insights by Focus Area: I created comprehensive dashboards that provided teams with a clear overview of companies categorized by sector and subsector. These dashboards included detailed company descriptions and vital financial information such as growth stage, investor details, funding history, and the number of employees. This centralized system offered teams an efficient tool for tracking companies across Invest-NL’s distinct focus areas, enabling better decision-making and resource allocation.

@ Talpa eCommerce

Built a Keras Transformer-Based Recommendation System: I developed a dynamic, transformer-based recommendation engine using Keras. This system leveraged customer behavior data to provide highly personalized product recommendations. By incorporating deep learning techniques, the model continuously adapted to new data, improving the accuracy and relevance of the recommendations over time.
Improved Product Matching Using Similarity Measures: I focused on enhancing product matching through advanced similarity measures. I utilized TensorFlow with web scraping to improve image-based product similarity, applying cosine similarity metrics for targeted marketing strategies, including product-to-product and geospatial recommendations. The implementation of these similarity models led to more accurate product matches, driving improved customer engagement, higher recommendation relevance and thus higher margins.
Analyzed Customer Seasonality with Fourier Transformations: I utilized Fourier transformations to analyze large-scale customer bidding data and uncover seasonal patterns in customer behavior. By examining over two years of auction data, I identified the intervals at which customers returned to bid on similar products. This method allowed me to compute a "Trendiness Score" for each product, combining returning customer percentages, time-weighted intervals between purchases, and dominant purchase frequencies. These insights provided a data-driven approach to predict customer re-engagement and optimize product recommendations based on seasonality trends. For example, the analysis showed that, on average, customers tend to return approximately every 30 days to bid on essential items like toilet paper, making it possible to adjust the timing of product recommendations to match their purchasing cycle.

Independent

Women's Health Model (Menstrual Cycle Prediction): Analyzed the Mendeley Women's Health Dataset containing temperature, sleep, and cycle data from 9 participants (3,644 records total) using advanced feature engineering (76 features including temperature trends, sleep patterns, and temporal factors) to train a Gradient Boosting classifier that achieved 73.6% cross-validation accuracy in predicting menstrual cycle phases (F1, L2, M), with age-cycle interaction and menstrual cycle length being the most predictive features.
Stress prediction from raw ECG signals: Implemented an end-to-end machine learning system that takes raw ECG signals as input and outputs real-time stress level predictions using deep learning. The system processes ECG data from the WESAD dataset (15 subjects at 700Hz sampling rate), extracts 20+ HRV (Heart Rate Variability) features through advanced signal processing, and trains LSTM/GRU neural networks to classify stress vs. baseline states, achieving 85-93% accuracy. The complete pipeline includes data loading, ECG preprocessing with R-peak detection, time/frequency/nonlinear HRV feature extraction, deep learning model training with MLflow tracking, and deployment as a production-ready FastAPI web service that can process 60-second ECG segments in under 500ms.
Pulse Rate Algorithm: Developed a pulse rate estimation algorithm for wrist-wearables using PPG and accelerometer data, achieving a mean absolute error of 8.8 BPM at 90% availability on the Troika dataset via Random Forest regression and signal processing. Applied the model to clinical CAST data, confirming that resting heart rate rises until middle age and then declines, with observed gender and age group differences.
Moneyball Fantasy Football Agent: Engineered a production-ready autonomous AI system for fantasy football management, implementing a comprehensive data pipeline processing 111,994+ fantasy scores across 22+ NFL seasons (2002-2024) with PostgreSQL, Docker, and cloud infrastructure. Developed and deployed Random Forest machine learning models with 60+ engineered features and industry-first temporal validation framework, preventing data leakage while maintaining 99.8% model accuracy on out-of-sample predictions. Built an intelligent decision-making agent using greedy optimization algorithms and Value Over Replacement (VOR) analysis, achieving 105.33 average expected points per week and demonstrating top 20-25% fantasy performance tier. Successfully identified and capitalized on rookie talent (Jayden Daniels, Malik Nabers, Brock Bowers) through contrarian AI strategies, resulting in elite-level strategic consistency across 90+ automated roster decisions. Delivered zero temporal violations across 10-week production testing while processing 13,060+ player records, demonstrating robust MLOps practices and establishing new standards for ethical AI in sports analytics.
Agents project: This project comprehensively explores AI agent development, commencing with a foundational Python-based ReAct agent to illustrate core reasoning and action loops using basic tools and OpenAI's LLM. It then transitions to LangGraph, demonstrating how to construct more controllable agents capable of leveraging Tavily for efficient agentic search, contrasting this with traditional web scraping techniques using DuckDuckGo, Requests, and BeautifulSoup. Subsequent parts enhance these LangGraph agents by incorporating SQLite-backed persistence for stateful, multi-turn conversations and real-time streaming for improved user interaction. The project further delves into advanced control mechanisms, enabling human-in-the-loop oversight for action approval, state modification, and "time-travel" debugging by revisiting and branching from past execution states. Finally, these concepts culminate in the development of a sophisticated, multi-step essay writing agent that iteratively plans, researches, drafts, and refines content, all orchestrated by LangGraph and presented through an interactive Gradio user interface.
Crypto Dashboard: a real-time cryptocurrency dashboard showing price and social media sentiment analysis. It's main addition to other sites is the constant analysis of new youtube videos.
Weather App: I created a weather app that shows everything one needs. It's fast and good. It's based on Node.js, mostly written in typescript and hosted on vercel.
Knowledge-Tok: a modern academic content discovery platform that aggregates research papers and academic content into a TikTok-like scrolling experience. Currently integrated with arXiv, with plans to expand to other academic sources.
Mechanistic Interpretability project: I analyzed how different cognitive tasks manifest in the neural activations of the DeepSeek-R1 model through layer-wise activation patterns.
Harvard ML Capstone: I developed a movie recommendation system using the MovieLens 10M dataset, focusing on predicting user ratings for movies. After performing data preparation and analysis, I built three models: a naive model predicting the same rating for all movies (RMSE: 1.06), a movie effect model incorporating movie-specific averages (RMSE: 0.94), and a movie + user effect model that achieved the best performance with an RMSE of 0.87.
I built this website from the ground up using PHP, JavaScript, HTML, and CSS. The site is fully custom-coded, featuring dynamic functionality and a clean, navigable, and responsive design.
I have also competed in some kaggle competitions and am active on leetcode.
More projects and code can be found on my github page.

Data Scientist & AI Engineer/Researcher

Research

Layer-Wise Cognitive Specialization in LLMs

Identifying Compassion in LLMs

Differentiation in Vision Transformers

Projects

Women's Health Model

Neural Cellular Automata

Pulse Rate Algorithm

Early-Onset Colon Cancer Analysis

Crypto Dashboard

Movie Recommendation System (Harvard)

Mechanistic Interpretability

NeuroEvolution Flappy Bird

Highway Driving (Reinforcement Learning)

Local RAG

Food Segmentation (Computer Vision)

Food Picture to Nutrients App

Certificates

Blogs

How I Built a Research Assistant

The Proxy Trap in AI Training

From Imitation to World Models

Claude's Architecture

Diffusion LLM Mercury 2

Prompt Repetition: A Surprisingly Simple Way To Boost LLM Accuracy

Using Interpretability to Identify a Novel Class of Alzheimer's Biomarkers

Alphabet’s Ascent to the Top

What Pulled Me Into the AI

LLMs explained: mathematically and in plain english

Paper: LLMs are Invertible

Thought: Google Should Win

Thought: Are We Summoning Ghosts or Building Animals? (RL vs LLMs)

Machine Learning for Healthcare - MIT 6.S897

Building a High-Quality RAG Application

DL 2021-2025 Research Papers

DL 2018-2021 Research Papers

Ilya Sutskever Research Papers 3

Ilya Sutskever Research Papers 2

Ilya Sutskever Research Papers 1

Deep Learning Algorithms 4: Generative & Advanced Architectures

Deep Learning Algorithms 3: Sequence Models

Deep Learning Algorithms 2: Computer Vision Architectures

Deep Learning Algorithms 1: Fundamentals

Machine Learning Algorithms 4: Ensemble Methods

Machine Learning Algorithms 3: Reinforcement Learning

Machine Learning Algorithms 2: Unsupervised Learning

Machine Learning Algorithms 1: Supervised Learning

Deep Learning for Computer Vision - Stanford CS231N

Language Modeling from Scratch - Stanford CS336

Foundation Models and Generative AI - MIT 6.S087

Reinforcement Learning - Stanford CS234

Machine Learning - Stanford CS229

Algorithms - MIT 6.006

Mathematics for Computer Science - MIT 6.1200J

Statistics for Applications - MIT 18.650

Fine-Tuning LLMs for Investment-Grade Company Classification

Contact