
Senior Data Scientist · PhD · Computational Biology
Data science, ML & drug discovery - for everyone.
I'm a Senior Data Scientist bridging computational biology and experimental research, with expertise in machine learning, bioinformatics, and tool development across cancer genomics, immunology, and drug discovery. I think a lot about how AI is changing science - and about making sure those changes reach every researcher, not just those with a machine learning background. My work tries to do that: building practical tools, leading interdisciplinary teams, and figuring out how to get the most out of modern AI in real biological research.
About me
I'm a Senior Data Scientist at Nexus BioQuest, a contract research organisation in Bristol, where I work across data science, machine learning, and analytical tool development in support of research programmes spanning pharmaceutical and biotech clients.
My PhD at the University of Bristol, funded by a competitive Cancer Research UK studentship, focused on predicting the functional impact of genetic variants in cancer genomes. I've since worked at Roche in Basel and Zürich, exploring protein language models and antibody optimisation - and published four peer-reviewed papers across cancer genomics, variant prediction, and drug discovery.
Beyond the code, I've led interdisciplinary teams to back-to-back hackathon victories - at Cambridge and the Wellcome Collection - and co-organised Bristol's first AI in Health meeting, securing two interdisciplinary research grants. I believe the best science happens at the edges of disciplines, and I love building the teams and environments where that becomes possible.

A perspective on AI in science
The tools exist. The clinical evidence is starting to follow. The harder question is who can actually use them.
"Some of the most powerful tools in the history of biology are sitting in research papers and GitHub repos that most bench scientists have never heard of. That feels like a problem worth working on."
AlphaFold more or less solved protein structure prediction in 2021 - a problem that had been open for 50 years. AI-designed drugs are now reaching clinical trials. The industry is reorganising around this, with pharma companies partnering with specialist AI firms and embedding NVIDIA infrastructure directly into their R&D pipelines. These are not future possibilities. They are happening now, and the pace is accelerating.
But using these tools well still requires an unusual combination of skills - enough ML to run and adapt the models, enough compute to work with them, and enough domain knowledge to ask the right questions. Most biologists have one of those things, maybe two. That gap is real, and it matters. I came into data science from the biology side, so I know what it feels like to have a question you cannot answer because the tools are out of reach. That is what drives most of what I build and write about.
The deeper analysis - the specific partnerships, what the clinical evidence actually shows, what NVIDIA's infrastructure deals mean in practice, and the honest open questions about whether this produces better drugs - is in the blog.
Foundation models reshaping biology
Projects & Hackathons
A mix of published tools, hackathon projects, and pipelines built for real research problems.
Led the winning team at GetSeen Ventures' AI × Cancer Bio Hackathon. Used transformer encoders on SMILES strings and high-content image embeddings from the RxRx3-core dataset to predict molecular pathways. Ongoing collaboration likely to result in publication.
Led the winning team at the Roche & HDR Hackathon. Encoded protein sequences with pre-trained language models (ESM, AntiBERT) and explored CNNs to model sequence-function relationships using DMS data from Protein Gym. Secured a Roche AI internship as a direct result.
Built a post-acquisition flow cytometry analysis pipeline with an intuitive Streamlit interface, applying unsupervised ML - clustering and dimensionality reduction - to high-dimensional cytometry data to uncover cell population patterns and accelerate downstream reporting.
Published a data mining toolkit integrating molecular annotations for SNVs, creating a centralised resource that reduces redundancy and accelerates machine learning model development for variant effect prediction.
At Roche pRED, used TensorFlow models grounded in global epistasis and pre-trained protein language models to predict binding affinity from deep mutational scanning data. Ongoing collaboration with University of Oslo, aiming for publication.
Co-organised Bristol's first interdisciplinary AI in Health Meeting in collaboration with the Elizabeth Blackwell Institute. Facilitated cross-disciplinary collaboration that resulted in two interdisciplinary grants for applied AI projects.

Skills & Tools
Picked up across academia, industry, and a few hackathons.
Publications
Four published works spanning cancer genomics, variant effect prediction, and drug discovery.
Get in touch
If something I've written resonates, if you're working on an interesting problem, or if you just want to talk about AI in biology - feel free to get in touch. I am always up for a good conversation.
Writing
AI is going to reshape how biology is done. I think scientists at every level need to understand what is actually happening - not the hype version, and not the version that assumes a computer science degree. I write here to try to bridge that gap: covering real tools, real evidence, and real implications, in language that a working scientist can use. Four themes: opinion, industry analysis, technical walkthroughs, and practical guides.