Welcome !
A selection of projects and initiatives that sat on the side of my job role, or that I did on my personal time.
Since 2022
I administer Tree of Life’s participation in the Google Summer of Code programme. I solicit and help write project ideas across the department (and even other Sanger teams !), prepare and oversee the application, and assist with candidate and project selection.
We are eminently grateful to Google Summer of Code for having us selected in 2022, 2024, and 2025.
May 2025 to Sep 2025
Title: Nextflow Pipelines for Variant Analyses
This project was part of the 2025 Google Summer of Code programme. Yunjia Zhang developed scalable Nextflow pipelines for variant analysis across diverse eukaryotic genomes. The workflows call germline and somatic variants, compute key variant metrics, generate quality-control reports, and produce standardised, interoperable outputs for downstream analysis. Comprehensive documentation enables researchers from diverse backgrounds to run the pipelines confidently and explore genomic variation.
Yunjia’s contributions and knowledge have got her inducted to the Nextflow Ambassador programme.
Project URL: GitHub
Jun 2024 to Dec 2024
Title: Sharing Nextflow code across organizations using the nf-core infrastructure
The BioDev network organised its very first Future Innovators’ Mentorship programme, aimed at mentoring graduates from the global south to take the first steps to a successful career in computation or bioinformatics in life sciences research.
I mentored João Cavalcante in a project to expand the nf-core tooling and add support for Nextflow pipeline components held across multiple repositories. This feature was essential to Tree of Life by allowing us to share code between pipelines and with nf-core, and reduce our technical debt.
João’s contributions and knowledge have got him inducted to the Nextflow Ambassador programme.
Project URL: Blog post
May 2019 to Sep 2019
Title: Using deep learning techniques to enhance orthology calls
This project was part of the 2019 Google Summer of Code programme. Harshit Gupta developed a machine-learning algorithm to predict orthologies in the Ensembl Genomes Browser organisation under the supervision of myself and Mateus Patricio.
We developed a TensorFlow-based algorithm that predicts orthologies directly from sequence data, without relying on phylogenetic methods. The approach achieved high accuracy (>90% in most settings), and we are planning its deployment in Ensembl.
Mateus then joined AstraZeneca and is now Director of AI & Data Engineering.
Project URL: GitHub
Apr 2018 to Sep 2018
Title: Inclusion of pseudogenes in the Ensembl comparative genomics resources
This project was funded by the French Embassy in the United Kingdom. Guillaume Giroussens joined the Ensembl compara team to work on methods to include pseudogenes in phylogenetic trees and homology assessments. His work was presented as a poster at the Genome Informatics conference.
May 2016 to Sep 2016
Title: Graphical editor for XML files
This project was part of the 2016 Google Summer of Code programme. Anuj Khandelwal developed a graphical workflow editor for eHive using Blockly under the supervision of myself and Leo Gordon.
eHive runs computation pipelines in distributed environments and its workflows were configured in a file format that required programming skills. This project aimed to remove that barrier by creating a Blockly-based graphical editor.
We targeted XML as the file format with a Relax NG specification. The editor’s core converts a Relax NG specification to Blockly blocks and matching rules so Blockly diagrams conform to the schema. The interface can import existing XML to visualise workflows as Blockly blocks, edit them, and export back to XML.
The project is not specific to eHive; the editor can handle any specification written in Relax NG.
Jan 2013 to May 2013
The EMBL postdoc retreat is an annual EMBL event that promotes scientific exchange among postdocs and provides a platform to address issues relevant to postdoctoral researchers.
May 2011 to May 2013
I got elected into EMBL-EBI’s Postdoc Committee, whose role is to liaise with the EMBL staff representatives, and manage activities and events for the postdoc community. My role was to organize scientific seminars and invite guest speakers for “Career talks”.
2011 to 2015
I used a Raspberry Pi as a test bed to develop my system administration skills. At its peak it ran a VPN, an email server, file servers (FTP, SMB, DLNA), a personal website, a blog engine, a photo browser, and online games, and provided SSH access.
Jan 2005 to Dec 2005
The role of the Student Association is to oversee the various activities and other associations for students of the ensIIE. We also act as liaison with the school’s director office.
Nov 2003 to Jul 2005
My contributions are mostly on the software side of things: position based visual servoing, IA & FPGA (in C, C++ & VHDL). We participated in the French Cup of Robotics both years.
2005
Through ensIIE’s “junior enterprise”, Dièse, I designed a software tool to analyse and manage costs for electronics manufacturers (project and product visualisation, resource management).
2002 to 2003
Implemented a 3D renderer from scratch (without OpenGL or similar libraries). The program renders 3D objects to raster images, including colour, transparency, lighting, and shading. It was written in C++.
Also familiar with ray tracing, BSP trees, Bézier curves, and B-splines.
2001 to 2002
Implemented a JPEG encoder/decoder from scratch on a TI-92+, based solely on the compression algorithm documentation.
Also wrote a BMP reader/writer and a library to manipulate, transform, and apply effects to images.