Welcome !
Since Feb 2021
Keywords: Strategic thinking, Technical leadership, Systems design,
Mentorship, People management, Software development management, Operations
management, Recruitment, Documentation, User training and support.
Team size: 7 people.
I lead the Informatics Infrastructure team for the Tree of Life programme. The team:
I arranged the team’s growth in two phases:
My ambition is to provide the most efficient platform for assembling and analysing genomes at an unprecedented scale. The Tree of Life projects will generate tens of thousands of high-quality genomes in the coming years — more than have ever been sequenced. This is a challenging and exciting endeavour that will shape the future of biology.
I’m especially proud to have introduced rigorous nf-core best practices (even extending the tooling) and Nextflow pipeline standards into the programme (talk). I also helped shift our teams’ focus toward compute efficiency and green computing (talk).
We act as the interface between the Tree of Life teams (assembly production and faculty research) and Sanger’s IT teams, collaborating with informatics groups across other programmes. Our work spans assembly methods, genomics, comparative genomics, cloud computing, and large-scale analyses, with a strong emphasis on metadata tracking, quality control, and event recording. The team supports the “Genome Engine” from beginning to end.
In parallel, I serve as Head of Informatics for Tree of Life. In this role I represent and support Tree of Life informatics within the Institute by ensuring sufficient IT capacity and policy compliance, and by drafting strategy and policy documents for the department, the Institute, and partner projects such as ERGA (links).
I worked in the Comparative Genomics team of the Ensembl genome browser, aka Ensembl Compara. The team performed comparative analyses, developed new methods and algorithms (including API and database schema extensions), and applied them to new datasets. Scalability was a primary concern as we processed hundreds of genomes under tight timelines.
I also contributed to the development of the eHive workflow manager, a system for creating and running workflows on distributed compute resources. eHive scheduled and executed the equivalent of over 1,000 CPU years of compute each year for Ensembl.
Oct 2019 to Jan 2021
Keywords: Development management. Technical leadership. Recruitment. Mentorship.
API development. Database design and optimisation. Workflow design and development.
User support (data, API, workflows).
Team size: 3 people.
In this role I transferred knowledge accumulated over eight years to the new Project Leader and developers, while supporting and overseeing software development (details).
We initiated a major revamp of compute workflows and data storage to handle the scale of projects such as the Darwin Tree of Life, aiming to provide comparative analyses for tens of thousands of genomes and beyond. I also advocated for and helped initiate a new Ensembl core library in Python.
I continued to maintain and contribute to the eHive workflow manager, ensuring it remained the most efficient solution for Ensembl’s Comparative Genomics workflows.
May 2014 to Sep 2019
Keywords: Project planning and management. Scientific and public communication. Technical leadership. Recruitment. Mentorship.
Development management. Reporting. Data-production planning and operational management.
API development. Database design and optimisation. Workflow design and development. Data
production under tight deadlines. Processing of large datasets. User support (data, API, workflows).
Team size: 5 people.
I managed the Comparative Genomics team for Ensembl, including development of the eHive workflow manager.
We integrated TreeFam activities into Ensembl and prioritised scaling up analyses while maintaining high data quality, introducing novel metrics to monitor performance.
During this period we published landmark papers describing comparative resources and ncRNA phylogenetic analyses.
Within eHive, I initiated support for additional programming languages, starting with Python, which enabled teams to plan a gradual phase-out of Perl. We authored an extensive user manual for eHive and provided technical support to other Ensembl teams. I also added container support (via Docker Swarm) to allow eHive to run in cloud environments.
May 2013 to May 2014
Keywords: Development management. Reporting. Technical advisor. Data-production planning and operational management.
API development. Database design and optimisation. Workflow design and development. Data
production under tight deadlines. Processing of large datasets. User support (data, API, workflows).
Team size: 2 people.
I managed part of the Comparative Genomics team at Ensembl, including development of the eHive workflow manager, while continuing developer duties. Our work focused on reconstructing phylogenetic trees and gene families, improving the software, and delivering Ensembl API workshops.
Jan 2011 to May 2013
Keywords: API development. Database design and optimisation. Workflow design and development. Data production under tight deadlines. Processing of large datasets. User support (data, API, workflows).
I focused on the pipeline that reconstructs protein phylogenetic trees, reshaping the API, and improving the software. I also delivered Ensembl API workshops.
Sep 2006 to Dec 2010
Title: Reconstruction of ancestral vertebrate genomes
I developed methods to infer the genome structure of ancestral species (the last common ancestors of groups of extant species — here ~50 vertebrates) at multiple scales (chromosome count, chromosome content, gene order). We also created a database and genome browser, Genomicus, to share the data. Because Genomicus uses Ensembl data, it is updated every two months after each Ensembl release.
Key 📖 publications include a review paper, a methods paper, and the project website. The thesis (French) and presentation (English) are available online.
| PhD | Bioinformatics | Sep 2006 to Dec 2010 | École normale supérieure, Paris, France |
| MSc | Bioinformatics | Sep 2005 to Aug 2006 | Évry university, France |
| Engineer MSc | Computer science, Software development, Mathematics | Sep 2003 to Aug 2006 | ensIIE, Évry, France |
| BSc | Mathematics | Sep 2004 to Jun 2005 | Paris Diderot university, France |
| Classes préparatoires | Mathematics, Physics, Computer science | Sep 2001 to Jun 2003 | Lycée Louis-le-Grand, Paris, France |
Jun 2005 to Aug 2005
I improved the user interface of Exogean, a tool for annotating gene structures in eukaryotic genomic DNA, by developing a web interface that enabled remote use. The site presented a series of forms to guide users through data upload and configuration, then launched the analysis as a background job.
This experience was pivotal in steering my career toward bioinformatics.
Technologies used: PHP, XHTML, CSS
Jun 2004 to Aug 2004
I worked on the leading project of the company: OptikLeader, a professional software for opticians. This was my first ever contact with Java, which I quickly learned having practised C++ (on my personal time) for a number of years.
Technologies used: Java, Hibernate, PostgreSQL
QA, 2024
Course on programme management building on project management fundamentals. The principles and approaches helped in working with senior leaders and discussing change across large areas.
Course URL: Website
scrum.org, 2020
Scrum is a leading Agile methodology. I briefly experimented with it in Ensembl Comparisons before adopting it more broadly in my Tree of Life team.
Course URL: Website
Association for Project Management (APM), 2017
Course on project management. Principles from the course have been applied across Ensembl management layers to plan and manage development and other projects.
Course URL: Website
F-Secure Corporation, 2004
Learned how to configure F-Secure software in enterprise contexts.