CV

Wellcome Sanger Institute, Hinxton, United Kingdom

Informatics Infrastructure Team Lead

Since Feb 2021

Keywords: Strategic thinking, Technical leadership, Systems design, Mentorship, People management, Software development management, Operations management, Recruitment, Documentation, User training and support.
Team size: 7 people.

I lead the Informatics Infrastructure team for the Tree of Life programme. The team:

implements large-scale genome analysis pipelines for faculty teams;
maintains the production environment (application deployment and pipeline orchestration);
manages and curates metadata across our production systems.

I arranged the team’s growth in two phases:

Establish the core systems infrastructure and framework, hiring DevOps, software development, and bioinformatics staff.
Expand capabilities for data curation, management, and analysis.

My ambition is to provide the most efficient platform for assembling and analysing genomes at an unprecedented scale. The Tree of Life projects will generate tens of thousands of high-quality genomes in the coming years — more than have ever been sequenced. This is a challenging and exciting endeavour that will shape the future of biology.

I’m especially proud to have introduced rigorous nf-core best practices (even extending the tooling) and Nextflow pipeline standards into the programme (talk). I also helped shift our teams’ focus toward compute efficiency and green computing (talk).

We act as the interface between the Tree of Life teams (assembly production and faculty research) and Sanger’s IT teams, collaborating with informatics groups across other programmes. Our work spans assembly methods, genomics, comparative genomics, cloud computing, and large-scale analyses, with a strong emphasis on metadata tracking, quality control, and event recording. The team supports the “Genome Engine” from beginning to end.

In parallel, I serve as Head of Informatics for Tree of Life. In this role I represent and support Tree of Life informatics within the Institute by ensuring sufficient IT capacity and policy compliance, and by drafting strategy and policy documents for the department, the Institute, and partner projects such as ERGA (links).

European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom

I worked in the Comparative Genomics team of the Ensembl genome browser, aka Ensembl Compara. The team performed comparative analyses, developed new methods and algorithms (including API and database schema extensions), and applied them to new datasets. Scalability was a primary concern as we processed hundreds of genomes under tight timelines.

I also contributed to the development of the eHive workflow manager, a system for creating and running workflows on distributed compute resources. eHive scheduled and executed the equivalent of over 1,000 CPU years of compute each year for Ensembl.

Principal Developer

Oct 2019 to Jan 2021

Keywords: Development management. Technical leadership. Recruitment. Mentorship. API development. Database design and optimisation. Workflow design and development. User support (data, API, workflows).
Team size: 3 people.

In this role I transferred knowledge accumulated over eight years to the new Project Leader and developers, while supporting and overseeing software development (details).

We initiated a major revamp of compute workflows and data storage to handle the scale of projects such as the Darwin Tree of Life, aiming to provide comparative analyses for tens of thousands of genomes and beyond. I also advocated for and helped initiate a new Ensembl core library in Python.

I continued to maintain and contribute to the eHive workflow manager, ensuring it remained the most efficient solution for Ensembl’s Comparative Genomics workflows.

Project Leader

May 2014 to Sep 2019

Keywords: Project planning and management. Scientific and public communication. Technical leadership. Recruitment. Mentorship. Development management. Reporting. Data-production planning and operational management. API development. Database design and optimisation. Workflow design and development. Data production under tight deadlines. Processing of large datasets. User support (data, API, workflows).
Team size: 5 people.

I managed the Comparative Genomics team for Ensembl, including development of the eHive workflow manager.

We integrated TreeFam activities into Ensembl and prioritised scaling up analyses while maintaining high data quality, introducing novel metrics to monitor performance.

During this period we published landmark papers describing comparative resources and ncRNA phylogenetic analyses.

Within eHive, I initiated support for additional programming languages, starting with Python, which enabled teams to plan a gradual phase-out of Perl. We authored an extensive user manual for eHive and provided technical support to other Ensembl teams. I also added container support (via Docker Swarm) to allow eHive to run in cloud environments.

Interim Manager

May 2013 to May 2014

Keywords: Development management. Reporting. Technical advisor. Data-production planning and operational management. API development. Database design and optimisation. Workflow design and development. Data production under tight deadlines. Processing of large datasets. User support (data, API, workflows).
Team size: 2 people.

I managed part of the Comparative Genomics team at Ensembl, including development of the eHive workflow manager, while continuing developer duties. Our work focused on reconstructing phylogenetic trees and gene families, improving the software, and delivering Ensembl API workshops.

Software Developer

Jan 2011 to May 2013

Keywords: API development. Database design and optimisation. Workflow design and development. Data production under tight deadlines. Processing of large datasets. User support (data, API, workflows).

I focused on the pipeline that reconstructs protein phylogenetic trees, reshaping the API, and improving the software. I also delivered Ensembl API workshops.

École normale supérieure, Paris, France

PhD student

Sep 2006 to Dec 2010

Title: Reconstruction of ancestral vertebrate genomes

I developed methods to infer the genome structure of ancestral species (the last common ancestors of groups of extant species — here ~50 vertebrates) at multiple scales (chromosome count, chromosome content, gene order). We also created a database and genome browser, Genomicus, to share the data. Because Genomicus uses Ensembl data, it is updated every two months after each Ensembl release.

Key 📖 publications include a review paper, a methods paper, and the project website. The thesis (French) and presentation (English) are available online.

Education

PhD	Bioinformatics	Sep 2006 to Dec 2010	École normale supérieure, Paris, France
MSc	Bioinformatics	Sep 2005 to Aug 2006	Évry university, France
Engineer MSc	Computer science, Software development, Mathematics	Sep 2003 to Aug 2006	ensIIE, Évry, France
BSc	Mathematics	Sep 2004 to Jun 2005	Paris Diderot university, France
Classes préparatoires	Mathematics, Physics, Computer science	Sep 2001 to Jun 2003	Lycée Louis-le-Grand, Paris, France

Internships

École Normale Supérieure, Paris, France

Jun 2005 to Aug 2005

I improved the user interface of Exogean, a tool for annotating gene structures in eukaryotic genomic DNA, by developing a web interface that enabled remote use. The site presented a series of forms to guide users through data upload and configuration, then launched the analysis as a background job.

This experience was pivotal in steering my career toward bioinformatics.

Technologies used: PHP, XHTML, CSS

Consultants Informatique Associés, Paris, France

Jun 2004 to Aug 2004

I worked on the leading project of the company: OptikLeader, a professional software for opticians. This was my first ever contact with Java, which I quickly learned having practised C++ (on my personal time) for a number of years.

Technologies used: Java, Hibernate, PostgreSQL

Certifications and training

Managing Successful Programmes

QA, 2024

Course on programme management building on project management fundamentals. The principles and approaches helped in working with senior leaders and discussing change across large areas.

Course URL: Website

Professional Scrum Master

scrum.org, 2020

Scrum is a leading Agile methodology. I briefly experimented with it in Ensembl Comparisons before adopting it more broadly in my Tree of Life team.

Course URL: Website

Project Fundamentals Qualification (PFQ) certification

Association for Project Management (APM), 2017

Course on project management. Principles from the course have been applied across Ensembl management layers to plan and manage development and other projects.

Course URL: Website

“Policy manager” and “Antivirus Client Security”

F-Secure Corporation, 2004

Learned how to configure F-Secure software in enterprise contexts.