Abstract: DNA alignment has been a killer app for the FM-index, but aligning DNA reads against a single genome can bias research results and medical diagnoses. In the past few years we have found ways to FM-index datasets of thousands of genomes, but researchers want the results expressed in terms of compact representations called pangenome graphs. Hundreds of matches in the dataset may correspond to only one or two matches in the graph. Given a read, therefore, we would like to find which parts of it match well and where they match in the graph, in time depending on the length of the read and the number of matches in the graph but not on the number of matches in the dataset. We are now closing in on that goal; this talk will give a high-level view of the challenges and some potential solutions.
Travis Gagie is an associate professor of computer science at Dalhousie University in Canada, specializing in compact data structures for bioinformatics. He moved from Canada to Italy for three years during his PhD to study the FM-index with its co-inventor Giovanni Manzini; switched schools to graduate in genome informatics at Bielefeld University, Germany; moved back and forth from Chile to Finland for ten years to work with Gonzalo Navarro and Veli Mäkinen on indexes for pangenomic datasets; and finally returned to Canada in 2019. He has published about 150 papers with about as many coauthors and is always happy to add more of either.