How do you sequence the genomes of 70,000 species?

Wednesday 30th Oct 2024, 12.30pm

Welcome back to the new series of the Oxford Sparks Big Questions Podcast! We are here to answer weird and wonderful questions about our world, with the help of science. And we’re starting with a very big question! How do you sequence the genomes of 70,000 species?

Dr Liam Crowley, from the Department of Biology, tells us about the ground-breaking Darwin Tree of Life project, which aims to sequence the genomes of over 70,000 species in Britain and Ireland. Discover the challenges and technological advances that make this monumental task possible, and explore the potential applications in fields like conservation genetics and evolutionary biology.

Tune in to find out how this project could revolutionise our understanding of biodiversity and the future of life on Earth!

Read Transcript

Emily Elias: A genome tells us the genetic building blocks of what makes something… something. It took over a decade to figure out the human genome, and now a group of researchers are thinking bigger. On this episode of the Oxford Sparks big Questions podcast, we are asking, how do you sequence the genomes of 70,000 species?

Hello, I’m Emily Elias, and this is the show where we seek out the brightest minds at the University of Oxford, and we ask them the big questions. And last time we spoke to this researcher, it was about bed bugs. So, hopefully, this time around, there will be less anxiety and creepy crawlies on your skin.

Liam Crowley: Hello. So I’m Dr Liam Crowley and I am a postdoctoral researcher in the Department of Biology at the University of Oxford. And I am working on a project called the Darwin Tree of Life Project.

Emily: What is the Darwin Tree of Life Project?

Liam: So the Darwin Tree of Life Project is a very exciting, ambitious project, which is a collaboration between lots of different institutions, including universities, but also museums and botanic gardens. And we have the ultimate aim of trying to sequence the full genome of every single species of eukaryote in Britain and Ireland.

Emily: That sounds like a lot of species.

Liam: Yeah, ambitious is definitely a good word to describe the project. So, based on our current list, of what we expect there to be in Britain and Ireland is more than 70,000 species of animal, plants and fungi and protist.

Emily: Maybe we should start from the basics. What exactly is the genome that you’re sequencing?

Liam: So, the genome is all of the genetic material held within the cells of these organisms. So, as well as everything inside the nucleus, with all the chromosomes, it’s also everything outside the nucleus. So, there’s some DNA held in things like mitochondria. So, we want to take every single sequence of those four bases that make up DNA – adenine, thymine, guanine, cytosine. And the order of those bases across all of the millions of base pairs that comprise that genome.

Emily: But why would you want to do this?

Liam: Well, it’s a very good question. So, the first way you could kind of answer that is actually just to say, because we can. Because actually, this is the first time in all of human history where something like this would even be feasible. And the first genome that we did, one of the first genomes that we produced was our own genome, the human genome. That took about a decade and billions of dollars to do just one genome, but actually that revolutionized all sorts of different fields of research and medicine.

So now it’s the turn of everything else, the rest of biodiversity. We want to try and eventually sequence all of the DNA on the planet, and it will give us a much greater understanding of all these different species. But also there’s loads of different applications for genomic science.

Emily: You’re into insects. How would you apply this, then, to an insect? What sort of thing would you be able to take away?

Liam: Yeah, that’s right. I’m an entomologist. I am focusing on trying to find all the different species of insects that we have at Wytham Woods so then we can then repair those specimens, extract the DNA, and sequence them. And there’s lots of different things we can do from this.

So I categorise it in two different ways, the first of which I would describe as discovery science. So we don’t know what we don’t know. So, actually, just by looking at all this data, we can start to find patterns and interesting things going on with the genomes themselves. And we can also see how these different species are evolving and how perhaps they’re related to other species and how they’re evolving as their convergence, or their unique mutations and adaptations that are arising within specific genes or gene families in these genomes.

And then the other thing that we can do is what we like to call enabling science. So we have this really kind of grandiose sentence that we can say that genomes are fast becoming an essential component of a 21st Century biology toolkit, meaning that more and more genomes are becoming a fundamental prerequisite to then allow us to do a whole range of different other scientific inquiries.

So, a really good example of this would be for conservation, and conservation genetics. So, if you want to see how related a vulnerable or isolated population is to each other or to different populations, we can do various sequencing to kind of see that genetic diversity. But before we can do any of that, we need to actually have that original reference genome so we know what part of the genome to look in to sequence, because we can’t sequence an entire genome every single time, but we can very quickly and easily sequence just very small snippets. So, it’s all about knowing where to look.

Emily: How long does it take to sequence something? I mean, the human genome took, what, like 13 years to do. So, like, I can’t imagine this is a quick turnaround.

Liam: Yeah, that’s right. The first few, because we were kind of figuring out the process did take a very long time. But actually, both the time it takes and the cost it takes have decreased beyond exponentially, which is really quite impressive. We can sequence a genome very, very quickly. Because this is all happening at scale, it’s hard to say how quick one particular genome might take. But with whole batches of genomes going through this process, best case scenario, we could actually go from collecting a beetle, from a log in Wytham Woods to actually publishing a full, high quality genome within a matter of weeks, potentially.

Emily: That’s insane.

Liam: Yeah.

Emily: So we’ve gone from years down to weeks. Is that like the power of AI, or is it something else at play?

Liam: It’s to do with how we actually sequence it, so, find out the order of those bases. And then the software and the programs and the way that we put those sequences together. It’s a little bit like doing a jigsaw puzzle. If you had a jigsaw puzzle with a few large pieces, it’s a lot easier to put it back together than if you had one with lots and lots of small pieces.

So new modern sequencing technology is called long read sequencing. And that’s exactly what we’re doing, where we actually start from larger, original fragments of DNA. And that’s really helpful because DNA is actually really repetitive. So, it’s really difficult to know, actually, which bit of DNA, which cell that bit of DNA actually originally came from. So, it’s like doing a jigsaw with loads and loads of pieces where they’re all gray and they’ve all been shoved into one giant bag, and you’re trying to work out what on earth goes where.

Emily: So I guess that you’re kind of in the process of making this giant library of genomes. What would be the hope that somebody would be able to do with it? Would it be like, oh, I’m really curious in this beetle. Let me go take out the beetle book and see what’s been happening with these guys.

Liam: Yeah, absolutely. So, one of the big pillars of this project is it’s all completely open and available at every single stage. So, all of the draft data and everything is all made available, kind of with the caveat that, yes, it’s not finished, there may be mistakes. Hopefully, the finished project will be brilliant. And so far, the quality has been unbelievably good.

It’s almost like we’re building a library and we’re putting the books onto the shelves, and then anyone around the world can come and they can take these books and they can do whatever research they want to do from that. So, we already have examples of people using our genomes, mostly in genomic science, but also in other fields as well, where some really exciting projects.

Emily: Do you have any examples of what people have done when they take those books off the shelf and what they’re using it for?

Liam: Yeah, so, there’s actually some really nice examples from the insects, particularly in conservation genetics.

Emily: Oh, you would say that you love your insects.

Liam: Yeah, not that I’d buy it at all! But there’s this one insect, a butterfly. It’s called the large blue butterfly and it actually went extinct in the UK and then there was a reintroduction from some Swedish individuals. And, it’s now doing really well thanks to some quite intensive conservation efforts. And it’s actually spreading. But because you have these kind of meta populations across these areas and they’re potentially quite restricted, it’s really important that we know how related they are and actually do we need to intervene? Perhaps we could translocate individuals or just kind of look after healthy genetic diversity for these populations. But we won’t be able to do any of that conservation genetics before we have that original reference genome.

So, yeah, we worked very hard and managed to get permits and permissions in place to take a couple of individuals to sequence, which would have no impact on the population there, taken from one of the sites where it’s doing best. And we are now producing that genome. So as soon as that’s finished, we have a direct application of people ready and raring to go to then do some really important conservation genetic work.

Emily: And how do you produce a genome? Like, do you just send it to the genome factory and then it spits it out like a box of crackers? Or how does that work?

Liam: Yeah, so, the one word answer to this would be teamwork. So we have a huge team and process which we have been perfecting over the last four or five years. But essentially you have someone like me, who goes out and finds the species, identifies them and then preserves. So everything is flash frozen at -80 to preserve that high quality DNA. That then goes to the Sanger Institute in Cambridgeshire, where they essentially break open all the cells, and extract that DNA. That DNA can then be checked for quality and then if it looks like it’s good, then it can be loaded onto the sequencing machines. They then determine the order of those bases and then all of that data goes onto the assembly step, which is bioinformatic processes, which then reconstruct the jigsaw, and then there’s various kind of post assembly checks of quality.

We have other techniques going on at the same time to make sure that all the scaffolds and some of the high level structures of the genome are all being put together correctly. And then we’re even doing some annotation. So actually sequencing some of the RNA alongside with the DNA to see where the genes are. And actually, can we label genes on the genome and try and have some work related to that? So it’s, yeah, a lot of different people in part of this big pipeline. But there seems to be a very good process where we’re kind of trying to link back to each other and make sure we keep track of specimens and everything is working all to a very high quality.

Emily: Okay, so you guys have got a goal of 70,000…. 7..0. Where are you at in that process?

Liam: Yeah. Ah, it’s a big number. And the first thing to say is, actually, at the start of the project, we didn’t even really know how many that is. So that’s kind of our best guess, because actually we’ve had hundreds of years of taxonomy and natural history in Britain and Ireland, and we still haven’t named every single species we think is here. And we’re finding there’s new cryptic specie, or perhaps we got some stuff wrong in the past. So it’s a big ask to kind of complete taxonomy for a nation! But we’re doing pretty well. We have collected more than 10,000 species in the initial phases of the project. And we have been sequencing a large number of these and we’ve released more than 1300 genomes so far. But that rate of new genomes coming out is going up all the time.

Emily: I hate to be that guy, but, like, what does this mean for the future? If you are able to sort of, like, perfect this process, get all of this information, build up this massive library of books, species books that we don’t even know how big it could be. What could this mean?

Liam: Well, at our launch meeting back in 2019, Professor Mark Blackstar, who’s one of the lead investigators for the project, kind of stood up at this internal meeting and said, this project is going to change biology. And at the time I thought, oh, that’s kind of just trying to tee everyone up and get everyone enthusiastic. But actually, as it’s gone on, there’s more I kind of agree, actually. This is revolutionary. So, as I alluded to before, there’s a whole range of different investigative techniques which are unlocked. We kind of can call it genome enabled research.

So there’s all these various different things from sorting out taxonomy and resolving phylogenies to study of evolution and how genomes themselves, as well as the organisms and genes and gene families, are evolving. And then there’s the actual applied applications like the conservation genetics and even potentially biodiscovery.

Again, we don’t know what we don’t know. There could be all sorts of amazing biological compounds held within the organisms which are all encoded within the genomes. So, having that is really exciting and is really important step. And the ultimate goal is, yeah, we want to sequence everything on the planet, particularly in the face of the mass extinction event and unprecedented biodiversity loss, it becomes even more important. And, you know, if there’s even potential science fiction applications kind of thing in the future with de-extincting species. Although that’s a whole other tangent.

Emily: We’ll save that tangent for another day!

This podcast was brought to you by Oxford Sparks from the University of Oxford with music by John Lyons and a special thanks to Liam Crowley.

Tell us what you think about this podcast. We are on social media @OxfordSparks. Or you can go to our website oxfordsparks.ox.ac.uk. It’s a pretty cool website. I mean, it’s got stuff on it.

I’m Emily Elias. Bye for now.

Topics: