The base composition of sequencing reads depends on the library type (RNA, genomic, bisulfite, ChIP, etc.) and the species, and can often be characteristic for a particular sequencing application. For a while we’ve been thinking about a quality control tool that checks if a given base composition matches the expected base composition for the application. In other words, does my library look like it is supposed to? Some of the code of my last year’s hackathon project (Charades) could easily be adapted to put a given base composition into the wider context, but what’s missing is a collection of base compositions for a variety of sequencing libraries. The immediate task would be to think about how to best collect library base compositions and match them up with meta data about library type for a variety of published applications. I will most likely be working on a different project this year, but I’d be happy to join in with discussions and provide ideas.
Christel obtained her PhD in Microbiology from the University of Erlangen (Germany) before she joined Wolf Reik’s lab at the Babraham Institute as a postdoc. Over the next decade she worked on several projects related to epigenetics and mammalian development, including imprinting, spatial organisation of the genome and global transcription. With the emergence of epigenomics, Christel started spending less and less time in the lab and more and more time on the computer. In 2016 she joined the Babraham Bioinformatics Core Facility, and since 2017 she is working as a Bioinformatician for Epigenetics. She is involved in projects from several groups within the Epigenetics ISP.