Tuberculosis (TB) is one of the top 10 causes of death worldwide and the leading cause of death from an infectious agent Mycobacterium tuberculosis var. tuberculosis (MTB) affecting 10 million people who fell ill with TB in 2018 with around 1.2 million deaths. Drug resistant TB poses a major threat to the World Health Organization’s “End TB” strategy which has defined its target as the year 2035. In 2018, there were about 0.5 million cases of drug resistant TB, of which 78% were resistant to multiple TB drugs. The traditional culture-based Drug Susceptibility test (the gold standard) often takes multiple weeks and the necessary laboratory facilities are not readily available in low-income countries.

Predicting the occurrence of drug resistance based on application of Machine Learning (ML) on the whole genome sequencing (WGS) data will pave the way to an early diagnosis and an efficient treatment in a much earlier time as compared to the gold standard culture-based phylogenetic drug susceptibility testing.



This project aims to explore

  1. Exploratory data analysis, to understand the various variables in the dataset.
  2. Feature engineering approaches to understand whether Single Nucleotide Polymorphism (SNP) provides a good foundation for prediction.
  3. Random forest approach for Machine Learning which combine multiple trees to create an overall ensemble model.

Desired skill level

This project requires some knowledge of Python

Beginner: If you’re curious about the topic, you can learn by reading the code and contribute by doing code reviews, helping us to structure the project better, improve documentation, fix variable names etc. Feel free to dip you toes in, the water’s fine!

Intermediate: If you’ve experience with Data Visualization, there’s good scope for that in this project 🙂

Advanced: Some familiarity with Machine Learning and Feature Engineering would be great.

Abhinav Sharma
Author: Abhinav Sharma

I'm an engineer by profession and quite curious about Bioinformatics 🙂

Categories: Project

Abhinav Sharma

Abhinav Sharma

I'm an engineer by profession and quite curious about Bioinformatics :)


Rachida Namoune

Rachida Namoune · September 21, 2020 at 8:30 pm

I am interested at joining the team but i don’t have any exprience in python coding yet
Thank you

    Abhinav Sharma

    Abhinav Sharma · September 22, 2020 at 4:35 am

    Hi Rano,

    Thanks for considering this project!

    Sure, there’s still plenty you can contribute, for example right now the Jupyter notebooks are in a bit of disarray, you can help us give a proper structure to the repository like shown here

    Let me know if this seems interesting, ping me in the Project team room 🙂


Morumda Daji

Morumda Daji · September 27, 2020 at 5:27 am

I will love to join this project team but I dont have any idea on python programming


    admin_wingett · November 3, 2020 at 9:55 pm

    Thank you for your message, but the hackathon finished before you made contact with us. Hopefully you can take part next time.

    ALl the best.

Leave a Reply

Your email address will not be published. Required fields are marked *