Functional annotation of lncRNA based on their Cis- and Trans interactions

Background  Long non coding RNA are emerging transcriptional species that are increasingly gaining relevance due to their involvement in biological processes. Functionally, lncRNA are known perform regulatory tasks through (i) signal transduction, (ii) sponge formation with microRNA, (iii) protein translocation and guide, (iv) scaffold for molecule assembly and recently (v) Read more…

Compiling Life Sciences Training Datasets

IntroductionThe Babraham Bioinformatics group has compiled numerous life sciences datasets with a view to using these in future training courses.  These datasets have also been made publicly available to assist other researchers and students with learning data analysis. The training materials are stored in a GitHub repository at: There is Read more…

Cambridge-India openVirus

Peter Murray-Rust, (Chemistry), Gita Yadav (Plant Sciences) and interns in India

openVirus is a team of Young Indian Scientists who have built tools to mine the scientific literature for new insights into Viral Epidemics.
Solutions to COVID may be lying in the literature of previous epidemics or the vast new output of COVID papers. The project has many facets and is very suitable for anyone interested in extracting and analysing masses of scientific articles.

Our facets (X in “viral epdemics and X”) include:

  • what countries are epidemics reported in?
  • what drugs are used?
  • what comorbidities occur
  • who funds research into viruses?
  • what viruses are involved?
  • what is the role of zoonosis (animal hosts)
  • who reports Test and Trace strategies
  • what non-pharma interventions are used (quarantine, social distancing, masks)

We build “minicorpora” for all of these using EuropePMC, and ontologies using Wikidata.

Among the skills that delegates can learn without previous programming

  • repositories (EuropePMC) and searching (including REST)
  • creation of ontologies (dictionaries) using Wikidata and SPARQL
  • Dockerised containers
  • Jupyter notebooks

A mini-review can be carried out in 2-3 hours.

If you’re interested in developing technology (probably scripting – R, Python, KNIME) we’d love contributions on

  • text-based search (Lucene)
  • Natural Language Processing (nltk, OpenNLP)
  • data display (e.g. matplotlib, D3.js)
  • Machine Learning (Keras, word2vec)
  • multilingual documents (Hindi, Urdu, Tamil, and Portuguese / Spanish – we have a collaboration with Redalyc repository in Latin America)

There is extensive documentation and there will be project members available for the working day (up to say 1700 BST, 2130 India Standard Time and PMR till later in UK).

overview slides at: