#8 AccuroLab & Data for Good : fighting misinformation about covid-19

Data for Good
6 min readMar 23, 2021
Logo Accurolab

Article rédigé par Fantine Monot

Finally, to conclude this series of articles dedicated to the great projects of Data For Good season 8, we are happy to present you : AccuroLab. Accurolab aims to help people validate the accuracy of COVID-19 related information they receive to stop the spread of false information.

A big thanks to Sean Anggani (AccuroLab project leader for Data For Good & Fullstack Software Engineer), Emmanuel Blondel (Data For Good volunteer & data scientist in a Parisian Hospital), Ien Tcho Ly (Data For Good volunteer & product manager with a passion for data science) and Grace Gimont Betancourt (Data For Good volunteer & data scientist) for their contribution.

  1. Let’s start with the beginning, what is Accurolab ?

Sean (project leader): Accurolab was launched to fight fake-news regarding Covid-19 by providing an easy way for people to verify the information they encounter on social media, through a Question & Answer system on WhatsApp. In practice, the person opens a WhatsApp conversation with our robot and asks his question. Our conversational agent answers and gives him reliable sources if he wants to go further. You can watch the demo by clicking right here.

This project was born during the MIT hackathon “Africa Takes on Covid-19”, with Steve Tchuenté, Tarik Fathallah and Marilyn Osei (the founders), who were worried about the “infodemic” going hand-in-hand with the pandemic. A rather striking example: the advice to drink bleach to kill the virus…

2. Why did you choose to partner with Data For Good ?

Sean (project leader): Well, at the beginning our mechanism to understand the context of a question was static and based on keywords search, i.e. if you ask, “Do mosquitoes transmit covid-19?”, we would identify “mosquitoes” as a keyword, and tell you “There’s no evidence that mosquitoes transmit covid”. But if I ask, “do mosquitoes cause malaria?”, it will give the same message, which is completely irrelevant! To upgrade from this limited model into a valuable tool for the public, we needed expert data scientists to understand the viability of a Natural Language Processing (NLP) model and create an accessible tool that understands questions and can give fact-checked answers — which is where Data For Good comes in. The objective of this 3-month challenge was to move from a static model to a dynamic NLP one.

Not only does Accurolab align with Data For Good’s values of helping projects with a positive social impact, Data For Good has a proven track record and a community of brilliant data scientists who are eager to make a change

Grace (DFG volunteer): For my part, I decided to embark on this adventure first because I wanted to learn more about NLP and because around me, I saw some of my family members falling into the trap of fake-news about Covid-19.

Tcho (DFG volunteer): And to complete Grace’s answer, we decided to stay involved in the project because of the really great team!

3. And so, what is the verdict? :)

Sean (project leader): Coming into Data For Good, Accurolab wanted to understand how far this solution can go. We have a vision of fighting the infodemic by:

1) Building a dynamic scientific database of accurate information about covid-19 that must be automatically updated with the latest statistics, information, articles and so on covering all topics related to the pandemic — to be able to answer all the questions people might have with relevant information.

2) Creating a dynamic NLP model capable of understanding the user’s question to provide an accurate and relevant answer. To give you an example of the level of difficulty, we once asked our conversational agent the following question: “what is the mortality rate of covid-19?” and we were quite surprised (and worried) when we read the answer saying: ”11%” — fortunately, it is not! Finally, we understood why the machine had given us this figure: it was quoted from an article about the consequences of covid-19 for people with kidney problems. That is why it is important that the machine also takes the whole context into account.

To be transparent, during this season with Data For Good, we faced a major difficulty: finding scientifically valid sources of information on Covid-19 that are regularly updated. Indeed, as we can see, in the face of this unprecedented situation, information is groping around: it is ambivalent, evolving and sometimes contradictory. Even within the scientific community, points of view are very diverse, and sometimes even opposed…

At the moment, our model understands the user’s question, but because our data sources are limited for the reasons explained above, it is able to provide an accurate answer for less than 10% of queries. (We have a relevance calculation score that measures the relevance for a given question. If the score is less than 85%, we choose not to provide an answer).

So, the whole team is currently working on the future shape of the project, taking these constraints into account. Of course, if you are interested or have an idea, everyone is welcome: you can write to us at accurolab@gmail.com !

Tcho (DFG volunteer): Actually, at the beginning our goal was to create a conversational agent or chatbot that would learn in an unsupervised manner — the dream. Then, the project evolved toward a more realistic goal: to create the most useful conversational agent possible that would function with the minimum of human input, i.e. a machine which understands when a given question corresponds to a given stored answer — a non-trivial challenge in itself when the same question can be expressed in so many ways!

Grace (DFG volunteer): Yes, and in 3 months you can only go in one direction. We decided to work on an NLP model — a RoBERTa base model pretrained on Stanford Question Answering dataset — but in the end we realized that it was not the most suitable for our project. But we are really motivated to try a new one now! Several of us want to continue working on the project even though season 8 is over.

Emmanuel (DFG volunteer): Indeed, if there is one big lesson I learnt during this session, it is not to follow the trend of models, not to be influenced by the mainstream. You really have to check if the model meets your specific issues. The implementation of the model, with the mathematical part, is really important. To be more precise, we worked with the following technos: Python as a language, Haystack with ElasticSearch as an NLP toolkit, JS and Mozilla Readability library for data gathering, and AWS and Docker for deployment.

Tcho (DFG volunteer): I would also like to point out that the Google NLP model we worked with was published fairly recently, in 2018, so we are really dealing with state-of-the-art models — which is super interesting. NLP is a real playground and evolving very rapidly.

Grace (DFG volunteer): We learned a lot, both technically as Tcho and Emmanuel said, but also in terms of management. First, we had to work with different time slots as some members are based in the USA, which left us a small window to meet, and secondly, we did everything remotely. To conclude, it was a very pleasant experience and I’m looking forward to hearing about the projects for season 9!

[The launch of the 9th season of Data For Good will take place virtually on Saturday 27 March from 10am to 1pm. More information here]

If you are interested in Accurolab’s mission, you can contact them at this address : accurolab@gmail.com

Because it’s Data For Good, it’s open source, here is the GitHub: https://github.com/dataforgoodfr/batch8_accurolab

French version is coming soon !

--

--

Data for Good

Data For Good est une communauté de data scientists bénévoles mettant leurs compétences au profit de la résolution de problèmes sociaux.