What it is
The DHS AI/ML Toolkit is a suite of interoperable tools for scalable, explainable AI workflows on Demographic and Health Surveys (DHS) data— from ingestion and transformation to modeling, visualization, and policy use.
Open research · Community tools
Interoperable, explainable AI workflows for Demographic and Health Surveys—built for reproducibility, open science, and researchers everywhere.
We share methods and software so graduate students, researchers, and organizations can work with DHS data in a standardized, reproducible way.
The DHS AI/ML Toolkit is a suite of interoperable tools for scalable, explainable AI workflows on Demographic and Health Surveys (DHS) data— from ingestion and transformation to modeling, visualization, and policy use.
Inspired by a 2020 study by Bitew et al. on the potential of data science and ML for DHS data, we launched this initiative to promote reproducible frameworks for students and researchers.
Whether you are building child mortality risk models, spatial dashboards, or Bayesian systems, these components support data-driven development in low-resource and research settings—child mortality, spatial work, and explainable inference included.
Toolkit components
Each item links to more detail in our Newsletter archive—jump in where your work begins.
A Python toolkit and CLI (storyteller-dhs) that turns DHS databases into an interactive Datasette experience with reusable workflows for querying,
exports, and full-text search — designed as a storytelling companion for the DHS AI/ML Toolkit.
Converts DHS datasets into clean CSV tables that can be stored in relational databases such as SQLite and PostgreSQL. It simplifies the data engineering process and makes DHS survey data ready for analysis and model training.
An Explainable AI (XAI) system for Bayesian modeling and analysis of DHS data. It supports modeling under-5 mortality risks across clusters with features like uncertainty estimation and Bayesian diagnostics.
A DHS-based machine learning model that predicts the survival of children under 5 years across five countries in Africa with more than 95% accuracy. Its predictions feed into the FLOWER dashboard for visual interpretation.
A model designed to predict the survival status of a child using 36 variables. It trains on DHS data and includes a detailed workflow for model evaluation and classification accuracy.
An R-based application that uses survival analysis techniques such as Kaplan–Meier and Cox regression to estimate under-five mortality risks across different countries and genders using DHS data.
A predictive model for estimating the survival of children under 5 using household features, spatial indicators, and survey metadata. Offers tools to explore feature relevance and country comparisons.
DHS AI Genesis was our initial step toward building a platform that applies data science and machine learning to DHS data. It originated as a space to demonstrate how machine learning algorithms can be used to extract insights from household survey data. Genesis marked the beginning of our broader work on the DHS AI/ML Toolkit and remains a reference point for researchers interested in this field.