Speaker
Description
BONSAI is an automated LCI background database. It takes open data that is collected via APIs or scraped from websites, cleans and harmonizes it, and runs it through a multi-step workflow to produce an open LCI dataset. First, we build a multi-national monetary supply and use table (MSUT). Using production volumes and other auxiliary data, we disaggregate the data and add physical layers. This is then used to construct and balance a hybrid input-output model, from which we derive LCI coefficients.
All of this is open source and open data, and available at bonsai.uno. The goal is not just to produce usable LCI data for LCA practitioners, but also to provide intermediate outputs that may be relevant for other communities working with open data and sustainability.
In this talk, I want to give an insight into how we’ve approached the development of BONSAI over the past five years: what worked well, what didn’t, and what we’d do differently if we were to start again. I’ll briefly outline our tech stack (Python/Django, Airflow, PostgreSQL) and then focus on the challenges that may be relevant to others: how we approached full traceability, how we managed modeling decisions under uncertainty, and what lessons we learned about collaboration and development workflows. Hopefully, this talk can help others avoid similar pitfalls, spark ideas, or even lead to future collaborations.
The BONSAI project is still evolving, and many parts of the pipeline could benefit from community input, whether in the form of data contributions, technical development, or modeling discussions. One of our goals is to lower the entry barrier and make it easier for new contributors to join, especially those who bring expertise from related open data or sustainability projects. We’d like to share our experience not just to inform, but to start a conversation on how to collaboratively build open and reliable infrastructure for sustainability research.
How much time do you ideally wish for your contribution? | 40 minutes |
---|