Brightcon 2025, hackathon & courses in Grenoble and online

Name: Brightcon 2025, hackathon & courses in Grenoble and online
Start: 2025-10-12T12:00:00+02:00
End: 2025-10-17T18:00:00+02:00
Location: CEA Grenoble

12–17 Oct 2025

CEA Grenoble

Europe/Zurich timezone

Contact

brightcon@d-d-s.ch

From PDF to JSON-LD: An Automated n8n–AI Pipeline for Rapid, Validated LCA Inventory Creation

16 Oct 2025, 11:05

15m

CEA Grenoble

Presentation open data T2: Data - backbone of LCAs

Mr Luis Trujillo (Universidad Autónoma Metropolitana)

This work develops an end-to-end, low-code workflow that automatically converts heterogeneous documents—reports, theses and open-data PDFs—into high-quality, machine-readable Life Cycle Assessment (LCA) inventories. Built on the open-source orchestrator n8n, the pipeline (i) prunes non-informative pages with custom Python APIs, (ii) extracts text while preserving tables through the free LLMWhisperer API, (iii) enriches content via Deepseek LLM nodes for domain-specific entity recognition, (iv) normalises and annotates data with JavaScript routines to yield standards-compliant JSON-LD, and (v) uploads the output to a MySQL-backed Mexican LCA web platform. A supervised-validation layer—combining rule-based checks and expert review—assigns data-quality scores, ensuring transparent provenance before database ingestion. Tested on twenty academic theses, the system cut manual curation time by 80 % and produced consistent inventories in under five minutes per document; metadata enrichment improved downstream query performance by ~30 % compared with hand-curated entries. By eliminating repetitive tasks, enforcing schema uniformity and providing a direct bridge from source document to live database, the workflow accelerates the population of national and international LCA repositories and supports rapid creation of efficient datasets and databases. Key challenges include processing poorly scanned files, harmonising domain-specific nomenclature and scaling the validation module, while opportunities lie in multilingual expansion, uncertainty quantification and broader integration with circular-economy datasets. Overall, coupling open-source automation, advanced language models and supervised quality assurance offers a replicable blueprint for reliable, rapid LCA data generation that lowers barriers for researchers and strengthens global open-data infrastructure.

How much time do you ideally wish for your contribution?	15 minutes

Alejandro Padilla-Rivera (Instituto de Ingeniería- Universidad Nacional Autónoma de México) Mr Luis Trujillo (Universidad Autónoma Metropolitana) Mr Marcos Nolasco (Universidad Autónoma Metropolitana)

Mr Ivan Vásquez (Universidad Nacional Autónoma de México) Dr Patricia Güereca-Hernández (Instituto de Ingeniería- Universidad Nacional Autónoma de México)

There are no materials yet.

Brightcon 2025, hackathon & courses in Grenoble and online

Contact

From PDF to JSON-LD: An Automated n8n–AI Pipeline for Rapid, Validated LCA Inventory Creation

CEA Grenoble

Speaker

Description

Authors

Co-authors

Presentation materials

Choose timezone

Brightcon 2025, hackathon & courses in Grenoble and online

Contact

Speaker

Description

Authors

Co-authors

Presentation materials