The Web-based Systems Group at the University of Mannheim conducts research on methods for integrating data from large numbers of data sources in the context of the open Web and in corporate data lakes. Our research includes areas such as entity matching, schema matching, table annotation, information extraction, and data discovery. Our current work focuses on utilizing large language models and LLM-based agents for data integration tasks. We apply the developed methods to integrate product data from large numbers of e-shops and to construct knowledge graphs such as DBpedia. The empirical research of the group includes monitoring the adoption of schema.org annotations on the public Web by regularly extracting structured data from large Web corpora.
Web-based Systems Group @ University of Mannheim
Pinned Loading
Repositories
- Automatic-data-labeling Public
This repository contains the code and data for the paper 'Labeling Training Data for Entity Matching Using Large Language Models' which explores knowledge distillation for entity matching, e.g. using LLMs to label training pairs that are subsequently used to train a smaller student model, such as DITTO using RoBERTa.
wbsg-uni-mannheim/Automatic-data-labeling’s past year of commit activity - PyDI Public
The PyDI framework provides methods for end-to-end data integration. The framework covers all steps of the integration process, including schema matching, data translation, entity matching, and data fusion. The framework offers traditional string-based methods as well as modern LLM- and embedding-based techniques for these tasks.
wbsg-uni-mannheim/PyDI’s past year of commit activity - MaDI-Bench Public
The Mannheim Data Integration Benchmark (MaDI-Bench) is an end-to-end benchmark for tabular data integration. It provides integration tasks across five domains, covering schema matching, value normalization, entity matching, and data fusion. It supports difficulty variants and measuring step-wise as well as end-to-end performance.
wbsg-uni-mannheim/MaDI-Bench’s past year of commit activity - automatic-data-integration Public
This repository contains the code and case study data for the paper: Automatic End-to-End Data Integration using Large Language Models.
wbsg-uni-mannheim/automatic-data-integration’s past year of commit activity - WebMall Public
This repository contains the code and data of the WebMall benchmark for evaluating the capability of Web agents to find and compare product offers from multiple e-shops.
wbsg-uni-mannheim/WebMall’s past year of commit activity - WebMall-Interfaces Public
Modern LLM agents interact with the web through various architectures - from traditional browser automation to API-based approaches. This project provides implementation and evaluation code to systematically compare their effectiveness across 91 realistic e-commerce scenarios.
wbsg-uni-mannheim/WebMall-Interfaces’s past year of commit activity - winter Public Forked from olehmberg/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
wbsg-uni-mannheim/winter’s past year of commit activity - AgentLab Public Forked from ServiceNow/AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
wbsg-uni-mannheim/AgentLab’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…