Japan Construction Cost Database: An Open Dataset for LLM-Based Cost Estimation and Fraud Detection in Residential Renovation
DOI:
https://doi.org/10.31224/7007Keywords:
construction cost estimation, fraud detection, open dataset, large language models, Japan, consumer protection, renovation pricingAbstract
We introduce the Japan Construction Cost Database (JCCDB), an openly available structured dataset linking residential construction plan-level pricing, contractor-size-tier margin ranges, and fraud-detection patterns for the Japanese market. The dataset comprises 87 construction plans across 7 renovation categories (roof construction, termite control, water heater replacement, window renovation, bathroom renovation, kitchen renovation, and electrical work), annotated with 88 fraud-detection patterns categorized by severity. Pricing is derived from the HORIZON SHIELD (HS) Rule, a transparent cost formula grounded in 30 years of hands-on construction management experience combined with 2026 Kanto-region trade-price data. Each plan is stratified across four contractor-size tiers (sole proprietor to major national chain) with empirically derived margin ranges drawn from Ministry of Land, Infrastructure, Transport and Tourism (MLIT) industry analysis. We further describe KIRA, an LLM-based construction cost diagnostic system built on this dataset, and discuss its architecture and alignment with HS Rule pricing. The dataset is released under CC-BY 4.0 and is intended to support NLP/LLM research, consumer-protection studies, and cross-country comparative construction cost analysis.
Downloads
Downloads
Posted
License
Copyright (c) 2026 Toshikatsu Oga

This work is licensed under a Creative Commons Attribution 4.0 International License.