Preprint / Version 1

Chart Understanding with Large Language Model

##article.authors##

  • John Feng Virginia Tech

DOI:

https://doi.org/10.31224/3401

Abstract

Chart visualization plays a pivotal role in conveying information efficiently, enabling humans to rapidly discern critical insights from well-crafted visuals. Yet, for complex charts, such as stock market trends, or those aiding understanding for low vision users, drawing meaningful interpretations—especially for novices—can be particularly challenging. While humans may struggle to assimilate historical patterns at a glance, large multimodal models (LMMs), trained on extensive text corpora and charts, have the potential to provide enhanced guidance. Leveraging these models can empower users to understand complex chart patterns with deeper contextual insights. While successful in the general domains, such open-source LMMs are less effective for chart images because chart image understanding is tremendously different from natural scene image understanding. Contrasting with natural scene images, which primarily contain objects and reflect their spatial relationships, chart images contain unique abstract elements (such as flow diagrams, trend lines, color-coded legends, etc.) that convey specific data-related information. The behind of the their poor performance is because of lacking domain-specific training essential for chart understanding. In this project, we introduce a baseline multimodal model that integrates text and charts to enhance the chart comprehension capabilities of existing models, offering more pertinent insights and information related to the depicted charts. The input of our model will be the chart images and human questions. The output will be the answers generated by our model.

Downloads

Download data is not yet available.

Downloads

Posted

2023-12-12