Rethinking Graph Structure Learning in the Era of LLMs

Abstract

Recently, the emergence of large language models (LLMs) has promptedresearchers to explore the integration of language descriptions into graphs,aiming to enhance model encoding capabilities from a data-centric perspective.This graph representation is called text-attributed graphs (TAGs). A review ofprior advancements highlights that graph structure learning (GSL) is a pivotaltechnique for improving data utility, making it highly relevant to efficientTAG learning. However, most GSL methods are tailored for traditional graphswithout textual information, underscoring the necessity of developing a new GSLparadigm. Despite clear motivations, it remains challenging: (1) How can wedefine a reasonable optimization objective for GSL in the era of LLMs,considering the massive parameters in LLM? (2) How can we design an efficientmodel architecture that enables seamless integration of LLM for thisoptimization objective? For Question 1, we reformulate existing GSLoptimization objectives as a tree optimization framework, shifting the focusfrom obtaining a well-trained edge predictor to a language-aware tree sampler.For Question 2, we propose decoupled and training-free model design principlesfor LLM integration, shifting the focus from computation-intensive fine-tuningto more efficient inference. Based on this, we propose Large Language and TreeAssistant (LLaTA), which leverages tree-based LLM in-context learning toenhance the understanding of topology and text, enabling reliable inference andgenerating improved graph structure. Extensive experiments on 10 TAG datasetsdemonstrate that LLaTA enjoys flexibility - incorporated with any backbone;scalability - outperforms other LLM-based GSL methods in terms of runningefficiency; effectiveness - achieves SOTA performance.