GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models

  • 2024-12-11 13:56:40
  • Shuyang Hou, Jianyuan Liang, Anqi Zhao, Huayi Wu
  • 0

Abstract

As the scale and complexity of spatiotemporal data continue to grow rapidly,the use of geospatial modeling on the Google Earth Engine (GEE) platformpresents dual challenges: improving the coding efficiency of domain experts andenhancing the coding capabilities of interdisciplinary users. To address thesechallenges and improve the performance of large language models (LLMs) ingeospatial code generation tasks, we propose a framework for building ageospatial operator knowledge base tailored to the GEE JavaScript API. Thisframework consists of an operator syntax knowledge table, an operatorrelationship frequency table, an operator frequent pattern knowledge table, andan operator relationship chain knowledge table. By leveraging Abstract SyntaxTree (AST) techniques and frequent itemset mining, we systematically extractoperator knowledge from 185,236 real GEE scripts and syntax documentation,forming a structured knowledge base. Experimental results demonstrate that theframework achieves over 90% accuracy, recall, and F1 score in operatorknowledge extraction. When integrated with the Retrieval-Augmented Generation(RAG) strategy for LLM-based geospatial code generation tasks, the knowledgebase improves performance by 20-30%. Ablation studies further quantify thenecessity of each knowledge table in the knowledge base construction. This workprovides robust support for the advancement and application of geospatial codemodeling techniques, offering an innovative approach to constructingdomain-specific knowledge bases that enhance the code generation capabilitiesof LLMs, and fostering the deeper integration of generative AI technologieswithin the field of geoinformatics.