Towards LLM Agents for Earth Observation

Abstract

Earth Observation (EO) provides critical planetary data for environmentalmonitoring, disaster management, climate science, and other scientific domains.Here we ask: Are AI systems ready for reliable Earth Observation? We introduce\datasetnamenospace, a benchmark of 140 yes/no questions from NASA EarthObservatory articles across 13 topics and 17 satellite sensors. Using GoogleEarth Engine API as a tool, LLM agents can only achieve an accuracy of 33%because the code fails to run over 58% of the time. We improve the failure ratefor open models by fine-tuning synthetic data, allowing much smaller models(Llama-3.1-8B) to achieve comparable accuracy to much larger ones (e.g.,DeepSeek-R1). Taken together, our findings identify significant challenges tobe solved before AI agents can automate earth observation, and suggest pathsforward. The project page is available athttps://iandrover.github.io/UnivEarth.