LLM-Driven Multi-step Translation from C to Rust using Static Analysis

Abstract

Translating software written in legacy languages to modern languages, such asC to Rust, has significant benefits in improving memory safety whilemaintaining high performance. However, manual translation is cumbersome,error-prone, and produces unidiomatic code. Large language models (LLMs) havedemonstrated promise in producing idiomatic translations, but offer nocorrectness guarantees as they lack the ability to capture all the semanticsdifferences between the source and target languages. To resolve this issue, wepropose SACTOR, an LLM-driven C-to-Rust zero-shot translation tool using atwo-step translation methodology: an "unidiomatic" step to translate C intoRust while preserving semantics, and an "idiomatic" step to refine the code tofollow Rust's semantic standards. SACTOR utilizes information provided bystatic analysis of the source C program to address challenges such as pointersemantics and dependency resolution. To validate the correctness of thetranslated result from each step, we use end-to-end testing via the foreignfunction interface to embed our translated code segment into the original code.We evaluate the translation of 200 programs from two datasets and two casestudies, comparing the performance of GPT-4o, Claude 3.5 Sonnet, Gemini 2.0Flash, Llama 3.3 70B and DeepSeek-R1 in SACTOR. Our results demonstrate thatSACTOR achieves high correctness and improved idiomaticity, with thebest-performing model (DeepSeek-R1) reaching 93% and (GPT-4o, Claude 3.5,DeepSeek-R1) reaching 84% correctness (on each dataset, respectively), whileproducing more natural and Rust-compliant translations compared to existingmethods.