J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain

Abstract

As the scale and capabilities of Large Language Models (LLMs) increase, theirapplications in knowledge-intensive fields such as legal domain have garneredwidespread attention. However, it remains doubtful whether these LLMs makejudgments based on domain knowledge for reasoning. If LLMs base their judgmentssolely on specific words or patterns, rather than on the underlying logic ofthe language, the ''LLM-as-judges'' paradigm poses substantial risks in thereal-world applications. To address this question, we propose a method of legalknowledge injection attacks for robustness testing, thereby inferring whetherLLMs have learned legal knowledge and reasoning logic. In this paper, wepropose J&H: an evaluation framework for detecting the robustness of LLMs underknowledge injection attacks in the legal domain. The aim of the framework is toexplore whether LLMs perform deductive reasoning when accomplishing legaltasks. To further this aim, we have attacked each part of the reasoning logicunderlying these tasks (major premise, minor premise, and conclusiongeneration). We have collected mistakes that legal experts might make injudicial decisions in the real world, such as typos, legal synonyms, inaccurateexternal legal statutes retrieval. However, in real legal practice, legalexperts tend to overlook these mistakes and make judgments based on logic.However, when faced with these errors, LLMs are likely to be misled bytypographical errors and may not utilize logic in their judgments. We conductedknowledge injection attacks on existing general and domain-specific LLMs.Current LLMs are not robust against the attacks employed in our experiments. Inaddition we propose and compare several methods to enhance the knowledgerobustness of LLMs.