Abstract
The retrieval augmented generation (RAG) system such as Retro has been shownto improve language modeling capabilities and reduce toxicity andhallucinations by retrieving from a database of non-parametric memorycontaining trillions of entries. We introduce Retro-li that shows retrieval canalso help using a small-scale database, but it demands more accurate and betterneighbors when searching in a smaller hence sparser non-parametric memory. Thiscan be met by using a proper semantic similarity search. We further proposeadding a regularization to the non-parametric memory for the first time: itsignificantly reduces perplexity when the neighbor search operations are noisyduring inference, and it improves generalization when a domain shift occurs. Wealso show that Retro-li's non-parametric memory can potentially be implementedon analog in-memory computing hardware, exhibiting O(1) search time whilecausing noise in retrieving neighbors, with minimal (<1%) performance loss. Ourcode is available at:https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.