Retrieval-Augmented Era (RAG) is an strategy to constructing AI methods that mixes a language mannequin with an exterior data supply. In easy phrases, the AI first searches for related paperwork (like articles or webpages) associated to a consumer’s question, after which makes use of these paperwork to generate a extra correct reply. This methodology has been celebrated for serving to giant language fashions (LLMs) keep factual and scale back hallucinations by grounding their responses in actual information.
Intuitively, one would possibly suppose that the extra paperwork an AI retrieves, the higher knowledgeable its reply will probably be. Nonetheless, current analysis suggests a shocking twist: with regards to feeding data to an AI, typically much less is extra.
Fewer Paperwork, Higher Solutions
A new examine by researchers on the Hebrew College of Jerusalem explored how the quantity of paperwork given to a RAG system impacts its efficiency. Crucially, they stored the full quantity of textual content fixed – which means if fewer paperwork had been supplied, these paperwork had been barely expanded to fill the identical size as many paperwork would. This fashion, any efficiency variations could possibly be attributed to the amount of paperwork reasonably than merely having a shorter enter.
The researchers used a question-answering dataset (MuSiQue) with trivia questions, every initially paired with 20 Wikipedia paragraphs (just a few of which truly comprise the reply, with the remainder being distractors). By trimming the variety of paperwork from 20 down to simply the two–4 actually related ones – and padding these with a bit of additional context to keep up a constant size – they created situations the place the AI had fewer items of fabric to contemplate, however nonetheless roughly the identical whole phrases to learn.
The outcomes had been placing. Usually, the AI fashions answered extra precisely after they got fewer paperwork reasonably than the complete set. Efficiency improved considerably – in some cases by as much as 10% in accuracy (F1 rating) when the system used solely the handful of supporting paperwork as a substitute of a big assortment. This counterintuitive increase was noticed throughout a number of totally different open-source language fashions, together with variants of Meta’s Llama and others, indicating that the phenomenon will not be tied to a single AI mannequin.
One mannequin (Qwen-2) was a notable exception that dealt with a number of paperwork with out a drop in rating, however virtually all of the examined fashions carried out higher with fewer paperwork general. In different phrases, including extra reference materials past the important thing related items truly harm their efficiency extra usually than it helped.

Supply: Levy et al.
Why is that this such a shock? Sometimes, RAG methods are designed below the belief that retrieving a broader swath of knowledge can solely assist the AI – in spite of everything, if the reply isn’t within the first few paperwork, it is perhaps within the tenth or twentieth.
This examine flips that script, demonstrating that indiscriminately piling on additional paperwork can backfire. Even when the full textual content size was held fixed, the mere presence of many various paperwork (every with their very own context and quirks) made the question-answering process more difficult for the AI. It seems that past a sure level, every extra doc launched extra noise than sign, complicated the mannequin and impairing its capability to extract the proper reply.
Why Much less Can Be Extra in RAG
This “much less is extra” end result is smart as soon as we contemplate how AI language fashions course of data. When an AI is given solely probably the most related paperwork, the context it sees is targeted and freed from distractions, very like a scholar who has been handed simply the correct pages to check.
Within the examine, fashions carried out considerably higher when given solely the supporting paperwork, with irrelevant materials eliminated. The remaining context was not solely shorter but in addition cleaner – it contained details that straight pointed to the reply and nothing else. With fewer paperwork to juggle, the mannequin might dedicate its full consideration to the pertinent data, making it much less prone to get sidetracked or confused.
Then again, when many paperwork had been retrieved, the AI needed to sift by means of a mixture of related and irrelevant content material. Usually these additional paperwork had been “related however unrelated” – they may share a subject or key phrases with the question however not truly comprise the reply. Such content material can mislead the mannequin. The AI would possibly waste effort attempting to attach dots throughout paperwork that don’t truly result in an accurate reply, or worse, it’d merge data from a number of sources incorrectly. This will increase the chance of hallucinations – cases the place the AI generates a solution that sounds believable however will not be grounded in any single supply.
In essence, feeding too many paperwork to the mannequin can dilute the helpful data and introduce conflicting particulars, making it more durable for the AI to resolve what’s true.
Curiously, the researchers discovered that if the additional paperwork had been clearly irrelevant (for instance, random unrelated textual content), the fashions had been higher at ignoring them. The actual bother comes from distracting information that appears related: when all of the retrieved texts are on related matters, the AI assumes it ought to use all of them, and it might wrestle to inform which particulars are literally essential. This aligns with the examine’s commentary that random distractors triggered much less confusion than real looking distractors within the enter. The AI can filter out blatant nonsense, however subtly off-topic data is a slick lure – it sneaks in below the guise of relevance and derails the reply. By decreasing the variety of paperwork to solely the actually needed ones, we keep away from setting these traps within the first place.
There’s additionally a sensible profit: retrieving and processing fewer paperwork lowers the computational overhead for a RAG system. Each doc that will get pulled in needs to be analyzed (embedded, learn, and attended to by the mannequin), which makes use of time and computing assets. Eliminating superfluous paperwork makes the system extra environment friendly – it could discover solutions quicker and at decrease price. In situations the place accuracy improved by specializing in fewer sources, we get a win-win: higher solutions and a leaner, extra environment friendly course of.

Supply: Levy et al.
Rethinking RAG: Future Instructions
This new proof that high quality usually beats amount in retrieval has essential implications for the way forward for AI methods that depend on exterior data. It means that designers of RAG methods ought to prioritize good filtering and rating of paperwork over sheer quantity. As an alternative of fetching 100 potential passages and hoping the reply is buried in there someplace, it might be wiser to fetch solely the highest few extremely related ones.
The examine’s authors emphasize the necessity for retrieval strategies to “strike a stability between relevance and variety” within the data they provide to a mannequin. In different phrases, we need to present sufficient protection of the subject to reply the query, however not a lot that the core details are drowned in a sea of extraneous textual content.
Transferring ahead, researchers are prone to discover strategies that assist AI fashions deal with a number of paperwork extra gracefully. One strategy is to develop higher retriever methods or re-rankers that may determine which paperwork actually add worth and which of them solely introduce battle. One other angle is bettering the language fashions themselves: if one mannequin (like Qwen-2) managed to deal with many paperwork with out shedding accuracy, analyzing the way it was educated or structured might supply clues for making different fashions extra sturdy. Maybe future giant language fashions will incorporate mechanisms to acknowledge when two sources are saying the identical factor (or contradicting one another) and focus accordingly. The purpose can be to allow fashions to make the most of a wealthy number of sources with out falling prey to confusion – successfully getting the perfect of each worlds (breadth of knowledge and readability of focus).
It’s additionally value noting that as AI methods acquire bigger context home windows (the power to learn extra textual content directly), merely dumping extra information into the immediate isn’t a silver bullet. Larger context doesn’t mechanically imply higher comprehension. This examine reveals that even when an AI can technically learn 50 pages at a time, giving it 50 pages of mixed-quality data could not yield a superb end result. The mannequin nonetheless advantages from having curated, related content material to work with, reasonably than an indiscriminate dump. Actually, clever retrieval could develop into much more essential within the period of big context home windows – to make sure the additional capability is used for useful data reasonably than noise.
The findings from “Extra Paperwork, Similar Size” (the aptly titled paper) encourage a re-examination of our assumptions in AI analysis. Typically, feeding an AI all the info now we have will not be as efficient as we predict. By specializing in probably the most related items of knowledge, we not solely enhance the accuracy of AI-generated solutions but in addition make the methods extra environment friendly and simpler to belief. It’s a counterintuitive lesson, however one with thrilling ramifications: future RAG methods is perhaps each smarter and leaner by fastidiously selecting fewer, higher paperwork to retrieve.
Retrieval-Augmented Era (RAG) is an strategy to constructing AI methods that mixes a language mannequin with an exterior data supply. In easy phrases, the AI first searches for related paperwork (like articles or webpages) associated to a consumer’s question, after which makes use of these paperwork to generate a extra correct reply. This methodology has been celebrated for serving to giant language fashions (LLMs) keep factual and scale back hallucinations by grounding their responses in actual information.
Intuitively, one would possibly suppose that the extra paperwork an AI retrieves, the higher knowledgeable its reply will probably be. Nonetheless, current analysis suggests a shocking twist: with regards to feeding data to an AI, typically much less is extra.
Fewer Paperwork, Higher Solutions
A new examine by researchers on the Hebrew College of Jerusalem explored how the quantity of paperwork given to a RAG system impacts its efficiency. Crucially, they stored the full quantity of textual content fixed – which means if fewer paperwork had been supplied, these paperwork had been barely expanded to fill the identical size as many paperwork would. This fashion, any efficiency variations could possibly be attributed to the amount of paperwork reasonably than merely having a shorter enter.
The researchers used a question-answering dataset (MuSiQue) with trivia questions, every initially paired with 20 Wikipedia paragraphs (just a few of which truly comprise the reply, with the remainder being distractors). By trimming the variety of paperwork from 20 down to simply the two–4 actually related ones – and padding these with a bit of additional context to keep up a constant size – they created situations the place the AI had fewer items of fabric to contemplate, however nonetheless roughly the identical whole phrases to learn.
The outcomes had been placing. Usually, the AI fashions answered extra precisely after they got fewer paperwork reasonably than the complete set. Efficiency improved considerably – in some cases by as much as 10% in accuracy (F1 rating) when the system used solely the handful of supporting paperwork as a substitute of a big assortment. This counterintuitive increase was noticed throughout a number of totally different open-source language fashions, together with variants of Meta’s Llama and others, indicating that the phenomenon will not be tied to a single AI mannequin.
One mannequin (Qwen-2) was a notable exception that dealt with a number of paperwork with out a drop in rating, however virtually all of the examined fashions carried out higher with fewer paperwork general. In different phrases, including extra reference materials past the important thing related items truly harm their efficiency extra usually than it helped.

Supply: Levy et al.
Why is that this such a shock? Sometimes, RAG methods are designed below the belief that retrieving a broader swath of knowledge can solely assist the AI – in spite of everything, if the reply isn’t within the first few paperwork, it is perhaps within the tenth or twentieth.
This examine flips that script, demonstrating that indiscriminately piling on additional paperwork can backfire. Even when the full textual content size was held fixed, the mere presence of many various paperwork (every with their very own context and quirks) made the question-answering process more difficult for the AI. It seems that past a sure level, every extra doc launched extra noise than sign, complicated the mannequin and impairing its capability to extract the proper reply.
Why Much less Can Be Extra in RAG
This “much less is extra” end result is smart as soon as we contemplate how AI language fashions course of data. When an AI is given solely probably the most related paperwork, the context it sees is targeted and freed from distractions, very like a scholar who has been handed simply the correct pages to check.
Within the examine, fashions carried out considerably higher when given solely the supporting paperwork, with irrelevant materials eliminated. The remaining context was not solely shorter but in addition cleaner – it contained details that straight pointed to the reply and nothing else. With fewer paperwork to juggle, the mannequin might dedicate its full consideration to the pertinent data, making it much less prone to get sidetracked or confused.
Then again, when many paperwork had been retrieved, the AI needed to sift by means of a mixture of related and irrelevant content material. Usually these additional paperwork had been “related however unrelated” – they may share a subject or key phrases with the question however not truly comprise the reply. Such content material can mislead the mannequin. The AI would possibly waste effort attempting to attach dots throughout paperwork that don’t truly result in an accurate reply, or worse, it’d merge data from a number of sources incorrectly. This will increase the chance of hallucinations – cases the place the AI generates a solution that sounds believable however will not be grounded in any single supply.
In essence, feeding too many paperwork to the mannequin can dilute the helpful data and introduce conflicting particulars, making it more durable for the AI to resolve what’s true.
Curiously, the researchers discovered that if the additional paperwork had been clearly irrelevant (for instance, random unrelated textual content), the fashions had been higher at ignoring them. The actual bother comes from distracting information that appears related: when all of the retrieved texts are on related matters, the AI assumes it ought to use all of them, and it might wrestle to inform which particulars are literally essential. This aligns with the examine’s commentary that random distractors triggered much less confusion than real looking distractors within the enter. The AI can filter out blatant nonsense, however subtly off-topic data is a slick lure – it sneaks in below the guise of relevance and derails the reply. By decreasing the variety of paperwork to solely the actually needed ones, we keep away from setting these traps within the first place.
There’s additionally a sensible profit: retrieving and processing fewer paperwork lowers the computational overhead for a RAG system. Each doc that will get pulled in needs to be analyzed (embedded, learn, and attended to by the mannequin), which makes use of time and computing assets. Eliminating superfluous paperwork makes the system extra environment friendly – it could discover solutions quicker and at decrease price. In situations the place accuracy improved by specializing in fewer sources, we get a win-win: higher solutions and a leaner, extra environment friendly course of.

Supply: Levy et al.
Rethinking RAG: Future Instructions
This new proof that high quality usually beats amount in retrieval has essential implications for the way forward for AI methods that depend on exterior data. It means that designers of RAG methods ought to prioritize good filtering and rating of paperwork over sheer quantity. As an alternative of fetching 100 potential passages and hoping the reply is buried in there someplace, it might be wiser to fetch solely the highest few extremely related ones.
The examine’s authors emphasize the necessity for retrieval strategies to “strike a stability between relevance and variety” within the data they provide to a mannequin. In different phrases, we need to present sufficient protection of the subject to reply the query, however not a lot that the core details are drowned in a sea of extraneous textual content.
Transferring ahead, researchers are prone to discover strategies that assist AI fashions deal with a number of paperwork extra gracefully. One strategy is to develop higher retriever methods or re-rankers that may determine which paperwork actually add worth and which of them solely introduce battle. One other angle is bettering the language fashions themselves: if one mannequin (like Qwen-2) managed to deal with many paperwork with out shedding accuracy, analyzing the way it was educated or structured might supply clues for making different fashions extra sturdy. Maybe future giant language fashions will incorporate mechanisms to acknowledge when two sources are saying the identical factor (or contradicting one another) and focus accordingly. The purpose can be to allow fashions to make the most of a wealthy number of sources with out falling prey to confusion – successfully getting the perfect of each worlds (breadth of knowledge and readability of focus).
It’s additionally value noting that as AI methods acquire bigger context home windows (the power to learn extra textual content directly), merely dumping extra information into the immediate isn’t a silver bullet. Larger context doesn’t mechanically imply higher comprehension. This examine reveals that even when an AI can technically learn 50 pages at a time, giving it 50 pages of mixed-quality data could not yield a superb end result. The mannequin nonetheless advantages from having curated, related content material to work with, reasonably than an indiscriminate dump. Actually, clever retrieval could develop into much more essential within the period of big context home windows – to make sure the additional capability is used for useful data reasonably than noise.
The findings from “Extra Paperwork, Similar Size” (the aptly titled paper) encourage a re-examination of our assumptions in AI analysis. Typically, feeding an AI all the info now we have will not be as efficient as we predict. By specializing in probably the most related items of knowledge, we not solely enhance the accuracy of AI-generated solutions but in addition make the methods extra environment friendly and simpler to belief. It’s a counterintuitive lesson, however one with thrilling ramifications: future RAG methods is perhaps each smarter and leaner by fastidiously selecting fewer, higher paperwork to retrieve.