RAG Tools Introduction
RAGTools
is an experimental module that provides a set of utilities for building Retrieval-Augmented Generation (RAG) applications, ie, applications that generate answers by combining knowledge of the underlying AI model with the information from the user's knowledge base.
It is designed to be powerful and flexible, allowing you to build RAG applications with minimal effort. Extend any step of the pipeline with your own custom code (see the RAG Interface section), or use the provided defaults to get started quickly.
Once the API stabilizes (near term), we hope to carve it out into a separate package.
Import the module as follows:
# required dependencies to load the necessary extensions!!!
using LinearAlgebra, SparseArrays, Unicode, Snowball
using PromptingTools.Experimental.RAGTools
# to access unexported functionality
const RT = PromptingTools.Experimental.RAGTools
Highlights
The main functions to be aware of are:
build_index
to build a RAG index from a list of documents (typeChunkIndex
)airag
to generate answers using the RAG model on top of theindex
built aboveretrieve
to retrieve relevant chunks from the index for a given questiongenerate!
to generate an answer from the retrieved chunks
annotate_support
to highlight which parts of the RAG answer are supported by the documents in the index vs which are generated by the model, it is applied automatically if you use pretty printing withpprint
(eg,pprint(result)
)build_qa_evals
to build a set of question-answer pairs for evaluation of the RAG model from your corpus
The hope is to provide a modular and easily extensible set of tools for building RAG applications in Julia. Feel free to open an issue or ask in the #generative-ai
channel in the JuliaLang Slack if you have a specific need.
Examples
Let's build an index, we need to provide a starter list of documents:
sentences = [
"Find the most comprehensive guide on Julia programming language for beginners published in 2023.",
"Search for the latest advancements in quantum computing using Julia language.",
"How to implement machine learning algorithms in Julia with examples.",
"Looking for performance comparison between Julia, Python, and R for data analysis.",
"Find Julia language tutorials focusing on high-performance scientific computing.",
"Search for the top Julia language packages for data visualization and their documentation.",
"How to set up a Julia development environment on Windows 10.",
"Discover the best practices for parallel computing in Julia.",
"Search for case studies of large-scale data processing using Julia.",
"Find comprehensive resources for mastering metaprogramming in Julia.",
"Looking for articles on the advantages of using Julia for statistical modeling.",
"How to contribute to the Julia open-source community: A step-by-step guide.",
"Find the comparison of numerical accuracy between Julia and MATLAB.",
"Looking for the latest Julia language updates and their impact on AI research.",
"How to efficiently handle big data with Julia: Techniques and libraries.",
"Discover how Julia integrates with other programming languages and tools.",
"Search for Julia-based frameworks for developing web applications.",
"Find tutorials on creating interactive dashboards with Julia.",
"How to use Julia for natural language processing and text analysis.",
"Discover the role of Julia in the future of computational finance and econometrics."
]
Let's index these "documents":
index = build_index(sentences; chunker_kwargs=(; sources=map(i -> "Doc$i", 1:length(sentences))))
This would be equivalent to the following index = build_index(SimpleIndexer(), sentences)
which dispatches to the default implementation of each step via the SimpleIndexer
struct. We provide these default implementations for the main functions as an optional argument - no need to provide them if you're running the default pipeline.
Notice that we have provided a chunker_kwargs
argument to the build_index
function. These will be kwargs passed to chunker
step.
Now let's generate an answer to a question.
- Run end-to-end RAG (retrieve + generate!), return
AIMessage
question = "What are the best practices for parallel computing in Julia?"
msg = airag(index; question) # short for airag(RAGConfig(), index; question)
## Output:
## [ Info: Done with RAG. Total cost: \$0.0
## AIMessage("Some best practices for parallel computing in Julia include us...
- Explore what's happening under the hood by changing the return type -
RAGResult
contains all intermediate steps.
result = airag(index; question, return_all=true)
## RAGResult
## question: String "What are the best practices for parallel computing in Julia?"
## rephrased_questions: Array{String}((1,))
## answer: SubString{String}
## final_answer: SubString{String}
## context: Array{String}((5,))
## sources: Array{String}((5,))
## emb_candidates: CandidateChunks{Int64, Float32}
## tag_candidates: CandidateChunks{Int64, Float32}
## filtered_candidates: CandidateChunks{Int64, Float32}
## reranked_candidates: CandidateChunks{Int64, Float32}
## conversations: Dict{Symbol, Vector{<:PromptingTools.AbstractMessage}}
You can still get the message from the result, see result.conversations[:final_answer]
(the dictionary keys correspond to the function names of those steps).
- If you need to customize it, break the pipeline into its sub-steps: retrieve and generate - RAGResult serves as the intermediate result.
# Retrieve which chunks are relevant to the question
result = retrieve(index, question)
# Generate an answer
result = generate!(index, result)
You can leverage a pretty-printing system with pprint
where we automatically annotate the support of the answer by the chunks we provided to the model. It is configurable and you can select only some of its functions (eg, scores, sources).
pprint(result)
You'll see the following in REPL but with COLOR highlighting in the terminal.
--------------------
QUESTION(s)
--------------------
- What are the best practices for parallel computing in Julia?
--------------------
ANSWER
--------------------
Some of the best practices for parallel computing in Julia include:[1,0.7]
- Using [3,0.4]`@threads` for simple parallelism[1,0.34]
- Utilizing `Distributed` module for more complex parallel tasks[1,0.19]
- Avoiding excessive memory allocation
- Considering task granularity for efficient workload distribution
--------------------
SOURCES
--------------------
1. Doc8
2. Doc15
3. Doc5
4. Doc2
5. Doc9
See ?print_html
for the HTML version of the pretty-printing and styling system, eg, when you want to display the results in a web application based on Genie.jl/Stipple.jl.
How to read the output
Color legend:
No color: High match with the context, can be trusted more
Blue: Partial match against some words in the context, investigate
Magenta (Red): No match with the context, fully generated by the model
Square brackets: The best matching context ID + Match score of the chunk (eg,
[3,0.4]
means the highest support for the sentence is from the context chunk number 3 with a 40% match).
Want more?
See examples/building_RAG.jl
for one more example.
RAG Interface
System Overview
This system is designed for information retrieval and response generation, structured in three main phases:
Preparation, when you create an instance of
AbstractIndex
Retrieval, when you surface the top most relevant chunks/items in the
index
and returnAbstractRAGResult
, which contains the references to the chunks (AbstractCandidateChunks
)Generation, when you generate an answer based on the context built from the retrieved chunks, return either
AIMessage
orAbstractRAGResult
The corresponding functions are build_index
, retrieve
, and generate!
, respectively. Here is the high-level diagram that shows the signature of the main functions:
Notice that the first argument is a custom type for multiple dispatch. In addition, observe the "kwargs" names, that's how the keyword arguments for each function are passed down from the higher-level functions (eg, build_index(...; chunker_kwargs=(; separators=...)))
). It's the simplest way to customize some step of the pipeline (eg, set a custom model with a model
kwarg or prompt template with template
kwarg).
The system is designed to be hackable and extensible at almost every entry point. If you want to customize the behavior of any step, you can do so by defining a new type and defining a new method for the step you're changing, eg,
PromptingTools.Experimental.RAGTools: rerank
struct MyReranker <: AbstractReranker end
rerank(::MyReranker, index, candidates) = ...
And then you would set the retrive
step to use your custom MyReranker
via reranker
kwarg, eg, retrieve(....; reranker = MyReranker())
(or customize the main dispatching AbstractRetriever
struct).
The overarching principles are:
Always dispatch / customize the behavior by defining a new
Struct
and the corresponding method for the existing functions (eg,rerank
function for the re-ranking step).Custom types are provided as the first argument (the high-level functions will work without them as we provide some defaults).
Custom types do NOT have any internal fields or DATA (with the exception of managing sub-steps of the pipeline like
AbstractRetriever
orRAGConfig
).Additional data should be passed around as keyword arguments (eg,
chunker_kwargs
inbuild_index
to pass data to the chunking step). The intention was to have some clearly documented default values in the docstrings of each step + to have the various options all in one place.
RAG Diagram
The main functions are:
Prepare your document index with build_index
:
signature:
(indexer::AbstractIndexBuilder, files_or_docs::Vector{<:AbstractString}) -> AbstractChunkIndex
flow:
get_chunks
->get_embeddings
->get_tags
->build_tags
dispatch types:
AbstractIndexBuilder
,AbstractChunker
,AbstractEmbedder
,AbstractTagger
Run E2E RAG with airag
:
signature:
(cfg::AbstractRAGConfig, index::AbstractChunkIndex; question::AbstractString)
->AIMessage
orAbstractRAGResult
flow:
retrieve
->generate!
dispatch types:
AbstractRAGConfig
,AbstractRetriever
,AbstractGenerator
Retrieve relevant chunks with retrieve
:
signature:
(retriever::AbstractRetriever, index::AbstractChunkIndex, question::AbstractString) -> AbstractRAGResult
flow:
rephrase
->get_embeddings
->find_closest
->get_tags
->find_tags
->rerank
dispatch types:
AbstractRAGConfig
,AbstractRephraser
,AbstractEmbedder
,AbstractSimilarityFinder
,AbstractTagger
,AbstractTagFilter
,AbstractReranker
Generate an answer from relevant chunks with generate!
:
signature:
(generator::AbstractGenerator, index::AbstractChunkIndex, result::AbstractRAGResult)
->AIMessage
orAbstractRAGResult
flow:
build_context!
->answer!
->refine!
->postprocess!
dispatch types:
AbstractGenerator
,AbstractContextBuilder
,AbstractAnswerer
,AbstractRefiner
,AbstractPostprocessor
To discover the currently available implementations, use subtypes
function, eg, subtypes(AbstractReranker)
.
Passing Keyword Arguments
If you need to pass keyword arguments, use the nested kwargs corresponding to the dispatch type names (rephrase
step, has rephraser
dispatch type and rephraser_kwargs
for its keyword arguments).
For example:
cfg = RAGConfig(; retriever = AdvancedRetriever())
# kwargs will be big and nested, let's prepare them upfront
# we specify "custom" model for each component that calls LLM
kwargs = (
retriever = AdvancedRetriever(),
retriever_kwargs = (;
top_k = 100,
top_n = 5,
# notice that this is effectively: retriever_kwargs/rephraser_kwargs/template
rephraser_kwargs = (;
template = :RAGQueryHyDE,
model = "custom")),
generator_kwargs = (;
# pass kwargs to `answer!` step defined by the `answerer` -> we're setting `answerer_kwargs`
answerer_kwargs = (;
model = "custom"),
# api_kwargs can be shared across all components
api_kwargs = (;
url = "http://localhost:8080")))
result = airag(cfg, index, question; kwargs...)
If you were one level deeper in the pipeline, working with retriever directly, you would pass:
retriever_kwargs = (;
top_k = 100,
top_n = 5,
# notice that this is effectively: rephraser_kwargs/template
rephraser_kwargs = (;
template = :RAGQueryHyDE,
model = "custom"),
# api_kwargs can be shared across all components
api_kwargs = (;
url = "http://localhost:8080"))
result = retrieve(AdvancedRetriever(), index, question; retriever_kwargs...)
And going even deeper, you would provide the rephraser_kwargs
directly to the rephrase
step, eg,
rephrase(SimpleRephraser(), question; model="custom", template = :RAGQueryHyDE, api_kwargs = (; url = "http://localhost:8080"))
Deepdive
Preparation Phase:
Begins with
build_index
, which creates a user-defined index type from an abstract chunk index using specified dels and function strategies.get_chunks
then divides the indexed data into manageable pieces based on a chunking strategy.get_embeddings
generates embeddings for each chunk using an embedding strategy to facilitate similarity arches.Finally,
get_tags
extracts relevant metadata from each chunk, enabling tag-based filtering (hybrid search index). If there aretags
available,build_tags
is called to build the corresponding sparse matrix for filtering with tags.
Retrieval Phase:
The
retrieve
step is intended to find the most relevant chunks in theindex
.rephrase
is called first, if we want to rephrase the query (methods likeHyDE
can improve retrieval quite a bit)!get_embeddings
generates embeddings for the original + rephrased queryfind_closest
looks up the most relevant candidates (CandidateChunks
) using a similarity search strategy.get_tags
extracts the potential tags (can be provided as part of theairag
call, eg, when we want to use only some small part of the indexed chunks)find_tags
filters the candidates to strictly match at least one of the tags (if provided)rerank
is called to rerank the candidates based on the reranking strategy (ie, to improve the ordering of the chunks in context).
Generation Phase:
The
generate!
step is intended to generate a response based on the retrieved chunks, provided viaAbstractRAGResult
(eg,RAGResult
).build_context!
constructs the context for response generation based on a context strategy and applies the necessary formattinganswer!
generates the response based on the context and the queryrefine!
is called to refine the response (optional, defaults to passthrough)postprocessing!
is available for any final touches to the response or to potentially save or format the results (eg, automatically save to the disk)
Note that all generation steps are mutating the RAGResult
object.
See more details and corresponding functions and types in src/Experimental/RAGTools/rag_interface.jl
.
References
build_index(
indexer::AbstractIndexBuilder, files_or_docs::Vector{<:AbstractString};
verbose::Integer = 1,
extras::Union{Nothing, AbstractVector} = nothing,
index_id = gensym("ChunkEmbeddingsIndex"),
chunker::AbstractChunker = indexer.chunker,
chunker_kwargs::NamedTuple = NamedTuple(),
embedder::AbstractEmbedder = indexer.embedder,
embedder_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = indexer.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
api_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
Build an INDEX for RAG (Retriever-Augmented Generation) applications from the provided file paths. INDEX is a object storing the document chunks and their embeddings (and potentially other information).
The function processes each file or document (depending on chunker
), splits its content into chunks, embeds these chunks, optionally extracts metadata, and then combines this information into a retrievable index.
Define your own methods via indexer
and its subcomponents (chunker
, embedder
, tagger
).
Arguments
indexer::AbstractIndexBuilder
: The indexing logic to use. Default isSimpleIndexer()
.files_or_docs
: A vector of valid file paths OR string documents to be indexed (chunked and embedded). Specify which mode to use viachunker
.verbose
: An Integer specifying the verbosity of the logs. Default is1
(high-level logging).0
is disabled.extras
: An optional vector of extra information to be stored with each chunk. Default isnothing
.index_id
: A unique identifier for the index. Default is a generated symbol.chunker
: The chunker logic to use for splitting the documents. Default isTextChunker()
.chunker_kwargs
: Parameters to be provided to theget_chunks
function. Useful to change theseparators
ormax_length
.sources
: A vector of strings indicating the source of each chunk. Default is equal tofiles_or_docs
.
embedder
: The embedder logic to use for embedding the chunks. Default isBatchEmbedder()
.embedder_kwargs
: Parameters to be provided to theget_embeddings
function. Useful to change thetarget_batch_size_length
or reduce asyncmap tasksntasks
.model
: The model to use for embedding. Default isPT.MODEL_EMBEDDING
.
tagger
: The tagger logic to use for extracting tags from the chunks. Default isNoTagger()
, ie, skip tag extraction. There are alsoPassthroughTagger
andOpenTagger
.tagger_kwargs
: Parameters to be provided to theget_tags
function.model
: The model to use for tags extraction. Default isPT.MODEL_CHAT
.template
: A template to be used for tags extraction. Default is:RAGExtractMetadataShort
.tags
: A vector of vectors of strings directly providing the tags for each chunk. Applicable fortagger::PasstroughTagger
.
api_kwargs
: Parameters to be provided to the API endpoint. Shared across all API calls if provided.cost_tracker
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
Returns
ChunkEmbeddingsIndex
: An object containing the compiled index of chunks, embeddings, tags, vocabulary, and sources.
See also: ChunkEmbeddingsIndex
, get_chunks
, get_embeddings
, get_tags
, CandidateChunks
, find_closest
, find_tags
, rerank
, retrieve
, generate!
, airag
Examples
# Default is loading a vector of strings and chunking them (`TextChunker()`)
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))
# Another example with tags extraction, splitting only sentences and verbose output
# Assuming `test_files` is a vector of file paths
indexer = SimpleIndexer(chunker=FileChunker(), tagger=OpenTagger())
index = build_index(indexer, test_files;
chunker_kwargs(; separators=[". "]), verbose=true)
Notes
- If you get errors about exceeding embedding input sizes, first check the
max_length
in your chunks. If that does NOT resolve the issue, try changing theembedding_kwargs
. In particular, reducing thetarget_batch_size_length
parameter (eg, 10_000) and number of tasksntasks=1
. Some providers cannot handle large batch sizes (eg, Databricks).
build_index(
indexer::KeywordsIndexer, files_or_docs::Vector{<:AbstractString};
verbose::Integer = 1,
extras::Union{Nothing, AbstractVector} = nothing,
index_id = gensym("ChunkKeywordsIndex"),
chunker::AbstractChunker = indexer.chunker,
chunker_kwargs::NamedTuple = NamedTuple(),
processor::AbstractProcessor = indexer.processor,
processor_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = indexer.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
api_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
Builds a ChunkKeywordsIndex
from the provided files or documents to support keyword-based search (BM25).
airag(cfg::AbstractRAGConfig, index::AbstractDocumentIndex;
question::AbstractString,
verbose::Integer = 1, return_all::Bool = false,
api_kwargs::NamedTuple = NamedTuple(),
retriever::AbstractRetriever = cfg.retriever,
retriever_kwargs::NamedTuple = NamedTuple(),
generator::AbstractGenerator = cfg.generator,
generator_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
High-level wrapper for Retrieval-Augmented Generation (RAG), it combines together the retrieve
and generate!
steps which you can customize if needed.
The simplest version first finds the relevant chunks in index
for the question
and then sends these chunks to the AI model to help with generating a response to the question
.
To customize the components, replace the types (retriever
, generator
) of the corresponding step of the RAG pipeline - or go into sub-routines within the steps. Eg, use subtypes(AbstractRetriever)
to find the available options.
Arguments
cfg::AbstractRAGConfig
: The configuration for the RAG pipeline. Defaults toRAGConfig()
, where you can swap sub-types to customize the pipeline.index::AbstractDocumentIndex
: The chunk index to search for relevant text.question::AbstractString
: The question to be answered.return_all::Bool
: Iftrue
, returns the details used for RAG along with the response.verbose::Integer
: If>0
, enables verbose logging. The higher the number, the more nested functions will log.api_kwargs
: API parameters that will be forwarded to ALL of the API calls (aiembed
,aigenerate
, andaiextract
).retriever::AbstractRetriever
: The retriever to use for finding relevant chunks. Defaults tocfg.retriever
, eg,SimpleRetriever
(with no question rephrasing).retriever_kwargs::NamedTuple
: API parameters that will be forwarded to theretriever
call. Examples of important ones:top_k::Int
: Number of top candidates to retrieve based on embedding similarity.top_n::Int
: Number of candidates to return after reranking.tagger::AbstractTagger
: Tagger to use for tagging the chunks. Defaults toNoTagger()
.tagger_kwargs::NamedTuple
: API parameters that will be forwarded to thetagger
call. You could provide the explicit tags directly withPassthroughTagger
andtagger_kwargs = (; tags = ["tag1", "tag2"])
.
generator::AbstractGenerator
: The generator to use for generating the answer. Defaults tocfg.generator
, eg,SimpleGenerator
.generator_kwargs::NamedTuple
: API parameters that will be forwarded to thegenerator
call. Examples of important ones:answerer_kwargs::NamedTuple
: API parameters that will be forwarded to theanswerer
call. Examples:model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
.template
: The template to use for theaigenerate
function. Defaults to:RAGAnswerFromContext
.
refiner::AbstractRefiner
: The method to use for refining the answer. Defaults togenerator.refiner
, eg,NoRefiner
.refiner_kwargs::NamedTuple
: API parameters that will be forwarded to therefiner
call.model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
.template
: The template to use for theaigenerate
function. Defaults to:RAGAnswerRefiner
.
cost_tracker
: An atomic counter to track the total cost of the operations (if you want to track the cost of multiple pipeline runs - it passed around in the pipeline).
Returns
If
return_all
isfalse
, returns the generated message (msg
).If
return_all
istrue
, returns the detail of the full pipeline inRAGResult
(see the docs).
See also build_index
, retrieve
, generate!
, RAGResult
, getpropertynested
, setpropertynested
, merge_kwargs_nested
, ChunkKeywordsIndex
.
Examples
Using airag
to get a response for a question:
index = build_index(...) # create an index
question = "How to make a barplot in Makie.jl?"
msg = airag(index; question)
To understand the details of the RAG process, use return_all=true
msg, details = airag(index; question, return_all = true)
# details is a RAGDetails object with all the internal steps of the `airag` function
You can also pretty-print details
to highlight generated text vs text that is supported by context. It also includes annotations of which context was used for each part of the response (where available).
PT.pprint(details)
Example with advanced retrieval (with question rephrasing and reranking (requires COHERE_API_KEY
). We will obtain top 100 chunks from embeddings (top_k
) and top 5 chunks from reranking (top_n
). In addition, it will be done with a "custom" locally-hosted model.
cfg = RAGConfig(; retriever = AdvancedRetriever())
# kwargs will be big and nested, let's prepare them upfront
# we specify "custom" model for each component that calls LLM
kwargs = (
retriever_kwargs = (;
top_k = 100,
top_n = 5,
rephraser_kwargs = (;
model = "custom"),
embedder_kwargs = (;
model = "custom"),
tagger_kwargs = (;
model = "custom")),
generator_kwargs = (;
answerer_kwargs = (;
model = "custom"),
refiner_kwargs = (;
model = "custom")),
api_kwargs = (;
url = "http://localhost:8080"))
result = airag(cfg, index, question; kwargs...)
If you want to use hybrid retrieval (embeddings + BM25), you can easily create an additional index based on keywords and pass them both into a MultiIndex
.
You need to provide an explicit config, so the pipeline knows how to handle each index in the search similarity phase (finder
).
index = # your existing index
# create the multi-index with the keywords index
index_keywords = ChunkKeywordsIndex(index)
multi_index = MultiIndex([index, index_keywords])
# define the similarity measures for the indices that you have (same order)
finder = RT.MultiFinder([RT.CosineSimilarity(), RT.BM25Similarity()])
cfg = RAGConfig(; retriever=AdvancedRetriever(; processor=RT.KeywordsProcessor(), finder))
# Run the pipeline with the new hybrid retrieval (return the `RAGResult` to see the details)
result = airag(cfg, multi_index; question, return_all=true)
# Pretty-print the result
PT.pprint(result)
For easier manipulation of nested kwargs, see utilities getpropertynested
, setpropertynested
, merge_kwargs_nested
.
retrieve(retriever::AbstractRetriever,
index::AbstractChunkIndex,
question::AbstractString;
verbose::Integer = 1,
top_k::Integer = 100,
top_n::Integer = 5,
api_kwargs::NamedTuple = NamedTuple(),
rephraser::AbstractRephraser = retriever.rephraser,
rephraser_kwargs::NamedTuple = NamedTuple(),
embedder::AbstractEmbedder = retriever.embedder,
embedder_kwargs::NamedTuple = NamedTuple(),
processor::AbstractProcessor = retriever.processor,
processor_kwargs::NamedTuple = NamedTuple(),
finder::AbstractSimilarityFinder = retriever.finder,
finder_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = retriever.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
filter::AbstractTagFilter = retriever.filter,
filter_kwargs::NamedTuple = NamedTuple(),
reranker::AbstractReranker = retriever.reranker,
reranker_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0),
kwargs...)
Retrieves the most relevant chunks from the index for the given question and returns them in the RAGResult
object.
This is the main entry point for the retrieval stage of the RAG pipeline. It is often followed by generate!
step.
Notes:
- The default flow is
build_context!
->answer!
->refine!
->postprocess!
.
The arguments correspond to the steps of the retrieval process (rephrasing, embedding, finding similar docs, tagging, filtering by tags, reranking). You can customize each step by providing a new custom type that dispatches the corresponding function, eg, create your own type struct MyReranker<:AbstractReranker end
and define the custom method for it rerank(::MyReranker,...) = ...
.
Note: Discover available retrieval sub-types for each step with subtypes(AbstractRephraser)
and similar for other abstract types.
If you're using locally-hosted models, you can pass the api_kwargs
with the url
field set to the model's URL and make sure to provide corresponding model
kwargs to rephraser
, embedder
, and tagger
to use the custom models (they make AI calls).
Arguments
retriever
: The retrieval method to use. Default isSimpleRetriever
but could beAdvancedRetriever
for more advanced retrieval.index
: The index that holds the chunks and sources to be retrieved from.question
: The question to be used for the retrieval.verbose
: If>0
, it prints out verbose logging. Default is1
. If you set it to2
, it will print out logs for each sub-function.top_k
: The TOTAL number of closest chunks to return fromfind_closest
. Default is100
. If there are multiple rephrased questions, the number of chunks per each item will betop_k ÷ number_of_rephrased_questions
.top_n
: The TOTAL number of most relevant chunks to return for the context (fromrerank
step). Default is5
.api_kwargs
: Additional keyword arguments to be passed to the API calls (shared by allai*
calls).rephraser
: Transform the question into one or more questions. Default isretriever.rephraser
.rephraser_kwargs
: Additional keyword arguments to be passed to the rephraser.model
: The model to use for rephrasing. Default isPT.MODEL_CHAT
.template
: The rephrasing template to use. Default is:RAGQueryOptimizer
or:RAGQueryHyDE
(depending on therephraser
selected).
embedder
: The embedding method to use. Default isretriever.embedder
.embedder_kwargs
: Additional keyword arguments to be passed to the embedder.processor
: The processor method to use when using Keyword-based index. Default isretriever.processor
.processor_kwargs
: Additional keyword arguments to be passed to the processor.finder
: The similarity search method to use. Default isretriever.finder
, oftenCosineSimilarity
.finder_kwargs
: Additional keyword arguments to be passed to the similarity finder.tagger
: The tag generating method to use. Default isretriever.tagger
.tagger_kwargs
: Additional keyword arguments to be passed to the tagger. Noteworthy arguments:tags
: Directly provide the tags to use for filtering (can be String, Regex, or Vector{String}). Useful fortagger = PassthroughTagger
.
filter
: The tag matching method to use. Default isretriever.filter
.filter_kwargs
: Additional keyword arguments to be passed to the tag filter.reranker
: The reranking method to use. Default isretriever.reranker
.reranker_kwargs
: Additional keyword arguments to be passed to the reranker.model
: The model to use for reranking. Default isrerank-english-v2.0
if you usereranker = CohereReranker()
.
cost_tracker
: An atomic counter to track the cost of the retrieval. Default isThreads.Atomic{Float64}(0.0)
.
See also: SimpleRetriever
, AdvancedRetriever
, build_index
, rephrase
, get_embeddings
, get_keywords
, find_closest
, get_tags
, find_tags
, rerank
, RAGResult
.
Examples
Find the 5 most relevant chunks from the index for the given question.
# assumes you have an existing index `index`
retriever = SimpleRetriever()
result = retrieve(retriever,
index,
"What is the capital of France?",
top_n = 5)
# or use the default retriever (same as above)
result = retrieve(retriever,
index,
"What is the capital of France?",
top_n = 5)
Apply more advanced retrieval with question rephrasing and reranking (requires COHERE_API_KEY
). We will obtain top 100 chunks from embeddings (top_k
) and top 5 chunks from reranking (top_n
).
retriever = AdvancedRetriever()
result = retrieve(retriever, index, question; top_k=100, top_n=5)
You can use the retriever
to customize your retrieval strategy or directly change the strategy types in the retrieve
kwargs!
Example of using locally-hosted model hosted on localhost:8080
:
retriever = SimpleRetriever()
result = retrieve(retriever, index, question;
rephraser_kwargs = (; model = "custom"),
embedder_kwargs = (; model = "custom"),
tagger_kwargs = (; model = "custom"), api_kwargs = (;
url = "http://localhost:8080"))
generate!(
generator::AbstractGenerator, index::AbstractDocumentIndex, result::AbstractRAGResult;
verbose::Integer = 1,
api_kwargs::NamedTuple = NamedTuple(),
contexter::AbstractContextBuilder = generator.contexter,
contexter_kwargs::NamedTuple = NamedTuple(),
answerer::AbstractAnswerer = generator.answerer,
answerer_kwargs::NamedTuple = NamedTuple(),
refiner::AbstractRefiner = generator.refiner,
refiner_kwargs::NamedTuple = NamedTuple(),
postprocessor::AbstractPostprocessor = generator.postprocessor,
postprocessor_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0),
kwargs...)
Generate the response using the provided generator
and the index
and result
. It is the second step in the RAG pipeline (after retrieve
)
Returns the mutated result
with the result.final_answer
and the full conversation saved in result.conversations[:final_answer]
.
Notes
The default flow is
build_context!
->answer!
->refine!
->postprocess!
.contexter
is the method to use for building the context, eg, simply enumerate the context chunks withContextEnumerator
.answerer
is the standard answer generation step with LLMs.refiner
step allows the LLM to critique itself and refine its own answer.postprocessor
step allows for additional processing of the answer, eg, logging, saving conversations, etc.All of its sub-routines operate by mutating the
result
object (and adding their part).Discover available sub-types for each step with
subtypes(AbstractRefiner)
and similar for other abstract types.
Arguments
generator::AbstractGenerator
: Thegenerator
to use for generating the answer. Can beSimpleGenerator
orAdvancedGenerator
.index::AbstractDocumentIndex
: The index containing chunks and sources.result::AbstractRAGResult
: The result containing the context and question to generate the answer for.verbose::Integer
: If >0, enables verbose logging.api_kwargs::NamedTuple
: API parameters that will be forwarded to ALL of the API calls (aiembed
,aigenerate
, andaiextract
).contexter::AbstractContextBuilder
: The method to use for building the context. Defaults togenerator.contexter
, eg,ContextEnumerator
.contexter_kwargs::NamedTuple
: API parameters that will be forwarded to thecontexter
call.answerer::AbstractAnswerer
: The method to use for generating the answer. Defaults togenerator.answerer
, eg,SimpleAnswerer
.answerer_kwargs::NamedTuple
: API parameters that will be forwarded to theanswerer
call. Examples:model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
.template
: The template to use for theaigenerate
function. Defaults to:RAGAnswerFromContext
.
refiner::AbstractRefiner
: The method to use for refining the answer. Defaults togenerator.refiner
, eg,NoRefiner
.refiner_kwargs::NamedTuple
: API parameters that will be forwarded to therefiner
call.model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
.template
: The template to use for theaigenerate
function. Defaults to:RAGAnswerRefiner
.
postprocessor::AbstractPostprocessor
: The method to use for postprocessing the answer. Defaults togenerator.postprocessor
, eg,NoPostprocessor
.postprocessor_kwargs::NamedTuple
: API parameters that will be forwarded to thepostprocessor
call.cost_tracker
: An atomic counter to track the total cost of the operations.
See also: retrieve
, build_context!
, ContextEnumerator
, answer!
, SimpleAnswerer
, refine!
, NoRefiner
, SimpleRefiner
, postprocess!
, NoPostprocessor
Examples
Assume we already have `index`
question = "What are the best practices for parallel computing in Julia?"
# Retrieve the relevant chunks - returns RAGResult
result = retrieve(index, question)
# Generate the answer using the default generator, mutates the same result
result = generate!(index, result)
annotate_support(annotater::TrigramAnnotater, answer::AbstractString,
context::AbstractVector; min_score::Float64 = 0.5,
skip_trigrams::Bool = true, hashed::Bool = true,
sources::Union{Nothing, AbstractVector{<:AbstractString}} = nothing,
min_source_score::Float64 = 0.25,
add_sources::Bool = true,
add_scores::Bool = true, kwargs...)
Annotates the answer
with the overlap/what's supported in context
and returns the annotated tree of nodes representing the answer
Returns a "root" node with children nodes representing the sentences/code blocks in the answer
. Only the "leaf" nodes are to be printed (to avoid duplication), "leaf" nodes are those with NO children.
Default logic:
Split into sentences/code blocks, then into tokens (~words).
Then match each token (~word) exactly.
If no exact match found, count trigram-based match (include the surrounding tokens for better contextual awareness).
If the match is higher than
min_score
, it's recorded in thescore
of the node.
Arguments
annotater::TrigramAnnotater
: Annotater to useanswer::AbstractString
: Text to annotatecontext::AbstractVector
: Context to annotate against, ie, look for "support" in the texts incontext
min_score::Float64
: Minimum score to consider a match. Default: 0.5, which means that half of the trigrams of each word should matchskip_trigrams::Bool
: Whether to potentially skip trigram matching if exact full match is found. Default: truehashed::Bool
: Whether to use hashed trigrams. It's harder to debug, but it's much faster for larger texts (hashed text are held in a Set to deduplicate). Default: truesources::Union{Nothing, AbstractVector{<:AbstractString}}
: Sources to add at the end of the context. Default: nothingmin_source_score::Float64
: Minimum score to consider/to display a source. Default: 0.25, which means that at least a quarter of the trigrams of each word should match to some context. The threshold is lower thanmin_score
, because it's average across ALL words in a block, so it's much harder to match fully with generated text.add_sources::Bool
: Whether to add sources at the end of each code block/sentence. Sources are addded in the square brackets like "[1]". Default: trueadd_scores::Bool
: Whether to add source-matching scores at the end of each code block/sentence. Scores are added in the square brackets like "[0.75]". Default: truekwargs: Additional keyword arguments to pass to
trigram_support!
andset_node_style!
. See their documentation for more details (eg, customize the colors of the nodes based on the score)
Example
annotater = TrigramAnnotater()
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test context. Another context sentence."
annotated_root = annotate_support(annotater, answer, context)
pprint(annotated_root) # pretty print the annotated tree
annotate_support(
annotater::TrigramAnnotater, result::AbstractRAGResult; min_score::Float64 = 0.5,
skip_trigrams::Bool = true, hashed::Bool = true,
min_source_score::Float64 = 0.25,
add_sources::Bool = true,
add_scores::Bool = true, kwargs...)
Dispatch for annotate_support
for AbstractRAGResult
type. It extracts the final_answer
and context
from the result
and calls annotate_support
with them.
See annotate_support
for more details.
Example
res = RAGResult(; question = "", final_answer = "This is a test.",
context = ["Test context.", "Completely different"])
annotated_root = annotate_support(annotater, res)
PT.pprint(annotated_root)
build_qa_evals(doc_chunks::Vector{<:AbstractString}, sources::Vector{<:AbstractString};
model=PT.MODEL_CHAT, instructions="None.", qa_template::Symbol=:RAGCreateQAFromContext,
verbose::Bool=true, api_kwargs::NamedTuple = NamedTuple(), kwargs...) -> Vector{QAEvalItem}
Create a collection of question and answer evaluations (QAEvalItem
) from document chunks and sources. This function generates Q&A pairs based on the provided document chunks, using a specified AI model and template.
Arguments
doc_chunks::Vector{<:AbstractString}
: A vector of document chunks, each representing a segment of text.sources::Vector{<:AbstractString}
: A vector of source identifiers corresponding to each chunk indoc_chunks
(eg, filenames or paths).model
: The AI model used for generating Q&A pairs. Default isPT.MODEL_CHAT
.instructions::String
: Additional instructions or context to provide to the model generating QA sets. Defaults to "None.".qa_template::Symbol
: A template symbol that dictates the AITemplate that will be used. It must have placeholdercontext
. Default is:CreateQAFromContext
.api_kwargs::NamedTuple
: Parameters that will be forwarded to the API endpoint.verbose::Bool
: Iftrue
, additional information like costs will be logged. Defaults totrue
.
Returns
Vector{QAEvalItem}
: A vector of QAEvalItem
structs, each containing a source, context, question, and answer. Invalid or empty items are filtered out.
Notes
The function internally uses
aiextract
to generate Q&A pairs based on the providedqa_template
. So you can use any kwargs that you want.Each
QAEvalItem
includes the context (document chunk), the generated question and answer, and the source.The function tracks and reports the cost of AI calls if
verbose
is enabled.Items where the question, answer, or context is empty are considered invalid and are filtered out.
Examples
Creating Q&A evaluations from a set of document chunks:
doc_chunks = ["Text from document 1", "Text from document 2"]
sources = ["source1", "source2"]
qa_evals = build_qa_evals(doc_chunks, sources)