Reference for RAGTools
Provides Retrieval-Augmented Generation (RAG) functionality.
Requires: LinearAlgebra, SparseArrays, Unicode, PromptingTools for proper functionality.
This module is experimental and may change at any time. It is intended to be moved to a separate package in the future.
Abstract type for storing candidate chunks, ie, references to items in a AbstractChunkIndex
Return type from find_closest
and find_tags
Required Fields
: the id of the index from which the candidates are drawnpositions::Vector{Int}
: the positions of the candidates in the indexscores::Vector{Float32}
: the similarity scores of the candidates from the query (higher is better)
AbstractChunkIndex <: AbstractDocumentIndex
Main abstract type for storing document chunks and their embeddings. It also stores tags and sources for each chunk.
Required Fields
: unique identifier of each index (to ensure we're using the right index withCandidateChunks
: underlying document chunks / snippetsembeddings::Union{Nothing, Matrix{<:Real}}
: for semantic searchtags::Union{Nothing, AbstractMatrix{<:Bool}}
: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the giventag
for the position lookup)tags_vocab::Union{Nothing, Vector{<:AbstractString}}
: vocabulary for thetags
matrix (each column intags
is one item intags_vocab
and rows are the chunks)sources::Vector{<:AbstractString}
: sources of the chunksextras::Union{Nothing, AbstractVector}
: additional data, eg, metadata, source code, etc.
AbstractGenerator <: AbstractGenerationMethod
Abstract type for generating an answer with generate!
(use to change the process / return type of generate
Required Fields
: the context building method, dispatching `build_context!answerer::AbstractAnswerer
: the answer generation method, dispatchinganswer!
: the answer refining method, dispatchingrefine!
: the postprocessing method, dispatchingpostprocess!
Abstract type for building an index with build_index
(use to change the process / return type of build_index
Required Fields
: the chunking method, dispatchingget_chunks
: the embedding method, dispatchingget_embeddings
: the tagging method, dispatchingget_tags
AbstractMultiIndex <: AbstractDocumentIndex
Experimental abstract type for storing multiple document indexes. Not yet implemented.
AbstractRetriever <: AbstractRetrievalMethod
Abstract type for retrieving chunks from an index with retrieve
(use to change the process / return type of retrieve
Required Fields
: the rephrasing method, dispatchingrephrase
: the similarity search method, dispatchingfind_closest
: the tag matching method, dispatchingfind_tags
: the reranking method, dispatchingrerank
AdvancedGenerator <: AbstractGenerator
Default implementation for generate!
. It simply enumerates context snippets and runs aigenerate
(no refinement).
It uses ContextEnumerator
, SimpleAnswerer
, SimpleRefiner
, and NoPostprocessor
as default contexter
, answerer
, refiner
, and postprocessor
AdvancedRetriever <: AbstractRetriever
Dispatch for retrieve
with advanced retrieval methods to improve result quality. Compared to SimpleRetriever, it adds rephrasing the query and reranking the results.
: the rephrasing method, dispatchingrephrase
- usesHyDERephraser
: the embedding method, dispatchingget_embeddings
(see Preparation Stage for more details) - usesBatchEmbedder
: the processor method, dispatchingget_keywords
(see Preparation Stage for more details) - usesNoProcessor
: the similarity search method, dispatchingfind_closest
- usesCosineSimilarity
: the tag generating method, dispatchingget_tags
(see Preparation Stage for more details) - usesNoTagger
: the tag matching method, dispatchingfind_tags
- usesNoTagFilter
: the reranking method, dispatchingrerank
- usesCohereReranker
AllTagFilter <: AbstractTagFilter
Finds the chunks that have ALL OF the specified tag(s). A method for find_tags
AnnotatedNode{T} <: AbstractAnnotatedNode
A node to add annotations to the generated answer in airag
Annotations can be: sources, scores, whether its supported or not by the context, etc.
: Unique identifier for the same group of nodes (eg, different lines of the same code block)parent::Union{AnnotatedNode, Nothing}
: Parent node that current node was built onchildren::Vector{AnnotatedNode}
: Children nodes`score::
AnyTagFilter <: AbstractTagFilter
Finds the chunks that have ANY OF the specified tag(s). A method for find_tags
BM25Similarity <: AbstractSimilarityFinder
Finds the closest chunks to a query embedding by measuring the BM25 similarity between the query and the chunks' embeddings in binary form. A method for find_closest
Reference: Wikipedia: BM25. Implementation follows: The Next Generation of Lucene Relevance.
BatchEmbedder <: AbstractEmbedder
Default embedder for get_embeddings
functions. It passes individual documents to be embedded in chunks to aiembed
BinaryBatchEmbedder <: AbstractEmbedder
Same as BatchEmbedder
but reduces the embeddings matrix to a binary form (eg, BitMatrix
). Defines a method for get_embeddings
Reference: HuggingFace: Embedding Quantization.
BinaryCosineSimilarity <: AbstractSimilarityFinder
Finds the closest chunks to a query embedding by measuring the Hamming distance AND cosine similarity between the query and the chunks' embeddings in binary form. A method for find_closest
It follows the two-pass approach:
First pass: Hamming distance in binary form to get the
top_k * rescore_multiplier
(ie, more than top_k) candidates.Second pass: Rescore the candidates with float embeddings and return the top_k.
Reference: HuggingFace: Embedding Quantization.
BitPackedBatchEmbedder <: AbstractEmbedder
Same as BatchEmbedder
but reduces the embeddings matrix to a binary form packed in UInt64 (eg, BitMatrix.chunks
). Defines a method for get_embeddings
See also utilities pack_bits
and unpack_bits
to move between packed/non-packed binary forms.
Reference: HuggingFace: Embedding Quantization.
BitPackedCosineSimilarity <: AbstractSimilarityFinder
Finds the closest chunks to a query embedding by measuring the Hamming distance AND cosine similarity between the query and the chunks' embeddings in binary form. A method for find_closest
The difference to BinaryCosineSimilarity
is that the binary values are packed into UInt64, which is more efficient.
Reference: HuggingFace: Embedding Quantization. Implementation of hamming_distance
is based on TinyRAG.
A struct for storing references to chunks in the given index (identified by index_id
) called positions
and scores
holding the strength of similarity (=1 is the highest, most similar). It's the result of the retrieval stage of RAG.
: the id of the index from which the candidates are drawnpositions::Vector{Int}
: the positions of the candidates in the index (ie,5
refers to the 5th chunk in the index -chunks(index)[5]
: the similarity scores of the candidates from the query (higher is better)
Main struct for storing document chunks and their embeddings. It also stores tags and sources for each chunk.
Previously, this struct was called ChunkIndex
: unique identifier of each index (to ensure we're using the right index withCandidateChunks
: underlying document chunks / snippetsembeddings::Union{Nothing, Matrix{<:Real}}
: for semantic searchtags::Union{Nothing, AbstractMatrix{<:Bool}}
: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the giventag
for the position lookup)tags_vocab::Union{Nothing, Vector{<:AbstractString}}
: vocabulary for thetags
matrix (each column intags
is one item intags_vocab
and rows are the chunks)sources::Vector{<:AbstractString}
: sources of the chunksextras::Union{Nothing, AbstractVector}
: additional data, eg, metadata, source code, etc.
Struct for storing chunks of text and associated keywords for BM25 similarity search.
: unique identifier of each index (to ensure we're using the right index withCandidateChunks
: underlying document chunks / snippetschunkdata::Union{Nothing, AbstractMatrix{<:Real}}
: for similarity search, assumed to beDocumentTermMatrix
tags::Union{Nothing, AbstractMatrix{<:Bool}}
: for exact search, filtering, etc. This is often a sparse matrix indicating which chunks have the giventag
for the position lookup)tags_vocab::Union{Nothing, Vector{<:AbstractString}}
: vocabulary for thetags
matrix (each column intags
is one item intags_vocab
and rows are the chunks)sources::Vector{<:AbstractString}
: sources of the chunksextras::Union{Nothing, AbstractVector}
: additional data, eg, metadata, source code, etc.
We can easily create a keywords-based index from a standard embeddings-based index.
# Let's assume we have a standard embeddings-based index
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))
# Creating an additional index for keyword-based search (BM25), is as simple as
index_keywords = ChunkKeywordsIndex(index)
# We can immediately create a MultiIndex (a hybrid index holding both indices)
multi_index = MultiIndex([index, index_keywords])
You can also build the index via build_index
# given some sentences and sources
index_keywords = build_index(KeywordsIndexer(), sentences; chunker_kwargs=(; sources))
# Retrive closest chunks with
retriever = SimpleBM25Retriever()
result = retrieve(retriever, index_keywords, "What are the best practices for parallel computing in Julia?")
If you want to use airag, don't forget to specify the config to make sure keywords are processed (ie, tokenized) and that BM25 is used for searching candidates
cfg = RAGConfig(; retriever = SimpleBM25Retriever());
airag(cfg, index_keywords;
question = "What are the best practices for parallel computing in Julia?")
[processor::AbstractProcessor=KeywordsProcessor(),] index::ChunkEmbeddingsIndex; verbose::Int = 1,
index_id = gensym("ChunkKeywordsIndex"), processor_kwargs...)
Convenience method to quickly create a ChunkKeywordsIndex
from an existing ChunkEmbeddingsIndex
# Let's assume we have a standard embeddings-based index
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))
# Creating an additional index for keyword-based search (BM25), is as simple as
index_keywords = ChunkKeywordsIndex(index)
# We can immediately create a MultiIndex (a hybrid index holding both indices)
multi_index = MultiIndex([index, index_keywords])
CohereReranker <: AbstractReranker
Rerank strategy using the Cohere Rerank API. Requires an API key. A method for rerank
ContextEnumerator <: AbstractContextBuilder
Default method for build_context!
method. It simply enumerates the context snippets around each position in candidates
. When possibly, it will add surrounding chunks (from the same source).
CosineSimilarity <: AbstractSimilarityFinder
Finds the closest chunks to a query embedding by measuring the cosine similarity between the query and the chunks' embeddings. A method for find_closest
(see the docstring for more details and usage example).
A sparse matrix of term frequencies and document lengths to allow calculation of BM25 similarity scores.
FileChunker <: AbstractChunker
Chunker when you provide file paths to get_chunks
Ie, the inputs will be validated first (eg, file exists, etc) and then read into memory.
Set as default chunker in get_chunks
FlashRanker <: AbstractReranker
Rerank strategy using the package FlashRank.jl and local models. A method for rerank
You must first import the FlashRank.jl package. To automatically download any required models, set your ENV["DATADEPS_ALWAYS_ACCEPT"] = true
(see DataDeps for more details).
using FlashRank
# Wrap the model to be a valid Ranker recognized by RAGTools
# It will be provided to the airag/rerank function to avoid instantiating it on every call
reranker = FlashRank.RankerModel(:mini) |> FlashRanker
# You can choose :tiny or :mini
## Apply to the pipeline configuration, eg,
cfg = RAGConfig(; retriever = AdvancedRetriever(; reranker))
# Ask a question (assumes you have some `index`)
question = "What are the best practices for parallel computing in Julia?"
result = airag(cfg, index; question, return_all = true)
Defines styling via classes (attribute class
) and styles (attribute style
) for HTML formatting of AbstractAnnotatedNode
HyDERephraser <: AbstractRephraser
Rephraser implemented using the provided AI Template (eg, ...
) and standard chat model. A method for rephrase
It uses a prompt-based rephrasing method called HyDE (Hypothetical Document Embedding), where instead of looking for an embedding of the question, we look for the documents most similar to a synthetic passage that would be a good answer to our question.
Reference: Arxiv paper.
is the average of all scoring criteria. Explain the final_rating
in rationale
Provide the final_rating
between 1-5. Provide the rationale for it.
KeywordsIndexer <: AbstractIndexBuilder
Keyword-based index (BM25) to be returned by build_index
It uses TextChunker
, KeywordsProcessor
, and NoTagger
as default chunker, processor, and tagger.
KeywordsProcessor <: AbstractProcessor
Default keywords processor for get_keywords
functions. It normalizes the documents, tokenizes them and builds a DocumentTermMatrix
A struct for storing references to multiple sets of chunks across different indices. Each set of chunks is identified by an index_id
in index_ids
, with corresponding positions
in the index and scores
indicating the strength of similarity.
This struct is useful for scenarios where candidates are drawn from multiple indices, and there is a need to keep track of which candidates came from which index.
: the ids of the indices from which the candidates are drawnpositions::Vector{TP}
: the positions of the candidates in their respective indicesscores::Vector{TD}
: the similarity scores of the candidates from the query
MultiFinder <: AbstractSimilarityFinder
Composite finder for MultiIndex
where we want to set multiple finders for each index. A method for find_closest
. Positions correspond to indexes(::MultiIndex)
Composite index that stores multiple ChunkIndex objects and their embeddings.
: unique identifier of each index (to ensure we're using the right index withCandidateChunks
: the indexes to be combined
Use accesor indexes
to access the individual indexes.
We can create a MultiIndex
from a vector of AbstractChunkIndex
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; sources))
index_keywords = ChunkKeywordsIndex(index) # same chunks as above but adds BM25 instead of embeddings
multi_index = MultiIndex([index, index_keywords])
To use airag
with different types of indices, we need to specify how to find the closest items for each index
# Cosine similarity for embeddings and BM25 for keywords, same order as indexes in MultiIndex
finder = RT.MultiFinder([RT.CosineSimilarity(), RT.BM25Similarity()])
# Notice that we add `processor` to make sure keywords are processed (ie, tokenized) as well
cfg = RAGConfig(; retriever = SimpleRetriever(; processor = RT.KeywordsProcessor(), finder))
# Ask questions
msg = airag(cfg, multi_index; question = "What are the best practices for parallel computing in Julia?")
pprint(msg) # prettify the answer
NoEmbedder <: AbstractEmbedder
No-op embedder for get_embeddings
functions. It returns nothing
NoPostprocessor <: AbstractPostprocessor
Default method for postprocess!
method. A passthrough option that returns the result
without any changes.
Overload this method to add custom postprocessing steps, eg, logging, saving conversations to disk, etc.
NoProcessor <: AbstractProcessor
No-op processor for get_keywords
functions. It returns the inputs as is.
NoRefiner <: AbstractRefiner
Default method for refine!
method. A passthrough option that returns the result.answer
without any changes.
NoRephraser <: AbstractRephraser
No-op implementation for rephrase
, which simply passes the question through.
NoReranker <: AbstractReranker
No-op implementation for rerank
, which simply passes the candidate chunks through.
NoTagFilter <: AbstractTagFilter
No-op implementation for find_tags
, which simply returns all chunks.
NoTagger <: AbstractTagger
No-op tagger for get_tags
functions. It returns (nothing
, nothing
OpenTagger <: AbstractTagger
Tagger for get_tags
functions, which generates possible tags for each chunk via aiextract
. You can customize it via prompt template (default: :RAGExtractMetadataShort
), but it's quite open-ended (ie, AI decides the possible tags).
PassthroughTagger <: AbstractTagger
Tagger for get_tags
functions, which passes tags
directly as Vector of Vectors of strings (ie, tags[i]
is the tags for docs[i]
RAGConfig <: AbstractRAGConfig
Default configuration for RAG. It uses SimpleIndexer
, SimpleRetriever
, and SimpleGenerator
as default components. Provided as the first argument in airag
To customize the components, replace corresponding fields for each step of the RAG pipeline (eg, use subtypes(AbstractIndexBuilder)
to find the available options).
A struct for debugging RAG answers. It contains the question, answer, context, and the candidate chunks at each step of the RAG pipeline.
Think of the flow as question
-> rephrased_questions
-> answer
-> final_answer
with the context and candidate chunks helping along the way.
: the original questionrephrased_questions::Vector{<:AbstractString}
: a vector of rephrased questions (eg, HyDe, Multihop, etc.)answer::AbstractString
: the generated answerfinal_answer::AbstractString
: the refined final answer (eg, after CorrectiveRAG), also considered the FINAL answer (it must be always available)context::Vector{<:AbstractString}
: the context used for retrieval (ie, the vector of chunks and their surrounding window if applicable)sources::Vector{<:AbstractString}
: the sources of the context (for the original matched chunks)emb_candidates::CandidateChunks
: the candidate chunks from the embedding index (fromfind_closest
)tag_candidates::Union{Nothing, CandidateChunks}
: the candidate chunks from the tag index (fromfind_tags
: the filtered candidate chunks (intersection ofemb_candidates
: the reranked candidate chunks (fromrerank
: the conversation history for AI steps of the RAG pipeline, use keys that correspond to the function names, eg,:answer
See also: pprint
(pretty printing), annotate_support
(for annotating the answer)
RankGPTReranker <: AbstractReranker
Rerank strategy using the RankGPT algorithm (calling LLMs). A method for rerank
[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github
Results from the RankGPT algorithm.
: The question that was asked.chunks::AbstractVector{T}
: The chunks that were ranked (=context).positions::Vector{Int}
: The ranking of the chunks (referring to thechunks
: The time it took to rank the chunks.cost::Float64
: The cumulative cost of the ranking.tokens::Int
: The cumulative number of tokens used in the ranking.
SimpleAnswerer <: AbstractAnswerer
Default method for answer!
method. Generates an answer using the aigenerate
function with the provided context and question.
SimpleBM25Retriever <: AbstractRetriever
Keyword-based implementation for retrieve
. It does a simple similarity search via BM25Similarity
and returns the results.
Make sure to use consistent processor
and tagger
with the Preparation Stage (build_index
: the rephrasing method, dispatchingrephrase
- usesNoRephraser
: the embedding method, dispatchingget_embeddings
(see Preparation Stage for more details) - usesNoEmbedder
: the processor method, dispatchingget_keywords
(see Preparation Stage for more details) - usesKeywordsProcessor
: the similarity search method, dispatchingfind_closest
- usesCosineSimilarity
: the tag generating method, dispatchingget_tags
(see Preparation Stage for more details) - usesNoTagger
: the tag matching method, dispatchingfind_tags
- usesNoTagFilter
: the reranking method, dispatchingrerank
- usesNoReranker
SimpleGenerator <: AbstractGenerator
Default implementation for generate
. It simply enumerates context snippets and runs aigenerate
(no refinement).
It uses ContextEnumerator
, SimpleAnswerer
, NoRefiner
, and NoPostprocessor
as default contexter
, answerer
, refiner
, and postprocessor
SimpleIndexer <: AbstractIndexBuilder
Default implementation for build_index
It uses TextChunker
, BatchEmbedder
, and NoTagger
as default chunker, embedder, and tagger.
SimpleRefiner <: AbstractRefiner
Refines the answer using the same context previously provided via the provided prompt template. A method for refine!
SimpleRephraser <: AbstractRephraser
Rephraser implemented using the provided AI Template (eg, ...
) and standard chat model. A method for rephrase
SimpleRetriever <: AbstractRetriever
Default implementation for retrieve
function. It does a simple similarity search via CosineSimilarity
and returns the results.
Make sure to use consistent embedder
and tagger
with the Preparation Stage (build_index
: the rephrasing method, dispatchingrephrase
- usesNoRephraser
: the embedding method, dispatchingget_embeddings
(see Preparation Stage for more details) - usesBatchEmbedder
: the processor method, dispatchingget_keywords
(see Preparation Stage for more details) - usesNoProcessor
: the similarity search method, dispatchingfind_closest
- usesCosineSimilarity
: the tag generating method, dispatchingget_tags
(see Preparation Stage for more details) - usesNoTagger
: the tag matching method, dispatchingfind_tags
- usesNoTagFilter
: the reranking method, dispatchingrerank
- usesNoReranker
Defines styling keywords for printstyled
for each AbstractAnnotatedNode
A view of the parent index with respect to the chunks
(and chunk-aligned fields). All methods and accessors working for AbstractChunkIndex
also work for SubChunkIndex
. It does not yet work for MultiIndex
: the parent index from which the chunks are drawn (always the original index, never a view)positions::Vector{Int}
: the positions of the chunks in the parent index (always refers to original PARENT index, even if we create a view of the view)
cc = CandidateChunks(, 1:10)
sub_index = @view(index[cc])
You can use SubChunkIndex
to access chunks or sources (and other fields) from a parent index, eg,
RT.chunkdata(sub_index) # slice of embeddings
RT.embeddings(sub_index) # slice of embeddings
RT.tags(sub_index) # slice of tags
RT.tags_vocab(sub_index) # unchanged, identical to parent version
RT.extras(sub_index) # slice of extras
Access the parent index that the positions
correspond to
A partial view of a DocumentTermMatrix, tf
is MATERIALIZED for performance and fewer allocations.
TavilySearchRefiner <: AbstractRefiner
Refines the answer by executing a web search using the Tavily API. This method aims to enhance the answer's accuracy and relevance by incorporating information retrieved from the web. A method for refine!
TextChunker <: AbstractChunker
Chunker when you provide text to get_chunks
functions. Inputs are directly chunked
Annotation method where we score answer versus each context based on word-level trigrams that match.
It's very simple method (and it can loose some semantic meaning in longer sequences like negative), but it works reasonably well for both text and code.
Shortcut to LinearAlgebra.normalize. Provided in the package extension RAGToolsExperimentalExt
(Requires SparseArrays, Unicode, and LinearAlgebra)
root::AnnotatedNode; add_sources::Bool = true, add_scores::Bool = true,
sources::Union{Nothing, AbstractVector{<:AbstractString}} = nothing)
Adds metadata to the children of root
. Metadata includes sources and scores, if requested.
Optionally, it can add a list of sources
at the end of the printed text.
The metadata is added by inserting new nodes in the root
children list (with no children of its own to be printed out).
airag(cfg::AbstractRAGConfig, index::AbstractDocumentIndex;
verbose::Integer = 1, return_all::Bool = false,
api_kwargs::NamedTuple = NamedTuple(),
retriever::AbstractRetriever = cfg.retriever,
retriever_kwargs::NamedTuple = NamedTuple(),
generator::AbstractGenerator = cfg.generator,
generator_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
High-level wrapper for Retrieval-Augmented Generation (RAG), it combines together the retrieve
and generate!
steps which you can customize if needed.
The simplest version first finds the relevant chunks in index
for the question
and then sends these chunks to the AI model to help with generating a response to the question
To customize the components, replace the types (retriever
, generator
) of the corresponding step of the RAG pipeline - or go into sub-routines within the steps. Eg, use subtypes(AbstractRetriever)
to find the available options.
: The configuration for the RAG pipeline. Defaults toRAGConfig()
, where you can swap sub-types to customize the pipeline.index::AbstractDocumentIndex
: The chunk index to search for relevant text.question::AbstractString
: The question to be answered.return_all::Bool
: Iftrue
, returns the details used for RAG along with the response.verbose::Integer
: If>0
, enables verbose logging. The higher the number, the more nested functions will log.api_kwargs
: API parameters that will be forwarded to ALL of the API calls (aiembed
, andaiextract
: The retriever to use for finding relevant chunks. Defaults tocfg.retriever
, eg,SimpleRetriever
(with no question rephrasing).retriever_kwargs::NamedTuple
: API parameters that will be forwarded to theretriever
call. Examples of important ones:top_k::Int
: Number of top candidates to retrieve based on embedding similarity.top_n::Int
: Number of candidates to return after reranking.tagger::AbstractTagger
: Tagger to use for tagging the chunks. Defaults toNoTagger()
: API parameters that will be forwarded to thetagger
call. You could provide the explicit tags directly withPassthroughTagger
andtagger_kwargs = (; tags = ["tag1", "tag2"])
: The generator to use for generating the answer. Defaults tocfg.generator
, eg,SimpleGenerator
: API parameters that will be forwarded to thegenerator
call. Examples of important ones:answerer_kwargs::NamedTuple
: API parameters that will be forwarded to theanswerer
call. Examples:model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: The template to use for theaigenerate
function. Defaults to:RAGAnswerFromContext
: The method to use for refining the answer. Defaults togenerator.refiner
, eg,NoRefiner
: API parameters that will be forwarded to therefiner
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: The template to use for theaigenerate
function. Defaults to:RAGAnswerRefiner
: An atomic counter to track the total cost of the operations (if you want to track the cost of multiple pipeline runs - it passed around in the pipeline).
, returns the generated message (msg
, returns the detail of the full pipeline inRAGResult
(see the docs).
See also build_index
, retrieve
, generate!
, RAGResult
, getpropertynested
, setpropertynested
, merge_kwargs_nested
, ChunkKeywordsIndex
Using airag
to get a response for a question:
index = build_index(...) # create an index
question = "How to make a barplot in Makie.jl?"
msg = airag(index; question)
To understand the details of the RAG process, use return_all=true
msg, details = airag(index; question, return_all = true)
# details is a RAGDetails object with all the internal steps of the `airag` function
You can also pretty-print details
to highlight generated text vs text that is supported by context. It also includes annotations of which context was used for each part of the response (where available).
Example with advanced retrieval (with question rephrasing and reranking (requires COHERE_API_KEY
). We will obtain top 100 chunks from embeddings (top_k
) and top 5 chunks from reranking (top_n
). In addition, it will be done with a "custom" locally-hosted model.
cfg = RAGConfig(; retriever = AdvancedRetriever())
# kwargs will be big and nested, let's prepare them upfront
# we specify "custom" model for each component that calls LLM
kwargs = (
retriever_kwargs = (;
top_k = 100,
top_n = 5,
rephraser_kwargs = (;
model = "custom"),
embedder_kwargs = (;
model = "custom"),
tagger_kwargs = (;
model = "custom")),
generator_kwargs = (;
answerer_kwargs = (;
model = "custom"),
refiner_kwargs = (;
model = "custom")),
api_kwargs = (;
url = "http://localhost:8080"))
result = airag(cfg, index, question; kwargs...)
If you want to use hybrid retrieval (embeddings + BM25), you can easily create an additional index based on keywords and pass them both into a MultiIndex
You need to provide an explicit config, so the pipeline knows how to handle each index in the search similarity phase (finder
index = # your existing index
# create the multi-index with the keywords index
index_keywords = ChunkKeywordsIndex(index)
multi_index = MultiIndex([index, index_keywords])
# define the similarity measures for the indices that you have (same order)
finder = RT.MultiFinder([RT.CosineSimilarity(), RT.BM25Similarity()])
cfg = RAGConfig(; retriever=AdvancedRetriever(; processor=RT.KeywordsProcessor(), finder))
# Run the pipeline with the new hybrid retrieval (return the `RAGResult` to see the details)
result = airag(cfg, multi_index; question, return_all=true)
# Pretty-print the result
For easier manipulation of nested kwargs, see utilities getpropertynested
, setpropertynested
, merge_kwargs_nested
align_node_styles!(annotater::TrigramAnnotater, nodes::AbstractVector{<:AnnotatedNode}; kwargs...)
Aligns the styles of the nodes based on the surrounding nodes ("fill-in-the-middle").
If the node has no score, but the surrounding nodes have the same style, the node will inherit the style of the surrounding nodes.
annotate_support(annotater::TrigramAnnotater, answer::AbstractString,
context::AbstractVector; min_score::Float64 = 0.5,
skip_trigrams::Bool = true, hashed::Bool = true,
sources::Union{Nothing, AbstractVector{<:AbstractString}} = nothing,
min_source_score::Float64 = 0.25,
add_sources::Bool = true,
add_scores::Bool = true, kwargs...)
Annotates the answer
with the overlap/what's supported in context
and returns the annotated tree of nodes representing the answer
Returns a "root" node with children nodes representing the sentences/code blocks in the answer
. Only the "leaf" nodes are to be printed (to avoid duplication), "leaf" nodes are those with NO children.
Default logic:
Split into sentences/code blocks, then into tokens (~words).
Then match each token (~word) exactly.
If no exact match found, count trigram-based match (include the surrounding tokens for better contextual awareness).
If the match is higher than
, it's recorded in thescore
of the node.
: Annotater to useanswer::AbstractString
: Text to annotatecontext::AbstractVector
: Context to annotate against, ie, look for "support" in the texts incontext
: Minimum score to consider a match. Default: 0.5, which means that half of the trigrams of each word should matchskip_trigrams::Bool
: Whether to potentially skip trigram matching if exact full match is found. Default: truehashed::Bool
: Whether to use hashed trigrams. It's harder to debug, but it's much faster for larger texts (hashed text are held in a Set to deduplicate). Default: truesources::Union{Nothing, AbstractVector{<:AbstractString}}
: Sources to add at the end of the context. Default: nothingmin_source_score::Float64
: Minimum score to consider/to display a source. Default: 0.25, which means that at least a quarter of the trigrams of each word should match to some context. The threshold is lower thanmin_score
, because it's average across ALL words in a block, so it's much harder to match fully with generated text.add_sources::Bool
: Whether to add sources at the end of each code block/sentence. Sources are addded in the square brackets like "[1]". Default: trueadd_scores::Bool
: Whether to add source-matching scores at the end of each code block/sentence. Scores are added in the square brackets like "[0.75]". Default: truekwargs: Additional keyword arguments to pass to
. See their documentation for more details (eg, customize the colors of the nodes based on the score)
annotater = TrigramAnnotater()
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test context. Another context sentence."
annotated_root = annotate_support(annotater, answer, context)
pprint(annotated_root) # pretty print the annotated tree
annotater::TrigramAnnotater, result::AbstractRAGResult; min_score::Float64 = 0.5,
skip_trigrams::Bool = true, hashed::Bool = true,
min_source_score::Float64 = 0.25,
add_sources::Bool = true,
add_scores::Bool = true, kwargs...)
Dispatch for annotate_support
for AbstractRAGResult
type. It extracts the final_answer
and context
from the result
and calls annotate_support
with them.
See annotate_support
for more details.
res = RAGResult(; question = "", final_answer = "This is a test.",
context = ["Test context.", "Completely different"])
annotated_root = annotate_support(annotater, res)
answerer::SimpleAnswerer, index::AbstractDocumentIndex, result::AbstractRAGResult;
model::AbstractString = PT.MODEL_CHAT, verbose::Bool = true,
template::Symbol = :RAGAnswerFromContext,
cost_tracker = Threads.Atomic{Float64}(0.0),
Generates an answer using the aigenerate
function with the provided result.context
and result.question
- Mutated
and the full conversation saved inresult.conversations[:answer]
: The method to use for generating the answer. Usesaigenerate
: The index containing chunks and sources.result::AbstractRAGResult
: The result containing the context and question to generate the answer for.model::AbstractString
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: Iftrue
, enables verbose logging.template::Symbol
: The template to use for theaigenerate
function. Defaults to:RAGAnswerFromContext
: An atomic counter to track the cost of the operation.
index::AbstractDocumentIndex, candidates::AbstractCandidateChunks;
verbose::Bool = true,
chunks_window_margin::Tuple{Int, Int} = (1, 1), kwargs...)
index::AbstractDocumentIndex, result::AbstractRAGResult; kwargs...)
Build context strings for each position in candidates
considering a window margin around each position. If mutating version is used (build_context!
), it will use result.reranked_candidates
to update the result.context
: The method to use for building the context. Enumerates the snippets.index::AbstractDocumentIndex
: The index containing chunks and sources.candidates::AbstractCandidateChunks
: Candidate chunks which contain positions to extract context from.verbose::Bool
: Iftrue
, enables verbose logging.chunks_window_margin::Tuple{Int, Int}
: A tuple indicating the margin (before, after) around each position to include in the context. Defaults to(1,1)
, which means 1 preceding and 1 suceeding chunk will be included. With(0,0)
, only the matching chunks will be included.
: A vector of context strings, each corresponding to a position inreranked_candidates
index = ChunkIndex(...) # Assuming a proper index is defined
candidates = CandidateChunks(, [2, 4], [0.1, 0.2])
context = build_context(ContextEnumerator(), index, candidates; chunks_window_margin=(0, 1)) # include only one following chunk for each matching chunk
indexer::KeywordsIndexer, files_or_docs::Vector{<:AbstractString};
verbose::Integer = 1,
extras::Union{Nothing, AbstractVector} = nothing,
index_id = gensym("ChunkKeywordsIndex"),
chunker::AbstractChunker = indexer.chunker,
chunker_kwargs::NamedTuple = NamedTuple(),
processor::AbstractProcessor = indexer.processor,
processor_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = indexer.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
api_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
Builds a ChunkKeywordsIndex
from the provided files or documents to support keyword-based search (BM25).
indexer::AbstractIndexBuilder, files_or_docs::Vector{<:AbstractString};
verbose::Integer = 1,
extras::Union{Nothing, AbstractVector} = nothing,
index_id = gensym("ChunkEmbeddingsIndex"),
chunker::AbstractChunker = indexer.chunker,
chunker_kwargs::NamedTuple = NamedTuple(),
embedder::AbstractEmbedder = indexer.embedder,
embedder_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = indexer.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
api_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0))
Build an INDEX for RAG (Retriever-Augmented Generation) applications from the provided file paths. INDEX is a object storing the document chunks and their embeddings (and potentially other information).
The function processes each file or document (depending on chunker
), splits its content into chunks, embeds these chunks, optionally extracts metadata, and then combines this information into a retrievable index.
Define your own methods via indexer
and its subcomponents (chunker
, embedder
, tagger
: The indexing logic to use. Default isSimpleIndexer()
: A vector of valid file paths OR string documents to be indexed (chunked and embedded). Specify which mode to use viachunker
: An Integer specifying the verbosity of the logs. Default is1
(high-level logging).0
is disabled.extras
: An optional vector of extra information to be stored with each chunk. Default isnothing
: A unique identifier for the index. Default is a generated symbol.chunker
: The chunker logic to use for splitting the documents. Default isTextChunker()
: Parameters to be provided to theget_chunks
function. Useful to change theseparators
: A vector of strings indicating the source of each chunk. Default is equal tofiles_or_docs
: The embedder logic to use for embedding the chunks. Default isBatchEmbedder()
: Parameters to be provided to theget_embeddings
function. Useful to change thetarget_batch_size_length
or reduce asyncmap tasksntasks
: The model to use for embedding. Default isPT.MODEL_EMBEDDING
: The tagger logic to use for extracting tags from the chunks. Default isNoTagger()
, ie, skip tag extraction. There are alsoPassthroughTagger
: Parameters to be provided to theget_tags
: The model to use for tags extraction. Default isPT.MODEL_CHAT
: A template to be used for tags extraction. Default is:RAGExtractMetadataShort
: A vector of vectors of strings directly providing the tags for each chunk. Applicable fortagger::PasstroughTagger
: Parameters to be provided to the API endpoint. Shared across all API calls if provided.cost_tracker
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
: An object containing the compiled index of chunks, embeddings, tags, vocabulary, and sources.
See also: ChunkEmbeddingsIndex
, get_chunks
, get_embeddings
, get_tags
, CandidateChunks
, find_closest
, find_tags
, rerank
, retrieve
, generate!
, airag
# Default is loading a vector of strings and chunking them (`TextChunker()`)
index = build_index(SimpleIndexer(), texts; chunker_kwargs = (; max_length=10))
# Another example with tags extraction, splitting only sentences and verbose output
# Assuming `test_files` is a vector of file paths
indexer = SimpleIndexer(chunker=FileChunker(), tagger=OpenTagger())
index = build_index(indexer, test_files;
chunker_kwargs(; separators=[". "]), verbose=true)
- If you get errors about exceeding embedding input sizes, first check the
in your chunks. If that does NOT resolve the issue, try changing theembedding_kwargs
. In particular, reducing thetarget_batch_size_length
parameter (eg, 10_000) and number of tasksntasks=1
. Some providers cannot handle large batch sizes (eg, Databricks).
build_qa_evals(doc_chunks::Vector{<:AbstractString}, sources::Vector{<:AbstractString};
model=PT.MODEL_CHAT, instructions="None.", qa_template::Symbol=:RAGCreateQAFromContext,
verbose::Bool=true, api_kwargs::NamedTuple = NamedTuple(), kwargs...) -> Vector{QAEvalItem}
Create a collection of question and answer evaluations (QAEvalItem
) from document chunks and sources. This function generates Q&A pairs based on the provided document chunks, using a specified AI model and template.
: A vector of document chunks, each representing a segment of text.sources::Vector{<:AbstractString}
: A vector of source identifiers corresponding to each chunk indoc_chunks
(eg, filenames or paths).model
: The AI model used for generating Q&A pairs. Default isPT.MODEL_CHAT
: Additional instructions or context to provide to the model generating QA sets. Defaults to "None.".qa_template::Symbol
: A template symbol that dictates the AITemplate that will be used. It must have placeholdercontext
. Default is:CreateQAFromContext
: Parameters that will be forwarded to the API endpoint.verbose::Bool
: Iftrue
, additional information like costs will be logged. Defaults totrue
: A vector of QAEvalItem
structs, each containing a source, context, question, and answer. Invalid or empty items are filtered out.
The function internally uses
to generate Q&A pairs based on the providedqa_template
. So you can use any kwargs that you want.Each
includes the context (document chunk), the generated question and answer, and the source.The function tracks and reports the cost of AI calls if
is enabled.Items where the question, answer, or context is empty are considered invalid and are filtered out.
Creating Q&A evaluations from a set of document chunks:
doc_chunks = ["Text from document 1", "Text from document 2"]
sources = ["source1", "source2"]
qa_evals = build_qa_evals(doc_chunks, sources)
Builds a matrix of tags and a vocabulary list. REQUIRES SparseArrays, LinearAlgebra, Unicode packages to be loaded!!
build_tags(tagger::AbstractTagger, chunk_tags::Nothing; kwargs...)
No-op that skips any tag building, returning nothing, nothing
Otherwise, it would build the sparse matrix and the vocabulary (requires SparseArrays
and LinearAlgebra
packages to be loaded).
Access chunkdata for a subset of chunks, chunk_idx
is a vector of chunk indices in the index
Access chunkdata for a subset of chunks, chunk_idx
is a vector of chunk indices in the index
Access chunkdata for a subset of chunks, chunk_idx
is a vector of chunk indices in the index
Lightweight wrapper around the Cohere API. See for more details.
: Your Cohere API key. You can get one from (trial access is for free).endpoint
: The Cohere endpoint to call.url
: The base URL for the Cohere API. Default is
: Any additional keyword arguments to pass
: Any additional keyword arguments to pass to the Cohere API.
context::AbstractVector{<:AbstractString}; rank_start::Integer = 1,
rank_end::Integer = 100, max_length::Integer = 512, template::Symbol = :RAGRankGPT)
Creates rendered template with injected context
Extracts the ranking from the response into a sorted array of integers.
finder::CosineSimilarity, emb::AbstractMatrix{<:Real},
query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
top_k::Int = 100, minimum_similarity::AbstractFloat = -1.0, kwargs...)
Finds the indices of chunks (represented by embeddings in emb
) that are closest (in cosine similarity for CosineSimilarity()
) to query embedding (query_emb
is the logic used for the similarity search. Default is CosineSimilarity
If minimum_similarity
is provided, only indices with similarity greater than or equal to it are returned. Similarity can be between -1 and 1 (-1 = completely opposite, 1 = exactly the same).
Returns only top_k
closest indices.
finder::AbstractSimilarityFinder, index::AbstractChunkIndex,
query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
top_k::Int = 100, kwargs...)
Finds the indices of chunks (represented by embeddings in index
) that are closest to query embedding (query_emb
Returns only top_k
closest indices.
finder::BM25Similarity, dtm::AbstractDocumentTermMatrix,
query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
top_k::Int = 100, minimum_similarity::AbstractFloat = -1.0, kwargs...)
Finds the indices of chunks (represented by DocumentTermMatrix in dtm
) that are closest to query tokens (query_tokens
) using BM25.
Reference: Wikipedia: BM25. Implementation follows: The Next Generation of Lucene Relevance.
finder::BinaryCosineSimilarity, emb::AbstractMatrix{<:Bool},
query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
top_k::Int = 100, rescore_multiplier::Int = 4, minimum_similarity::AbstractFloat = -1.0, kwargs...)
Finds the indices of chunks (represented by embeddings in emb
) that are closest to query embedding (query_emb
) using binary embeddings (in the index).
This is a two-pass approach:
First pass: Hamming distance in binary form to get the
top_k * rescore_multiplier
(ie, more than top_k) candidates.Second pass: Rescore the candidates with float embeddings and return the top_k.
Returns only top_k
closest indices.
Reference: HuggingFace: Embedding Quantization.
Convert any Float embeddings to binary like this:
binary_emb = map(>(0), emb)
finder::BitPackedCosineSimilarity, emb::AbstractMatrix{<:Bool},
query_emb::AbstractVector{<:Real}, query_tokens::AbstractVector{<:AbstractString} = String[];
top_k::Int = 100, rescore_multiplier::Int = 4, minimum_similarity::AbstractFloat = -1.0, kwargs...)
Finds the indices of chunks (represented by embeddings in emb
) that are closest to query embedding (query_emb
) using bit-packed binary embeddings (in the index).
This is a two-pass approach:
First pass: Hamming distance in bit-packed binary form to get the
top_k * rescore_multiplier
(i.e., more than top_k) candidates.Second pass: Rescore the candidates with float embeddings and return the top_k.
Returns only top_k
closest indices.
Reference: HuggingFace: Embedding Quantization.
Convert any Float embeddings to bit-packed binary like this:
bitpacked_emb = pack_bits(emb.>0)
find_tags(method::AnyTagFilter, index::AbstractChunkIndex,
tag::Union{AbstractString, Regex}; kwargs...)
find_tags(method::AnyTagFilter, index::AbstractChunkIndex,
tags::Vector{T}; kwargs...) where {T <: Union{AbstractString, Regex}}
Finds the indices of chunks (represented by tags in index
) that have ANY OF the specified tag
or tags
find_tags(method::AllTagFilter, index::AbstractChunkIndex,
tag::Union{AbstractString, Regex}; kwargs...)
find_tags(method::AllTagFilter, index::AbstractChunkIndex,
tags::Vector{T}; kwargs...) where {T <: Union{AbstractString, Regex}}
Finds the indices of chunks (represented by tags in index
) that have ALL OF the specified tag
or tags
find_tags(method::NoTagFilter, index::AbstractChunkIndex,
tags::Union{T, AbstractVector{<:T}}; kwargs...) where {T <:
AbstractString, Regex, Nothing}}
tags; kwargs...)
Returns all chunks in the index, ie, no filtering, so we simply return nothing
(easier for dispatch).
generator::AbstractGenerator, index::AbstractDocumentIndex, result::AbstractRAGResult;
verbose::Integer = 1,
api_kwargs::NamedTuple = NamedTuple(),
contexter::AbstractContextBuilder = generator.contexter,
contexter_kwargs::NamedTuple = NamedTuple(),
answerer::AbstractAnswerer = generator.answerer,
answerer_kwargs::NamedTuple = NamedTuple(),
refiner::AbstractRefiner = generator.refiner,
refiner_kwargs::NamedTuple = NamedTuple(),
postprocessor::AbstractPostprocessor = generator.postprocessor,
postprocessor_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0),
Generate the response using the provided generator
and the index
and result
. It is the second step in the RAG pipeline (after retrieve
Returns the mutated result
with the result.final_answer
and the full conversation saved in result.conversations[:final_answer]
The default flow is
is the method to use for building the context, eg, simply enumerate the context chunks withContextEnumerator
is the standard answer generation step with LLMs.refiner
step allows the LLM to critique itself and refine its own answer.postprocessor
step allows for additional processing of the answer, eg, logging, saving conversations, etc.All of its sub-routines operate by mutating the
object (and adding their part).Discover available sub-types for each step with
and similar for other abstract types.
: Thegenerator
to use for generating the answer. Can beSimpleGenerator
: The index containing chunks and sources.result::AbstractRAGResult
: The result containing the context and question to generate the answer for.verbose::Integer
: If >0, enables verbose logging.api_kwargs::NamedTuple
: API parameters that will be forwarded to ALL of the API calls (aiembed
, andaiextract
: The method to use for building the context. Defaults togenerator.contexter
, eg,ContextEnumerator
: API parameters that will be forwarded to thecontexter
: The method to use for generating the answer. Defaults togenerator.answerer
, eg,SimpleAnswerer
: API parameters that will be forwarded to theanswerer
call. Examples:model
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: The template to use for theaigenerate
function. Defaults to:RAGAnswerFromContext
: The method to use for refining the answer. Defaults togenerator.refiner
, eg,NoRefiner
: API parameters that will be forwarded to therefiner
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: The template to use for theaigenerate
function. Defaults to:RAGAnswerRefiner
: The method to use for postprocessing the answer. Defaults togenerator.postprocessor
, eg,NoPostprocessor
: API parameters that will be forwarded to thepostprocessor
: An atomic counter to track the total cost of the operations.
See also: retrieve
, build_context!
, ContextEnumerator
, answer!
, SimpleAnswerer
, refine!
, NoRefiner
, SimpleRefiner
, postprocess!
, NoPostprocessor
Assume we already have `index`
question = "What are the best practices for parallel computing in Julia?"
# Retrieve the relevant chunks - returns RAGResult
result = retrieve(index, question)
# Generate the answer using the default generator, mutates the same result
result = generate!(index, result)
sources::AbstractVector{<:AbstractString} = files_or_docs,
verbose::Bool = true,
separators = ["\n\n", ". ", "\n", " "], max_length::Int = 256)
Chunks the provided files_or_docs
into chunks of maximum length max_length
(if possible with provided separators
Supports two modes of operation:
chunker = FileChunker()
: The function opens each file infiles_or_docs
and reads its contents.chunker = TextChunker()
: The function assumes thatfiles_or_docs
is a vector of strings to be chunked, you MUST provide correspondingsources
: A vector of valid file paths OR string documents to be chunked.separators
: A list of strings used as separators for splitting the text in each file into chunks. Default is[\n\n", ". ", "\n", " "]
. Seerecursive_splitter
for more details.max_length
: The maximum length of each chunk (if possible with provided separators). Default is 256.sources
: A vector of strings indicating the source of each chunk. Default is equal tofiles_or_docs
get_embeddings(embedder::BatchEmbedder, docs::AbstractVector{<:AbstractString};
verbose::Bool = true,
model::AbstractString = PT.MODEL_EMBEDDING,
truncate_dimension::Union{Int, Nothing} = nothing,
cost_tracker = Threads.Atomic{Float64}(0.0),
target_batch_size_length::Int = 80_000,
ntasks::Int = 4 * Threads.nthreads(),
Embeds a vector of docs
using the provided model (kwarg model
) in a batched manner - BatchEmbedder
tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.
are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.If you get errors about exceeding input sizes, first check the
in your chunks. If that does NOT resolve the issue, try reducing thetarget_batch_size_length
parameter (eg, 10_000) and number of tasksntasks=1
. Some providers cannot handle large batch sizes.
: A vector of strings to be embedded.verbose
: A boolean flag for verbose output. Default istrue
: The model to use for embedding. Default isPT.MODEL_EMBEDDING
: The dimensionality of the embeddings to truncate to. Default isnothing
will also do nothing.cost_tracker
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.target_batch_size_length
: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.ntasks
: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().
get_embeddings(embedder::BinaryBatchEmbedder, docs::AbstractVector{<:AbstractString};
verbose::Bool = true,
model::AbstractString = PT.MODEL_EMBEDDING,
truncate_dimension::Union{Int, Nothing} = nothing,
return_type::Type = Matrix{Bool},
cost_tracker = Threads.Atomic{Float64}(0.0),
target_batch_size_length::Int = 80_000,
ntasks::Int = 4 * Threads.nthreads(),
Embeds a vector of docs
using the provided model (kwarg model
) in a batched manner and then returns the binary embeddings matrix - BinaryBatchEmbedder
tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.
are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.If you get errors about exceeding input sizes, first check the
in your chunks. If that does NOT resolve the issue, try reducing thetarget_batch_size_length
parameter (eg, 10_000) and number of tasksntasks=1
. Some providers cannot handle large batch sizes.
: A vector of strings to be embedded.verbose
: A boolean flag for verbose output. Default istrue
: The model to use for embedding. Default isPT.MODEL_EMBEDDING
: The dimensionality of the embeddings to truncate to. Default isnothing
: The type of the returned embeddings matrix. Default isMatrix{Bool}
. ChooseBitMatrix
to minimize storage requirements,Matrix{Bool}
to maximize performance in elementwise-ops.cost_tracker
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.target_batch_size_length
: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.ntasks
: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().
get_embeddings(embedder::BitPackedBatchEmbedder, docs::AbstractVector{<:AbstractString};
verbose::Bool = true,
model::AbstractString = PT.MODEL_EMBEDDING,
truncate_dimension::Union{Int, Nothing} = nothing,
cost_tracker = Threads.Atomic{Float64}(0.0),
target_batch_size_length::Int = 80_000,
ntasks::Int = 4 * Threads.nthreads(),
Embeds a vector of docs
using the provided model (kwarg model
) in a batched manner and then returns the binary embeddings matrix represented in UInt64 (bit-packed) - BitPackedBatchEmbedder
tries to batch embedding calls for roughly 80K characters per call (to avoid exceeding the API rate limit) to reduce network latency.
The best option for FAST and MEMORY-EFFICIENT storage of embeddings, for retrieval use BitPackedCosineSimilarity
are assumed to be already chunked to the reasonable sizes that fit within the embedding context limit.If you get errors about exceeding input sizes, first check the
in your chunks. If that does NOT resolve the issue, try reducing thetarget_batch_size_length
parameter (eg, 10_000) and number of tasksntasks=1
. Some providers cannot handle large batch sizes.
: A vector of strings to be embedded.verbose
: A boolean flag for verbose output. Default istrue
: The model to use for embedding. Default isPT.MODEL_EMBEDDING
: The dimensionality of the embeddings to truncate to. Default isnothing
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.target_batch_size_length
: The target length (in characters) of each batch of document chunks sent for embedding. Default is 80_000 characters. Speeds up embedding process.ntasks
: The number of tasks to use for asyncmap. Default is 4 * Threads.nthreads().
See also: unpack_bits
, pack_bits
, BitPackedCosineSimilarity
get_tags(tagger::NoTagger, docs::AbstractVector{<:AbstractString};
Simple no-op that skips any tagging of the documents
get_tags(tagger::OpenTagger, docs::AbstractVector{<:AbstractString};
verbose::Bool = true,
cost_tracker = Threads.Atomic{Float64}(0.0),
Extracts "tags" (metadata/keywords) from a vector of docs
using the provided model (kwarg model
: A vector of strings to be embedded.verbose
: A boolean flag for verbose output. Default istrue
: The model to use for tags extraction. Default isPT.MODEL_CHAT
: A template to be used for tags extraction. Default is:RAGExtractMetadataShort
: AThreads.Atomic{Float64}
object to track the total cost of the API calls. Useful to pass the total cost to the parent call.
get_tags(tagger::PassthroughTagger, docs::AbstractVector{<:AbstractString};
Pass tags
directly as Vector of Vectors of strings (ie, tags[i]
is the tags for docs[i]
). It then builds the vocabulary from the tags and returns both the tags in matrix form and the vocabulary.
nt::NamedTuple, parent_keys::Vector{Symbol}, key::Symbol, default = nothing)
Get a property key
from a nested NamedTuple nt
, where the property is nested to a key in parent_keys
Useful for nested kwargs where we want to get some property in parent_keys
subset (eg, model
in retriever_kwargs
kw = (; abc = (; def = "x"))
getpropertynested(kw, [:abc], :def)
# Output: "x"
mat::AbstractMatrix{T}, query::AbstractVector{T})::Vector{Int} where {T <: Integer}
Calculates the column-wise Hamming distance between a matrix of binary vectors mat
and a single binary vector vect
This is the first-pass ranking for BinaryCosineSimilarity
Implementation from domluna's tinyRAG.
truncate_dimension::Union{Nothing, Int} = nothing; verbose::Bool = false) where {T <:
Horizontal concatenation of matrices, with optional truncation of the rows of each matrix to the specified dimension (reducing embedding dimensionality).
More efficient that a simple splatting, as the resulting matrix is pre-allocated in one go.
Returns: a Matrix{Float32}
: Vector of matrices to concatenatetruncate_dimension::Union{Nothing,Int}=nothing
: Dimension to truncate to, ornothing
to skip truncation. If truncated, the columns will be normalized.verbose::Bool=false
: Whether to print verbose output.
a = rand(Float32, 1000, 10)
b = rand(Float32, 1000, 20)
c = hcat_truncate([a, b])
size(c) # (1000, 30)
d = hcat_truncate([a, b], 500)
size(d) # (500, 30)
load_text(chunker::AbstractChunker, input;
Load text from input
using the provided chunker
. Called by get_chunks
Available chunkers:
: The function opens each file ininput
and reads its contents.TextChunker
: The function assumes thatinput
is a vector of strings to be chunked, you MUST provide correspondingsources
merge_kwargs_nested(nt1::NamedTuple, nt2::NamedTuple)
Merges two nested NamedTuples nt1
and nt2
recursively. The nt2
values will overwrite the nt1
values when overlapping.
kw = (; abc = (; def = "x"))
kw2 = (; abc = (; def = "x", def2 = 2), new = 1)
merge_kwargs_nested(kw, kw2)
pack_bits(arr::AbstractMatrix{<:Bool}) -> Matrix{UInt64}
pack_bits(vect::AbstractVector{<:Bool}) -> Vector{UInt64}
Pack a matrix or vector of boolean values into a more compact representation using UInt64.
Arguments (Input)
: A matrix of boolean values where the number of rows must be divisible by 64.
- For
: Returns a matrix of UInt64 where each element represents 64 boolean values from the original matrix.
For vectors:
bin = rand(Bool, 128)
binint = pack_bits(bin)
binx = unpack_bits(binint)
@assert bin == binx
For matrices:
bin = rand(Bool, 128, 10)
binint = pack_bits(bin)
binx = unpack_bits(binint)
@assert bin == binx
result::RankGPTResult; rank_start::Integer = 1, rank_end::Integer = 100, kwargs...)
One sub-step of the RankGPT algorithm permutation ranking within the window of chunks defined by rank_start
and rank_end
preprocess_tokens(text::AbstractString, stemmer=nothing; stopwords::Union{Nothing,Set{String}}=nothing, min_length::Int=3)
Preprocess provided text
by removing numbers, punctuation, and applying stemming for BM25 search index.
Returns a list of preprocessed tokens.
stemmer = Snowball.Stemmer("english")
stopwords = Set(["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "some", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"])
text = "This is a sample paragraph to test the functionality of your text preprocessor. It contains a mix of uppercase and lowercase letters, as well as punctuation marks such as commas, periods, and exclamation points! Let's see how your preprocessor handles quotes, like "this one", and also apostrophes, like in don't. Will it preserve the formatting of this paragraph, including the indentation and line breaks?"
preprocess_tokens(text, stemmer; stopwords)
print_html([io::IO,] parent_node::AbstractAnnotatedNode)
print_html([io::IO,] rag::AbstractRAGResult; add_sources::Bool = false,
add_scores::Bool = false, default_styler = HTMLStyler(),
low_styler = HTMLStyler(styles = "color:magenta", classes = ""),
medium_styler = HTMLStyler(styles = "color:blue", classes = ""),
high_styler = HTMLStyler(styles = "", classes = ""), styler_kwargs...)
Pretty-prints the annotation parent_node
(or RAGResult
) to the io
stream (or returns the string) in HTML format (assumes node is styled with styler HTMLStyler
It wraps each "token" into a span with requested styling (HTMLStyler's properties classes
and styles
). It also replaces new lines with <br>
for better HTML formatting.
For any non-HTML styler, it prints the content as plain text.
is providedor the string with HTML-formatted text (if
is not provided, we print the result out)
See also HTMLStyler
, annotate_support
, and set_node_style!
for how the styling is applied and what the arguments mean.
Note: RT
is an alias for PromptingTools.Experimental.RAGTools
Simple start directly with the RAGResult
# set up the text/RAGResult
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."
rag = RT.RAGResult(; context, final_answer=answer, question="")
# print the HTML
Low-level control by creating our AnnotatedNode
# prepare your HTML styling
styler_kwargs = (;
low_styler=RT.HTMLStyler(styles="color:magenta", classes=""),
medium_styler=RT.HTMLStyler(styles="color:blue", classes=""),
high_styler=RT.HTMLStyler(styles="", classes=""))
# annotate the text
context = [
"This is a test context.", "Another context sentence.", "Final piece of context."]
answer = "This is a test answer. It has multiple sentences."
parent_node = RT.annotate_support(
RT.TrigramAnnotater(), answer, context; add_sources=false, add_scores=false, styler_kwargs...)
# print the HTML
# or to accumulate more nodes
io = IOBuffer()
print_html(io, parent_node)
rank_gpt(chunks::AbstractVector{<:AbstractString}, question::AbstractString;
verbose::Int = 1, rank_start::Integer = 1, rank_end::Integer = 100,
window_size::Integer = 20, step::Integer = 10,
num_rounds::Integer = 1, model::String = "gpt4o", kwargs...)
Ranks the chunks
based on their relevance for question
. Returns the ranking permutation of the chunks in the order they are most relevant to the question (the first is the most relevant).
result = rank_gpt(chunks, question; rank_start=1, rank_end=25, window_size=8, step=4, num_rounds=3, model="gpt4o")
[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github
result::RankGPTResult; verbose::Int = 1, rank_start = 1, rank_end = 100,
window_size = 20, step = 10, model::String = "gpt4o", kwargs...)
One single pass of the RankGPT algorithm permutation ranking across all positions between rank_start
and rank_end
curr_rank::AbstractVector{<:Integer}, response::AbstractString;
rank_start::Integer = 1, rank_end::Integer = 100)
Extracts and heals the permutation to contain all ranking positions.
reciprocal_rank_fusion(args...; k::Int=60)
Merges multiple rankings and calculates the reciprocal rank score for each chunk (discounted by the inverse of the rank).
positions1 = [1, 3, 5, 7, 9]
positions2 = [2, 4, 6, 8, 10]
positions3 = [2, 4, 6, 11, 12]
merged_positions, scores = reciprocal_rank_fusion(positions1, positions2, positions3)
positions1::AbstractVector{<:Integer}, scores1::AbstractVector{<:T},
scores2::AbstractVector{<:T}; k::Int = 60) where {T <: Real}
Merges two sets of rankings and their joint scores. Calculates the reciprocal rank score for each chunk (discounted by the inverse of the rank).
positions1 = [1, 3, 5, 7, 9]
scores1 = [0.9, 0.8, 0.7, 0.6, 0.5]
positions2 = [2, 4, 6, 8, 10]
scores2 = [0.5, 0.6, 0.7, 0.8, 0.9]
merged, scores = reciprocal_rank_fusion(positions1, scores1, positions2, scores2; k = 60)
refiner::NoRefiner, index::AbstractChunkIndex, result::AbstractRAGResult;
Simple no-op function for refine!
. It simply copies the result.answer
and result.conversations[:answer]
without any changes.
refiner::SimpleRefiner, index::AbstractDocumentIndex, result::AbstractRAGResult;
verbose::Bool = true,
model::AbstractString = PT.MODEL_CHAT,
template::Symbol = :RAGAnswerRefiner,
cost_tracker = Threads.Atomic{Float64}(0.0),
Give model a chance to refine the answer (using the same or different context than previously provided).
This method uses the same context as the original answer, however, it can be modified to do additional retrieval and use a different context.
- Mutated
and the full conversation saved inresult.conversations[:final_answer]
: The method to use for refining the answer. Usesaigenerate
: The index containing chunks and sources.result::AbstractRAGResult
: The result containing the context and question to generate the answer for.model::AbstractString
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: Iftrue
, enables verbose logging.template::Symbol
: The template to use for theaigenerate
function. Defaults to:RAGAnswerRefiner
: An atomic counter to track the cost of the operation.
refiner::TavilySearchRefiner, index::AbstractDocumentIndex, result::AbstractRAGResult;
verbose::Bool = true,
model::AbstractString = PT.MODEL_CHAT,
include_answer::Bool = true,
max_results::Integer = 5,
include_domains::AbstractVector{<:AbstractString} = String[],
exclude_domains::AbstractVector{<:AbstractString} = String[],
template::Symbol = :RAGWebSearchRefiner,
cost_tracker = Threads.Atomic{Float64}(0.0),
Refines the answer by executing a web search using the Tavily API. This method aims to enhance the answer's accuracy and relevance by incorporating information retrieved from the web.
Note: The web results and web answer (if requested) will be added to the context and sources!
and the full conversation saved inresult.conversations[:final_answer]
.In addition, the web results and web answer (if requested) are appended to the
for correct highlighting and verification.
: The method to use for refining the answer. Usesaigenerate
with a web search template.index::AbstractDocumentIndex
: The index containing chunks and sources.result::AbstractRAGResult
: The result containing the context and question to generate the answer for.model::AbstractString
: The model to use for generating the answer. Defaults toPT.MODEL_CHAT
: Iftrue
, includes the answer from Tavily in the web search.max_results::Integer
: The maximum number of results to return.include_domains::AbstractVector{<:AbstractString}
: A list of domains to include in the search results. Default is an empty list.exclude_domains::AbstractVector{<:AbstractString}
: A list of domains to exclude from the search results. Default is an empty list.verbose::Bool
: Iftrue
, enables verbose logging.template::Symbol
: The template to use for theaigenerate
function. Defaults to:RAGWebSearchRefiner
: An atomic counter to track the cost of the operation.
refiner!(TavilySearchRefiner(), index, result)
# See result.final_answer or pprint(result)
To enable this refiner in a full RAG pipeline, simply swap the component in the config:
cfg = RT.RAGConfig()
cfg.generator.refiner = RT.TavilySearchRefiner()
result = airag(cfg, index; question, return_all = true)
rephrase(rephraser::SimpleRephraser, question::AbstractString;
verbose::Bool = true,
model::String = PT.MODEL_CHAT, template::Symbol = :RAGQueryHyDE,
cost_tracker = Threads.Atomic{Float64}(0.0))
Rephrases the question
using the provided rephraser template = RAGQueryHyDE
Special flavor of rephrasing using HyDE (Hypothetical Document Embedding) method, which aims to find the documents most similar to a synthetic passage that would be a good answer to our question.
Returns both the original and the rephrased question.
: Type that dictates the logic of rephrasing step.question
: The question to be rephrased.model
: The model to use for rephrasing. Default isPT.MODEL_CHAT
: The rephrasing template to use. Default is:RAGQueryHyDE
. Find more withaitemplates("rephrase")
: A boolean flag indicating whether to print verbose logging. Default istrue
rephrase(rephraser::NoRephraser, question::AbstractString; kwargs...)
No-op, simple passthrough.
rephrase(rephraser::SimpleRephraser, question::AbstractString;
verbose::Bool = true,
model::String = PT.MODEL_CHAT, template::Symbol = :RAGQueryOptimizer,
cost_tracker = Threads.Atomic{Float64}(0.0), kwargs...)
Rephrases the question
using the provided rephraser template
Returns both the original and the rephrased question.
: Type that dictates the logic of rephrasing step.question
: The question to be rephrased.model
: The model to use for rephrasing. Default isPT.MODEL_CHAT
: The rephrasing template to use. Default is:RAGQueryOptimizer
. Find more withaitemplates("rephrase")
: A boolean flag indicating whether to print verbose logging. Default istrue
reranker::CohereReranker, index::AbstractDocumentIndex, question::AbstractString,
verbose::Bool = false,
api_key::AbstractString = PT.COHERE_API_KEY,
top_n::Integer = length(candidates.scores),
model::AbstractString = "rerank-english-v3.0",
return_documents::Bool = false,
cost_tracker = Threads.Atomic{Float64}(0.0),
Re-ranks a list of candidate chunks using the Cohere Rerank API. See for more details.
: Using Cohere APIindex
: The index that holds the underlying chunks to be re-ranked.question
: The query to be used for the search.candidates
: The candidate chunks to be re-ranked.top_n
: The number of most relevant documents to return. Default islength(documents)
: The model to use for reranking. Default isrerank-english-v3.0
: A boolean flag indicating whether to return the reranked documents in the response. Default isfalse
: A boolean flag indicating whether to print verbose logging. Default isfalse
: An atomic counter to track the cost of the retrieval. Not implemented /tracked (cost unclear). Provided for consistency.
reranker::RankGPTReranker, index::AbstractDocumentIndex, question::AbstractString,
api_key::AbstractString = PT.OPENAI_API_KEY,
model::AbstractString = PT.MODEL_CHAT,
verbose::Bool = false,
top_n::Integer = length(candidates.scores),
unique_chunks::Bool = true,
cost_tracker = Threads.Atomic{Float64}(0.0),
Re-ranks a list of candidate chunks using the RankGPT algorithm. See for more details.
It uses LLM calls to rank the candidate chunks.
: Using Cohere APIindex
: The index that holds the underlying chunks to be re-ranked.question
: The query to be used for the search.candidates
: The candidate chunks to be re-ranked.top_n
: The number of most relevant documents to return. Default islength(documents)
: The model to use for reranking. Default isrerank-english-v3.0
: A boolean flag indicating whether to print verbose logging. Default is1
: A boolean flag indicating whether to remove duplicates from the candidate chunks prior to reranking (saves compute time). Default istrue
index = <some index>
question = "What are the best practices for parallel computing in Julia?"
cfg = RAGConfig(; retriever = SimpleRetriever(; reranker = RT.RankGPTReranker()))
msg = airag(cfg, index; question, return_all = true)
To get full verbosity of logs, set verbose = 5
(anything higher than 3).
msg = airag(cfg, index; question, return_all = true, verbose = 5)
[1] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents by W. Sun et al. [2] RankGPT Github
verbose::Integer = 1,
top_k::Integer = 100,
top_n::Integer = 5,
api_kwargs::NamedTuple = NamedTuple(),
rephraser::AbstractRephraser = retriever.rephraser,
rephraser_kwargs::NamedTuple = NamedTuple(),
embedder::AbstractEmbedder = retriever.embedder,
embedder_kwargs::NamedTuple = NamedTuple(),
processor::AbstractProcessor = retriever.processor,
processor_kwargs::NamedTuple = NamedTuple(),
finder::AbstractSimilarityFinder = retriever.finder,
finder_kwargs::NamedTuple = NamedTuple(),
tagger::AbstractTagger = retriever.tagger,
tagger_kwargs::NamedTuple = NamedTuple(),
filter::AbstractTagFilter = retriever.filter,
filter_kwargs::NamedTuple = NamedTuple(),
reranker::AbstractReranker = retriever.reranker,
reranker_kwargs::NamedTuple = NamedTuple(),
cost_tracker = Threads.Atomic{Float64}(0.0),
Retrieves the most relevant chunks from the index for the given question and returns them in the RAGResult
This is the main entry point for the retrieval stage of the RAG pipeline. It is often followed by generate!
- The default flow is
The arguments correspond to the steps of the retrieval process (rephrasing, embedding, finding similar docs, tagging, filtering by tags, reranking). You can customize each step by providing a new custom type that dispatches the corresponding function, eg, create your own type struct MyReranker<:AbstractReranker end
and define the custom method for it rerank(::MyReranker,...) = ...
Note: Discover available retrieval sub-types for each step with subtypes(AbstractRephraser)
and similar for other abstract types.
If you're using locally-hosted models, you can pass the api_kwargs
with the url
field set to the model's URL and make sure to provide corresponding model
kwargs to rephraser
, embedder
, and tagger
to use the custom models (they make AI calls).
: The retrieval method to use. Default isSimpleRetriever
but could beAdvancedRetriever
for more advanced retrieval.index
: The index that holds the chunks and sources to be retrieved from.question
: The question to be used for the retrieval.verbose
: If>0
, it prints out verbose logging. Default is1
. If you set it to2
, it will print out logs for each sub-function.top_k
: The TOTAL number of closest chunks to return fromfind_closest
. Default is100
. If there are multiple rephrased questions, the number of chunks per each item will betop_k ÷ number_of_rephrased_questions
: The TOTAL number of most relevant chunks to return for the context (fromrerank
step). Default is5
: Additional keyword arguments to be passed to the API calls (shared by allai*
: Transform the question into one or more questions. Default isretriever.rephraser
: Additional keyword arguments to be passed to the rephraser.model
: The model to use for rephrasing. Default isPT.MODEL_CHAT
: The rephrasing template to use. Default is:RAGQueryOptimizer
(depending on therephraser
: The embedding method to use. Default isretriever.embedder
: Additional keyword arguments to be passed to the embedder.processor
: The processor method to use when using Keyword-based index. Default isretriever.processor
: Additional keyword arguments to be passed to the processor.finder
: The similarity search method to use. Default isretriever.finder
, oftenCosineSimilarity
: Additional keyword arguments to be passed to the similarity finder.tagger
: The tag generating method to use. Default isretriever.tagger
: Additional keyword arguments to be passed to the tagger. Noteworthy arguments:tags
: Directly provide the tags to use for filtering (can be String, Regex, or Vector{String}). Useful fortagger = PassthroughTagger
: The tag matching method to use. Default isretriever.filter
: Additional keyword arguments to be passed to the tag filter.reranker
: The reranking method to use. Default isretriever.reranker
: Additional keyword arguments to be passed to the reranker.model
: The model to use for reranking. Default isrerank-english-v2.0
if you usereranker = CohereReranker()
: An atomic counter to track the cost of the retrieval. Default isThreads.Atomic{Float64}(0.0)
See also: SimpleRetriever
, AdvancedRetriever
, build_index
, rephrase
, get_embeddings
, get_keywords
, find_closest
, get_tags
, find_tags
, rerank
, RAGResult
Find the 5 most relevant chunks from the index for the given question.
# assumes you have an existing index `index`
retriever = SimpleRetriever()
result = retrieve(retriever,
"What is the capital of France?",
top_n = 5)
# or use the default retriever (same as above)
result = retrieve(retriever,
"What is the capital of France?",
top_n = 5)
Apply more advanced retrieval with question rephrasing and reranking (requires COHERE_API_KEY
). We will obtain top 100 chunks from embeddings (top_k
) and top 5 chunks from reranking (top_n
retriever = AdvancedRetriever()
result = retrieve(retriever, index, question; top_k=100, top_n=5)
You can use the retriever
to customize your retrieval strategy or directly change the strategy types in the retrieve
Example of using locally-hosted model hosted on localhost:8080
retriever = SimpleRetriever()
result = retrieve(retriever, index, question;
rephraser_kwargs = (; model = "custom"),
embedder_kwargs = (; model = "custom"),
tagger_kwargs = (; model = "custom"), api_kwargs = (;
url = "http://localhost:8080"))
run_qa_evals(index::AbstractChunkIndex, qa_items::AbstractVector{<:QAEvalItem};
api_kwargs::NamedTuple = NamedTuple(),
airag_kwargs::NamedTuple = NamedTuple(),
qa_evals_kwargs::NamedTuple = NamedTuple(),
verbose::Bool = true, parameters_dict::Dict{Symbol, <:Any} = Dict{Symbol, Any}())
Evaluates a vector of QAEvalItem
s and returns a vector QAEvalResult
. This function assesses the relevance and accuracy of the answers generated in a QA evaluation context.
See ?run_qa_evals
for more details.
: The vector of QA evaluation items containing the questions and their answers.verbose::Bool
: Iftrue
, enables verbose logging. Defaults totrue
: Parameters that will be forwarded to the API calls. See?aiextract
for details.airag_kwargs::NamedTuple
: Parameters that will be forwarded toairag
calls. See?airag
for details.qa_evals_kwargs::NamedTuple
: Parameters that will be forwarded torun_qa_evals
calls. See?run_qa_evals
for details.parameters_dict::Dict{Symbol, Any}
: Track any parameters used for later evaluations. Keys must be Symbols.
: Vector of evaluation results that includes various scores and metadata related to the QA evaluation.
index = "..." # Assuming a proper index is defined
qa_items = [QAEvalItem(question="What is the capital of France?", answer="Paris", context="France is a country in Europe."),
QAEvalItem(question="What is the capital of Germany?", answer="Berlin", context="Germany is a country in Europe.")]
# Let's run a test with `top_k=5`
results = run_qa_evals(index, qa_items; airag_kwargs=(;top_k=5), parameters_dict=Dict(:top_k => 5))
# Filter out the "failed" calls
results = filter(x->!isnothing(x.answer_score), results);
# See average judge score
mean(x->x.answer_score, results)
run_qa_evals(qa_item::QAEvalItem, ctx::RAGResult; verbose::Bool = true,
parameters_dict::Dict{Symbol, <:Any}, judge_template::Symbol = :RAGJudgeAnswerFromContext,
model_judge::AbstractString, api_kwargs::NamedTuple = NamedTuple()) -> QAEvalResult
Evaluates a single QAEvalItem
using RAG details (RAGResult
) and returns a QAEvalResult
structure. This function assesses the relevance and accuracy of the answers generated in a QA evaluation context.
: The QA evaluation item containing the question and its answer.ctx::RAGResult
: The RAG result used for generating the QA pair, including the original context and the answers. Comes fromairag(...; return_context=true)
: Iftrue
, enables verbose logging. Defaults totrue
.parameters_dict::Dict{Symbol, Any}
: Track any parameters used for later evaluations. Keys must be Symbols.judge_template::Symbol
: The template symbol for the AI model used to judge the answer. Defaults to:RAGJudgeAnswerFromContext
: The AI model used for judging the answer's quality. Defaults to standard chat model, but it is advisable to use more powerful model GPT-4.api_kwargs::NamedTuple
: Parameters that will be forwarded to the API endpoint.
: An evaluation result that includes various scores and metadata related to the QA evaluation.
The function computes a retrieval score and rank based on how well the context matches the QA context.
It then uses the
to score the answer's accuracy and relevance.In case of errors during evaluation, the function logs a warning (if
) and theanswer_score
will be set tonothing
Evaluating a QA pair using a specific context and model:
qa_item = QAEvalItem(question="What is the capital of France?", answer="Paris", context="France is a country in Europe.")
ctx = RAGResult(source="Wikipedia", context="France is a country in Europe.", answer="Paris")
parameters_dict = Dict("param1" => "value1", "param2" => "value2")
eval_result = run_qa_evals(qa_item, ctx, parameters_dict=parameters_dict, model_judge="MyAIJudgeModel")
Returns 1.0 if context
overlaps or is contained within any of the candidate_context
Returns Integer rank of the position where context
overlaps or is contained within a candidate_context
score_to_unit_scale(x::AbstractVector{T}) where T<:Real
Shift and scale a vector of scores to the unit scale [0, 1].
x = [1.0, 2.0, 3.0, 4.0, 5.0]
scaled_x = score_to_unit_scale(x)
set_node_style!(::TrigramAnnotater, node::AnnotatedNode;
low_threshold::Float64 = 0.0, medium_threshold::Float64 = 0.5, high_threshold::Float64 = 1.0,
default_styler::AbstractAnnotationStyler = Styler(),
low_styler::AbstractAnnotationStyler = Styler(color = :magenta, bold = false),
medium_styler::AbstractAnnotationStyler = Styler(color = :blue, bold = false),
high_styler::AbstractAnnotationStyler = Styler(color = :nothing, bold = false),
bold_multihits::Bool = false)
Sets style of node
based on the provided rules
setpropertynested(nt::NamedTuple, parent_keys::Vector{Symbol},
Setter for a property key
in a nested NamedTuple nt
, where the property is nested to a key in parent_keys
Useful for nested kwargs where we want to change some property in parent_keys
subset (eg, model
in retriever_kwargs
kw = (; abc = (; def = "x"))
setpropertynested(kw, [:abc], :def, "y")
# Output: (abc = (def = "y",),)
Practical example of changing all model
keys in CHAT-based steps in the pipeline:
# changes :model to "gpt4t" whenever the parent key is in the below list (chat-based steps)
[:rephraser_kwargs, :tagger_kwargs, :answerer_kwargs, :refiner_kwargs],
:model, "gpt4t")
Or changing an embedding model (across both indexer and retriever steps, because it's same step name):
kwargs = setpropertynested(
kwargs, [:embedder_kwargs],
:model, "text-embedding-3-large"
split_into_code_and_sentences(input::Union{String, SubString{String}})
Splits text block into code or text and sub-splits into units.
If code block, it splits by newline but keep the group_id
the same (to have the same source) If text block, splits into sentences, bullets, etc., provides different group_id
(to have different source)
Extracts the Tag
item into a string of the form category:::value
(lowercased and spaces replaced with underscores).
msg = aiextract(:RAGExtractMetadataShort; return_type=MaybeTags, text="I like package DataFrames", instructions="None.")
metadata = tags_extract(msg.content.items)
prev_token::Union{Nothing, AbstractString}, curr_token::AbstractString,
next_token::Union{Nothing, AbstractString})
Joins the three tokens together. Useful to add boundary tokens (like spaces vs brackets) to the curr_token
to improve the matched context (ie, separate partial matches from exact match)
tokenize(input::Union{String, SubString{String}})
Tokenizes provided input
by spaces, special characters or Julia symbols (eg, =>
Unlike other tokenizers, it aims to lossless - ie, keep both the separated text and the separators.
translate_positions_to_parent(index::AbstractChunkIndex, positions::AbstractVector{<:Integer})
Translate positions to the parent index. Useful to convert between positions in a view and the original index.
Used whenever a chunkdata()
is used to re-align positions in case index is a view.
index::SubChunkIndex, pos::AbstractVector{<:Integer})
Translate positions to the parent index. Useful to convert between positions in a view and the original index.
Used whenever a chunkdata()
or tags()
are used to re-align positions to the "parent" index.
context_trigrams::AbstractVector, trigram_func::F1 = trigrams, token_transform::F2 = identity;
skip_trigrams::Bool = false, min_score::Float64 = 0.5,
min_source_score::Float64 = 0.25,
stop_words::AbstractVector{<:String} = STOPWORDS,
styler_kwargs...) where {F1 <: Function, F2 <: Function}
Find if the parent_node.content
is supported by the provided context_trigrams
Split the
into tokensCreate an
for each tokenIf
is enabled, it looks for an exact match in thecontext_trigrams
If no exact match found, it counts trigram-based match (include the surrounding tokens for better contextual awareness) as a score
Then it sets the style of the node based on the score
Lastly, it aligns the styles of neighboring nodes with
(eg, single character tokens)Then, it rolls up the scores and sources to the parent node
For diagnostics, you can use AbstractTrees.print_tree(parent_node)
to see the tree structure of each token and its score.
node = AnnotatedNode(content = "xyz") trigram_support!(node, context_trigrams) # updates node.children! ```
<div style='border-width:1px; border-style:solid; border-color:black; padding: 1em; border-radius: 25px;'>
<a id='PromptingTools.Experimental.RAGTools.trigrams-Tuple{AbstractString}' href='#PromptingTools.Experimental.RAGTools.trigrams-Tuple{AbstractString}'>#</a> <b><u>PromptingTools.Experimental.RAGTools.trigrams</u></b> — <i>Method</i>.
trigrams(input_string::AbstractString; add_word::AbstractString = "")
Splits provided input_string
into a vector of trigrams (combination of three consecutive characters found in the input_string
If add_word
is provided, it is added to the resulting array. Useful to add the full word itself to the resulting array for exact match.
trigrams_hashed(input_string::AbstractString; add_word::AbstractString = "")
Splits provided input_string
into a Set of hashed trigrams (combination of three consecutive characters found in the input_string
It is more efficient for lookups in large strings (eg, >100K characters).
If add_word
is provided, it is added to the resulting array to hash. Useful to add the full word itself to the resulting array for exact match.
Extract the last message from the RAGResult. It looks for final_answer
first, then answer
fields in the conversations
dictionary. Returns nothing
if not found.
Extracts the last output (generated text answer) from the RAGResult.
io::IO, node::AbstractAnnotatedNode;
text_width::Int = displaysize(io)[2], add_newline::Bool = true)
Pretty print the node
to the io
stream, including all its children
Supports only
for now.
io::IO, r::AbstractRAGResult; add_context::Bool = false,
text_width::Int = displaysize(io)[2], annotater_kwargs...)
Pretty print the RAG result r
to the given io
If add_context
is true
, the context will be printed as well. The text_width
parameter can be used to control the width of the output.
You can provide additional keyword arguments to the annotater, eg, add_sources
, add_scores
, min_score
, etc. See annotate_support
for more details.