API Reference

API reference for FlashRank.

FlashRank.BertTextEncoder
FlashRank.EmbedResult
FlashRank.EmbedderModel
FlashRank.RankerModel
FlashRank.WordPiece
FlashRank.WordPiece
FlashRank.bert_cased_tokenizer
FlashRank.bert_uncased_tokenizer
FlashRank.embed
FlashRank.embed
FlashRank.encode
FlashRank.rank
FlashRank.tokenize

FlashRank.BertTextEncoder — Type

BertTextEncoder

The text encoder for Bert model (WordPiece tokenization).

Fields

wp::WordPiece: The WordPiece tokenizer.
vocab::Dict{String, Int}: The vocabulary, 0-based indexing of tokens to match Python implementation.
startsym::String: The start symbol.
endsym::String: The end symbol.
padsym::String: The pad symbol.
trunc::Union{Nothing, Int}: The truncation length. Defaults to 512 tokens.

source

FlashRank.EmbedResult — Type

EmbedResult{T <: Real}

The result of embedding passages.

Fields

embeddings::AbstractArray{T}: The embeddings of the passages. With property embeddings as column-major matrix of size (batch_size, embedding_dimension).
elapsed::Float64: The time taken to embed the passages.

source

FlashRank.EmbedderModel — Type

EmbedderModel

A model for embedding passages, including the encoder and the ONNX session for inference.

For embedding, use as embed(embedder, passages) or as a functor embedder(passages).

source

FlashRank.RankerModel — Type

RankerModel

A model for ranking passages, including the encoder and the ONNX session for inference.

For ranking, use as rank(ranker, query, passages) or as a functor ranker(query, passages).

source

FlashRank.WordPiece — Type

WordPiece

WordPiece is a tokenizer that splits a string into a sequence of KNOWN sub-word tokens (or token IDs). It uses a double array trie to store the vocabulary and the index of the vocabulary.

Implementation is based on: https://github.com/chengchingwen/Transformers.jl

Fields

trie::DoubleArrayTrie: The double array trie of the vocabulary (for fast lookups of tokens).
index::Vector{Int}: The index of the vocabulary. It is 0-based as we provide token IDs to models trained in python.
unki::Int: The index of the unknown token in the TRIE (ie, this is not the token ID, but the trie index).
max_char::Int: The maximum number of characters in a token. Default is 200.
subword_prefix::String: The prefix of a sub-word token. Default is "##".

source

FlashRank.WordPiece — Method

(wp::WordPiece; token_ids::Bool = false)(x)

WordPiece functor that tokenizes a string into a sequence of tokens (or token IDs).

Arguments

token_ids::Bool = false: If true, return the token IDs directly. Otherwise, return the tokens.

source

FlashRank.bert_cased_tokenizer — Method

bert_cased_tokenizer(input)

Google bert tokenizer which remain the case during tokenization. Recommended for multi-lingual data.

source

FlashRank.bert_uncased_tokenizer — Method

bert_uncased_tokenizer(input)

Google bert tokenizer which do lower case on input before tokenization.

source

FlashRank.embed — Method

embed(
    embedder::EmbedderModel, passage::AbstractString; split_instead_trunc::Bool = false)

Embeds a single passage.

If passage is too long for the model AND split_instead_trunc is true, the passage is split into several smaller chunks of size embedder.encoder.trunc and embedded separately.

source

FlashRank.embed — Method

embed(
    embedder::EmbedderModel, passages::AbstractVector{<:AbstractString})

Embeds passages using the given embedder model.

Arguments:

embedder::EmbedderModel: The embedder model to use.
passages::AbstractVector{<:AbstractString}: The passages to embed.

Returns

EmbedResult: The embeddings of the passages. With property embeddings as column-major matrix of size (batch_size, embedding_dimension).

Example

model = EmbedderModel(:tiny_embed)
result = embed(model, ["Hello, how are you?", "How is it going?"])
result.embeddings # 312x2 matrix of Float32

source

FlashRank.encode — Method

encode(enc::BertTextEncoder, text::String; add_special_tokens::Bool = true,
    max_tokens::Int = enc.trunc, split_instead_trunc::Bool = false)

Encodes the text and returns the token IDs, token type IDs, and attention mask.

We enforce max_tokens to be a concrete number here to be able to do split_instead_trunc. split_instead_trunc splits any long sequences into several smaller ones.

source

FlashRank.rank — Method

rank(
    ranker::RankerModel, query::AbstractString, passages::AbstractVector{<:AbstractString};
    top_n = length(passages))

Ranks passages for a given query using the given ranker model. Ranking should determine higher suitability to provide an answer to the query (higher score is better).

Arguments:

ranker::RankerModel: The ranker model to use.
query::AbstractString: The query to rank passages for.
passages::AbstractVector{<:AbstractString}: The passages to rank.
top_n: The number of most relevant documents to return. Default is length(passages).

source

FlashRank.tokenize — Method

tokenize(enc::BertTextEncoder, text::AbstractString;
    add_special_tokens::Bool = true, add_end_token::Bool = true, token_ids::Bool = false,
    max_tokens::Union{Nothing, Int} = enc.trunc)

Tokenizes the text and returns the tokens or token IDs (to skip looking up the IDs twice).

Arguments

add_special_tokens::Bool = true: Add special tokens at the beginning and end of the text.
add_end_token::Bool = true: Add end token at the end of the text.
token_ids::Bool = false: If true, return the token IDs directly. Otherwise, return the tokens.
max_tokens::Union{Nothing, Int} = enc.trunc: The maximum number of tokens to return (usually defined by the model).

source