How It Works
This is an advanced section that explains how PromptingTools.jl works under the hood. It is not necessary to understand this to use the package, but it can be helpful for debugging and understanding the limitations of the package.
We'll start with the key concepts and then walk through an example of aigenerate
to see how it all fits together.
Key Concepts
5 Key Concepts (/Objects):
API/Model Providers -> The method that gives you access to Large Language Models (LLM), it can be an API (eg, OpenAI) or a locally-hosted application (eg, Llama.cpp or Ollama)
Schemas -> object of type
AbstractPromptSchema
that determines which methods are called and, hence, what providers/APIs are usedPrompts -> the information you want to convey to the AI model
Messages -> the basic unit of communication between the user and the AI model (eg,
UserMessage
vsAIMessage
)Prompt Templates -> re-usable "prompts" with placeholders that you can replace with your inputs at the time of making the request
When you call aigenerate
, roughly the following happens: render
-> UserMessage
(s) -> render
-> OpenAI.create_chat
-> ... -> AIMessage
.
API/Model Providers
You can think of "API/Model Providers" as the method that gives you access to Large Language Models (LLM). It can be an API (eg, OpenAI) or a locally-hosted application (eg, Llama.cpp or Ollama).
You interact with them via the schema
object, which is a subtype of AbstractPromptSchema
, eg, there is an OpenAISchema
for the provider "OpenAI" and its supertype AbstractOpenAISchema
is for all other providers that mimic the OpenAI API.
Schemas
For your "message" to reach an AI model, it needs to be formatted and sent to the right place (-> provider!).
We leverage the multiple dispatch around the "schemas" to pick the right logic. All schemas are subtypes of AbstractPromptSchema
and there are many subtypes, eg, OpenAISchema <: AbstractOpenAISchema <:AbstractPromptSchema
.
For example, if you provide schema = OpenAISchema()
, the system knows that:
it will have to format any user inputs to OpenAI's "message specification" (a vector of dictionaries, see their API documentation). Function
render(OpenAISchema(),...)
will take care of the rendering.it will have to send the message to OpenAI's API. We will use the amazing
OpenAI.jl
package to handle the communication.
Prompts
Prompt is loosely the information you want to convey to the AI model. It can be a question, a statement, or a command. It can have instructions or some context, eg, previous conversation.
You need to remember that Large Language Models (LLMs) are stateless. They don't remember the previous conversation/request, so you need to provide the whole history/context every time (similar to how REST APIs work).
Prompts that we send to the LLMs are effectively a sequence of messages (<:AbstractMessage
).
Messages
Messages are the basic unit of communication between the user and the AI model.
There are 5 main types of messages (<:AbstractMessage
):
SystemMessage
- this contains information about the "system", eg, how it should behave, format its output, etc. (eg, `You're a world-class Julia programmer. You write brief and concise code.)UserMessage
- the information "from the user", ie, your question/statement/taskUserMessageWithImages
- the same asUserMessage
, but with images (URLs or Base64-encoded images)AIMessage
- the response from the AI model, when the "output" is textDataMessage
- the response from the AI model, when the "output" is data, eg, embeddings withaiembed
or user-defined structs withaiextract
Prompt Templates
We want to have re-usable "prompts", so we provide you with a system to retrieve pre-defined prompts with placeholders (eg, ) that you can replace with your inputs at the time of making the request.
"AI Templates" as we call them (AITemplate
) are usually a vector of SystemMessage
and a UserMessage
with specific purpose/task.
For example, the template :AssistantAsk
is defined loosely as:
template = [SystemMessage("You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer."),
UserMessage("# Question\n\n{{ask}}")]
Notice that we have a placeholder ask
() that you can replace with your question without having to re-write the generic system instructions.
When you provide a Symbol (eg, :AssistantAsk
) to ai* functions, thanks to the multiple dispatch, it recognizes that it's an AITemplate(:AssistantAsk)
and looks it up.
You can discover all available templates with aitemplates("some keyword")
or just see the details of some template aitemplates(:AssistantAsk)
.
Note: There is a new way to create and register templates in one go with create_template(;user=<user prompt>, system=<system prompt>, load_as=<template name>)
(it skips the serialization step where a template previously must have been saved somewhere on the disk). See FAQ for more details or directly ?create_template
.
ai* Functions Overview
The above steps are implemented in the ai*
functions, eg, aigenerate
, aiembed
, aiextract
, etc. They all have the same basic structure:
ai*(<optional schema>,<prompt or conversation>; <optional keyword arguments>)
,
but they differ in purpose:
aigenerate
is the general-purpose function to generate any text response with LLMs, ie, it returnsAIMessage
with field:content
containing the generated text (eg,ans.content isa AbstractString
)aiembed
is designed to extract embeddings from the AI model's response, ie, it returnsDataMessage
with field:content
containing the embeddings (eg,ans.content isa AbstractArray
)aiextract
is designed to extract structured data from the AI model's response and return them as a Julia struct (eg, if we providereturn_type=Food
, we getans.content isa Food
). You need to define the return type first and then provide it as a keyword argument.aitools
is designed for agentic workflows with a mix of tool calls and user inputs. It can work with simple functions and execute them.aiclassify
is designed to classify the input text into (or simply respond within) a set of discretechoices
provided by the user. It can be very useful as an LLM Judge or a router for RAG systems, as it uses the "logit bias trick" and generates exactly 1 token. It returnsAIMessage
with field:content
, but the:content
can be only one of the providedchoices
(eg,ans.content in choices
)aiscan
is for working with images and vision-enabled models (as an input), but it returnsAIMessage
with field:content
containing the generated text (eg,ans.content isa AbstractString
) similar toaigenerate
.aiimage
is for generating images (eg, with OpenAI DALL-E 3). It returns aDataMessage
, where the field:content
might contain either the URL to download the image from or the Base64-encoded image depending on the user-provided kwargapi_kwargs.response_format
.aitemplates
is a helper function to discover available templates and see their details (eg,aitemplates("some keyword")
oraitemplates(:AssistantAsk)
)
If you're using a known model
, you do NOT need to provide a schema
(the first argument).
Optional keyword arguments in ai*
tend to be:
model::String
- Which model you want to useverbose::Bool
- Whether you went to see INFO logs around AI costsreturn_all::Bool
- Whether you want the WHOLE conversation or just the AI answer (ie, whether you want to include your inputs/prompt in the output)api_kwargs::NamedTuple
- Specific parameters for the model, eg,temperature=0.0
to be NOT creative (and have more similar output in each run)http_kwargs::NamedTuple
- Parameters for the HTTP.jl package, eg,readtimeout = 120
to time out in 120 seconds if no response was received.
In addition to the above list of ai*
functions, you can also use the "lazy" counterparts of these functions from the experimental AgentTools module.
using PromptingTools.Experimental.AgentTools
For example, AIGenerate()
will create a lazy instance of aigenerate
. It is an instance of AICall
with aigenerate
as its ai function. It uses exactly the same arguments and keyword arguments as aigenerate
(see ?aigenerate
for details).
"lazy" refers to the fact that it does NOT generate any output when instantiated (only when run!
is called).
Or said differently, the AICall
struct and all its flavors (AIGenerate
, ...) are designed to facilitate a deferred execution model (lazy evaluation) for AI functions that interact with a Language Learning Model (LLM). It stores the necessary information for an AI call and executes the underlying AI function only when supplied with a UserMessage
or when the run!
method is applied.
This approach allows us to remember user inputs and trigger the LLM call repeatedly if needed, which enables automatic fixing (see ?airetry!
).
Example:
result = AIGenerate(:JuliaExpertAsk; ask="xyz", model="abc", api_kwargs=(; temperature=0.1))
result |> run!
# Is equivalent to
result = aigenerate(:JuliaExpertAsk; ask="xyz", model="abc", api_kwargs=(; temperature=0.1), return_all=true)
# The only difference is that we default to `return_all=true` with lazy types because we have a dedicated `conversation` field, which makes it much easier
Lazy AI calls and self-healing mechanisms unlock much more robust and useful LLM workflows!
Walkthrough Example for aigenerate
using PromptingTools
const PT = PromptingTools
# Let's say this is our ask
msg = aigenerate(:AssistantAsk; ask="What is the capital of France?")
# it is effectively the same as:
msg = aigenerate(PT.OpenAISchema(), PT.AITemplate(:AssistantAsk); ask="What is the capital of France?", model="gpt3t")
There is no model
provided, so we use the default PT.MODEL_CHAT
(effectively GPT3.5-Turbo). Then we look it up in PT.MDOEL_REGISTRY
and use the associated schema for it (OpenAISchema
in this case).
The next step is to render the template, replace the placeholders and render it for the OpenAI model.
# Let's remember out schema
schema = PT.OpenAISchema()
ask = "What is the capital of France?"
First, we obtain the template (no placeholder replacement yet) and "expand it"
template_rendered = PT.render(schema, AITemplate(:AssistantAsk); ask)
2-element Vector{PromptingTools.AbstractChatMessage}:
PromptingTools.SystemMessage("You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer.")
PromptingTools.UserMessage{String}("# Question\n\n{{ask}}", [:ask], :usermessage)
Second, we replace the placeholders
rendered_for_api = PT.render(schema, template_rendered; ask)
2-element Vector{Dict{String, Any}}:
Dict("role" => "system", "content" => "You are a world-class AI assistant. Your communication is brief and concise. You're precise and answer only when you're confident in the high quality of your answer.")
Dict("role" => "user", "content" => "# Question\n\nWhat is the capital of France?")
Notice that the placeholders are only replaced in the second step. The final output here is a vector of messages with "role" and "content" keys, which is the format required by the OpenAI API.
As a side note, under the hood, the second step is done in two sub-steps:
replace the placeholders
messages_rendered = PT.render(PT.NoSchema(), template_rendered; ask)
-> returns a vector of Messages!then, we convert the messages to the format required by the provider/schema
PT.render(schema, messages_rendered)
-> returns the OpenAI formatted messages
Next, we send the above rendered_for_api
to the OpenAI API and get the response back.
using OpenAI
OpenAI.create_chat(api_key, model, rendered_for_api)
The last step is to take the JSON response from the API and convert it to the AIMessage
object.
# simplification for educational purposes
msg = AIMessage(; content = r.response[:choices][1][:message][:content])
In practice, there are more fields we extract, so we define a utility for it: PT.response_to_message
. Especially, since with parameter n
, you can request multiple AI responses at once, so we want to re-use our response processing logic.
That's it! I hope you've learned something new about how PromptingTools.jl works under the hood.
Walkthrough Example for aiextract
Whereas aigenerate
is a general-purpose function to generate any text response with LLMs, aiextract
is designed to extract structured data from the AI model's response and return them as a Julia struct.
It's a bit more complicated than aigenerate
because it needs to handle the JSON schema of the return type (= our struct).
Let's define a toy example of a struct and see how aiextract
works under the hood.
using PromptingTools
const PT = PromptingTools
"""
Extract the name of the food from the sentence. Extract any provided adjectives for the food as well.
Example: "I am eating a crunchy bread." -> Food("bread", ["crunchy"])
"""
struct Food
name::String # required field!
adjectives::Union{Nothing,Vector{String}} # not required because `Nothing` is allowed
end
msg = aiextract("I just ate a delicious and juicy apple."; return_type=Food)
msg.content
# Food("apple", ["delicious", "juicy"])
You can see that we sent a prompt to the AI model and it returned a Food
object. We provided some light guidance as a docstring of the return type, but the AI model did the heavy lifting.
aiextract
leverages native "function calling" (supported by OpenAI, Fireworks, Together, and many others).
We encode the user-provided return_type
into the corresponding JSON schema and create the payload as per the specifications of the provider.
Let's how that's done:
sig = PT.function_call_signature(Food)
## Dict{String, Any} with 3 entries:
## "name" => "Food_extractor"
## "parameters" => Dict{String, Any}("properties"=>Dict{String, Any}("name"=>Dict("type"=>"string"), "adjectives"=>Dict{String, …
## "description" => "Extract the food from the sentence. Extract any provided adjectives for the food as well.\n\nExample: "
You can see that we capture the field names and types in parameters
and the description in description
key.
Furthermore, if we zoom in on the "parameter" field, you can see that we encode not only the names and types but also whether the fields are required (ie, do they allow Nothing
) You can see below that the field adjectives
accepts Nothing
, so it's not required. Only the name
field is required.
sig["parameters"]
## Dict{String, Any} with 3 entries:
## "properties" => Dict{String, Any}("name"=>Dict("type"=>"string"), "adjectives"=>Dict{String, Any}("items"=>Dict("type"=>"strin…
## "required" => ["name"]
## "type" => "object"
For aiextract
, the signature is provided to the API provider via tools
parameter, eg,
api_kwargs = (; tools = [Dict(:type => "function", :function => sig)])
Optionally, we can provide also tool_choice
parameter to specify which tool to use if we provided multiple (differs across providers).
When the message is returned, we extract the JSON object in the response and decode it into Julia object via JSON3.read(obj, Food)
. For example,
model_response = Dict(:tool_calls => [Dict(:function => Dict(:arguments => JSON3.write(Dict("name" => "apple", "adjectives" => ["delicious", "juicy"]))))])
food = JSON3.read(model_response[:tool_calls][1][:function][:arguments], Food)
# Output: Food("apple", ["delicious", "juicy"])
This is why you can sometimes have errors when you use abstract types in your return_type
-> to enable that, you would need to set the right StructTypes
behavior for your abstract type (see the JSON3.jl documentation for more details on how to do that).
It works quite well for concrete types and "vanilla" structs, though.
Unfortunately, function calling is generally NOT supported by locally-hosted / open-source models, so let's try to build a workaround with aigenerate
You need to pick a bigger / more powerful model, as it's NOT an easy task to output a correct JSON specification. My laptop isn't too powerful and I don't like waiting, so I'm going to use Mixtral model hosted on Together.ai (you get $25 credit when you join)!
model = "tmixtral" # tmixtral is an alias for "mistralai/Mixtral-8x7B-Instruct-v0.1" on Together.ai and it automatically sets `schema = TogetherOpenAISchema()`
We'll add the signature to the prompt and we'll request the JSON output in two places - in the prompt and in the api_kwargs
(to ensure that the model outputs the JSON via "grammar") NOTE: You can write much better and more specific prompt if you have a specific task / return type in mind + you should make sure that the prompt + struct description make sense together!
Let's define a prompt and return_type
. Notice that we add several placeholders (eg, ) to fill with user inputs later.
prompt = """
You're a world-class data extraction engine.
Your task is to extract information formatted as per the user provided schema.
You MUST response in JSON format.
**Example:**
---------
Description: "Extract the Car from the sentence. Extract the corresponding brand and model as well."
Input: "I drive a black Porsche 911 Turbo."
Schema: "{\"properties\":{\"model\":{\"type\":\"string\"},\"brand\":{\"type\":\"string\"}},\"required\":[\"brand\",\"model\"],\"type\":\"object\"}"
Output: "{\"model\":\"911 Turbo\",\"brand\":\"Porsche\"}"
---------
**User Request:**
Description: {{description}}
Input: {{input}}
Schema: {{signature}}
Output:
You MUST OUTPUT in JSON format.
"""
We need to extract the "signature of our return_type
and put it in the right placeholders. Let's generate now!
sig = PT.function_call_signature(Food)
result = aigenerate(prompt; input="I just ate a delicious and juicy apple.",
schema=JSON3.write(sig["parameters"]), description=sig["description"],
## We provide the JSON output requirement as per API docs: https://docs.together.ai/docs/json-mode
model, api_kwargs=(; response_format=Dict("type" => "json_object"), temperature=0.2), return_all=true)
result[end].content
## "{\n \"adjectives\": [\"delicious\", \"juicy\"],\n \"food\": \"apple\"\n}"
We're using a smaller model, so the output is not perfect. Let's try to load into our object:
obj = JSON3.read(result[end].content, Food)
# Output: ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type String
Unfortunately, we get an error because the model mixed up the key "name" for "food", so it cannot be parsed.
Fortunately, we can do better and use automatic fixing! All we need to do is to change from aigenerate
-> AIGenerate
(and use airetry!
)
The signature of AIGenerate
is identical to aigenerate
with the exception of config
field, where we can influence the future retry
behaviour.
result = AIGenerate(prompt; input="I just ate a delicious and juicy apple.",
schema=JSON3.write(sig["parameters"]), description=sig["description"],
## We provide the JSON output requirement as per API docs: https://docs.together.ai/docs/json-mode
model, api_kwargs=(; response_format=Dict("type" => "json_object"), temperature=0.2),
## limit the number of retries, default is 10 rounds
config=RetryConfig(; max_retries=3))
run!(result) # run! triggers the generation step (to have some AI output to check)
Let's set up a retry mechanism with some practical feedback. We'll leverage airetry!
to automatically retry the request and provide feedback to the model. Think of airetry!
as @assert
on steroids:
@assert CONDITION MESSAGE
→ airetry! CONDITION <state> MESSAGE
The main benefits of airetry!
are:
It can retry automatically, not just throw an error
It manages the "conversation’ (list of messages) for you, including adding user-provided feedback to help generate better output
feedback = "The output is not in the correct format. The keys should be $(join([string("\"$f\"") for f in fieldnames(Food)],", "))."
# We use do-syntax with provide the `CONDITION` (it must return Bool)
airetry!(result, feedback) do conv
## try to convert
obj = try
JSON3.read(last_output(conv), Food)
catch e
## you could save the error and provide as feedback (eg, into a slot in the `:memory` field of the AICall object)
e
end
## Check if the conversion was successful; if it's `false`, it will retry
obj isa Food # -> Bool
end
food = JSON3.read(last_output(result), Food)
## [ Info: Condition not met. Retrying...
## Output: Food("apple", ["delicious", "juicy"])
It took 1 retry (see result.config.retries
) and we have the correct output from an open-source model!
If you're interested in the result
object, it's a struct (AICall
) with a field conversation
, which holds the conversation up to this point. AIGenerate is an alias for AICall using aigenerate
function. See ?AICall
(the underlying struct type) for more details on the fields and methods available.