Frequently Asked Questions
Why OpenAI
OpenAI's models are at the forefront of AI research and provide robust, state-of-the-art capabilities for many tasks.
There will be situations not or cannot use it (eg, privacy, cost, etc.). In that case, you can use local models (eg, Ollama) or other APIs (eg, Anthropic).
Note: To get started with Ollama.ai, see the Setup Guide for Ollama section below.
What if I cannot access OpenAI?
There are many alternatives:
Other APIs: MistralAI, Anthropic, Google, Together, Fireworks, Voyager (the latter ones tend to give free credits upon joining!)
Locally-hosted models: Llama.cpp/Llama.jl, Ollama, vLLM (see the examples and the corresponding docs)
Data Privacy and OpenAI
At the time of writing, OpenAI does NOT use the API calls for training their models.
API
OpenAI does not use data submitted to and generated by our API to train OpenAI models or improve OpenAI’s service offering. In order to support the continuous improvement of our models, you can fill out this form to opt-in to share your data with us. – How your data is used to improve our models
You can always double-check the latest information on the OpenAI's How we use your data page.
Resources:
Creating OpenAI API Key
You can get your API key from OpenAI by signing up for an account and accessing the API section of the OpenAI website.
Create an account with OpenAI
Go to API Key page
Click on “Create new secret key”
!!! Do not share it with anyone and do NOT save it to any files that get synced online.
Resources:
Pro tip: Always set the spending limits!
Getting an error "ArgumentError: api_key cannot be empty" despite having set OPENAI_API_KEY
? {#Getting-an-error-"ArgumentError:-apikey-cannot-be-empty"-despite-having-set-OPENAIAPI_KEY?}
Quick fix: just provide kwarg api_key
with your key to the aigenerate
function (and other ai*
functions).
This error is thrown when the OpenAI API key is not available in 1) local preferences or 2) environment variables (ENV["OPENAI_API_KEY"]
).
First, check if you can access the key by running ENV["OPENAI_API_KEY"]
in the Julia REPL. If it returns nothing
, the key is not set.
If the key is set, but you still get the error, there was a rare bug in earlier versions where if you first precompiled PromptingTools without the API key, it would remember it and "compile away" the get(ENV,...)
function call. If you're experiencing this bug on the latest version of PromptingTools, please open an issue on GitHub.
The solution is to force a new precompilation, so you can do any of the below:
Force precompilation (run
Pkg.precompile()
in the Julia REPL)Update the PromptingTools package (runs precompilation automatically)
Delete your compiled cache in
.julia
DEPOT (usually.julia/compiled/v1.10/PromptingTools
). You can do it manually in the file explorer or via Julia REPL:rm("~/.julia/compiled/v1.10/PromptingTools", recursive=true, force=true)
Getting an error "Rate limit exceeded" from OpenAI?
Have you opened a new account recently? It is quite likely that you've exceeded the free tier limits.
OpenAI has a rate limit on the number of requests and the number of tokens you can make in a given period. If you exceed either of these, you will receive a "Rate limit exceeded" error. "Free tier" (ie, before you pay the first 5 USD) has very low limits, eg, maximum of 3 requests per minute. See the OpenAI Rate Limits for more information.
If you look at the HTTP response headers in the error, you can see the limits remaining and how long until it resets, eg, x-ratelimit-remaining-*
and x-ratelimit-reset-*
.
If you want to avoid this error, you have two options:
Put a simple
sleep(x)
after every request, wherex
is calculated so that the number of your requests stays below the limit.Use
ntasks
keyword argument inasyncmap
to limit the number of concurrent requests. Eg, let's assume you want to process 100x c. 10,000 tokens, but your tier limit is only 60,000 tokens per minute. If we know that one request takes c. 10 seconds, it means that withntasks=1
we would send 6 requests per minute, which already maxes out our limit. If we setntasks=2
, we could process 12 requests per minute, so we would need our limit to be 120,000 tokens per minute.
# simple asyncmap loop with 2 concurrent requests; otherwise, same syntax as `map`
asyncmap(my_prompts; ntasks=2) do prompt
aigenerate(prompt)
end
Getting the error "429 Too Many Requests"?
Assuming you have not just sent hundreds of requests, this error might be related to insufficient "credits" in your account balance.
See the error message. If it says "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors", you'll need to re-charge your account balance. Visit Billing overview.
Please note that, unlike ChatGPT, OpenAI API is NOT free. However, individual requests are extremely cheap (eg, tenth of a cent), so if you charge 5 , it might last you up to hundreds of requests (depending on the models and prompts).
Setting OpenAI Spending Limits
OpenAI allows you to set spending limits directly on your account dashboard to prevent unexpected costs.
Go to OpenAI Billing
Set Soft Limit (you’ll receive a notification) and Hard Limit (API will stop working not to spend more money)
A good start might be a soft limit of c.$5 and a hard limit of c.$10 - you can always increase it later in the month.
Resources:
How much does it cost? Is it worth paying for?
If you use a local model (eg, with Ollama), it's free. If you use any commercial APIs (eg, OpenAI), you will likely pay per "token" (a sub-word unit).
For example, a simple request with a simple question and 1 sentence response in return (”Is statement XYZ a positive comment”) will cost you ~0.0001 (ie, one-hundredth of a cent)
Is it worth paying for?
GenAI is a way to buy time! You can pay cents to save tens of minutes every day.
Continuing the example above, imagine you have a table with 200 comments. Now, you can parse each one of them with an LLM for the features/checks you need. Assuming the price per call was 0.0001 , you'd pay 2 cents for the job and save 30-60 minutes of your time!
Resources:
How to try new OpenAI models if I'm not Tier 5 customer?
As of September 2024, you cannot access the new o1 models via API unless you're a Tier 5 customer.
Fortunately, you can use OpenRouter to access these new models.
Get your API key from OpenRouter
Add some minimum Credits to the account (eg, 5 ).
Set it as an environment variable (or use local preferences):
ENV["OPENROUTER_API_KEY"] = "<your key>"
Use the model aliases with
or
prefix, eg,oro1
for o1-preview ororo1m
for o1-mini.
Example:
# Let's use o1-preview model hosted on OpenRouter ("or" prefix)
msg = aigenerate("What is the meaning of life?"; model="oro1")
Note: There are some quirks for the o1 models. For example, the new o1 series does NOT support SystemMessage
yet, so OpenRouter does some tricks (likely converting them to normal user messages). To be in control of this behavior and have comparable behavior to the native OpenAI API, you can use kwarg no_system_message=true
in aigenerate
to ensure OpenRouter does not do any tricks.
Example:
# Let's use o1-mini and disable adding automatic system message
msg = aigenerate("What is the meaning of life?"; model="oro1m", no_system_message=true)
Configuring the Environment Variable for API Key
This is a guide for OpenAI's API key, but it works for any other API key you might need (eg, MISTRALAI_API_KEY
for MistralAI API).
To use the OpenAI API with PromptingTools.jl, set your API key as an environment variable:
ENV["OPENAI_API_KEY"] = "your-api-key"
As a one-off, you can:
set it in the terminal before launching Julia:
export OPENAI_API_KEY = <your key>
set it in your
setup.jl
(make sure not to commit it to GitHub!)
Make sure to start Julia from the same terminal window where you set the variable. Easy check in Julia, run ENV["OPENAI_API_KEY"]
and you should see your key!
A better way:
On a Mac, add the configuration line to your terminal's configuration file (eg,
~/.zshrc
). It will get automatically loaded every time you launch the terminalOn Windows, set it as a system variable in "Environment Variables" settings (see the Resources)
Resources:
Setting the API Key via Preferences.jl
You can also set the API key in LocalPreferences.toml
, so it persists across sessions and projects.
Use: PromptingTools.set_preferences!("OPENAI_API_KEY"=>"your-api-key")
To double-check, run PromptingTools.get_preferences("OPENAI_API_KEY")
and you should see your key!
See more detail in the ?PromptingTools.PREFERENCES
docstring.
Understanding the API Keyword Arguments in aigenerate
(api_kwargs
)
See OpenAI API reference for more information.
Instant Access from Anywhere
For easy access from anywhere, add PromptingTools into your startup.jl
(can be found in ~/.julia/config/startup.jl
).
Add the following snippet:
using PromptingTools
const PT = PromptingTools # to access unexported functions and types
Now, you can just use ai"Help me do X to achieve Y"
from any REPL session!
Open Source Alternatives
The ethos of PromptingTools.jl is to allow you to use whatever model you want, which includes Open Source LLMs. The most popular and easiest to setup is Ollama.ai - see below for more information.
Setup Guide for Ollama
Ollama runs a background service hosting LLMs that you can access via a simple API. It's especially useful when you're working with some sensitive data that should not be sent anywhere.
Installation is very easy, just download the latest version here.
Once you've installed it, just launch the app and you're ready to go!
To check if it's running, go to your browser and open 127.0.0.1:11434
. You should see the message "Ollama is running". Alternatively, you can run ollama serve
in your terminal and you'll get a message that it's already running.
There are many models available in Ollama Library, including Llama2, CodeLlama, SQLCoder, or my personal favorite openhermes2.5-mistral
.
Download new models with ollama pull <model_name>
(eg, ollama pull openhermes2.5-mistral
).
Show currently available models with ollama list
.
See Ollama.ai for more information.
Changing the Default Model or Schema
If you tend to use non-default options, it can get tedious to specify PT.*
every time.
There are three ways how you can customize your workflows (especially when you use Ollama or other local models):
Import the functions/types you need explicitly at the top (eg,
using PromptingTools: OllamaSchema
)Register your model and its associated schema (
PT.register_model!(; name="123", schema=PT.OllamaSchema())
). You won't have to specify the schema anymore only the model name. See Working with Ollama for more information.Override your default model (
PT.MODEL_CHAT
) and schema (PT.PROMPT_SCHEMA
). It can be done persistently with Preferences, eg,PT.set_preferences!("PROMPT_SCHEMA" => "OllamaSchema", "MODEL_CHAT"=>"llama2")
.
Using Custom API Providers like Azure or Databricks
Several providers are directly supported (eg, Databricks), check the available prompt schemas (eg, subtypes(PT.AbstractOpenAISchema)
).
If you need a custom URL or a few keyword parameters, refer to the implementation of DatabricksOpenAISchema. You effectively need to create your own prompt schema (struct MySchema <: PT.AbstractOpenAISchema
) and override the OpenAI.jl behavior. The easiest way is to provide your custom method for OpenAI.create_chat
and customize the url
, api_key
, and other kwargs
fields. You can follow the implementation of create_chat
for DatabricksOpenAISchema
in src/llm_openAI.jl
.
Once your schema is ready, you can register the necessary models via PT.register_model!(; name="myschema", schema=MySchema())
. You can also add aliases for easier access (eg, PT.MODEL_ALIASES["mymodel"] = "my-model-with-really-long-name"
).
If you would like to use some heavily customized API, eg, your company's internal LLM proxy (to change headers, URL paths, etc.), refer to the example examples/adding_custom_API.jl
in the repo.
How to have Multi-turn Conversations?
Let's say you would like to respond back to a model's response. How to do it?
- With
ai""
macro
The simplest way if you used ai""
macro, is to send a reply with the ai!""
macro. It will use the last response as the conversation.
ai"Hi! I'm John"
ai!"What's my name?"
# Return: "Your name is John."
- With
aigenerate
function
You can use the conversation
keyword argument to pass the previous conversation (in all ai*
functions). It will prepend the past conversation
before sending the new request to the model.
To get the conversation, set return_all=true
and store the whole conversation thread (not just the last message) in a variable. Then, use it as a keyword argument in the next call.
conversation = aigenerate("Hi! I'm John"; return_all=true)
@info last(conversation) # display the response
# follow-up (notice that we provide past messages as conversation kwarg
conversation = aigenerate("What's my name?"; return_all=true, conversation)
## [ Info: Tokens: 50 @ Cost: $0.0 in 1.0 seconds
## 5-element Vector{PromptingTools.AbstractMessage}:
## PromptingTools.SystemMessage("Act as a helpful AI assistant")
## PromptingTools.UserMessage("Hi! I'm John")
## AIMessage("Hello John! How can I assist you today?")
## PromptingTools.UserMessage("What's my name?")
## AIMessage("Your name is John.")
Notice that the last message is the response to the second request, but with return_all=true
we can see the whole conversation from the beginning.
How to have typed responses?
Our responses are always in AbstractMessage
types to ensure we can also handle downstream processing, error handling, and self-healing code (see airetry!
).
A good use case for a typed response is when you have a complicated control flow and would like to group and handle certain outcomes differently. You can easily do it as an extra step after the response is received.
Trivially, we can use aiclassifier
for Bool statements, eg,
# We can do either
mybool = tryparse(Bool, aiclassify("Is two plus two four?")) isa Bool # true
# or simply check equality
msg = aiclassify("Is two plus two four?") # true
mybool = msg.content == "true"
Now a more complicated example with multiple categories mapping to an enum:
choices = [("A", "any animal or creature"), ("P", "for any plant or tree"), ("O", "for everything else")]
# Set up the return types we want
@enum Categories A P O
string_to_category = Dict("A" => A, "P" => P,"O" => O)
# Run an example
input = "spider"
msg = aiclassify(:InputClassifier; choices, input)
mytype = string_to_category[msg.content] # A (for animal)
How does it work? aiclassify
guarantees to output one of our choices (and it handles some of the common quirks)!
How would we achieve the same with aigenerate
and arbitrary struct? We need to use the "lazy" AIGenerate
struct and airetry!
to ensure we get the response and then we can process it further.
AIGenerate
has two fields you should know about:
conversation
- eg, the vector of "messages" in the current conversation (same as what you get fromaigenerate
withreturn_all=true
)success
- a boolean flag if the request was successful AND if it passed any subsequentairetry!
calls
Let's mimic a case where our "program" should return one of three types: SmallInt
, LargeInt
, FailedResponse
.
We first need to define our custom types:
# not needed, just to show a fully typed example
abstract type MyAbstractResponse end
struct SmallInt <: MyAbstractResponse
number::Int
end
struct LargeInt <: MyAbstractResponse
number::Int
end
struct FailedResponse <: MyAbstractResponse
content::String
end
Let's define our "program" as a function to be cleaner. Notice that we use AIGenerate
and airetry!
to ensure we get the response and then we can process it further.
using PromptingTools.Experimental.AgentTools
function give_me_number(prompt::String)::MyAbstractResponse
# Generate the response
response = AIGenerate(prompt; config=RetryConfig(;max_retries=2)) |> run!
# Check if it's parseable as Int, if not, send back to be fixed
# syntax: airetry!(CONDITION-TO-CHECK, <response object>, FEEDBACK-TO-MODEL)
airetry!(x->tryparse(Int,last_output(x))|>!isnothing, response, "Wrong output format! Answer with digits and nothing else. The number is:")
if response.success != true
## we failed to generate a parseable integer
return FailedResponse("I failed to get the response. Last output: $(last_output(response))")
end
number = tryparse(Int,last_output(response))
return number < 1000 ? SmallInt(number) : LargeInt(number)
end
give_me_number("How many car seats are in Porsche 911T?")
## [ Info: Condition not met. Retrying...
## [ Info: Condition not met. Retrying...
## SmallInt(2)
We ultimately received our custom type SmallInt
with the number of car seats in the Porsche 911T (I hope it's correct!).
If you want to access the full conversation history (all the attempts and feedback), simply output the response
object and explore response.conversation
.
How to quickly create a prompt template?
Many times, you will want to create a prompt template that you can reuse with different inputs (eg, to create templates for AIHelpMe or LLMTextAnalysis).
Previously, you would have to create a vector of SystemMessage
and UserMessage
objects and then save it to a disk and reload. Now, you can use the create_template
function to do it for you. It's designed for quick prototyping, so it skips the serialization step and loads it directly into the template store (ie, you can use it like any other templates - try aitemplates()
search).
The syntax is simple: create_template(;user=<user prompt>, system=<system prompt>, load_as=<template name>)
When called it creates a vector of messages, which you can use directly in the ai*
functions. If you provide load_as
, it will load the template in the template store (under the load_as
name).
Let's generate a quick template for a simple conversation (only one placeholder: name)
# first system message, then user message (or use kwargs)
tpl=PT.create_template("You must speak like a pirate", "Say hi to {{name}}"; load_as="GreatingPirate")
## 2-element Vector{PromptingTools.AbstractChatMessage}:
## PromptingTools.SystemMessage("You must speak like a pirate")
## PromptingTools.UserMessage("Say hi to {{name}}")
You can immediately use this template in ai*
functions:
aigenerate(tpl; name="Jack Sparrow")
# Output: AIMessage("Arr, me hearty! Best be sending me regards to Captain Jack Sparrow on the salty seas! May his compass always point true to the nearest treasure trove. Yarrr!")
Since we provided load_as
, it's also registered in the template store:
aitemplates("pirate")
## 1-element Vector{AITemplateMetadata}:
## PromptingTools.AITemplateMetadata
## name: Symbol GreatingPirate
## description: String ""
## version: String "1.0"
## wordcount: Int64 46
## variables: Array{Symbol}((1,))
## system_preview: String "You must speak like a pirate"
## user_preview: String "Say hi to {{name}}"
## source: String ""
So you can use it like any other template:
aigenerate(:GreatingPirate; name="Jack Sparrow")
# Output: AIMessage("Arr, me hearty! Best be sending me regards to Captain Jack Sparrow on the salty seas! May his compass always point true to the nearest treasure trove. Yarrr!")
If you want to save it in your project folder:
PT.save_template("templates/GreatingPirate.json", tpl; version="1.0") # optionally, add description
It will be saved and accessed under its basename, ie, GreatingPirate
(same as load_as
keyword argument).
Note: If you make any changes to the templates on the disk/in a folder, you need to explicitly reload all templates again!
If you are using the main PromptingTools templates, you can simply call PT.load_templates!()
. If you have a project folder with your templates, you want to add it first:
PT.load_templates!("templates")
After the first run, we will remember the folder and you can simply call PT.load_templates!()
to reload all the templates in the future!
Do we have a RecursiveCharacterTextSplitter like Langchain?
Yes, we do! Look for utility recursive_spliter
(previously known as split_by_length
). See its docstring for more information.
For reference, Langchain's RecursiveCharacterTextSplitter
uses the following setting: separators = ["\n\n", "\n", " ", ""]
.
I'd recommend using the following instead: separators = ["\\n\\n", ". ", "\\n", " "]
(ie, it does not split words, which tends to be unnecessary and quite damaging to the chunk quality).
Example:
using PromptingTools: recursive_splitter
text = "Paragraph 1\n\nParagraph 2. Sentence 1. Sentence 2.\nParagraph 3"
separators = ["\n\n", ". ", "\n", " "] # split by paragraphs, sentences, and newlines, and words
chunks = recursive_splitter(text, separators, max_length=10)
How would I fine-tune a model?
Fine-tuning is a powerful technique to adapt a model to your specific use case (mostly the format/syntax/task). It requires a dataset of examples, which you can now easily generate with PromptingTools.jl!
You can save any conversation (vector of messages) to a file with
PT.save_conversation("filename.json", conversation)
.Once the finetuning time comes, create a bundle of ShareGPT-formatted conversations (common finetuning format) in a single
.jsonl
file. UsePT.save_conversations("dataset.jsonl", [conversation1, conversation2, ...])
(notice that plural "conversationS" in the function name).
For an example of an end-to-end finetuning process, check out our sister project JuliaLLMLeaderboard Finetuning experiment. It shows the process of finetuning for half a dollar with JarvisLabs.ai and Axolotl.
Can I see how my prompt is rendered / what is sent to the API?
Yes, there are two ways.
"dry run", where the
ai*
function will return the prompt rendered in the style of the selected API provider"partial render", for provider-agnostic purposes, you can run only the first step of the rendering pipeline to see the messages that will be sent (but formatted as
SystemMessage
andUserMessage
), which is easy to read and work withDry Run
Add kwargs dry_run
and return_all
to see what could have been sent to the API to your ai*
functions (without return_all
there is nothing to show you).
Example for OpenAI:
dry_conv = aigenerate(:BlankSystemUser; system = "I exist", user = "say hi",
model = "lngpt3t", return_all = true, dry_run = true)
2-element Vector{Dict{String, Any}}:
Dict("role" => "system", "content" => "I exist")
Dict("role" => "user", "content" => "say hi")
- Partial Render
Personally, I prefer to see the pretty formatting of PromptingTools *Messages. To see what will be sent to the model, you can render
only the first stage of the rendering pipeline with schema NoSchema()
(it merely does the variable replacements and creates the necessary messages). It's shared by all the schema/providers.
PT.render(PT.NoSchema(), "say hi, {{name}}"; name="John")
2-element Vector{PromptingTools.AbstractMessage}:
PromptingTools.SystemMessage("Act as a helpful AI assistant")
PromptingTools.UserMessage("say hi, John")
What about the prompt templates? Prompt templates have an extra pre-rendering step that expands the symbolic :name
(understood by PromptingTools as a reference to AITemplate(:name)
) into a vector of Messages.
# expand the template into messages
tpl = PT.render(AITemplate(:BlankSystemUser))
PT.render(PT.NoSchema(), tpl; system = "I exist", user = "say hi")
# replace any variables, etc.
2-element Vector{PromptingTools.AbstractMessage}:
PromptingTools.SystemMessage("I exist")
PromptingTools.UserMessage("say hi")
For more information about the rendering pipeline and examples refer to Walkthrough Example for aigenerate.
Automatic Logging / Tracing
If you would like to automatically capture metadata about your conversations, you can use the TracerSchema
. It automatically captures the necessary metadata such as model, task (parent_id
), current thread (thread_id
), API kwargs used and any prompt templates (and its versions).
using PromptingTools: TracerSchema, OpenAISchema
wrap_schema = TracerSchema(OpenAISchema())
msg = aigenerate(wrap_schema, "Say hi!"; model="gpt-4")
# output type should be TracerMessage
msg isa TracerMessage
You can work with the message like any other message (properties of the inner object
are overloaded). You can extract the original message with unwrap
:
unwrap(msg) isa String
You can extract the metadata with meta
:
meta(msg) isa Dict
If you would like to automatically save the conversations, you can use the SaverSchema
. It automatically serializes the conversation to a file in the directory specified by the environment variable LOG_DIR
.
using PromptingTools: SaverSchema
wrap_schema = SaverSchema(OpenAISchema())
msg = aigenerate(wrap_schema, "Say hi!"; model="gpt-4")
See LOG_DIR
location to find the serialized conversation.
You can also compose multiple tracing schemas. For example, you can capture metadata with TracerSchema
and then save everything automatically with SaverSchema
:
using PromptingTools: TracerSchema, SaverSchema, OpenAISchema
wrap_schema = OpenAISchema() |> TracerSchema |> SaverSchema
conv = aigenerate(wrap_schema,:BlankSystemUser; system="You're a French-speaking assistant!",
user="Say hi!"; model="gpt-4", api_kwargs=(;temperature=0.1), return_all=true)
conv
is a vector of tracing messages that will be saved to a JSON together with metadata about the template and api_kwargs
.
If you would like to enable this behavior automatically, you can register your favorite model (or re-register existing models) with the "wrapped" schema:
PT.register_model!(; name= "gpt-3.5-turbo", schema=OpenAISchema() |> TracerSchema |> SaverSchema)