JuliaLLMLeaderboard
Documentation for Julia LLM Leaderboard.
Introduction
Welcome to the Julia Code Generation Benchmark Repository!
This project is designed for the Julia community to compare the code generation capabilities of various AI models. Unlike academic benchmarks, our focus is practicality and simplicity: "Generate code, run it, and see if it works(-ish)."
This repository aims to understand how different AI models and prompting strategies perform in generating syntactically correct Julia code to guide users in choosing the best model for their needs.
Itchy fingers? Open the Results section or just run your own benchmark with run_benchmark()
(eg, examples/code_gen_benchmark.jl
).
First Steps
To get started with benchmarking, see the Getting Started section, or simply continue to results:
- Results for Local LLM Models
- Results for Paid LLM APIs
- Results by Prompt Templates
- Results by Test Cases
Feedback and Improvements
We highly value community input. If you have suggestions or ideas for improvement, please open an issue. All contributions are welcome!