JuliaLLMLeaderboard

Documentation for Julia LLM Leaderboard.

Introduction

Welcome to the Julia Code Generation Benchmark Repository!

This project is designed for the Julia community to compare the code generation capabilities of various AI models. Unlike academic benchmarks, our focus is practicality and simplicity: "Generate code, run it, and see if it works(-ish)."

This repository aims to understand how different AI models and prompting strategies perform in generating syntactically correct Julia code to guide users in choosing the best model for their needs.

Itchy fingers? Open the Results section or just run your own benchmark with run_benchmark() (eg, examples/code_gen_benchmark.jl).

First Steps

To get started with benchmarking, see the Getting Started section, or simply continue to results:

Results for Local LLM Models
Results for Paid LLM APIs
Results by Prompt Templates
Results by Test Cases

Feedback and Improvements

We highly value community input. If you have suggestions or ideas for improvement, please open an issue. All contributions are welcome!