Performance Benchmarks¶

Baseline measurements on Apple M-series (macOS, ReleaseSafe build).

Benchmark Programs¶

fib -- Naive Fibonacci¶

Pure recursion with fixnum arithmetic. Exercises function call overhead and JIT compilation of hot paths.

(define (fib n)
  (if (< n 2) n
      (+ (fib (- n 1)) (fib (- n 2)))))

tak -- Takeuchi Function¶

Deep recursion with arithmetic and comparisons. A classic benchmark for call overhead and argument passing.

(define (tak x y z)
  (if (not (< y x))
      z
      (tak (tak (- x 1) y z)
           (tak (- y 1) z x)
           (tak (- z 1) x y))))

Results¶

Benchmark	Input	Result	Kaappi	Notes
fib	35	9227465	~1.9s	Pure recursion, fixnum arithmetic
tak	33,22,11	12	~46s	Deep recursion + arithmetic + comparisons

Measured with (current-jiffy) / (jiffies-per-second). Pre-built executable (zig-out/bin/kaappi), no compilation overhead.

Micro-benchmarks¶

Informal measurements of individual operations (not from the benchmark harness above):

Operation	Input	Kaappi	Notes
`hash-table-set!`	10K entries	~1ms	Open-addressing with linear probing
`hash-table-set!`	50K entries	~5ms	Linear scaling
`iota`	100K elements	~1.4ms	List allocation + GC
`fold +`	100K elements	~2.1ms	List traversal

Comparison Context¶

Approximate results from ecraven/r7rs-benchmarks for the same benchmark programs. These numbers are from different hardware and implementation versions -- they show rough relative positioning, not exact comparisons.

Implementation	fib(35)	tak(33,22,11)	Type
Chez Scheme	~0.15s	~1.2s	Native compiler
Chicken	~0.5s	~4s	AOT compiler (C backend)
Gauche	~1.8s	~15s	Bytecode interpreter
Chibi	~3.5s	~30s	Bytecode interpreter
Kaappi	~1.9s	~46s	Bytecode interpreter + JIT

Kaappi is faster than Chibi for fib (1.9s vs 3.5s) and comparable on tak. The JIT inlines fixnum arithmetic, comparisons, car/cdr, and cons for hot functions.

Optimizations Implemented¶

JIT compiler (AArch64, x86_64): hot functions (100+ calls) compiled to native code; inline fixnum +/-/*/</>/<=/>=/=, predicates (zero?, null?, pair?, not), car/cdr, cons. On riscv64, the interpreter handles all execution (no JIT backend yet).
NativeFn fast path: call_global bypasses the full dispatch chain for native functions
Constant folding: (+ 1 2) → 3 at compile time
Inline global cache: call_global caches resolved function pointers with version invalidation
Open-addressing hash tables: O(1) lookup/insert (was O(n) linear scan — 1000x speedup)
Self-tail-call optimization: detected at compile time, reuses the current frame

Running Benchmarks¶

kaappi benchmarks/tak_direct.scm   # run tak(33,22,11)

The benchmarks/ directory contains additional benchmark scripts that read parameters from stdin via the r7rs-benchmarks harness:

Script	Input format	Example
`fib.scm`	`count input expected`	`1 35 9227465`
`tak.scm`	`count x y z expected`	`1 33 22 11 12`
`nqueens.scm`	`count n expected`	`1 10 724`
`primes.scm`	`count n expected`	`1 10000 1229`

A separate call/cc micro-benchmark is also available:

zig build bench    # call/cc capture overhead benchmark

GC Metrics¶

Use --gc-stats to see garbage collection statistics alongside benchmark results (collections, live objects, heap size):

kaappi --gc-stats benchmarks/fib.scm

The benchmark runner also outputs JSON for programmatic analysis. Compare two JSON snapshots with benchmarks/compare-benchmarks.sh to flag regressions above a configurable threshold (default 10%):

benchmarks/compare-benchmarks.sh baseline.json current.json

The script exits non-zero on regression, making it suitable for CI gates. Set THRESHOLD to adjust the sensitivity (e.g., THRESHOLD=5).

Use --profile to get per-function timing for any Scheme program:

kaappi --profile program.scm