PromptTriage

Research

Data-driven studies on prompt engineering and LLM behavior. All evaluations scored by a multi-model jury on a 100-point scale.

AI Format Wars

Does the shape of your prompt matter?

1,080 evals · 5 models · 3-judge jury

Study CComing soon

Does your system prompt actually matter?

104 evals · 2 models · 3-judge jury

AnalysisComing soon

3 critical anti-patterns found in production systems.

170 prompts analyzed

Datasets and scripts are open-source on GitHub.