VIOLIN Logo
VIOLIN
Level-4 Obedience3 Deterministic Tasks

VIOLIN Leaderboard

The VIOLIN leaderboard evaluates image generation models on Visual Instruction Obedience Level-4 EvaluatIoN โ€” the first systematic benchmark targeting deterministic pixel-level control.

State-of-the-art generative models excel at complex scenes yet fail at trivially simple tasks โ€” the "Paradox of Simplicity". Violin exposes this failure through three zero- or low-entropy tasks: Pure Color Generation, Image Masking, and Geometric Shape Generation. The benchmark also includes Violin-Absolute-Color, a supplementary zero-entropy extension with six hex-code variations.

๐ŸŽจ
Pure Color
Pixel-perfect uniform color blocks via ISCC-NBS color names
๐Ÿ–ผ๏ธ
Image Masking
Binary mask application with strict spatial adherence
โญ•
Geometric Shape
Precise shape generation at defined spatial coordinates

๐Ÿ† Main Benchmark Leaderboard

Three deterministic Level-4 Obedience tasks. Lower error scores indicate better obedience.

๐ŸŽจPure Color GenerationSingle Block

Generate uniform color blocks using ISCC-NBS Level-2 natural language color names. Measures pixel-level color accuracy and image purity.

RankModelTypeHigher is betterLower is better
Seedream-5ByteDance
Closed
95.1
0.049
GPT-Image-2OpenAI
Closed
94.9
0.051
Nano-Banana-2Google
Closed
94.7
0.053
4
FLUX.2Black Forest Labs
Open
94.3
0.057
5
Qwen-ImageAlibaba
Open
94.3
0.057
6
Z-ImageHuawei
Open
93.9
0.061
7
FLUX.1Black Forest Labs
Open
90.9
0.091
๐Ÿ–ผ๏ธImage MaskingInpainting ยท Outpainting ยท Random

Apply binary masks (Inpainting / Outpainting / Random) to images with strict pixel-level adherence. Evaluates spatial coverage and boundary precision.

RankModelTypeHigher is betterLower is better
GPT-Image-2OpenAI
Closed
84.2
0.158
Seedream-5ByteDance
Closed
72.0
0.280
Nano-Banana-2Google
Closed
63.4
0.366
4
FLUX.2Black Forest Labs
Open
48.8
0.512
โญ•Geometric Shape GenerationCircle ยท Square ยท Triangle

Generate circles, squares, and triangles at precisely specified spatial positions. Measures shape fidelity, localization accuracy, and fill purity.

RankModelTypeHigher is betterLower is better
GPT-Image-2OpenAI
Closed
86.6
0.134
Seedream-5ByteDance
Closed
79.3
0.207
Nano-Banana-2Google
Closed
71.8
0.282
4
FLUX.2Black Forest Labs
Open
69.6
0.304
5
Qwen-ImageAlibaba
Open
66.3
0.337
6
Z-ImageHuawei
Open
62.4
0.376
7
FLUX.1Black Forest Labs
Open
59.3
0.407

๐Ÿ’ก Click any row to view detailed metric breakdown. All metrics are error measures โ€” lower is better.

๐Ÿ“ฆ Supplementary: Violin-Absolute-Color

Zero-entropy color task โ€” hex code prompts across 6 variations, 5 open-source models.

Var-1: Single Color (Hex)

Single uniform color block, specified with hexadecimal code.

RankModelrgb-edlab-00sdcedhfcolor-mean
1
Qwen-Image
Alibaba
0.1560.1800.0580.0020.0210.083
2
FLUX.1
Black Forest Labs
0.3640.3870.0440.0010.0010.159
3
SANA
NVIDIA
0.2880.3310.2320.0280.0300.182
4
Janus-Pro-1.5
DeepSeek
0.3440.4100.1930.0060.0040.191
5
OmniGen2
PKU/VectorSpace
0.3970.4020.2020.0700.0160.217