Training model: claude-sonnet-4-20250514
Inference model: gpt-4.1-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 0.9 | 0.95 |
potions | 0.7 | 0.45 |
southgermancredit | 0.6923076923076923 | 0.7256637168141593 |
timetravel_insurance | 0.75 | 0.7 |
titanic | 0.8431372549019608 | 0.7254901960784313 |
wisconsin | 0.873015873015873 | 0.8484848484848485 |
espionage | 0.9 | 0.95 |
potions | 0.75 | 0.7 |
timetravel_insurance | 0.75 | 0.75 |
espionage | 0.95 | 1.0 |
potions | 0.75 | 0.6 |
timetravel_insurance | 0.0 | 0.0 |
wisconsin | 0.746031746031746 | 0.803030303030303 |