Training model: gpt-4.1
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.85 | 0.9 |
| potions | 0.65 | 0.75 |
| southgermancredit | 0.4807692307692308 | 0.3274336283185841 |
| timetravel_insurance | 0.6 | 0.7 |
| titanic | 0.7647058823529411 | 0.7058823529411765 |
| wisconsin | 0.8253968253968254 | 0.6818181818181818 |