Training model: gpt-4.1
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.9 | 0.9 |
| potions | 0.85 | 0.45 |
| southgermancredit | 0.6153846153846154 | 0.6106194690265486 |
| timetravel_insurance | 0.9 | 0.75 |
| titanic | 0.7254901960784313 | 0.6470588235294118 |
| wisconsin | 0.8888888888888888 | 0.7878787878787878 |