Training model: o1
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.85 | 0.95 |
| potions | 0.7 | 0.55 |
| southgermancredit | 0.5288461538461539 | 0.5486725663716814 |
| timetravel_insurance | 0.95 | 0.8 |
| titanic | 0.7058823529411765 | 0.7450980392156863 |
| wisconsin | 0.8412698412698413 | 0.6666666666666666 |