Training model: o1
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.95 | 0.9 |
| potions | 0.6 | 0.6 |
| southgermancredit | 0.49038461538461536 | 0.4424778761061947 |
| timetravel_insurance | 0.8 | 0.65 |
| titanic | 0.7843137254901961 | 0.7058823529411765 |
| wisconsin | 0.8888888888888888 | 0.7727272727272727 |
| wisconsin | 0.7142857142857143 | 0.696969696969697 |