Training model: o1
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 0.95 | 0.9 |
potions | 0.6 | 0.6 |
southgermancredit | 0.49038461538461536 | 0.4424778761061947 |
timetravel_insurance | 0.8 | 0.65 |
titanic | 0.7843137254901961 | 0.7058823529411765 |
wisconsin | 0.8888888888888888 | 0.7727272727272727 |
wisconsin | 0.7142857142857143 | 0.696969696969697 |