Training model: o3
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 1.0 | 0.75 |
potions | 0.8 | 0.8 |
southgermancredit | 0.5961538461538461 | 0.6017699115044248 |
timetravel_insurance | 0.85 | 0.8 |
titanic | 0.7647058823529411 | 0.7843137254901961 |
wisconsin | 0.9047619047619048 | 0.7272727272727273 |