Training model: gpt-4.1
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 0.85 | 0.9 |
potions | 0.65 | 0.75 |
southgermancredit | 0.4807692307692308 | 0.3274336283185841 |
timetravel_insurance | 0.6 | 0.7 |
titanic | 0.7647058823529411 | 0.7058823529411765 |
wisconsin | 0.8253968253968254 | 0.6818181818181818 |