Training model: o3
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 0.95 | 0.8 |
potions | 0.8 | 0.75 |
southgermancredit | 0.5961538461538461 | 0.49557522123893805 |
timetravel_insurance | 0.8 | 0.7 |
titanic | 0.803921568627451 | 0.6862745098039216 |
wisconsin | 0.9365079365079365 | 0.8333333333333334 |