Training model: gpt-4o-2024-11-20
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.7 | 0.8 |
| potions | 0.8 | 0.6 |
| timetravel_insurance | 0.55 | 0.55 |
| titanic | 0.6862745098039216 | 0.6078431372549019 |
| wisconsin | 0.8253968253968254 | 0.7575757575757576 |
| wisconsin | 0.873015873015873 | 0.8636363636363636 |