Training model: claude-3-7-sonnet-20250219
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.85 | 0.9 |
| potions | 0.8 | 0.65 |
| southgermancredit | 0.6634615384615384 | 0.5575221238938053 |
| timetravel_insurance | 0.8 | 0.7 |
| titanic | 0.803921568627451 | 0.803921568627451 |
| wisconsin | 0.8888888888888888 | 0.8181818181818182 |