Training model: gemini-2.5-pro-exp-03-25
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.85 | 0.95 |
| potions | 0.9 | 0.75 |
| southgermancredit | 0.5192307692307693 | 0.336283185840708 |
| timetravel_insurance | 0.95 | 0.75 |
| wisconsin | 0.9047619047619048 | 0.8939393939393939 |