Training model: gemini-2.0-flash
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.55 | 0.55 |
| potions | 0.55 | 0.65 |
| southgermancredit | 0.5288461538461539 | 0.5398230088495575 |
| timetravel_insurance | 0.65 | 0.35 |
| titanic | 0.7254901960784313 | 0.6666666666666666 |
| wisconsin | 0.746031746031746 | 0.696969696969697 |