Training model: gemini-2.0-flash
Inference model: gpt-4o-mini
| Dataset | Validation accuracy | Test accuracy |
|---|---|---|
| espionage | 0.95 | 0.8 |
| potions | 0.75 | 0.6 |
| southgermancredit | 0.4519230769230769 | 0.46017699115044247 |
| timetravel_insurance | 0.75 | 0.8 |
| titanic | 0.6666666666666666 | 0.5686274509803921 |
| wisconsin | 0.7142857142857143 | 0.6515151515151515 |