Training model: gemini-2.0-flash
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 0.55 | 0.55 |
potions | 0.55 | 0.65 |
southgermancredit | 0.5288461538461539 | 0.5398230088495575 |
timetravel_insurance | 0.65 | 0.35 |
titanic | 0.7254901960784313 | 0.6666666666666666 |
wisconsin | 0.746031746031746 | 0.696969696969697 |