Training model: claude-3-5-haiku-20241022
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 1.0 | 0.85 |
potions | 0.9 | 0.6 |
southgermancredit | 0.5769230769230769 | 0.45132743362831856 |
timetravel_insurance | 0.8 | 0.85 |
titanic | 0.7450980392156863 | 0.6274509803921569 |
wisconsin | 0.4444444444444444 | 0.4696969696969697 |