Training model: claude-3-7-sonnet-20250219
Inference model: gpt-4o-mini
Dataset | Validation accuracy | Test accuracy |
---|---|---|
espionage | 1.0 | 0.8 |
potions | 0.85 | 0.75 |
southgermancredit | 0.6346153846153846 | 0.5575221238938053 |
timetravel_insurance | 0.8 | 0.65 |
titanic | 0.7843137254901961 | 0.6666666666666666 |
wisconsin | 0.9047619047619048 | 0.7575757575757576 |