Model sonnet40

Training model: claude-sonnet-4-20250514
Inference model: gpt-4.1-mini

Investigations

Performance

DatasetValidation accuracyTest accuracy
espionage0.950.95
potions0.750.7
southgermancredit0.66346153846153840.5663716814159292
timetravel_insurance0.60.65
titanic0.8039215686274510.7450980392156863
wisconsin0.8730158730158730.8484848484848485
espionage1.01.0
potions0.70.65
southgermancredit0.63461538461538460.6460176991150443
timetravel_insurance0.750.7
wisconsin0.90476190476190480.8787878787878788
espionage0.950.95
potions0.750.7
timetravel_insurance0.90.8