Model sonnet4010

Training model: claude-sonnet-4-20250514
Inference model: gpt-4.1-mini

Investigations

Performance

DatasetValidation accuracyTest accuracy
espionage0.90.95
potions0.70.45
southgermancredit0.69230769230769230.7256637168141593
timetravel_insurance0.750.7
titanic0.84313725490196080.7254901960784313
wisconsin0.8730158730158730.8484848484848485
espionage0.90.95
potions0.750.7
timetravel_insurance0.750.75
espionage0.951.0
potions0.750.6
timetravel_insurance0.00.0
wisconsin0.7460317460317460.803030303030303