Model opus4010

Training model: claude-opus-4-20250514
Inference model: gpt-4.1-mini

Investigations

Performance

DatasetValidation accuracyTest accuracy
espionage0.950.9
potions0.750.6
southgermancredit0.59615384615384610.6371681415929203
timetravel_insurance0.90.8
titanic0.72549019607843130.7843137254901961
wisconsin0.76190476190476190.7878787878787878
espionage1.00.85
potions0.70.75
timetravel_insurance0.90.8
espionage1.00.8
potions0.850.55
timetravel_insurance0.950.7