Model anthropic37

Training model: claude-3-7-sonnet-20250219
Inference model: gpt-4o-mini

Investigations

Performance

DatasetValidation accuracyTest accuracy
espionage1.00.8
potions0.850.75
southgermancredit0.63461538461538460.5575221238938053
timetravel_insurance0.80.65
titanic0.78431372549019610.6666666666666666
wisconsin0.90476190476190480.7575757575757576