Model anthropic3710

Training model: claude-3-7-sonnet-20250219
Inference model: gpt-4o-mini

Investigations

Performance

DatasetValidation accuracyTest accuracy
espionage0.850.9
potions0.80.65
southgermancredit0.66346153846153840.5575221238938053
timetravel_insurance0.80.7
titanic0.8039215686274510.803921568627451
wisconsin0.88888888888888880.8181818181818182