Model geminipro
Training model: gemini-2.0-pro-exp
Inference model: gpt-4o-mini
Investigations
Investigation 8 (espionage)
Investigation 33 (potions)
Investigation 62 (southgermancredit)
Investigation 152 (wisconsin)
Performance
Dataset
Validation accuracy
Test accuracy
espionage
0.95
0.8
potions
0.95
0.6
southgermancredit
0.5
0.5221238938053098
wisconsin
0.6825396825396826
0.5454545454545454