Model geminipro10
Training model: gemini-2.0-pro-exp
Inference model: gpt-4o-mini
Investigations
Investigation 9 (espionage)
Investigation 34 (potions)
Investigation 88 (timetravel_insurance)
Investigation 153 (wisconsin)
Performance
Dataset
Validation accuracy
Test accuracy
espionage
0.65
0.4
potions
0.75
0.65
timetravel_insurance
0.95
0.8
wisconsin
0.7142857142857143
0.5909090909090909