Ensemble

Dataset: potions

Models

Model Narratives

gemini

Round ID: 292
Prompt used:
	If ColourShift is greater than 17, then predict 'Ineffective'. Otherwise, if FizzIntensity is less than 30, then predict 'Ineffective'. Otherwise predict 'Effective'

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    8                    1
Actual Ineffective                    6                    5

Accuracy: 0.650
Precision: 0.571
Recall: 0.889
F1 Score: 0.696

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 47.878643
	ColourShift: 10.863845


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 42.903545
	ColourShift: 19.770008


Examples for Falsely predicted Effective when it should have been Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 31.36187
	ColourShift: 13.327494


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 31.839703
	ColourShift: 19.288298

openaio3

Round ID: 317
Prompt used:
	Task: Decide whether the entity is “Effective” or “Ineffective”.
	You are given exactly two real‑valued fields per entity:
	• FizzIntensity  (in arbitrary units)
	• ColourShift    (in arbitrary units)
	
	Apply the following rules IN ORDER and output only the single word Effective or Ineffective.
	
	Rule 0 – Definitely Ineffective
	   If FizzIntensity < 25 → Ineffective
	
	Rule 1 – Very high fizz overrides everything
	   Else, if FizzIntensity ≥ 60 → Effective
	
	Rule 2 – Solid fizz with at least a small colour shift
	   Else, if FizzIntensity ≥ 44  AND  ColourShift ≥ 9 → Effective
	
	Rule 3 – Moderate fizz but large colour shift
	   Else, if ColourShift ≥ 20  AND  FizzIntensity ≥ 38 → Effective
	
	Default
	   Else → Ineffective
	
	Remember: evaluate the rules in sequence; stop at the first rule that matches. Output nothing except the final label.

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    4                    5
Actual Ineffective                    0                   11

Accuracy: 0.750
Precision: 1.000
Recall: 0.444
F1 Score: 0.615

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 45.78967
	ColourShift: 12.371225


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 37.055344
	ColourShift: 15.48838


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 28.113564
	ColourShift: 20.790554

opus40

Round ID: 499
Prompt used:
	Analyze the chemical reaction data and classify it as either "Effective" or "Ineffective" based on the following rules:
	
	RULE 1: If FizzIntensity is greater than or equal to 43.0, classify as "Effective"
	
	RULE 2: If FizzIntensity is less than 28.0:
	   - If ColourShift is greater than or equal to 18.0, classify as "Effective"
	   - Otherwise, classify as "Ineffective"
	
	RULE 3: If FizzIntensity is between 28.0 and 43.0 (inclusive of 28.0, exclusive of 43.0):
	   - If FizzIntensity is greater than or equal to 40.0 AND ColourShift is greater than or equal to 14.0, classify as "Effective"
	   - If FizzIntensity is less than 35.0 AND ColourShift is greater than or equal to 17.0, classify as "Effective"
	   - If ColourShift is less than 13.0, classify as "Ineffective"
	   - Otherwise, classify as "Ineffective"
	
	Apply these rules in order and use the first rule that matches. The classification should be based solely on these numerical thresholds.

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    6                    3
Actual Ineffective                    7                    4

Accuracy: 0.500
Precision: 0.462
Recall: 0.667
F1 Score: 0.545

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 42.903545
	ColourShift: 19.770008


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 36.28945
	ColourShift: 11.461653


Examples for Falsely predicted Effective when it should have been Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 28.113564
	ColourShift: 20.790554


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 37.77209
	ColourShift: 14.7023735

Ensemble Confusion Matrix

	Predicted +	Predicted -
Actual +	5	4
Actual -	3	8

Accuracy 0.650, Precision 0.625, Recall 0.556, F1 0.588