Ensemble

Dataset: potions

Models

Model Narratives

anthropic3710

Round ID: 286
Prompt used:
	To determine if an entity will be effective, apply the following rules in order:
	
	RULE 1: If FizzIntensity > 50, the entity is Effective.
	
	RULE 2: If FizzIntensity < 25 AND ColourShift < 15, the entity is Ineffective.
	
	RULE 3: Calculate the FizzEfficiency Ratio = FizzIntensity / ColourShift
	   - If FizzEfficiency Ratio > 3.5, the entity is Ineffective
	   - If FizzEfficiency Ratio < 1.2, the entity is Ineffective
	
	RULE 4: Calculate the Combined Effect Score = FizzIntensity + (ColourShift * 1.5)
	   - If Combined Effect Score ≥ 65, the entity is Effective
	   - If Combined Effect Score < 55, the entity is Ineffective
	
	RULE 5: Check for the Sweet Spot Condition:
	   - If FizzIntensity is between 40 and 50 (inclusive) AND ColourShift is between 12 and 20 (inclusive), the entity is Effective.
	
	RULE 6: If none of the above rules apply, the entity is Ineffective.

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    6                    3
Actual Ineffective                    4                    7

Accuracy: 0.650
Precision: 0.600
Recall: 0.667
F1 Score: 0.632

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 42.094933
	ColourShift: 12.042143


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 36.28945
	ColourShift: 11.461653


Examples for Falsely predicted Effective when it should have been Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 37.77209
	ColourShift: 14.7023735


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 36.499676
	ColourShift: 18.39799


gemini

Round ID: 292
Prompt used:
	If ColourShift is greater than 17, then predict 'Ineffective'. Otherwise, if FizzIntensity is less than 30, then predict 'Ineffective'. Otherwise predict 'Effective'

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    8                    1
Actual Ineffective                    6                    5

Accuracy: 0.650
Precision: 0.571
Recall: 0.889
F1 Score: 0.696

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 45.78967
	ColourShift: 12.371225


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 42.903545
	ColourShift: 19.770008


Examples for Falsely predicted Effective when it should have been Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 40.698505
	ColourShift: 14.4273


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 26.284357
	ColourShift: 21.189081


openaio3

Round ID: 317
Prompt used:
	Task: Decide whether the entity is “Effective” or “Ineffective”.
	You are given exactly two real‑valued fields per entity:
	• FizzIntensity  (in arbitrary units)
	• ColourShift    (in arbitrary units)
	
	Apply the following rules IN ORDER and output only the single word Effective or Ineffective.
	
	Rule 0 – Definitely Ineffective
	   If FizzIntensity < 25 → Ineffective
	
	Rule 1 – Very high fizz overrides everything
	   Else, if FizzIntensity ≥ 60 → Effective
	
	Rule 2 – Solid fizz with at least a small colour shift
	   Else, if FizzIntensity ≥ 44  AND  ColourShift ≥ 9 → Effective
	
	Rule 3 – Moderate fizz but large colour shift
	   Else, if ColourShift ≥ 20  AND  FizzIntensity ≥ 38 → Effective
	
	Default
	   Else → Ineffective
	
	Remember: evaluate the rules in sequence; stop at the first rule that matches. Output nothing except the final label.

Confusion Matrix:
                Predicted Effective  Predicted Ineffective
Actual Effective                    4                    5
Actual Ineffective                    0                   11

Accuracy: 0.750
Precision: 1.000
Recall: 0.444
F1 Score: 0.615

Examples for Correctly predicted Effective: (Correct answer: Effective, What the previous set of rules predicted: Effective)
  Entity Data:
	FizzIntensity: 66.28547
	ColourShift: 8.929057


Examples for Falsely predicted Ineffective when it should have been Effective: (Correct answer: Effective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 36.28945
	ColourShift: 11.461653


Examples for Correctly predicted Ineffective: (Correct answer: Ineffective, What the previous set of rules predicted: Ineffective)
  Entity Data:
	FizzIntensity: 31.36187
	ColourShift: 13.327494


Ensemble Confusion Matrix

Predicted +Predicted -
Actual +72
Actual -47

Accuracy 0.700, Precision 0.636, Recall 0.778, F1 0.700