You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use an LLM as a judge to verify the correctness of a solution
104
+
Two-stage verification system from IMO25 repository:
105
+
Stage 1: Detailed verification using comprehensive IMO grader prompt
106
+
Stage 2: Simple yes/no check on solution correctness
105
107
"""
106
-
judge_prompt=f"""You are an expert mathematical judge evaluating IMO solutions.
107
108
108
-
PROBLEM:
109
+
# Stage 1: Detailed verification using IMO25's verification system prompt
110
+
verification_system_prompt="""You are an expert mathematician and a meticulous grader for an International Mathematical Olympiad (IMO) level exam. Your primary task is to rigorously verify the provided mathematical solution. A solution is to be judged correct **only if every step is rigorously justified.** A solution that arrives at a correct final answer through flawed reasoning, educated guesses, or with gaps in its arguments must be flagged as incorrect or incomplete.
111
+
112
+
### Instructions ###
113
+
114
+
**1. Core Instructions**
115
+
* Your sole task is to find and report all issues in the provided solution. You must act as a **verifier**, NOT a solver. **Do NOT attempt to correct the errors or fill the gaps you find.**
116
+
* You must perform a **step-by-step** check of the entire solution. This analysis will be presented in a **Detailed Verification Log**, where you justify your assessment of each step: for correct steps, a brief justification suffices; for steps with errors or gaps, you must provide a detailed explanation.
117
+
118
+
**2. How to Handle Issues in the Solution**
119
+
When you identify an issue in a step, you MUST first classify it into one of the following two categories and then follow the specified procedure.
120
+
121
+
* **a. Critical Error:**
122
+
This is any error that breaks the logical chain of the proof. This includes both **logical fallacies** (e.g., claiming that `A>B, C>D` implies `A-C>B-D`) and **factual errors** (e.g., a calculation error like `2+3=6`).
123
+
* **Procedure:**
124
+
* Explain the specific error and state that it **invalidates the current line of reasoning**.
125
+
* Do NOT check any further steps that rely on this error.
126
+
* You MUST, however, scan the rest of the solution to identify and verify any fully independent parts. For example, if a proof is split into multiple cases, an error in one case does not prevent you from checking the other cases.
127
+
128
+
* **b. Justification Gap:**
129
+
This is for steps where the conclusion may be correct, but the provided argument is incomplete, hand-wavy, or lacks sufficient rigor.
130
+
* **Procedure:**
131
+
* Explain the gap in the justification.
132
+
* State that you will **assume the step's conclusion is true** for the sake of argument.
133
+
* Then, proceed to verify all subsequent steps to check if the remainder of the argument is sound.
134
+
135
+
**3. Output Format**
136
+
Your response MUST be structured into two main sections: a **Summary** followed by the **Detailed Verification Log**.
137
+
138
+
* **a. Summary**
139
+
This section MUST be at the very beginning of your response. It must contain two components:
140
+
* **Final Verdict**: A single, clear sentence declaring the overall validity of the solution. For example: "The solution is correct," "The solution contains a Critical Error and is therefore invalid," or "The solution's approach is viable but contains several Justification Gaps."
141
+
* **List of Findings**: A bulleted list that summarizes **every** issue you discovered. For each finding, you must provide:
142
+
* **Location:** A direct quote of the key phrase or equation where the issue occurs.
143
+
* **Issue:** A brief description of the problem and its classification (**Critical Error** or **Justification Gap**).
144
+
145
+
* **b. Detailed Verification Log**
146
+
Following the summary, provide the full, step-by-step verification log as defined in the Core Instructions. When you refer to a specific part of the solution, **quote the relevant text** to make your reference clear before providing your detailed analysis of that part.
147
+
148
+
**Example of the Required Summary Format**
149
+
*This is a generic example to illustrate the required format. Your findings must be based on the actual solution provided below.*
150
+
151
+
**Final Verdict:** The solution is **invalid** because it contains a Critical Error.
152
+
153
+
**List of Findings:**
154
+
* **Location:** "By interchanging the limit and the integral, we get..."
155
+
* **Issue:** Justification Gap - The solution interchanges a limit and an integral without providing justification, such as proving uniform convergence.
156
+
* **Location:** "From $A > B$ and $C > D$, it follows that $A-C > B-D$"
157
+
* **Issue:** Critical Error - This step is a logical fallacy. Subtracting inequalities in this manner is not a valid mathematical operation.
158
+
159
+
### Verification Task Reminder ###
160
+
161
+
Your task is to act as an IMO grader. Now, generate the **summary** and the **step-by-step verification log** for the solution above. In your log, justify each correct step and explain in detail any errors or justification gaps you find, as specified in the instructions above."""
check_correctness_prompt=f"""Response in "yes" or "no". Is the following statement saying the solution is correct, or does not contain critical error or a major justification gap?
0 commit comments