150行代码复刻草莓青春版支持联网

小火箭 • 2026年1月18日 am5:27 • 小火箭, 小火箭下载, 小火箭官网, 小火箭节点

前置一个有争议的个人观点：

o1 与其说是一个模型，不如说是一个自带任务规划和反思的 agent

这类的 agent 的最大优势，就是推理能力，以时间换性能，拿 token 换准确，有兴趣的同学可以读一下我之前写的一些内容：

《实用至上：智能体/Agent 是什么》：在这一篇里，我解释了 agent 的由来，以及探索路径

《 OpenAI「草莓」今秋发布，随后是「猎户座」》：在这一篇里，我预测了 o1 的形态以及行为（agent based program)

我必然会认为 o1 很强，也很有用：在大模型进展缓慢的前提下，这种思路能有效提高模型的输出水平。对于最广大的 ai 用户来说，能有效提升模型使用效率。（更广大的用户，不用 ai）

但我也必然认为，拿 o1 去进行大模型参数比拼是极其不合适的，尤其是进行 0-shot 比较。

换一种说法：拿一个反复检查 2 年半的试卷，和按时提交的试卷，去比准确率，很不合适。

在这篇文章里，我尝试用 150 行代码，构建一个能联网的、青春版的「草莓」

所谓青春版，是因为：

这里包含了最基础的项目规划和反思

没做任何微调，甚至没有用 openai 的模型，选了免费的智谱 glm-4-flash

我在里面加了 WebSearch，这样对于已知问题，可以更快的求解

注意：原版 o1 无法联网搜索，也无法使用任何的 tool

性能远没草莓好，没有内置 COT，仅作为 demo，用土方法模仿其功能

效果如下（回答 9.8 和 9.11 谁大）：

接下来，我将先展示代码，然后说一下实现原理。

代码展示

先说一下，这里我用的 colab，所以 api_key=userdata.get('Key_Zhipu')。

联网这里，我用的 WebPilot 的搜索 api，所以有一个 {watt(problem)}

这两个东西，你可以根据需求来改

from openai import OpenAIfrom dataclasses import dataclass, fieldfrom typing import List, Optionalfrom IPython.display import display, Markdownfrom google.colab import userdata# Set your OpenAI API key securelyclient = OpenAI( api_key=userdata.get('Key_Zhipu'), base_url="https://open.bigmodel.cn/api/paas/v4/") model = "glm-4-flash"# Define data models@dataclassclass ThoughtStep: step_answer: str is_completed: bool hint: str@dataclassclass ReasoningProcess: initial_problem: str steps: List[ThoughtStep] = field(default_factory=list) final_answer: Optional[str] = Nonedef solve_problem(problem: str, max_attempts: int = 10) -> ReasoningProcess: """ Solve a problem using multi-step reasoning, planning, and intelligent thinking. """ reasoning_process = ReasoningProcess(initial_problem=problem) attempts = 0 is_completed = False # Step 1: Analyze the problem and plan analysis_prompt = f"""You are an AI assistant that excels at solving complex STEM problems using multi-step reasoning.When given a problem, first analyze it, think about possible solution methods, and plan the subsequent steps to solve it.Problem:{problem}Web Search:{watt(problem)}Provide your analysis and step-by-step plan in plain text.""" display(Markdown("**大聪明正在思考...**")) messages = [{"role": "user", "content": analysis_prompt}] response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip() # Display AI's initial analysis display(Markdown(f"### AI Initial Analysis:n{response}n")) hint = response analysis_step = ThoughtStep(step_answer="", is_completed=False, hint=hint) reasoning_process.steps.append(analysis_step) messages = [{"role": "system", "content": "You are an AI assistant continuing the problem-solving process."}, {"role": "user", "content": "Giving a thought about this problem: " + problem}, {"role": "assistant", "content": hint}, {"role": "user", "content": f"Solve it with this thought, and give the final answer"}] # Continue with the plan and attempt to solve the problem while not is_completed and attempts < max_attempts: attempts += 1 # Phase 1: Generate the step answer based on the thought response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip() # Extract step answer step_answer = response.strip() display(Markdown(f"### Step Answer (Attempt {attempts}):n{step_answer}n")) # Phase 2: Validate the step answer using XML format validation_prompt = f"""You are an AI validator. Check if the following step answer solves the problem correctly:Problem:{problem}Step Answer:{step_answer}Respond in XML format as follows:<response> <is_correct>Is this answer 100% correct? Return true or false</is_correct> <hint>If the answer is incorrect, provide a new thought or hint.</hint></response>""" display(Markdown(f"**AI is validating step answer (Attempt {attempts})...**")) messages_validation = [{"role": "user", "content": validation_prompt}] response = client.chat.completions.create( model=model, messages=messages_validation ).choices[0].message.content.strip() # Parse the XML response try: is_correct = 'true' in response.lower() hint_start = response.find('<hint>') + len('<hint>') hint_end = response.find('</hint>') hint = response[hint_start:hint_end].strip() if hint_start != -1 and hint_end != -1 else "No hint provided" except: is_correct = False hint = "Error parsing validation response." # Update reasoning process step = ThoughtStep(step_answer=step_answer, is_completed=is_correct, hint=hint) reasoning_process.steps.append(step) messages += [{"role": "assistant", "content": step_answer}] if is_correct: break # Exit loop if the step answer is correct messages += [{"role": "user", "content": "Not correct, try with this: " + hint}] # Final answer step messages += [{"role": "user", "content": f"Based on your reasoning, provide the final answer to the problem and return it in the same language as the following: {reasoning_process.initial_problem}"}] response = client.chat.completions.create( model=model, messages=messages ).choices[0].message.content.strip() # Extract the final answer reasoning_process.final_answer = response # Display the final answer display(Markdown(f"## Final Answer:n{response}")) return reasoning_processdef display_reasoning_process(process: ReasoningProcess) -> None: """ Display the reasoning process details. """ display(Markdown(f"## Problem:n{process.initial_problem}n")) for idx, step in enumerate(process.steps, 1): display(Markdown(f"### Step {idx}:n**Hint**: {step.hint}n**Is Completed**: {step.is_completed}n")) if process.final_answer: display(Markdown(f"## Final Answer:n{process.final_answer}")) else: display(Markdown("## Final Answer: Not determined yet."))# Example usageif __name__ == "__main__": problem_text = """9.8 和 9.11 谁大""" # Solve the problem reasoning = solve_problem(problem_text)

原理解读

首先，这里我用的是 glm-4-flash，原因无他：免费。

整个实现的流程分几步：

第一步：任务规划。这个 agent 会先上网查阅有关问题的材料，并结合用户给到的问题进行分析，输出这个问题的解答规划

第二步：任务尝试。在收到规划后，这个 agent 会对问题进行尝试解决：

如果解决掉了（或者超出最大重试次数），则跳到第三步；

如果没解决，则反思一下自己为啥没解决好，然后自己 PUA 自己，并重试

第二步：任务收束。总结上面的问题解答，输出正式答案

最终，对于问题「回答 9.8 和 9.11 谁大」，输出这个（包含思考过程）：

这类程序，其方法就是让 ai 反复 PUA 自己，或者在找一个 ai 来 PUA 干活的 ai，让他不断尝试、检查和改进，直到交工（是不是很熟悉）

说明了什么

从几个角度，我来说这件事：

o1 不神秘，你也可以做（青春版限定）

调成 o1 这个效果，还是得从多角度下功夫，无论是 agent 的工程化，还是对模型进行一些训练（cot 内化）

o1 会很有用，尤其是在合成数据，以及解决复杂任务这块

一定程度上，说明了模型本身训练遇到了一些瓶颈

prompt 工程会逐渐式微

以及，欢迎讨论下这个：《对于 AI & AGI，我有 3 个问题》

再以及，回头我来筹办个正式的「 o1 算法挑战赛」，欢迎届时参加

（先让我去化缘点奖金，ahhhhhh

版权声明：
作者：小火箭
链接：https://www.xiaohuojian9.top/225.html
来源：小火箭官网
文章版权归作者所有，未经允许请勿转载。

THE END

小火箭小火箭下载小火箭加速小火箭加速器小火箭官网小火箭节点

二维码

草莓实测可能只是工程Trick且有扣费陷阱

< <上一篇

o1能带我们走进AGI吗

下一篇>>

搜索内容

150行代码复刻草莓青春版支持联网

取消回复

共有 0 条评论

小火箭节点订阅推荐

https://1.aliyun.v-2ray.com/common/channel/redirect/?cid=940341/

热门文章

好用的小火箭节点推荐 Shadowrocket 高速节点

150行代码复刻草莓青春版支持联网

取消回复

共有 0 条评论

小火箭节点订阅推荐

https://1.aliyun.v-2ray.com/common/channel/redirect/?cid=940341/

热门文章

好用的小火箭节点推荐 Shadowrocket 高速节点