AI创想

标题: LangGraph快速搭建新一代信息检索增强工具DeepResearch [打印本页]

作者: 米落枫    时间: 昨天 23:16
标题: LangGraph快速搭建新一代信息检索增强工具DeepResearch
作者:CSDN博客
LangGraph快速搭建新一代信息检索增强工具DeepResearch

项目概览

本项目是一个基于LangGraph框架构建的自主研究智能体,能够根据用户查询自动生成深度研究报告。通过三阶段流水线架构,实现从问题理解到报告生成的完整闭环。

(, 下载次数: 0)


项目结构
  1. langgpraph_deepresearch/
  2. ├── .env                           # 环境变量配置
  3. ├── langgraph.json                # LangGraph配置文件
  4. ├── requirements.txt              # 项目依赖
  5. └── graph.py                      # 核心图结构实现
复制代码
环境配置

依赖清单(requirements.txt)
  1. langgraph
  2. langchain-core
  3. langchain-deepseek
  4. python-dotenv
  5. langsmith
  6. pydantic
  7. matplotlib
  8. seaborn
  9. pandas
  10. IPython
  11. langchain_mcdapters
  12. uv
复制代码
环境变量(.env)
  1. DEEPSEEK_API_KEY ='****'
  2. TAVILY_API_KEY ='tvly-dev-******'
  3. LANGSMITH_API_KEY ='lsv2_pt_**********'
  4. LANGSMITH_TRACING =trueLANGSMITH_PROJECT='langgpraph_deepresearch'
复制代码
LangGraph配置(langgraph.json)
  1. {"dependencies":["./"],"graphs":{"langgpraph_deepresearch":"./graph.py:graph"},"env":".env"}
复制代码
核心代码架构

1. 数据模型定义
  1. classWebSearchItem(BaseModel):
  2.     query:str
  3.     reason:strclassWebSearchPlan(BaseModel):
  4.     searches: List[WebSearchItem]classReportData(BaseModel):
  5.     short_summary:str
  6.     markdown_report:str
  7.     follow_up_questions: List[str]
复制代码
2. 规划器(Planner)
  1. PLANNER_INSTRUCTIONS =("You are a helpful research assistant. Given a query, come up with 5-7 web searches ""to perform to best answer the query.""Return **ONLY valid JSON** that follows this schema:"'{"searches": [ {"query": "example", "reason": "why"} ]}')
  2. planner_chain =(
  3.     planner_prompt
  4.     | model.with_structured_output(WebSearchPlan, method="json_mode"))
复制代码
3. 搜索代理(Search Agent)
  1. SEARCH_INSTRUCTIONS =("You are a research assistant. Given a search term, you search the web for that term and ""produce a concise summary of the results. The summary must 2-3 paragraphs and less than 300 ""words. Capture the main points. Write succinctly, no need to have complete sentences or good ""grammar. This will be consumed by someone synthesizing a report, so its vital you capture the ""essence and ignore any fluff. Do not include any additional commentary other than the summary ""itself.")
  2. search_tool = TavilySearch(max_results=5, topic="general")
  3. search_agent = create_react_agent(
  4.     model=model,
  5.     prompt=SEARCH_INSTRUCTIONS,
  6.     tools=[search_tool],)
复制代码
4. 写作器(Writer)
  1. WRITER_PROMPT =("You are a senior researcher tasked with writing a cohesive report for a research query. ""You will be provided with the original query and some initial research.\n\n""① 先给出完整的大纲。\n""② 然后生成正式报告。\n""**写作要求**:\n""· 报告使用 Markdown 格式;\n""· 章节清晰,层次分明;\n""· markdown_report部分至少包含2000中文字符(注意需要用中文进行回复);\n""· 内容丰富,论据充分,可加入引用和数据,允许分段、添加引用、表格等;\n""· 最终仅返回 JSON:\n"'{"short_summary": "...", "markdown_report": "...", "follow_up_questions": ["..."]}')
  2. writer_chain = writer_prompt | model.with_structured_output(ReportData, method="json_mode")
复制代码
LangGraph节点实现

1. 规划节点(planner_node)
  1. defplanner_node(state: MessagesState)-> Command:
  2.     user_query = state["messages"][-1].content
  3.     raw = planner_chain.invoke({"query": user_query})print(raw)try:
  4.         plan = WebSearchPlan.model_validate(raw)except ValidationError:ifisinstance(raw,dict)andisinstance(raw.get("searches"),list):
  5.             plan = WebSearchPlan(searches=[WebSearchItem(query=q, reason="")for q in raw["searches"]])else:raisereturn Command(goto="search_node", update={"messages":[AIMessage(content=plan.model_dump_json())],"plan": plan})
复制代码
2. 搜索节点(search_node)
  1. defsearch_node(state: MessagesState)-> Command:
  2.     plan_json = state["messages"][-1].content
  3.     plan = WebSearchPlan.model_validate_json(plan_json)
  4.     summaries =[]for item in plan.searches:# 串行处理
  5.         run = search_agent.invoke({"messages":[HumanMessage(content=item.query)]})
  6.         msgs = run["messages"]
  7.         readable =next((m for m inreversed(msgs)ifisinstance(m,(ToolMessage, AIMessage))), msgs[-1])
  8.         summaries.append(f"##{item.query}\n\n{readable.content}")
  9.    
  10.     combined ="\n\n".join(summaries)return Command(goto="writer_node", update={"messages":[AIMessage(content=combined)]})
复制代码
3. 写作节点(writer_node)
  1. defwriter_node(state: MessagesState)-> Command:
  2.     original_query = state["messages"][0].content
  3.     combined_summary = state["messages"][-1].content
  4.    
  5.     writer_input =(f"原始问题:{original_query}\n\n"f"搜索摘要:\n{combined_summary}")
  6.    
  7.     report = writer_chain.invoke({"content": writer_input})return Command(
  8.         goto=END,
  9.         update={"messages":[AIMessage(content=json.dumps(report.dict(), ensure_ascii=False, indent=4))]})
复制代码
图构建与执行
  1. builder = StateGraph(MessagesState)
  2. builder.add_node("planner_node", planner_node)
  3. builder.add_node("search_node", search_node)
  4. builder.add_node("writer_node", writer_node)
  5. builder.add_edge(START,"planner_node")
  6. builder.add_edge("planner_node","search_node")
  7. builder.add_edge("search_node","writer_node")
  8. builder.add_edge("writer_node", END)
  9. graph = builder.compile()
复制代码
运行流程

关键特性

性能分析

完整的项目代码graph.py
  1. # 封装更完整的图结构import json
  2. import os
  3. from typing import List
  4. from dotenv import load_dotenv
  5. from pydantic import BaseModel, ValidationError, parse_obj_as
  6. from langchain_deepseek import ChatDeepSeek
  7. from langchain.prompts import ChatPromptTemplate
  8. from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
  9. from langgraph.graph import StateGraph, MessagesState, START, END
  10. from langgraph.types import Command
  11. from langgraph.prebuilt import create_react_agent
  12. from langchain_tavily import TavilySearch
  13. from langchain_openai import ChatOpenAI
  14. load_dotenv()
  15. model = ChatDeepSeek(model="deepseek-chat", max_tokens=8000)# model = ChatOpenAI(model="gpt-4-1", max_tokens=32000)# -------- 1) Planner Chain --------
  16. PLANNER_INSTRUCTIONS =("You are a helpful research assistant. Given a query, come up with 5-7 web searches ""to perform to best answer the query.""Return **ONLY valid JSON** that follows this schema:"'{{"searches": [ {{"query": "example", "reason": "why"}} ]}}')classWebSearchItem(BaseModel):
  17.     query:str
  18.     reason:strclassWebSearchPlan(BaseModel):
  19.     searches: List[WebSearchItem]
  20. planner_prompt = ChatPromptTemplate.from_messages([("system", PLANNER_INSTRUCTIONS),("human","{query}")])
  21. planner_chain =(
  22.     planner_prompt
  23.     | model.with_structured_output(WebSearchPlan, method="json_mode")# 强制 JSON)# -------- 2) search agent --------
  24. SEARCH_INSTRUCTIONS =("You are a research assistant. Given a search term, you search the web for that term and ""produce a concise summary of the results. The summary must 2-3 paragraphs and less than 300 ""words. Capture the main points. Write succinctly, no need to have complete sentences or good ""grammar. This will be consumed by someone synthesizing a report, so its vital you capture the ""essence and ignore any fluff. Do not include any additional commentary other than the summary ""itself.")
  25. search_tool = TavilySearch(max_results=5, topic="general")
  26. search_agent = create_react_agent(
  27.     model=model,
  28.     prompt=SEARCH_INSTRUCTIONS,
  29.     tools=[search_tool],)# -------- 3) Writer Chain --------
  30. WRITER_PROMPT =("You are a senior researcher tasked with writing a cohesive report for a research query. ""You will be provided with the original query and some initial research.\n\n""① 先给出完整的大纲。\n""② 然后生成正式报告。\n""**写作要求**:\n""· 报告使用 Markdown 格式;\n""· 章节清晰,层次分明;\n""· markdown_report部分至少包含2000中文字符(注意需要用中文进行回复);\n""· 内容丰富,论据充分,可加入引用和数据,允许分段、添加引用、表格等;\n""· 最终仅返回 JSON:\n"'{{"short_summary": "...", "markdown_report": "...", "follow_up_questions": ["..."]}}')classReportData(BaseModel):
  31.     short_summary:str
  32.     markdown_report:str
  33.     follow_up_questions: List[str]
  34. writer_prompt = ChatPromptTemplate.from_messages([("system", WRITER_PROMPT),("human","{content}")])
  35. writer_chain = writer_prompt | model.with_structured_output(ReportData, method="json_mode")# -------------LangGraph 节点----------------defplanner_node(state: MessagesState)-> Command:
  36.     user_query = state["messages"][-1].content
  37.     raw = planner_chain.invoke({"query": user_query})print(raw)try:# plan = parse_obj_as(WebSearchPlan, raw)
  38.         plan = WebSearchPlan.model_validate(raw)except ValidationError:ifisinstance(raw,dict)andisinstance(raw.get("searches"),list):
  39.             plan = WebSearchPlan(searches=[WebSearchItem(query=q, reason="")for q in raw["searches"]])else:raisereturn Command(goto="search_node", update={"messages":[AIMessage(content=plan.model_dump_json())],"plan": plan})# -------------search_node----------------defsearch_node(state: MessagesState)-> Command:
  40.     plan_json = state["messages"][-1].content
  41.     plan = WebSearchPlan.model_validate_json(plan_json)
  42.     summaries =[]for item in plan.searches:
  43.         run = search_agent.invoke({"messages":[HumanMessage(content=item.query)]})
  44.         msgs = run["messages"]
  45.         readable =next((m for m inreversed(msgs)ifisinstance(m,(ToolMessage, AIMessage))), msgs[-1])
  46.         summaries.append(f"##{item.query}\n\n{readable.content}")
  47.     combined ="\n\n".join(summaries)return Command(goto="writer_node", update={"messages":[AIMessage(content=combined)]})# -------------write_node----------------defwriter_node(state: MessagesState)-> Command:
  48.     original_query = state["messages"][0].content
  49.     combined_summary = state["messages"][-1].content
  50.    
  51.     writer_input =(f"原始问题:{original_query}\n\n"f"搜索摘要:\n{combined_summary}")
  52.    
  53.     report = writer_chain.invoke({"content": writer_input})return Command(
  54.         goto=END,
  55.         update={"messages":[AIMessage(content=json.dumps(report.dict(), ensure_ascii=False, indent=4))]})# 构建并运行Graph
  56. builder = StateGraph(MessagesState)
  57. builder.add_node("planner_node", planner_node)
  58. builder.add_node("search_node", search_node)
  59. builder.add_node("writer_node", writer_node)# 定义节点间的连接关系
  60. builder.add_edge(START,"planner_node")
  61. builder.add_edge("planner_node","search_node")
  62. builder.add_edge("search_node","writer_node")
  63. builder.add_edge("writer_node", END)# 编译Graph
  64. graph = builder.compile()
复制代码
使用示例
  1. # 运行研究
  2. result = graph.invoke({"messages":[HumanMessage(content="分析2024年人工智能发展趋势")]})# 解析结果import json
  3. report_data = json.loads(result["messages"][-1].content)print("摘要:", report_data["short_summary"])print("报告:", report_data["markdown_report"])
复制代码
部署

项目部署大家参考这里:langgraph-cli快速完成项目部署

(, 下载次数: 0)


扩展方向

总结

DeepResearch项目展示了如何基于LangGraph构建一个完整的AI研究智能体,通过三阶段流水线实现了从问题理解到深度报告生成的完整闭环。项目代码结构清晰,配置完整,具备良好的可扩展性和实用性。

参考:赋范空间大模型技术社区–DeepResearch应用开发实战

原文地址:https://blog.csdn.net/Galen_xia/article/details/149817138




欢迎光临 AI创想 (https://www.llms-ai.com/) Powered by Discuz! X3.4