跳转至

参考包

现在仓库里已经有一个可运行的小型代码骨架:agent_runtime_ref

它的目标不是变成生产级框架,而是作为本书 第七和第八部分 的最小代码锚点。

这个包被有意定位成实现锚点,而不是一个平行产品。它的价值在于,让读者能够看到本书论证背后的可运行结构,而不会把整个项目重新变成框架手册。

这一页承诺的事情是:

  • 它不会取代本书对这些层为何存在的解释;
  • 它不会成为读者学习架构权衡的主要地方;
  • 它也不会试图把整个仓库变成一个通用智能体框架。

这里是这个包的主说明页面。README 里只保留简短的上手说明,完整的命令行、配置和结构说明都集中放在这里。为了避免第一页看起来像运行时内部手册,可以按层次阅读:

  • Quick start — “如何运行”一节和最前面的 CLI 命令。
  • 最小架构地图 — “里面有什么”一节。
  • 配置契约 — “示例配置”一节。
  • 高级 lifecycle / controls 细节 — 后面的验证与 lifecycle-inspection 内容。
  • 源码链接agent_runtime_ref 文件列表。

Reader-route contract

把这一页当成地图,而不是线性章节:Quick start 用于首次运行,Architecture map 用于理解层次,CLI examples 用于可复现命令,Config contracts 用于配置审查,Advanced lifecycle-controls 用于第 VIII 部分场景,Runtime internals 只在需要核对具体模块时再读。

一个实用的阅读路径是:

  • 第 16 章看基线运行时与能力会话状态,
  • 第 17 章看策略层与能力契约,
  • Evidence Spine 看从请求到发布判断的端到端治理记录,
  • 第 18 章看围绕审批和运行时行为的发布门禁,
  • 第 21 章看保障响应,
  • 第 22 章配合生命周期模式看受治理工件链接、发布身份、验证器契约谱系与委派授权来源证明,
  • 第 23 到 27 章看能力会话周围的中断、过期、重新初始化、退役、可观测性、注册表负责人、验证器证据义务与委派授权生命周期控制。

支持分诊(support-triage)的运行时锚点

内置的 support-triage-ref 以可执行形式展示同一个贯穿案例:智能体身份、已批准的 search_docs/create_ticket 能力、审批等待、追踪/会话 ID(trace/session IDs)、生命周期检查和评测导出。因此,书中的重复工单线索不只是正文叙事(prose),也可以作为可运行的契约表面来审阅。

规范案例运行时范围(Canonical case runtime scope)

参考包把 支持分诊(Support triage) 作为可运行基线(runnable baseline),用来承载写入能力(write capabilities)、审批(approvals)和重复工单恢复(duplicate-ticket recovery)。内部知识助手(Internal knowledge assistant)事故协调(Incident coordination) 仍是同一架构的覆盖视角(coverage lenses):前者检查检索(retrieval)、记忆(memory)、新鲜度(freshness)和知识来源(knowledge provenance),后者检查追踪(traces)、升级(escalation)、通知副作用(notification side effects)、响应归属(response ownership)和事件后学习(post-incident learning)。如果以后把它们也做成可运行配置(runnable configs),它们应该复用同一组策略、遥测、生命周期和注册表契约(policy, telemetry, lifecycle and registry contracts),而不是变成彼此分离的演示。

最近的契约更新(contract updates)让这个表面更适合评审(review):委派授权上下文(delegated authorization context)会贯穿 CLI 演示(CLI demos)、会话(sessions)、评测导出(eval exports)与回放(replay);追踪导出脱敏(trace export redaction)现在覆盖命令摘要(command summaries)和 JSONL 工件(JSONL artifacts);生命周期检查(lifecycle inspection)会暴露运行时控制假设(runtime-control assumptions);文档守卫(docs guard)也固定了定义这些边界的稳定验证错误(stable validation errors)。

里面有什么

  • runtime.py 核心 AgentRuntime,负责组装运行上下文、检索、模型步骤、工具执行和后台更新钩子。
  • policy.py 一个带结构化决策的小型策略引擎。
  • catalog.py 带有运行语义、风险等级和出口契约元数据的能力注册表。
  • identity.py 智能体的显式身份,以及运行时被允许使用的已批准能力清单。
  • config.py 用来加载智能体身份、已批准能力清单、策略、能力目录和上线策略的 YAML 加载器。
  • memory.py 类型化记忆记录、来源证明、修订号以及按租户隔离的内存存储。
  • background.py 负责持久化记忆写入、基于来源证明的保存,以及压缩整理的后台维护路径。
  • execution.py 一个按契约分发能力的简单执行层,同时考虑风险等级与出口策略。
  • telemetry.py 用于结构化事件和跨度的内存遥测发射器。
  • rollout.py 上线前的最小就绪性闸门。
  • controls.py 用于已批准注册表的持续控制与清单漂移检查。
  • approvals.py 用于高风险动作的审批门禁、暂停/恢复语义、简单人工评审队列,以及审批状态必须与能力会话状态保持一致的那层控制表面。

同一层运行时控制表面也天然适合承载委派授权假设:是谁委托了访问,这份授权能否跨过暂停/恢复继续有效,以及如果委派访问在动作完成前被撤销,运行时应该如何处理。

  • lifecycle.py 用于变更记录、工件包、发布身份记录、运行时控制模式、验证器契约谱系和退役计划的生命周期工件,以及这些状态的就绪检查。

如何运行

.venv/bin/python -m agent_runtime_ref

预期输出:

{"agent_id": "support-triage-ref", "request_agent_id": "support-triage-ref", "session_id": "session-demo-001", "tenant_id": "tenant-acme", "principal_id": "user-42", "authorization_mode": "platform_owned", "delegated_principal_id": "", "delegated_scope": "", "result": "Ticket request is waiting for human approval (apr-001).", "status": "success", "failure_reason": "", "trace_id": "trace-demo-001", "idempotency_keys": ["trace-demo-001"], "approval_ids": ["apr-001"], "approval_capability_names": ["create_ticket"], "approval_status_counts": {"pending": 1}, "event_types": ["run_start", "policy_precheck", "retrieval", "context_layers_built", "span", "tool_policy_decision", "approval_requested", "sandbox_profile_reviewed", "tool_execution", "memory_write_decision", "memory_persisted", "background_compaction", "background_update_scheduled", "run_complete"], "events": 14, "memory_records": 4, "memory_record_ids": ["mem-001", "mem-002", "mem-003", "mem-004"], "pending_approvals": 1, "pending_approval_ids": ["apr-001"], "pending_approval_capability_names": ["create_ticket"], "config_dir": ".../agent_runtime_ref/configs"}

通过显式子命令运行运行时:

.venv/bin/python -m agent_runtime_ref simulate-run
.venv/bin/python -m agent_runtime_ref simulate-run --simulate-failure tool_timeout

第二种形式是一个刻意保持很小的失败丰富场景。它让这个参考包能够展示,一条本来被允许的能力也可能以受治理的失败运行收尾,并留下明确的遥测,而不是被泛化成成功路径。simulate-run 会返回 agent_idrequest_agent_idconfig_dirtrace_ididempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsevent_typessession_idtenant_idprincipal_idauthorization_modedelegated_principal_iddelegated_scopestatusresulteventsmemory_recordsmemory_record_idspending_approvalspending_approval_idspending_approval_capability_names 和可选的 failure_reason。Common identity and trace overrides 包括 --config-dir--agent-id--tenant-id--principal-id--trace-id--session-id,这样 examples 不需要修改 configs 也能保持 deterministic。更专门的 selectors 包括用于 memory inspection 的 --limit、用于 approval closure 的 --approval-id、用于 trace replay 的 --replay-trace-id、用于 session commands 的 --trace-prefix,以及用于 eval dataset exports 的 --session-prefix

查看智能体身份与已批准能力清单:

.venv/bin/python -m agent_runtime_ref inspect-agent

inspect-agent 会返回 agent_iddisplay_nameowner_teamruntime_principalapproved_capabilitiescatalog_capability_nameswrite_capabilitieswrite_capability_egressapproval_required_capabilitiesapproval_required_capability_bindingsidempotency_required_capabilitiesidempotency_required_capability_bindingscatalog_capabilities,让清单评审(inventory review)可以把已配置身份(configured identity)与能力目录(capability catalog)对照起来。在内置的 agent.yaml 中,该身份(identity)使用 agent_id support-triage-ref,显示名(display_name)是 Support triage reference agent,归属团队 owner_team agent_platform,运行时主体 runtime_principal svc-support-triage-ref,并且只批准(approved)search_docscreate_ticket;随后能力目录(capability catalog)将 search_docs 标记为由 knowledge_platform 拥有并绑定到 svc-knowledge-reader,将 create_ticket 标记为由 support_platform 拥有并绑定到 svc-ticket-writer。每条 catalog_capabilities 条目(entry)也会携带 nameownermodetransportrisk_tiernetwork_accesstool_principalapproval_requiredidempotency_key_requiredallowed_egress,让评审者(reviewers)在同一个响应(response)中看到能力身份(capability identity)、重复写入姿态与出口姿态(egress posture)。对于贯穿的重复工单线索(duplicate-ticket thread),这意味着 create_ticket 会被明确呈现为支持团队拥有(support-owned)、高风险(high-risk)、经纪式(brokered)、绑定到 svc-ticket-writer,并且在安全重试或调和(reconciliation)之前要求幂等键(idempotency key);approval_required_capability_bindingsidempotency_required_capability_bindings 会直接重复这个写入能力(write capability)的负责人(owner)与工具主体绑定(tool-principal binding),write_capability_egress 也会重复其经纪式(brokered)tickets.internal 出口目标(egress target),因此操作员(operators)不必先扫描完整目录列表(catalog list)。身份/目录加载器(Identity/catalog loaders)会用这些错误验证形状(validation shapes):'agent' must be a mappingagent.id must be a stringagent.id is requiredagent.display_name is requiredagent.owner_team is requiredagent.runtime_principal is required'approved_capabilities' must be a listAgent inventory config must be a mapping, Agent identity config must be a mapping, approved_capabilities entries must be stringsapproved_capabilities entries must not be emptyapproved_capabilities entries must be uniqueapproved_capabilities lookup must be a string'capabilities' must be a mappingCapability spec for {name!r} must be a mappingCapability names must be stringsCapability name must not be emptyCapability names must be uniqueCapability catalog entries must be CapabilitySpeccapabilities.{capability_name}.{key} must be a stringcapabilities.{capability_name}.{key} is required{label}.{key} must be a string{label} must be a stringcapabilities.{capability_name}.timeout_seconds must be positive'{label}.{key}' must be an integer'{label}.{key}' must be a boolean{label}.approval must be a string{label}.approval must not be empty{label}.approval is not supported: {approval}'allowed_egress' must be a listallowed_egress entries must be stringsallowed_egress entries must not be emptyallowed_egress entries must be unique

查看与第八部分对应的生命周期工件,包括运行时控制链接和发布身份:

.venv/bin/python -m agent_runtime_ref inspect-lifecycle
.venv/bin/python -m agent_runtime_ref check-controls --signal policy_traces_present=false
.venv/bin/python -m agent_runtime_ref check-change --signal offline_eval_passed=false
.venv/bin/python -m agent_runtime_ref check-change --signal failed_run_drill_checked=false
.venv/bin/python -m agent_runtime_ref check-retirement --step revoke_egress=false

inspect-lifecycle 现在也会显示来自 runtime-controls.yamlsandbox_profile 契约,包括 sandbox_profile.workspace_entriessandbox_profile_summaryworkspace_pathsshellnetworksecretssnapshot),以及 artifact_bundle.bundle_nameartifact_bundle.versionartifact_bundle.provenance_requiredartifact_bundle.signedartifact_bundle.review_evidence_keys、带有 duplicate_ticket_guardartifact_bundle.review_evidenceartifact_bundle.sandbox_profile_review_evidenceartifact_bundle.duplicate_ticket_guard_evidencechange.change_typechange.risk_levelchange.rollout_strategychange.affected_surfaceschange.required_signalschange.approval_roleschange.session_control_ownersupport-ops)、change.emergency_freeze_ownerartifact_bundle.session_control_ownerretirement.session_control_ownerretirement.emergency_freeze_ownerfailed_run_archive_targetscontrols.failed_run_control_expectationscontrols.failed_run_control_domainscontrols.failed_run_control_countcontrols.failed_run_control_summarycontrols.failed_run_control_statuscontrols.failed_run_control_review_requiredcontrols.failed_run_control_ownercontrols.failed_run_control_sourcecontrols.failed_run_control_last_reviewcontrols.failed_run_control_next_reviewcontrols.failed_run_control_release_bindingcontrols.support_duplicate_control_expectationscontrols.support_duplicate_control_domainscontrols.support_duplicate_control_countcontrols.support_duplicate_control_summarycontrols.support_duplicate_control_statuscontrols.support_duplicate_control_release_binding,这样操作员在同一个生命周期摘要里就能同时看到所有权、冻结责任、保留要求、追踪/来源证明控制,以及 duplicate-ticket evidence controls。 同一份运行时控制摘要(runtime-control summary)由 runtime-controls.yaml 支撑,其中包括 pause_allowedresume_allowedbackground_mode_allowedmax_wait_secondson_expirycontract_versioncapability_session_ownercapability_sessionstrack_session_idsresume_allowedallow_progress_eventsallow_elicitationon_session_expiry: reinitialize_or_cancelexpiry_policyexpiry_signal_owner,因此可恢复的能力会话(capability sessions)会明确暴露进度(progress)、澄清请求(elicitation)与过期假设(expiry assumptions)。delegated_authorization 默认值(defaults)也保持显式:authorization_modeuser_delegated_or_platform_owneddelegated_principal_policyexplicit_principal_binding_requiredtoken_reuse_policyreuse_within_valid_paused_run_onlyon_authorization_revokecancel_or_reapprovesubagent_inheritancedenied_by_default,可恢复/重新初始化流程(resumable/reinit flows)使用 resume_existing_session_if_valid。沙箱配置文件加载器(Sandbox-profile loader)会用这些错误验证运行时控制形状(runtime-control shapes):runtime_controls config must be a mappingruntime_controls.sandbox_profile config must be a mappingruntime_controls.sandbox_profile.{key} config must be a mappingruntime_controls.sandbox_profile.workspace.entries must be a list;直接构造(direct construction)会用 Sandbox profile config must be a mapping 拒绝畸形沙箱根(malformed sandbox roots),用 Sandbox profile {key} config must be a mapping 拒绝畸形沙箱分区(malformed sandbox sections),用 Sandbox profile {section}.{key} must be a string 拒绝畸形沙箱证据值(malformed sandbox evidence values),或用 Sandbox profile workspace entries must be a list 拒绝畸形工作区条目(malformed workspace entries)。 check-change 会返回 change_idreadyrequired_signalsapproval_rolesmissing_signalsfailed_run_signalsmissing_failed_run_signalssupport_duplicate_signalsmissing_support_duplicate_signalssupport_duplicate_signals_readyrollout_strategyrisk_level;它的必需信号(required signals)包括 duplicate_ticket_eval_passed,因此重复工单回归证据会同时进入变更就绪度(change readiness)和发布就绪度(rollout readiness)。 生命周期列表加载器(Lifecycle list loaders)会用 {key} must be a list{key} entries must be strings{key} entries must not be empty{key} entries must be unique 拒绝畸形、空白和重复条目(malformed, blank, and duplicate entries)。check-retirement 会返回 system_idreadytriggersmissing_stepsrequired_stepsarchive_targetsfailed_run_archive_targetssupport_duplicate_archive_targetsreplacement_mode,这样操作员就能看到哪些遥测/会话/审批/控制包(control-bundle)记录必须在退役之后继续保留下来,供后续退化路径与重复工单评审使用。 check-controls 会返回 healthyrequired_controlsblocked_findings_expectedmissing_controlsfailed_run_controlspreserved_failed_run_controlsfailed_run_controls_healthysupport_duplicate_controlspreserved_support_duplicate_controlssupport_duplicate_controls_healthyblocking_findingsinventory_drift;嵌套的 inventory_drift 对象会暴露 has_driftmissing_from_catalogmissing_from_inventory,这样追踪/来源证明相关缺口与能力清单不匹配(capability inventory mismatches)就能和普通控制卫生分开审阅。它在 controls.yaml 中的输入(inputs)要求 registry_reviewedcapability_owners_confirmedmemory_provenance_enforcedpolicy_traces_presentduplicate_ticket_eval_passedidempotency_keys_present,并带有验证形状(validation shapes)'controls' must be a mapping'controls.require' must be a list'controls.block_if' must be a list{label} entries must be strings{label} entries must not be empty{label} entries must be unique,控制策略(controls policy)会把这些规范化(normalize)成 required_controlsblock_if 会把 direct_tool_access_presentunmanaged_runtime_present 视为硬阻断项,汇总为 blocked_findings_expected,并在评测(evaluation)中成为 blocking_findings

查看记忆记录:

.venv/bin/python -m agent_runtime_ref inspect-memory --memory-class profile

现在 inspect-memory 会返回 config_dircountmemory_idsrecords;每条记录不只显示内容,也会显示 provenancerevisiondump-events 现在会在退化路径演练的 JSON 输出里返回 trace_idstatusresultevent_countevent_typesfailure_reasonapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsidempotency_keysevents

导出一次运行的结构化事件:

.venv/bin/python -m agent_runtime_ref dump-events --user-input "Please open a ticket for this issue."
.venv/bin/python -m agent_runtime_ref dump-events --simulate-failure tool_timeout

把事件导出为 JSONL,方便后续排查和回放:

.venv/bin/python -m agent_runtime_ref export-events --output artifacts/trace-demo.jsonl
.venv/bin/python -m agent_runtime_ref export-events --simulate-failure upstream_unavailable --output artifacts/trace-demo-failed.jsonl

export-events 会返回 output_pathtrace_idsession_idtenant_idprincipal_idagent_idauthorization_modedelegated_principal_iddelegated_scopestatusresultevent_countevent_typesredact_fieldsapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsidempotency_keys 和可选的 failure_reason,因此脱敏和退化路径证据会直接出现在命令摘要里。

如果你需要给外部人员查看脱敏后的导出结果,也可以在导出时直接隐藏敏感字段:

.venv/bin/python -m agent_runtime_ref export-events --output artifacts/trace-demo.jsonl --redact-field user_input

从 JSONL 文件里查看某条追踪:

.venv/bin/python -m agent_runtime_ref inspect-trace --input artifacts/trace-demo.jsonl

根据保存下来的追踪重新回放一次运行:

.venv/bin/python -m agent_runtime_ref replay-run --input artifacts/trace-demo.jsonl

dump-events 会返回 statusresultfailure_reasontrace_idsession_idtenant_idprincipal_idagent_idauthorization_modedelegated_principal_iddelegated_scopeevent_countevent_typesapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsidempotency_keyseventsinspect-trace 会返回 trace_idsession_idtenant_idprincipal_idagent_idauthorization_modedelegated_principal_iddelegated_scopestatusoutput_previewevent_countevent_typesfailure_reasonapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsidempotency_keyseventsexport-events 也会在 redact_fields 旁汇总 approval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsidempotency_keys,让审批谱系、审批 capability 谱系、审批状态和重复写入谱系在操作者深入单个 payload 之前就可见。replay-run 会返回 source_trace_idreplay_trace_idsource_session_idreplay_session_idsource_tenant_idreplay_tenant_idsource_principal_idreplay_principal_idsource_agent_idreplay_agent_idsource_authorization_modereplay_authorization_modesource_delegated_principal_idreplay_delegated_principal_idsource_delegated_scopereplay_delegated_scopestatusresultsource_statussource_output_previewsource_failure_reasonreplay_statusreplay_output_previewreplay_failure_reasonevent_countevent_typessource_event_countsource_event_typesreplay_event_countreplay_event_typesidempotency_keyssource_idempotency_keysreplay_idempotency_keysapproval_idssource_approval_idsreplay_approval_idspending_approval_idssource_pending_approval_idsreplay_pending_approval_idsapproval_capability_namessource_approval_capability_namesreplay_approval_capability_namespending_approval_capability_namessource_pending_approval_capability_namesreplay_pending_approval_capability_namesapproval_status_countssource_approval_status_countsreplay_approval_status_counts,让调查与重放都保留来源/运行审批 capability 与 status 谱系,并能对比原始写入 key 和 replay 写入 key。

带信号覆盖的上线策略检查:

.venv/bin/python -m agent_runtime_ref check-rollout --signal offline_eval_pass=false

发布检查(Rollout check)会返回 readyrequired_checksblocked_checksmissing_requiredsupport_duplicate_requiredmissing_support_duplicate_requiredsupport_duplicate_required_readyblocking_signalsrollout_mode;它的必需证据(required evidence)包括 duplicate_ticket_eval_passed,让自动化能够区分缺失的重复工单回归证据与明确阻断的信号;信号覆盖(signal overrides)接受布尔型 key=value 键值对(boolean key=value pairs),并会用 Unsupported boolean value in signal: {raw_signal!r} 拒绝未知布尔文本(boolean text)。运行时 CLI 失败路径(Runtime CLI failure paths)也会保持稳定的面向操作员消息(operator-facing messages):Config path must be a string or path-like object, Session output path must be a string or path-like object, Telemetry path must be a string or path-like object, CLI field must be a string: {field}, CLI field is required: {field}CLI field is not supported: {field}={value}; expected one of: {expected}CLI field must be an integer: {field}CLI field must be non-negative: {field}CLI field entries must be a sequence: {field}, CLI field entries must be unique: {field}Runtime request must be RunRequest, Run request field must be a string: {field}Run request field is required: {field}Delegated authorization field is required: {field}Signal must be a string, Signal must use key=value format: {raw_signal!r}Signal key must not be empty: {raw_signal!r}Background request must be RunRequest, Background context must be RunContext, Background model_output must be ModelOutput, Background context tool_results must be a listBackground context tool_results entries must be ToolResultBackground memory_store must be MemoryStore, Background policy must be PolicyEngine, Background telemetry must be TelemetryEmitter, Runtime catalog must be CapabilityCatalog, Runtime policy must be PolicyEngine, Runtime telemetry must be TelemetryEmitter, Runtime memory must be MemoryStore, Runtime approvals must be ApprovalQueue, Runtime sessions must be SessionStore, Runtime agent must be AgentIdentity, Runtime background must be BackgroundWorker, Approval queue policy must be ApprovalPolicy, Approval policy config must be a mapping, Capability catalog config must be a mapping, Controls policy config must be a mapping, Memory store config must be a mapping, Policy config must be a mapping, Rollout policy config must be a mapping, Telemetry event must be a mapping, Approval field must be a string: {field}Approval field is required: {field}Approval status is not supported: {status}Approval decision is not supported: {decision}Controls inventory must be ApprovedInventoryControls catalog must be CapabilityCatalogControls policy must be ControlsPolicyControls inventory_drift must be InventoryDriftLifecycle change must be ChangeRecordLifecycle retirement plan must be RetirementPlanAssessment signals must be a mappingAssessment signal key must be a stringAssessment signal key must not be emptyAssessment signal keys must be uniqueAssessment signal value must be a boolean: {field}Rollout policy must be RolloutPolicyRollout readiness must be RolloutReadinessRollout readiness flag must be a boolean: {field}Policy action is not supported: {action}Policy field must be a string: {field}Policy field is required: {field}Tool capability must be CapabilitySpec, Tool request must be ToolRequest, Tool policy decision must be PolicyDecision, Policy precheck request must be RunRequest, Policy context must be RunContext, Policy tool request must be ToolRequest, Policy capability must be CapabilitySpec, Tool request capability name must be a stringTool request capability name must not be emptyTool request arguments must be a mappingTool request argument key must be a stringTool request argument key must not be emptyTool request argument keys must be uniqueTool request argument value must be a string: {argument_key}Tool request capability does not match catalog entry: {capability_name} != {capability.name}Tool result status must be a stringTool result status must not be emptyTool result payload must be a mappingTool result payload key must be a stringTool result payload key must not be emptyTool result payload keys must be uniqueTool result payload value must be a string: {payload_key}Approval request not found: {approval_id}Approval request is not pending: {approval_id}No pending approval requests were generated for this runSession field must be a string: {field}Session field is required: {field}Session status is not supported: {status}Session tenant_id does not match existing session: {session_id}Session principal_id does not match existing session: {session_id}Session trace_id already exists: {trace_id}Session field entries must be a sequence: {field}Session field entries must be unique: {field}Session field entries must be unique: session_idSession runs must be a sequence, Session runs entries must be RunRecord, Session field entries must be a sequence: session_id, Session eval specs must be a mappingSession not found: {session_id}Telemetry event field must not be empty: event_typeTelemetry event field must not be empty: trace_idTelemetry event field must not be empty: schema_versionTelemetry schema version is not supported: {schema_version}Telemetry redact field must not be emptyTrace ID request must be a string, Trace ID not found in event file: {requested_trace_id}Trace file contains multiple trace IDs; pass --trace-id explicitlyTrace file does not contain a run_start eventModel step must return ModelOutputModel output text must be a stringModel output tool_request must be ToolRequest

检查持续控制和注册表漂移:

.venv/bin/python -m agent_runtime_ref check-controls --signal registry_reviewed=false

查看并处理演示用审批请求:

.venv/bin/python -m agent_runtime_ref inspect-approvals
.venv/bin/python -m agent_runtime_ref resolve-approval --decision approved --note "manager approved demo request"
.venv/bin/python -m agent_runtime_ref inspect-session
.venv/bin/python -m agent_runtime_ref inspect-session --simulate-failure tool_timeout
.venv/bin/python -m agent_runtime_ref session-eval-summary
.venv/bin/python -m agent_runtime_ref session-eval-summary --simulate-failure tool_timeout
.venv/bin/python -m agent_runtime_ref session-replay --user-input "Please create a ticket for this onboarding issue." --user-input "What language preference do you remember?"
.venv/bin/python -m agent_runtime_ref session-replay --simulate-failure tool_timeout --user-input "Please create a ticket for this issue."
.venv/bin/python -m agent_runtime_ref export-session --output artifacts/session-demo-001.json
.venv/bin/python -m agent_runtime_ref export-session --simulate-failure tool_timeout --output artifacts/session-demo-failed.json
.venv/bin/python -m agent_runtime_ref export-eval-dataset --output artifacts/eval-dataset.json
.venv/bin/python -m agent_runtime_ref export-eval-dataset --scenario failed_run_timeout --output artifacts/eval-failed-run.json

inspect-approvals 现在会返回 trace_idsession_idtenant_idagent_idcountapproval_idspending_approval_idsapproval_capability_namespending_approval_capability_namesapproval_status_countsidempotency_keysapprovals,其中包括 tenant_idagent_id、capability-session lifecycle fields(capability_session_idcapability_session_status)、authorization_modedelegated_principal_iddelegated_scope 等委派授权上下文,以及 idempotency_keyapproval_status_counts,因此审批路径评审可以直接和会话证据及重复写入意图对照。resolve-approval 在做出决定后会返回 approval_idapproval_idstrace_idsession_idtenant_idagent_idcapability_nameapproval_capability_namespending_approval_idspending_approval_capability_namesrequested_bystatusreviewerresolution_notecapability_session_idcapability_session_status、同样的委派上下文、idempotency_keyidempotency_keysapproval_status_counts,这样 capability-session、行动身份、幂等性谱系与最终 approval status 不会在闭合阶段丢失。 inspect-session 会显示会话级别的运行历史,以及关联的 trace_id。现在这里也能直接注入失败演练,而摘要会保留 failed_runstraceable_failed_runstrace_idsfailed_trace_idslatest_failure_reason,以及每次运行里的 output_textfailure_reasonrequest_agent_idcapability_session_idcapability_session_statusidempotency_keysession-eval-summary 会返回这一组运行的紧凑摘要,其中也明确统计失败运行和 traceable_failed_runs,而不是又把结果压回只有 successdenied 两类。现在也可以直接在这里注入失败演练,摘要会立刻显示 latest_failure_reason 便于快速复盘。 session-replay 可以在同一个 session_id 里执行多个相关请求。现在这里也能直接注入失败演练,而回放摘要会连同每次运行里的 failure_reasonrequest_agent_id 一起保留 failed_runstraceable_failed_runstrace_idsfailed_trace_idslatest_failure_reasonexport-session 会把整段会话保存成结构化 JSON,已经可以作为离线评测流程的种子数据。现在它也会保留 capability-session lifecycle fields(capability_session_idcapability_session_status)、委派授权上下文,例如 authorization_modedelegated_principal_iddelegated_scope,以及 idempotency_keyapproval_id,同时在 CLI 命令摘要里直接显示失败演练的 failed_runstraceable_failed_runstrace_idsfailed_trace_idslatest_failure_reason

Session 与 eval 命令也会明确暴露摘要字段:inspect-session 返回 session_idtenant_idprincipal_idtrace_counttrace_idsfailed_trace_idslatest_statuslatest_failure_reasonidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countssummaryrunssession-eval-summary 返回 session_idtotal_runssuccess_runsapproval_wait_runsdenied_runsfailed_runstraceable_failed_runstrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countslatest_statuslatest_trace_idlatest_failure_reasonsession-replay 返回 session_idrun_counttrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countslatest_failure_reasonsummaryrunsexport-session 返回 output_pathsession_idtotal_runsfailed_runstraceable_failed_runstrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countslatest_trace_idlatest_failure_reason;导出的 session JSON 也会在 nested summary 之外携带 top-level total_runsfailed_runstraceable_failed_runstrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countslatest_failure_reasonlatest_trace_idsession 标识符;export-eval-dataset 返回 dataset_nameoutput_pathsession_countsession_idsrun_countfailed_runstraceable_failed_runstrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countsduplicate_ticket_scenarioslatest_failure_reason 和作为 session ID 列表的 sessions,并且导出的 eval dataset artifact 会携带 top-level failed_runstraceable_failed_runstrace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_countslatest_failure_reason,并且每个导出的 eval session payload 都会在 summary.trace_idssummary.failed_trace_idssummary.idempotency_keys / summary.approval_ids / summary.approval_capability_names / summary.pending_approval_ids / summary.pending_approval_capability_names / summary.approval_status_counts 之外携带 top-level trace_idsfailed_trace_idsidempotency_keysapproval_idsapproval_capability_namespending_approval_idspending_approval_capability_namesapproval_status_counts,每个导出的 eval session payload 也会携带 session 以及包含 scenariolabelsexpected_outcomesgrading_ruleseval block,并且 nested run records 会保留 per-run request_agent_iduser_input;默认情况下 dataset_nameagent-runtime-ref-eval-seed,除非调用方传入 --dataset-name;eval export 还会在生成 session IDs 前校验内部 session_prefix seed,session commands 也会在生成 trace IDs 前校验内部 trace_prefix seed。

现在,运行时也会把工具路径中的失败类结果,例如验证失败,当成一等运行结局来处理。它不再假装这次运行仍然成功,而是记录失败运行、为失败或拒绝的运行(failed or denied runs)发出明确携带 failure_reasonrun_failed 事件以及终止态(terminal)run_complete.failure_reason,并在会话导出、追踪检查(trace inspection)、回放摘要(replay summaries)与 CLI 输出中通过 failure_reason 字段同时保留这个状态以及具体失败原因。 export-eval-dataset 会把几个内置会话场景打包成一个可直接用于评测的 JSON 工件,其中包括一个带 duplicate_ticket_eval_passedmax_ticket_side_effects: 1 和阻断型 duplicate_ticket_guard 的单独失败运行演练场景、资料查询(profile lookup)场景 profile_memory(带 memory_readprofile_lookupgrounded_answer 标签(labels))、带有 multi_runapproval_then_memorysession_evals 标签(labels)、并把 required_run_countapproval_status_counts 作为预期结果(expected outcomes)的多运行审批加记忆(multi-run approval-plus-memory)场景 mixed_session,以及带有 sandbox_profile_review 标签(label)、sandbox_profile_reviewed 预期结果(expected outcome)和阻断型 sandbox_profile_review 评分规则的带审批路径 support_ticket 场景,而命令摘要现在也会直接显示聚合后的 failed_runstraceable_failed_runstrace_idsfailed_trace_idslatest_failure_reason

现在这条评测路径也应该和附录里的更丰富验证器契约一起理解:对于长周期场景,这个包要帮助读者看到,数据集未来可以承载 process_scoreoutcome_scorefailure_attribution 与已链接的验证器证据,而不只是一个单薄结论。

这些命令现在也更清楚地体现了第 16、17 章里的一个关键区分:

  • 用来串起多次运行的用户可见 session_id
  • 用于排查和审计的单次运行 trace_id
  • 以及能力侧可能暂停、过期、恢复或需要重新初始化的会话状态。

这个参考包依然刻意保持很小,但它现在已经反映出:一个受治理的运行时有时必须把这三层状态分别讲清楚,而不是把它们压进同一个不透明对象里。

它现在也适合作为承接 Anthropic 新型运行框架经验的锚点:长时间运行的应用工作可能需要显式的上下文重置、结构化交接工件,以及规划器/生成器/评估器的角色分离,而不是一条不间断的智能体循环。这个参考包并没有把整套运行框架都实现出来,但它已经把那些关键运行时接缝暴露出来了,让团队能看见重置安全的交接、迭代契约、评估器评审与恢复后的控制状态应该落在什么地方。

它现在也适合作为验证器感知治理的锚点:如果发布或保障依赖评测输出,运行时就应该保留足够的追踪、会话与工件链接,来解释不只是发生了什么,还包括验证器为什么会这样判定这次运行。

这种能力也应延伸到生命周期处理。一个受治理的参考运行时应该能说明某次发布当时启用了哪一版验证器契约和哪一个发布身份,在退役之后还必须保留哪些证据,才能为早先的发布或保障决策提供解释,以及当某些结构化交接工件决定了即将退役系统被允许做什么时,这些工件必须如何跨过上下文重置或角色交接继续保留下来。

现在它也体现了第四个运营关注点:动作究竟是在什么委派授权上下文下执行的。这个上下文现在会出现在运行遥测、审批记录和会话导出里,让运行时不仅能解释发生了什么,还能解释它是在谁的委派身份与范围下发生的。

一个会真正读取用户画像记忆的请求:

.venv/bin/python -m agent_runtime_ref simulate-run --user-input "What language preference do you remember?"

如何验证

uv run ruff check .
uv run ty check
uv run pytest --cov=agent_runtime_ref --cov-report=term-missing

示例配置

configs 目录里有运行时和生命周期的起步文件:

它们现在已经不只是静态示例。config.py 可以把这些 YAML 加载进智能体身份、已批准能力清单、运行时、上下文层、记忆存储、上线策略、带有发布身份的生命周期工件以及其他生命周期状态,所以这个包已经更接近真实的运行骨架。通用加载器(Generic loaders)也会明确暴露畸形 YAML 形状(malformed YAML shapes):Config at {config_path!s} must be a mapping at the top level{label} config must be a mapping{key} must be a list

其中运行时控制包现在也被用来显式承载审批与会话治理规则,包括暂停/恢复、后台处理、过期、重新初始化策略、能力会话负责人,以及用户运行与能力侧会话之间的契约边界。

最小沙箱配置文件(sandbox profile)

如果这个包以后扩展到由沙箱(sandbox)支撑的执行,正确起点不是一套庞大的新子系统,而是一个把工作区(workspace)和权限显式化的小配置文件(profile):

sandbox_profile:
  manifest_version: 1
  workspace:
    entries:
      - path: repo
        source: local_dir
        read_only: false
      - path: task.md
        source: inline_file
        read_only: true
  capabilities:
    filesystem: true
    shell: restricted
    memory: read_write
    skills: read_only
  permissions:
    network: denied
    secrets: none
    run_as: sandbox_user
  state:
    resume: allowed
    snapshot: required_on_completion
    persist_session_state: true

这个例子不会把参考运行时变成完整的沙箱编排器。它只是固定第 9 章和第 16 章要求真实由沙箱(sandbox)支撑的运行时(runtime)暴露出来的契约表面:清单(manifest)、权限(permissions)、工作区物化(workspace materialization)、会话状态(session state),以及快照/恢复策略(snapshot/resume policy)都应该可以被复核(review)。

Durable agent actor 模式

未来的 reference-runtime 示例还应该建模 Chapter 16 中的 durable-agent-actor 边界。Runtime 不需要 vendor-specific 的 Durable Object 实现,但应该有一个可见契约,用来表达 stable agent identity、instance-local state、resumable sessions、scheduled wake-ups,以及到 governed stores 的 handoff。

允许放在本地的状态应该很窄:workflow cursor、per-instance queue position、connection/session preferences、last processed event、schedule metadata,以及可重建的 cached views。Profile memory、tenant knowledge、secrets、policy、audit logs 和 cross-instance facts 应该继续留在 governed stores 中,并带有 provenance、retention、export 和 access-control rules。

一个最小 config surface 包括:

  • agent_instance_idtenant_idowner_refschema_version
  • state_classephemeralinstance_localgoverned_memory_refexternal_record_ref
  • resume_policyhibernation_policystate_migration_policy
  • schedule_records,包含 owner instance、idempotency key、overlap policy、next fire time 和 trace linkage;
  • connection_scope,用于 WebSocket/streaming fan-out 和 approval UI visibility;
  • export_refdelete_refaudit_refs,避免 hidden durable memory。

Agent shell + durable workflow spine 模式

未来扩展 reference runtime 时,应该把一个模式单独保留下来:agent 不必拥有所有长时间工作。它可以只是 interaction shell——agent_instance_id、session state、user-facing stream、connection-scoped authorization 和 approval UI。与它并列的 durable workflow spine 应该拥有 steps、retries、等待外部事件、durable approval records、idempotency keys 和 evidence refs。

这个示例的最小契约表面包括:

  • workflow_instance_id,与 agent_instance_idrun_idtrace_id 并列;
  • durable_step_idstep_statusretry_policytimeout_policyidempotency_key
  • waiting_for:external event、approval、timer 或 reconciliation;
  • workflow 停在 HITL gate 时的 approval_idapproval_decision_ref
  • progress_event_id,并明确它不是 durable step;
  • 把 workflow resume 和 audit/event export 连接起来的 evidence_refs

这个模式能补足当前的后台更新:background task 可以是简单的延迟工作,而 workflow spine 是一种可复核的 durable procedure,能跨数小时等待、故障和人工决策继续存在。

为什么它有用

这本书现在不只依赖文档里的文字说明,也依赖真实的代码骨架:

  • 更容易在文件和契约的层面讨论架构;
  • 更容易继续往这个包里补充示例;
  • 更容易从章节直接走到可运行的原型;
  • 更容易展示配置驱动的路径,而不只是硬编码的演示;
  • 更容易把参考运行时和记忆、检索、后台更新以及运行时控制治理这些章节连起来;
  • 更容易讨论每条记忆是从哪里来的、它当前属于哪一个修订版,以及当时生效的是哪一个契约/运行时控制版本;
  • 更容易把发布身份、验证器契约谱系与退役义务和运行时控制、工件决策放在一起看清楚;
  • 更容易把审批状态、运行时会话状态、能力会话状态与验证器证据区分开来,同时仍保持它们之间的治理关联。

现在还有几项很实用的能力:

  • inspect-memory 可以直接展示预置记忆,以及按 tenantmemory_class 过滤后的结果;
  • dump-events 可以在不读源代码的情况下直接看到一次运行的结构化追踪;
  • export-events 可以把这条追踪保存成 JSONL,便于脱离进程分析;
  • export-events 现在会带上 schema_version,也支持按字段在导出时脱敏;
  • 带有审批路径的 export-events 路径会发出 sandbox_profile_reviewed,让 trace evidence 与 lifecycle bundle 和 eval grading rule 对齐;
  • inspect-trace 可以读取并筛选保存下来的追踪;
  • replay-run 可以根据保存的 run_start 事件重新回放一次运行。

阅读这个包最简单的方式是:

  • 用本书理解架构、顺序与运营模型论证;
  • 用这个包查看可运行结构、配置表面与检查示例;
  • 用附录模式理解运行时想要明确表达的契约边界。

下一步做什么

Runtime literal markers 还包括 eval_gatesession_idempotency_summary,用于保持 eval 与 idempotency evidence 的文档一致性。