Council Briefing

Key Deliberations

V2 Beta Launch Readiness vs. Stability Debt

Engineering velocity is high (multiple UX/CLI fixes merged; issue triage closed same-day), yet the beta narrative is still “works, but unstable,” risking developer confidence during the refactor-induced turbulence.

Do we hold the V2 line until a “golden path” install/start flow is deterministic for 90%+ of new builders, even if it delays ecosystem rollouts (Spartan, marketplace, hackathons)?

Discord (rhota): "Beta phase will last about 2 weeks" and "remains somewhat unstable" after repo consolidation.
GitHub: "Fixed CLI agent command" (#4028) and "Added validation and testing for CLI commands" (#4004).

1Yes—gate V2 on a hardened golden path (install → create → start → deploy) with strict release criteria.

Maximizes trust through shipping and reduces support load, but slows downstream launches.

2No—ship on schedule and rely on rapid patch cadence plus community support to absorb instability.

Maintains momentum and narratives, but risks reputation damage if early adopters churn.

3Hybrid—ship V2 as “preview” while maintaining a clearly supported stable track (0.25.9/1.x) with explicit migration windows.

Preserves velocity without abandoning reliability, but requires disciplined comms and dual-track maintenance.

4Other / More discussion needed / None of the above.

What is the Council’s preferred strategy for dependency/version fragmentation (e.g., plugin-sql mismatches) to prevent repeated install failures across the ecosystem?

Discord: "No matching version found for @elizaos/plugin-sql@^0.25.6" reported by multiple users; suggested pinning CLI beta versions.
GitHub top issues: dependency/version mismatch issue (#4101) was opened and later closed (March window).

1Enforce monorepo-pinned versions and a single blessed package set per release (lockstep).

Reduces user confusion and support costs, but increases coordination overhead across packages.

2Adopt strict semver + compatibility matrix tooling, allowing multiple supported combinations.

Improves flexibility, but requires rigorous documentation and automated validation to avoid drift.

3De-scope optional plugins from default installs and move them behind explicit opt-in templates.

Stabilizes first-run experience, but may slow discovery of ecosystem capabilities.

4Other / More discussion needed / None of the above.

Where should we invest next to maximize ‘Execution Excellence’ impact: UI polish, CLI determinism, or runtime/core correctness (e.g., embeddings, adapters, RAG performance)?

GitHub: multiple UI/UX fixes merged (memory viewer #4027, profile UI #4021, start/create UX #4007).
Discord: repeated pain points—CLI install/start issues and knowledge/RAG reliability complaints.

1CLI determinism first (install/start, clear errors, environment management).

Improves onboarding conversion and reduces friction for every developer, reinforcing Developer First.

2Runtime/core correctness first (RAG, embeddings, adapters, DB stability).

Prevents ‘it runs but it lies’ failures that erode trust and break flagship agent credibility.

3UI polish first (builder experience, memory/knowledge tooling, multi-agent chat UX).

Boosts perceived quality and demos, but may mask unresolved underlying reliability debt.

4Other / More discussion needed / None of the above.

Knowledge/RAG Reliability as a Trust Primitive

Knowledge ingestion and retrieval are emerging as the system’s most visible reliability fault-line (directory ambiguity, PDF workflows, adapter gaps); documentation and tooling improvements are landing, but the feature still feels probabilistic to users.

Should RAG/Knowledge be elevated to a “first-class contract” with strict behavioral guarantees (tests + reference datasets), even if it constrains flexibility in directory structures and adapters?

Discord (Sabochee): knowledge feature "seems to work 10% ok" (reported sentiment).
Discord (multiple): conflicting advice on knowledge paths (e.g., `characters/knowledge/...` vs `./knowledge/...`).

1Yes—define a single canonical knowledge pipeline and enforce it with integration tests and fixtures.

Transforms RAG from folklore into a reliable product surface, aligning with Execution Excellence.

2No—keep the pipeline flexible and prioritize documentation/examples over rigid contracts.

Preserves composability but risks continued “it depends” support burden and trust erosion.

3Partial—guarantee core ingestion/retrieval semantics while allowing multiple directory layouts via explicit configuration.

Balances reliability with composability, but requires precise DX and config validation.

4Other / More discussion needed / None of the above.

What is the Council’s preferred approach to PDF knowledge ingestion: mandate preprocessing to markdown/text, or ship an official PDF pipeline that is robust by default?

Discord (Midas): "Folder2knowledge — knowledge2character" converts PDFs to plain text then to character knowledge.
Discord: PDF + RAG issues repeatedly cited as a common pain point.

1Mandate preprocessing (PDF → text/markdown) and keep core minimal; provide scripts and guardrails.

Reduces complexity in core and improves determinism, but adds steps for end users.

2Ship an official, batteries-included PDF pipeline (extraction + chunking + embedding) with sane defaults.

Improves out-of-box success and demos, but increases maintenance surface and edge-case load.

3Dual-path: official pipeline for common PDFs plus explicit advanced hooks for custom extraction.

Covers mainstream needs while retaining composability, but requires careful DX to avoid confusion.

4Other / More discussion needed / None of the above.

How should we operationalize “Taming Information” internally: do we treat docs tooling (llms.txt, mermaid maps, doc-chat) as critical infrastructure with dedicated ownership and SLAs?

GitHub: docs frontpage improvements + llms.txt added (#3991); docs versioning (#3963).
Discord (jin): improving doc navigation with mermaid flow charts; requests for clearer onboarding (incl. Chinese community).

1Yes—create a Docs & DX guild with explicit metrics (time-to-first-agent, doc freshness, issue deflection).

Turns documentation into a compounding asset and reduces operational chaos across channels.

2No—keep docs community-driven and opportunistic; prioritize code shipping.

Maximizes short-term feature velocity, but perpetuates support load and fragmented knowledge.

3Hybrid—core team owns onboarding/golden paths; community owns long-tail pages with curated review cycles.

Maintains focus while leveraging open-source energy, but needs governance and editorial processes.

4Other / More discussion needed / None of the above.

Flagship Agents, Public Trust, and Token Utility Alignment

Spartan’s functionality is tightly coupled to V2 stability, while token utility discussions are active but not yet grounded in shipped mechanisms; public information asymmetry (private holder channels vs broader community) risks undermining trust-through-shipping.

Should Spartan (and other flagship agents) be treated as release-blocking acceptance tests for V2, rather than post-launch integrations?

Discord (rhota): focus is "getting Spartan working" after merging repos; beta ~2 weeks.
Discord (Odilitime): "trying to get spartan chatting before the v2 official launch".

1Yes—flagship agents are the proof-of-reliability; if they fail, V2 is not ready.

Strengthens trust and sets quality bar, but may slow platform release cadence.

2No—ship platform first; flagship agents can lag and stabilize independently.

Increases platform velocity, but weakens narrative and real-world validation of the framework.

3Split—define a minimal flagship capability set (chat + knowledge + one client) as blocking; advanced features can follow.

Creates an achievable quality gate while preserving momentum for iterative improvements.

4Other / More discussion needed / None of the above.

What token utility direction best aligns with the North Star (developer-friendly infra) without turning ElizaOS into a commodity compute reseller?

Discord (Patt): token as commodity for paying for compute; comparisons to Venice AI staking model.
Discord (DorianD): skepticism about compute reselling; push for token integration into agent functionality.

1Compute/payment utility first (API credits, discounts, staking-to-inference power).

Simple and legible, but risks positioning ElizaOS as a wrapper/reseller rather than a protocol.

2Security/quality utility first (staking to secure plugins/agents, reputation, slashing for non-delivery).

Aligns incentives with ecosystem reliability, but requires careful governance design and legal/UX considerations.

3Identity/registry utility first (agent registration via public keys, usage metrics, attestations).

Builds composable trust fabric for a decentralized agent economy, but value accrual may be slower to surface.

4Other / More discussion needed / None of the above.

How should we balance private holder experiences with public transparency to maximize community trust and developer adoption?

Discord (Toni): concern that private channels leave the public "no way to get any info"; rhota cites Telegram as public channel.
Discord: structured social amplification proposed (Typefully drafts → ben review → main account posting).

1Open-first: move status updates, roadmaps, and technical progress to public channels; keep only product perks private.

Maximizes developer trust and reduces rumor cycles, but may reduce perceived holder exclusivity.

2Holder-first: keep core updates in private channels and publish curated summaries when ready.

Strengthens holder retention, but risks alienating builders and external partners.

3Two-lane comms: public weekly engineering briefings + private “alpha features” access, with clear boundaries.

Preserves exclusivity while maintaining credibility, but requires disciplined publishing cadence.

4Other / More discussion needed / None of the above.

North Star & Strategic Context

Key Deliberations