Council Briefing

Key Deliberations

Reliability & DX Triage (Config, DB, Install)

Operational chatter indicates recurring failures in basic onboarding paths: model selection flags are ignored, SQLite/Supabase adapters fail unpredictably (notably with node plugin), and package install/start failures continue to spawn new issues—directly conflicting with the execution-excellence directive.

Which reliability defect should be declared a Priority-0 “ship-stopper” for the next release train to protect developer trust?

Discord (2025-01-20, coders): Users reported character files with "model": "small" still default to large models (configuration confusion).
Discord (2025-01-20, coders): "Database connection not open" / SQLite connection problems, especially with node plugin.

1Fix model selection and modelClass enforcement (small/medium/large mapping) end-to-end.

Reduces surprise cost/latency and restores configuration credibility—critical for Cloud and enterprise adoption.

2Stabilize database adapters and node plugin startup (SQLite + Supabase) with deterministic defaults and clearer errors.

Improves first-run success rate and lowers support load, directly increasing builder retention.

3Resolve package installation/start failures (npm/pnpm packaging, missing modules, model download failures) via a hardened quickstart path.

Maximizes onboarding throughput, but may defer deeper runtime correctness issues that reappear later.

4Other / More discussion needed / None of the above.

Do we formalize a single blessed “golden path” repo (main eliza) and effectively deprecate eliza-starter until it meets reliability targets?

Discord (2025-01-20, coders): Community advised using main eliza repository instead of eliza-starter due to dependency issues.

1Yes—declare main repo the golden path; mark eliza-starter as experimental until parity is restored.

Short-term clarity and fewer broken installs; potential backlash from starter users but less fragmentation.

2No—invest immediately to fix eliza-starter and keep it as the primary onboarding path.

Better long-term onboarding UX, but consumes bandwidth that could stabilize core runtime and Cloud launch.

3Hybrid—golden path is main repo now; starter remains supported only for a narrow “hello agent” scenario with CI gates.

Balances focus and clarity, while keeping an entry ramp for non-experts without overpromising.

4Other / More discussion needed / None of the above.

What is the Council’s minimum acceptable “first-run success rate” and what enforcement mechanism do we adopt to achieve it?

GitHub Daily Update (2025-01-21): New issues include inability to install `@elizaos/agent` (#2624) and agent start failures due to model download failures (#2623).

1Set a hard gate: ≥90% first-run success in CI smoke tests across OS targets before release.

Strong trust signal, but may slow feature velocity and require test infra expansion.

2Set a soft target: ≥75% success with rapid hotfix cadence and transparent known-issues ledger.

Keeps shipping momentum, but risks continued churn and reputational drag.

3Segmented targets: 95% for Cloud path, 70% for self-host; prioritize commercial reliability first.

Optimizes for revenue and managed UX, but may alienate open-source self-hosters if neglected.

4Other / More discussion needed / None of the above.

Throughput vs Coherence (Plugin Expansion & Governance of Quality)

The ecosystem is adding plugins at a high tempo (NIM, Cronos EVM, router nitro, holdstation swap, MongoDB adapter, etc.), but without stronger quality gates this growth can amplify support burden and reduce perceived reliability—contradicting “Execution Excellence.”

How should the Council govern plugin intake to preserve composability while preventing reliability debt from exploding?

GitHub Activity (Jan 20–22): "29 new pull requests (19 merged)... jump to 66 active contributors" (rapid intake).
Daily Report (2025-01-20): Multiple new plugins landed (e.g., NVIDIA NIM #2599, Holdstation swap #2596, Router Nitro #2590, Cronos EVM #2585).

1Adopt strict plugin admission standards: tests + minimal docs + security review required before merge/registry inclusion.

Higher trust and lower breakage, but reduces contributor velocity and increases maintainer workload.

2Two-tier system: “Core/Verified” plugins with high gates; “Community/Experimental” plugins with lightweight gates and clear labeling.

Preserves innovation while protecting newcomers; requires consistent labeling and registry tooling.

3Max velocity: merge quickly, rely on community to surface issues; fix regressions post-merge.

Short-term expansion, long-term support overload and perceived instability—risks North Star alignment.

4Other / More discussion needed / None of the above.

Do we pause net-new plugins for a defined stabilization window to align with execution excellence, or keep parallel lanes?

Discord (2025-01-20): Team prioritizing V2 development over PR activities; ongoing backlog includes model selection + DB issues.

1Pause net-new plugins for 1–2 sprints; focus on core stability, docs, and onboarding success rate.

Improves reliability quickly, but may dampen community excitement and partner integrations.

2Parallel lanes: core team stabilizes; community plugins continue under a strict “experimental” banner.

Maintains momentum while protecting core; requires clear governance and moderation bandwidth.

3No pause; rely on tooling (CI, linters, bots) to keep quality acceptable at scale.

Works only if automation coverage is strong; otherwise risks repeated regressions and contributor frustration.

4Other / More discussion needed / None of the above.

Model & Provider Strategy (DeepSeek R1, NVIDIA NIM, Cost/Performance)

Community signal indicates a strategic opening: DeepSeek R1 claims near-frontier reasoning at drastically lower cost with permissive licensing, while NVIDIA NIM integration expands provider optionality—yet model selection bugs and inconsistent provider behavior undermine the ability to exploit these options safely.

Should the Council elevate DeepSeek R1 integration to a strategic priority, and if so, what role should it play (default vs optional vs Cloud-only)?

Discord (2025-01-20, partners/coders): "DeepSeek's R1... O1/Sonnet-level performance at 30x lower cost with MIT licensing."
Daily Report (2025-01-20): DeepSeek provider support and related fixes appear in the repo activity stream.

1Make R1 a first-class, documented option and recommend it for cost-optimized deployments.

Increases competitiveness and developer delight, but increases surface area for provider-specific bugs.

2Keep R1 experimental until model selection + provider parity issues are resolved.

Protects reliability narrative; may miss a window to capture builders seeking cheaper reasoning.

3Offer R1 primarily via ElizaOS Cloud with curated configs and guardrails; keep self-host optional.

Turns provider advantage into managed UX and revenue leverage, but may be seen as gating capability.

4Other / More discussion needed / None of the above.

How do we reconcile “Open & Composable” with an exploding matrix of providers (OpenAI/Anthropic/DeepSeek/NVIDIA NIM/etc.) without sacrificing reliability?

GitHub Daily Update (2025-01-21): Added NVIDIA NIM plugin (#2599) and multiple provider-related improvements.
Discord (2025-01-20): Users report provider-specific failures (e.g., Anthropic issues in Discord; switching to OpenAI resolved an error).

1Define a provider compatibility contract (streaming, tools, vision, embeddings) and certify providers against it.

Creates a reliable composability baseline and supports future certification programs.

2Limit official support to a small set of “Council-approved” providers; others remain community-supported.

Reduces QA load, but constrains openness and may slow ecosystem growth.

3Embrace full provider plurality; invest in runtime adapters and robust fallback logic to smooth differences.

Most aligned with openness, but demands significant engineering investment in abstraction and testing.

4Other / More discussion needed / None of the above.

What is our canonical performance target: lower cost per agent, lower latency, or higher autonomy (memory/RAG/tooling), given current community pain points?

Discord (2025-01-20, coders): Need for better memory management so agents persist data between messages.
Discord (2025-01-20): Model selection confusion causing unintended use of large models (cost/latency risk).

1Prioritize cost control (correct model selection + cheaper reasoning providers) to maximize adoption.

Boosts builder experimentation and Cloud unit economics, but may leave autonomy gaps unresolved.

2Prioritize autonomy (memory/RAG correctness and persistence) even if cost/latency stays higher short-term.

Improves flagship-agent credibility and “agents that work,” but may reduce casual developer adoption.

3Prioritize latency/UX (streaming, responsiveness, client stability) to make agents feel alive across platforms.

Strengthens perceived quality and retention, but without autonomy gains agents may remain shallow.

4Other / More discussion needed / None of the above.

North Star & Strategic Context

Key Deliberations