Council Briefing

Key Deliberations

Reliability Front: Cross-Platform Stability & Test Coverage

Engineering throughput remains exceptional (dozens of PRs/day and expanding tests), yet the issue stream highlights fragility on macOS and in database adapters—directly challenging our Execution Excellence mandate and developer trust.

Which problems become “Fleet Blockers” that pause feature intake until resolved (to protect the reliability story)?

GitHub issues cited in holo-logs: macOS client startup/connectivity failures (#2360, #2471) and ARM64 Docker tokenizer module error (#2432).
Daily update: new requests for tests on Redis/SQLite/Supabase adapters after structure changes (#2469, #2467).

1Treat macOS client connectivity loops and ARM64 Docker breakages as immediate Fleet Blockers; freeze new features until fixed.

Maximizes developer trust and reduces support load, but temporarily slows ecosystem expansion.

2Prioritize database adapter correctness/testing (Redis + SQLite/Supabase) as Fleet Blockers; macOS issues handled as best-effort patches.

Protects core persistence guarantees, but risks reputational damage among Mac-heavy builders.

3No hard freeze; run parallel workstreams with a rotating strike team that closes top reliability issues weekly.

Maintains momentum, but risks recurring instability if triage discipline degrades.

4Other / More discussion needed / None of the above.

Do we enforce a minimum test/CI standard for new clients/plugins before merge to prevent regressions at current contribution volume?

Daily report: new tests added for GitHub client (#2407), Slack client (#2404), Instagram client (#2454), plugin-solana (#2345).
Repo activity: 46 PRs/33 merged (Jan 16-17) and 45 PRs/37 merged (Jan 17-18), indicating high merge throughput.

1Yes—require a baseline test harness + smoke tests for every new client/plugin before merge.

Raises merge friction now but stabilizes long-term reliability under massive contributor scale.

2Partial—require tests only for core runtime, DB adapters, and flagship clients; allow experimental plugins with looser gates.

Balances innovation with stability, but may create a “two-tier” quality perception.

3No—optimize for speed; rely on rapid rollback and community bug reports.

Maximizes shipping velocity, but undermines the ‘most reliable’ positioning and increases operator pain.

4Other / More discussion needed / None of the above.

What is the Council’s preferred stabilization cadence for releases while issue volume is rising?

Holo-log monthly stats (Jan): 1039 new PRs (735 merged), 401 new issues, 694 active contributors—high change rate.
Daily update includes multiple bug fixes and new compatibility issues appearing simultaneously.

1Adopt a strict release train: scheduled cutoffs + stabilization week with bugfix-only merges.

Improves predictability and quality, but may frustrate fast-moving contributors.

2Continue continuous delivery, but introduce a “stability branch” for Cloud/flagship users.

Preserves momentum while protecting production users, at the cost of branch management overhead.

3Move to fewer, larger releases tied to Cloud milestones to reduce churn.

Reduces operational noise but delays user-visible improvements and community feedback loops.

4Other / More discussion needed / None of the above.

Flagship Surface Area: Twitter/Telegram Client Reliability & Anti-Spam Controls

Social clients are a primary adoption funnel and public credibility layer, but authentication failures, reply formatting bugs, and deployment incompatibilities are actively impairing agent uptime and perceived competence.

What is our “minimum viable reliability bar” for social clients (Twitter/Telegram) before we position them as flagship-ready?

Issues cited: Twitter auth failures on AWS EC2 (Error 399, #2372) and unexpected JSON metadata in bot replies (#2423).
Daily update: Telegram client polling may conflict with cloud/blue-green deployments (#2466).

1Flagship-ready only when auth is robust across common hosts, replies are clean, and deployment mode is Cloud-compatible.

Stronger trust-through-shipping, but delays marketing/visibility for agent showcases.

2Flagship-ready if core posting works; publish known-issues + recommended hosting recipes (VPN/login steps, rate limits).

Ships faster while reducing surprises, but may normalize fragile behavior as “expected.”

3Flagship readiness is per-agent, not per-client; allow DegenSpartan/AIXVC to proceed with guardrails even if clients are imperfect.

Maintains narrative momentum, but risks public failures being attributed to ElizaOS itself.

4Other / More discussion needed / None of the above.

How should we reduce social-client harm (spam, scams, loops) while keeping autonomy high?

Discord holo-log: “Fix Twitter client to prevent responding to scam replies” mentioned; also rate limiting and mention handling challenges across channels.
Discord Q&A: control posting frequency via env vars (ENABLE_ACTION_PROCESSING=false; POST_INTERVAL_MIN/MAX).

1Default-safe autonomy: conservative rate limits, target user allowlists, and scam-reply filters enabled by default.

Reduces platform bans and reputational damage, but limits viral growth and responsiveness.

2Operator-driven autonomy: ship tooling and docs, but keep defaults permissive; builders assume responsibility.

Maximizes flexibility, but increases support burden and inconsistent user experiences.

3Introduce an optional “approval workflow” mode for high-stakes accounts (human-in-the-loop for posts).

Protects key brands/agents while preserving autonomy elsewhere, at the cost of extra UX complexity.

4Other / More discussion needed / None of the above.

Do we standardize a first-party uptime/ops pattern (watchdog, cron, health checks) as part of the core framework or keep it external?

Discord action item: “Implement cron job to monitor agent uptime” (Cipher); suggestions to auto-restart agents.
Community reports: running multiple agents and keeping them alive post-logout remains confusing/unanswered in some channels.

1Build first-party ops primitives (health endpoints + supervised restart) into ElizaOS Cloud and recommended self-host templates.

Improves reliability and DX; increases scope and maintenance responsibility.

2Publish official recipes (systemd, pm2, docker compose) and keep ops outside core.

Faster and simpler, but produces fragmented operator experience across environments.

3Make ops a plugin/adapter layer (community-maintained) with optional installation via registry.

Aligns with composability while avoiding core bloat, but quality may vary.

4Other / More discussion needed / None of the above.

Composable Future: Plugin Registry, Modularization, and V2 Secrecy vs Trust

Signals point toward a modular future (plugin registry, moving plugins out of core, dynamic plugin loading), while V2 is incubating privately—creating a strategic tension between rapid architectural evolution and community trust/coordination.

How aggressively should we move plugins out of core to reduce maintenance pain while keeping the developer experience seamless?

Completed items: “A new plugin registry has been created… move plugins out of core and add dynamic plugin loading” (Shaw, via X update in holo-logs).
Daily report shows rapid growth in features/integrations across many plugins, increasing surface area.

1Fast migration: deprecate core-bundled plugins quickly; make registry + dynamic loading the default path.

Reduces core bloat and accelerates composability, but risks breaking changes and onboarding confusion.

2Hybrid: keep a curated “core set” (flagship-quality) and move everything else to registry with clear tiering.

Preserves a stable DX baseline while enabling ecosystem growth, but requires governance and curation effort.

3Slow migration: prioritize stability; only extract plugins once APIs and docs are mature.

Minimizes churn, but prolongs maintenance load and slows scaling to many platforms/chains.

4Other / More discussion needed / None of the above.

What level of transparency should we maintain around V2 while it remains in a private repository?

Completed items: “ElizaOS v2 is currently in a private repository with limited access… finalizing details before merging back.” (Shaw, via X update in holo-logs).
Discord logs show builders asking “Where is V2 being developed?” with unanswered questions in coders channel.

1Publish a public V2 roadmap + API intent notes now, even if code stays private temporarily.

Improves alignment and reduces rumor load, while protecting unfinished implementation details.

2Selective access program: grant V2 repo access to high-signal contributors under guidelines, keep broader details limited.

Accelerates development with trusted builders, but may create perceived gatekeeping.

3Keep V2 mostly opaque until a merge-ready milestone to avoid thrash and external pressure.

Reduces coordination overhead, but increases community uncertainty and speculative narratives.

4Other / More discussion needed / None of the above.

How do we prevent knowledge/embedding confusion from becoming a chronic DX tax as we scale multi-provider support?

Discord coders: “dimension mismatches when trying to use different embedding models” and recurring RAG/knowledge management confusion (multiple users).
Suggested workaround: “use OpenAI for embedding since it uses 1536 dimensions” (Titan | Livepeer-Eliza.com).

1Enforce strict embedding compatibility checks with clear errors and an automatic migration/reset flow.

Reduces silent failures and support time, but may require opinionated constraints.

2Standardize on a default embedding dimension/provider for ‘happy path’ and document advanced overrides.

Improves onboarding and reliability; advanced users still can customize with informed tradeoffs.

3Keep flexibility; focus on documentation and community recipes rather than enforcing constraints.

Maximizes configurability but risks ongoing friction and perceived instability for new developers.

4Other / More discussion needed / None of the above.

North Star & Strategic Context

Key Deliberations