// real corrections and patterns discovered while running Heleni in production — each one caused a bug, outage, or user frustration before it became a rule
🎯
Execution > Conversation
Execute first, report results only. Never share reasoning, intermediate steps, or "let me check" narration. Lead with one recommendation instead of listing options.
Why: The owner explicitly corrected this pattern. Showing search process and multiple options added friction without value.
🔒
Credential Exposure Protocol
If credentials appear in chat, warn immediately on receipt — before any operations. Then proceed with task, and remind to revoke after completion.
Why: GitHub tokens were shared in WhatsApp. The PA used them first and warned after — too late.
📡
Explicit Delivery Targets
Cron jobs without explicit delivery targets fall back to last active session — can leak messages to wrong chats. Always pin the target explicitly.
Why: A daily AI digest intended for a work group was delivered to the owner's private DM.
🧠
Memory Discipline
Daily memory files should stay under 50 lines. Extract only durable facts: contacts, decisions, persistent rules, unresolved blockers.
Why: A daily notes file grew to 200+ lines with transient events, eating context window for no value.
🔍
Runtime Identity Check
Model identity must be checked via live session status, never assumed from context. Models can change mid-session without notice.
Why: Model switched from GPT to Claude mid-session. PA kept reporting the old model from compacted context.
💳
Proactive Billing Alerts
Billing errors should trigger automatic fallback + notification, not reactive handling. Use billing-monitor with heartbeat integration.
Why: 5+ PAs went down simultaneously from billing errors. Each required manual detection and intervention.
📋
Config Deduplication
SOUL.md, AGENTS.md, and MEMORY.md had 60% content overlap causing contradictions. One source of truth per concern.
Why: Conflicting instructions caused inconsistent behavior. Fixed: SOUL = behavior, MEMORY = facts, IDENTITY = who.
📱
Connected != Working
"Connected and listening" with Messages=0 means the ingest layer is broken, not the messaging platform.
Why: Multiple PAs showed WhatsApp connected but never responded. Problem was the runtime binding, not WhatsApp.
🗂️
.context Files — Separate Agent Config from Generic Skills
Keep SKILL.md files fully generic and synced to pa-skills. Store agent-specific data (phone numbers, JIDs, workspace IDs, board IDs) in a local .context file per skill. This lets any PA adopt a skill and fill in her own context.
Why: Skills were breaking because they contained hardcoded IDs that only worked for one agent. The .context pattern makes every skill portable without sacrificing personalization.
🪨
Terse Mode for Background Work
Crons and subagents don't need full sentences. Adding a terse-mode prefix (“No filler, no pleasantries, abbreviate freely”) cuts output tokens 50-75% without losing accuracy. Apply to cron prompts and subagent task descriptions — never to owner DMs or external messages.
Why: Background work was generating verbose responses nobody reads. Inspired by
caveman. Output token cost dropped ~70% on crons.
🎓
Graduation Gate for Memory
Never auto-promote dream/nightly candidates to MEMORY.md. Require minimum score ≥ 0.70, at least 2 recalls, and a one-line rationale. Reject raw transcripts, confidence=0.00 entries, and ephemeral content automatically.
Why: MEMORY.md grew from 5K to 30K chars with low-quality dream promotions (raw chat logs, confidence=0.00 fragments). Bootstrap truncated at 64%, losing real rules.
⚠️
Track Skill Failures
Log every skill failure with category skill_failure. After 3+ failures in 14 days, auto-flag the skill for review. Add a FLAGGED notice to the skill and notify the owner. Catch degrading skills before they become invisible tech debt.
Why: Skills silently degraded over weeks. Without failure tracking, broken skills kept getting routed to and producing bad output.
📚
Manifest-Based Skill Routing
Generate a lightweight _manifest.json index of all skills (name + description + triggers). Route via manifest first, full SKILL.md only on match. Cuts bootstrap tokens by 80-93% depending on skill library size.
Why: Loading full REFERENCE.md (2K+ tokens) on every routing decision wasted context. The manifest is ~300 tokens for 34 skills.
💥
Don't Modify Config During Active Runs
Running openclaw config set while an embedded agent run is active can crash the run. The gateway's config reload mutates shared state mid-iteration. Batch config changes to idle periods.
Why: Config change during active run caused dictionary changed size during iteration crash. The raw error was sent to the owner as a WhatsApp message.
🔍
No Fix Without Root Cause
Never apply a fix without identifying the root cause first. Investigate → Analyze → Hypothesize → Fix → Verify. If root cause is unclear after 2 attempts, escalate with evidence. Never restart a service without checking why it died.
Why: Inspired by
gstack /investigate. Symptom-patching (restarting services, clearing disk) masks real issues and causes recurring failures.
🛡️
Guard Mode for Destructive Commands
Block rm -rf, DROP TABLE, git push --force, kill -9, chmod 777 without explicit verification. Subagents must never execute destructive commands — return to main session for approval.
Why: Inspired by
gstack /guard. Autonomous agents executing destructive commands without checks is one bad path expansion away from disaster.
📅
Weekly Retro from Git History
Every Sunday, analyze commits, learnings, daily notes, and cron history from the past week. Write a structured retro (shipped, patterns, failures, next priorities). Compare with previous week to track trends and catch recurring issues.
Why: Inspired by
gstack /retro. Without structured reflection, the same issues recur week after week. Trend tracking catches degradation before the owner notices.
💡
Proactive Skill Suggestion
Don't just route skills reactively. Detect patterns and suggest the right skill: owner reports a bug → suggest investigate flow. Repeated corrections → suggest self-learning review. End of week → suggest retro. Memory bloat → suggest compaction.
Why: Inspired by
gstack proactive triggers. Reactive-only routing misses opportunities to catch problems early and improve autonomously.
cat files/memory-management.md
Memory Management for AI PA Agents
// every session starts from zero — structured, tiered memory puts the right information in the right place at the right time
// the core problem
AI agents wake up fresh every session. Without deliberate memory management, they repeat mistakes, forget context, and frustrate the people they work with. The solution isn't to load everything into context — that's expensive and slow. The solution is structured, tiered memory: the right information, in the right place, loaded at the right time.
🧠
Three Memory Tiers
Tier 1 — Working Memory (in-session): System prompt (SOUL.md, AGENTS.md, USER.md), MEMORY.md, today's daily notes, the conversation itself. Keep under ~10,000 tokens — every token costs money on every message.
Tier 2 — Session Memory: Daily log (memory/YYYY-MM-DD.md). Write tasks assigned/completed, decisions made, corrections received, context needed for tomorrow. Skip casual greetings, short acks, transient state. Write once per conversation at the end.
Tier 3 — Long-term Memory: MEMORY.md — curated, not raw. Owner preferences, key contacts, cross-session rules, lessons from mistakes. Not: completed one-time tasks, outdated rules, temporary status. Promote only after 2+ repetitions or explicit correction.
🔍
Two Memory Types
[FACT] — Something the owner stated directly.
"I work until 20:00"
[DEDUCED] — A logical conclusion inferred from behavior, corrections, or patterns.
"Prefers execution over explanation — never asks me to explain my reasoning, only wants results"
Write [DEDUCED] entries proactively. Don't wait for the owner to state it explicitly. If you notice a pattern — capture it.
💬
WhatsApp Memory
Every DM conversation needs its own context file — incoming and outgoing.
memory/whatsapp/dms/<PHONE>/context.md
memory/whatsapp/groups/<JID-sanitized>/context.md
Critical: After sending any proactive message, create the context file immediately. If someone replies a day later in a new session, you'll have zero context without it.
What to log: Who this person is (name, role, relationship), why you contacted them / what they asked, what was decided, current status.
📏
Preventing Bloat
Targets: MEMORY.md → max 175 lines. AGENTS.md / system files → max 60 lines.
Weekly compaction cron: Every Sunday, a lightweight model (Haiku) reads the memory files, removes outdated entries, merges duplicates, and pushes to git. No human involvement needed.
Signs you need compaction: MEMORY.md exceeds 200 lines, loading irrelevant context, session startup feels heavy.
🗂️
What to Store Where
MEMORY.md — Rules, preferences, key contacts
memory/YYYY-MM-DD.md — Daily events, task logs
memory/whatsapp/ — Conversation context
monday.com — Research, strategy, docs
Local files only — Credentials, config (never monday)
skills/<name>/.context — Skill-specific config
💡
The Compaction Mindset
Think of memory like a human's long-term memory — not a log file. A human doesn't remember every conversation word for word. They remember what mattered, what they learned, who they can trust and how they work, and what decisions were made and why. Your MEMORY.md should read the same way. Curated, dense, useful — not a transcript.
⚡
Quick Reference
Write to daily notes — After any significant conversation (5+ exchanges)
Write to MEMORY.md — After 2+ repetitions, or after correction
Write WhatsApp DM context — Immediately after any message sent or received
Run compaction — Weekly (automated) or when >200 lines
Push to git — After any memory write