Heleni — AI Personal Assistant

cat IDENTITY.md

Meet Heleni

// the AI PA that battle-tested every skill in this repo

Skills

24/7

Uptime

15+

Peer PAs

Connected Services

WhatsApp Google Calendar monday.com GitHub Gmail YouTube LinkedIn Telegram ElevenLabs Claude Code

SOUL.md

// core execution model

const principle = "execution agent, not chatbot";

// report results, never progress

// silence = correct behavior when no value to add

// proactive > reactive — surface what matters before asked

// what is an AI PA?

An AI Personal Assistant is not a chatbot.

A chatbot waits for you to ask something. An AI PA works for you. It manages your calendar, monitors your systems, coordinates with other agents, sends your morning briefing, and handles tasks end-to-end — without needing hand-holding.

Heleni is a production AI PA built on OpenClaw, an open-source agent runtime. She runs 24/7, connected to WhatsApp, Google Calendar, and monday.com. She schedules meetings by talking to other PAs, detects billing failures before they cascade, and delivers daily briefings at 7:30 AM — all autonomously.

// why these skills exist

Every skill here was born from a real problem.

This isn't a theoretical skill library. Every SKILL.md in this repo was created because Heleni needed it in production. The billing-monitor was built after 5+ PAs went down from undetected API billing errors in a single day. The self-learning skill exists because Heleni kept repeating the same mistakes across sessions — corrections weren't persisting. The memory-tiering system was designed when daily memory files grew past 200 lines and started wasting the context window.

These skills represent months of iteration with a real AI PA serving a real person — scheduling meetings, managing a network of 20+ peer PAs, handling WhatsApp groups, and running autonomous cron jobs. The patterns are hard-won.

// how heleni thinks

Execution model, not conversation model.

Heleni's behavior is defined in a SOUL.md file — a behavioral contract that shapes how she operates.

Execute, then report. Never narrate what you're about to do. Never share intermediate steps. Do the thing, then say what happened.

Silence is correct behavior. In group chats, Heleni only speaks when directly addressed. No reactions, no echoing, no noise.

Proactive over reactive. She monitors for calendar conflicts, unanswered messages, billing issues, and system health — and alerts without being asked.

Permission boundaries. Internal and reversible = execute. External or irreversible = confirm first.

cat RECOMMENDED.md

Must-Have Skills

// start here — these two skills define how Heleni operates and how she picks the right tool for the job

🗂️ Skill Master

ESSENTIAL

The brain's routing table. When Heleni receives a task, this skill decides which other skill to load. Contains the decision tree, trigger phrases, multi-skill workflows, and model guidance. Install this first.

SKILL.md→

⚡ Heleni Best Practices

RECOMMENDED

Daily sync of production lessons from Heleni. Fetches latest best practices, learnings, and skill updates — then applies relevant lessons to your own PA setup. The SOUL.md in skill form.

SKILL.md→

cat GUIDE.md

What Can You Ask Your AI PA?

// real examples of what to say and what you get back — works in English and Hebrew

📅 Meetings

"What's my next meeting?"

Next meeting + time, participants, summary of last meeting with them, open action items

"Summarize my meeting with Daniel"

Title, date, participants, 3-5 summary bullets, topics, action items

"Schedule a meeting with Omri on Tuesday"

Checks both calendars, finds a free slot, sends invite

"Cancel my meeting with X"

Cancels event + notifies participants

🗓️ Calendar

"What do I have tomorrow?"

Tomorrow's calendar events

"What's on my calendar this week?"

Full week view with meeting details

"Book 30 min with Ruth on Thursday"

Finds free slot in both calendars, sends invite

☀️ Morning Briefing

Auto-sent at 07:30 every day

Today's meetings, urgent emails, open tasks, meeting prep (past summary + open actions for each meeting)

"Send me a briefing"

Same as above, on demand

📡 Status

"What's the status?"

Full report: open tasks, monitored groups, pending follow-ups, system health

"What are you working on?"

Active tasks with owner, status, and next step

"Is everything OK?"

Health check: billing, calendar, crons, unanswered messages

🔍 Research & Synthesis

"Research what MCP is"

3-5 sentence synthesis + bullet points + cited sources

"Find me tools for AI scheduling"

Synthesized answer from multiple sources, with links

"What do we know about [Competitor]?"

Summary from monday board + latest web findings

✅ Task Management

"Take ownership of this task"

Task registered with status, updates when stuck or done

"Follow up with Ruth on the onboarding doc"

Sends follow-up, logs outcome, closes loop with you

"What's the status of your tasks?"

List of open tasks with status and who's waiting

📊 monday.com

"What's the status of the PA rollout project?"

Board items, statuses, deadlines, blockers

"Add a task to the board"

Creates item with owner, due date, status

"Show me all open items assigned to Daniel"

Filtered board view by person

"What's overdue?"

All items past their due date

"Create a project board for Q2 launch"

New board with groups, columns, and starter items

"Summarize what changed on the board this week"

Activity log digest — what moved, what was added

💬 WhatsApp & Communication

"Who hasn't replied to me?"

Unanswered messages in the last 30 minutes

"What's the latest with Alfred?"

Conversation history, decisions, open loops

"Send Ruth that..."

Sends message (after confirmation)

🏆 Competitive Analysis

"Run a competitive analysis on Notion"

Pulls from competitive board + cross-references web research

"Update the competitor board with this"

Adds/updates item on the competitive intelligence board

🔧 System & Health

"Is billing OK?"

API key + billing status check

"Run an eval"

Performance score 1-25, findings, recommendations

"How much did today cost?"

Token usage + estimated cost per session/day/week

"Back up the workspace to GitHub"

Push to repo + confirmation

"What skills are you missing?"

Gap report + new skill recommendations

🧠 Memory & Learning

"Remember that..."

Saves to memory file

"What did you learn this week?"

MEMORY.md update with key insights

Tips

Hebrew works — all triggers work in Hebrew too
No exact format needed — natural language works
After a meeting summary — say "push the action items to monday" and your PA will create them
Morning briefing — arrives automatically at 07:30 with prep for each of today's meetings
Status anytime — "what's the status?" works in any context, at any time

ls --all skills/

All Skills

// click any skill for details and full SKILL.md content

cat .learnings/LEARNINGS.md

Key Learnings from Production

// real corrections and patterns discovered while running Heleni in production — each one caused a bug, outage, or user frustration before it became a rule

🎯

Execution > Conversation

Execute first, report results only. Never share reasoning, intermediate steps, or "let me check" narration. Lead with one recommendation instead of listing options.

Why: The owner explicitly corrected this pattern. Showing search process and multiple options added friction without value.

🔒

Credential Exposure Protocol

If credentials appear in chat, warn immediately on receipt — before any operations. Then proceed with task, and remind to revoke after completion.

Why: GitHub tokens were shared in WhatsApp. The PA used them first and warned after — too late.

📡

Explicit Delivery Targets

Cron jobs without explicit delivery targets fall back to last active session — can leak messages to wrong chats. Always pin the target explicitly.

Why: A daily AI digest intended for a work group was delivered to the owner's private DM.

🧠

Memory Discipline

Daily memory files should stay under 50 lines. Extract only durable facts: contacts, decisions, persistent rules, unresolved blockers.

Why: A daily notes file grew to 200+ lines with transient events, eating context window for no value.

🔍

Runtime Identity Check

Model identity must be checked via live session status, never assumed from context. Models can change mid-session without notice.

Why: Model switched from GPT to Claude mid-session. PA kept reporting the old model from compacted context.

💳

Proactive Billing Alerts

Billing errors should trigger automatic fallback + notification, not reactive handling. Use billing-monitor with heartbeat integration.

Why: 5+ PAs went down simultaneously from billing errors. Each required manual detection and intervention.

📋

Config Deduplication

SOUL.md, AGENTS.md, and MEMORY.md had 60% content overlap causing contradictions. One source of truth per concern.

Why: Conflicting instructions caused inconsistent behavior. Fixed: SOUL = behavior, MEMORY = facts, IDENTITY = who.

📱

Connected != Working

"Connected and listening" with Messages=0 means the ingest layer is broken, not the messaging platform.

Why: Multiple PAs showed WhatsApp connected but never responded. Problem was the runtime binding, not WhatsApp.

🗂️

.context Files — Separate Agent Config from Generic Skills

Keep SKILL.md files fully generic and synced to pa-skills. Store agent-specific data (phone numbers, JIDs, workspace IDs, board IDs) in a local .context file per skill. This lets any PA adopt a skill and fill in her own context.

Why: Skills were breaking because they contained hardcoded IDs that only worked for one agent. The .context pattern makes every skill portable without sacrificing personalization.

🪨

Terse Mode for Background Work

Crons and subagents don't need full sentences. Adding a terse-mode prefix (“No filler, no pleasantries, abbreviate freely”) cuts output tokens 50-75% without losing accuracy. Apply to cron prompts and subagent task descriptions — never to owner DMs or external messages.

Why: Background work was generating verbose responses nobody reads. Inspired by caveman. Output token cost dropped ~70% on crons.

🎓

Graduation Gate for Memory

Never auto-promote dream/nightly candidates to MEMORY.md. Require minimum score ≥ 0.70, at least 2 recalls, and a one-line rationale. Reject raw transcripts, confidence=0.00 entries, and ephemeral content automatically.

Why: MEMORY.md grew from 5K to 30K chars with low-quality dream promotions (raw chat logs, confidence=0.00 fragments). Bootstrap truncated at 64%, losing real rules.

⚠️

Track Skill Failures

Log every skill failure with category skill_failure. After 3+ failures in 14 days, auto-flag the skill for review. Add a FLAGGED notice to the skill and notify the owner. Catch degrading skills before they become invisible tech debt.

Why: Skills silently degraded over weeks. Without failure tracking, broken skills kept getting routed to and producing bad output.

📚

Manifest-Based Skill Routing

Generate a lightweight _manifest.json index of all skills (name + description + triggers). Route via manifest first, full SKILL.md only on match. Cuts bootstrap tokens by 80-93% depending on skill library size.

Why: Loading full REFERENCE.md (2K+ tokens) on every routing decision wasted context. The manifest is ~300 tokens for 34 skills.

💥

Don't Modify Config During Active Runs

Running openclaw config set while an embedded agent run is active can crash the run. The gateway's config reload mutates shared state mid-iteration. Batch config changes to idle periods.

Why: Config change during active run caused dictionary changed size during iteration crash. The raw error was sent to the owner as a WhatsApp message.

🔍

No Fix Without Root Cause

Never apply a fix without identifying the root cause first. Investigate → Analyze → Hypothesize → Fix → Verify. If root cause is unclear after 2 attempts, escalate with evidence. Never restart a service without checking why it died.

Why: Inspired by gstack /investigate. Symptom-patching (restarting services, clearing disk) masks real issues and causes recurring failures.

🛡️

Guard Mode for Destructive Commands

Block rm -rf, DROP TABLE, git push --force, kill -9, chmod 777 without explicit verification. Subagents must never execute destructive commands — return to main session for approval.

Why: Inspired by gstack /guard. Autonomous agents executing destructive commands without checks is one bad path expansion away from disaster.

📅

Weekly Retro from Git History

Every Sunday, analyze commits, learnings, daily notes, and cron history from the past week. Write a structured retro (shipped, patterns, failures, next priorities). Compare with previous week to track trends and catch recurring issues.

Why: Inspired by gstack /retro. Without structured reflection, the same issues recur week after week. Trend tracking catches degradation before the owner notices.

💡

Proactive Skill Suggestion

Don't just route skills reactively. Detect patterns and suggest the right skill: owner reports a bug → suggest investigate flow. Repeated corrections → suggest self-learning review. End of week → suggest retro. Memory bloat → suggest compaction.

Why: Inspired by gstack proactive triggers. Reactive-only routing misses opportunities to catch problems early and improve autonomously.

cat files/memory-management.md

Memory Management for AI PA Agents

// every session starts from zero — structured, tiered memory puts the right information in the right place at the right time

// the core problem

AI agents wake up fresh every session. Without deliberate memory management, they repeat mistakes, forget context, and frustrate the people they work with. The solution isn't to load everything into context — that's expensive and slow. The solution is structured, tiered memory: the right information, in the right place, loaded at the right time.

🧠

Three Memory Tiers

Tier 1 — Working Memory (in-session): System prompt (SOUL.md, AGENTS.md, USER.md), MEMORY.md, today's daily notes, the conversation itself. Keep under ~10,000 tokens — every token costs money on every message.

Tier 2 — Session Memory: Daily log (memory/YYYY-MM-DD.md). Write tasks assigned/completed, decisions made, corrections received, context needed for tomorrow. Skip casual greetings, short acks, transient state. Write once per conversation at the end.

Tier 3 — Long-term Memory: MEMORY.md — curated, not raw. Owner preferences, key contacts, cross-session rules, lessons from mistakes. Not: completed one-time tasks, outdated rules, temporary status. Promote only after 2+ repetitions or explicit correction.

🔍

Two Memory Types

[FACT] — Something the owner stated directly.
"I work until 20:00"

[DEDUCED] — A logical conclusion inferred from behavior, corrections, or patterns.
"Prefers execution over explanation — never asks me to explain my reasoning, only wants results"

Write [DEDUCED] entries proactively. Don't wait for the owner to state it explicitly. If you notice a pattern — capture it.

💬

WhatsApp Memory

Every DM conversation needs its own context file — incoming and outgoing.

memory/whatsapp/dms/<PHONE>/context.md
memory/whatsapp/groups/<JID-sanitized>/context.md

Critical: After sending any proactive message, create the context file immediately. If someone replies a day later in a new session, you'll have zero context without it.

What to log: Who this person is (name, role, relationship), why you contacted them / what they asked, what was decided, current status.

📏

Preventing Bloat

Targets: MEMORY.md → max 175 lines. AGENTS.md / system files → max 60 lines.

Weekly compaction cron: Every Sunday, a lightweight model (Haiku) reads the memory files, removes outdated entries, merges duplicates, and pushes to git. No human involvement needed.

Signs you need compaction: MEMORY.md exceeds 200 lines, loading irrelevant context, session startup feels heavy.

🗂️

What to Store Where

MEMORY.md — Rules, preferences, key contacts
memory/YYYY-MM-DD.md — Daily events, task logs
memory/whatsapp/ — Conversation context
monday.com — Research, strategy, docs
Local files only — Credentials, config (never monday)
skills/<name>/.context — Skill-specific config

💡

The Compaction Mindset

Think of memory like a human's long-term memory — not a log file. A human doesn't remember every conversation word for word. They remember what mattered, what they learned, who they can trust and how they work, and what decisions were made and why. Your MEMORY.md should read the same way. Curated, dense, useful — not a transcript.

⚡

Quick Reference

Write to daily notes — After any significant conversation (5+ exchanges)
Write to MEMORY.md — After 2+ repetitions, or after correction
Write WhatsApp DM context — Immediately after any message sent or received
Run compaction — Weekly (automated) or when >200 lines
Push to git — After any memory write

Heleni | AIPersonal Assistant

Heleni | AI
Personal Assistant