Learn from Your Email (Reduce Manual Input)
Let InboxPilot learn your reply tone, style, and business context by scanning your connected email so you need to upload less data manually.
Concept: Email-derived knowledge — everything we learn comes from scanning the user's sent emails and inbox only. We do not make things up.
Description: The product scans the user's sent emails and inbox (their connected account). From that scan only, we derive what appears in the Knowledge Center: reply tone and style, suggested templates and FAQ when we spot patterns, website (from links or signature in their emails), and the labels they actually use. Nothing is invented—only inferred from their own mail.
Why reduce manual input?
- Faster onboarding — Users can connect email first and get useful AI replies with minimal or no uploads.
- Better adoption — Less friction than “upload templates, FAQ, files, website” before testing.
- More accurate tone — The AI learns from how the user actually writes, not just from a few hand-picked templates.
What we derive only from scanning sent emails and inbox
All of the following are inferred only from the scan of the user's sent emails and inbox. We do not invent data.
| Derived data | Source (from scan) | Where it's used |
|---|---|---|
| Reply tone & style | Sent email bodies: greetings, sign-offs, formality, sentence length | Writing-style summary for the AI (e.g. in system prompt or email_derived) |
| Email templates | Sent replies: cluster similar messages → “when to use” + body | Suggested mail_source (Knowledge Center → Email Templates) when patterns exist |
| FAQ-like pairs | Inbox + sent threads: recurring question → answer | Suggested qa_source when we detect clear Q&A patterns |
| Website | Links or signature/footer in sent or received emails (e.g. company URL) | Suggested website_data so the AI can reference the same site the user uses |
| Labels | Labels applied to messages in the account (e.g. Gmail labels) | “Most appropriate” labels for categorization and organization, based on how the user actually labels mail |
| Business-related text | Company name, product names, or policies mentioned in sent/inbox content | Append to text_source or email_derived when present in the scan |
The existing AI workflow uses mail_source, qa_source, documents, text_source, and website_data. Email-derived data from the scan is stored (e.g. in email_derived or merged into these) and used as context for the AI—all of it traceable to the inbox/sent scan, with nothing made up.
What if I don't use templates or get common inquiries?
Many people don't have fixed templates or repeat the same Q&As. The scan is still useful.
- We always learn your tone and style from your sent emails. We infer how you write: formal or casual, how you greet and sign off, sentence length, use of bullets or lists. That summary comes only from the scan and is used so the AI replies in your voice.
- Templates and FAQ are suggested only when we see them in your inbox/sent. We only add suggested templates or FAQ when we detect clear patterns (e.g. similar sent replies, recurring question→answer in threads). If we don't see patterns, we don't add them—we don't make anything up.
- Website and labels come from what appears in your emails (links, signature) and from the labels you apply to messages. If we don't see a clear website or label usage, we don't invent any.
You can rely on tone and style from the scan alone; anything else (templates, FAQ, website, labels) is only suggested when it's actually present in your sent emails and inbox.
How it fits with the Knowledge Center
- Today: Knowledge Center has five tabs (Email Templates, FAQ, Files, Website, Raw Text). The app requires “uploaded sources” before the Playground and connect flow are fully enabled.
- With “Learn from your email”:
- Option A — Optional: Scanning sent emails and inbox is an optional step. If the user has no uploads but has connected email and run “Learn from my email,” the system can treat email_derived (or merged sources) as valid context and allow testing (e.g.
hasUploadedSourcesbecomes true when email-derived data exists). - Option B — Supplement: Email-derived data only supplements manual uploads. Users still add templates/FAQ/files for accuracy, but the AI also uses tone and style (and website, labels, etc.) from the scan of their sent emails and inbox.
- Option A — Optional: Scanning sent emails and inbox is an optional step. If the user has no uploads but has connected email and run “Learn from my email,” the system can treat email_derived (or merged sources) as valid context and allow testing (e.g.
Recommendation: support both — allow “connect + scan only” to unlock the Playground with minimal uploads, while still encouraging Knowledge Center uploads for best results.
Implementation approach
Everything below is derived only from scanning the user's sent emails and inbox. No data is made up.
1. Scan sent emails and inbox
- Sent: Gmail API
users.messages.listwith the SENT label (and equivalent for Outlook/Zendesk). Fetch recent N sent messages. - Inbox: List messages in INBOX (or equivalent) so we have both sides of threads and the labels the user applies.
- Labels: From message metadata (e.g.
labelIdsin Gmail), collect the labels the user actually uses so we can treat them as most appropriate for categorization.
Reuse: getAuthenticatedGmailClient(email) and body extraction similar to extractEmailContent in src/server/actions/rules/execute-rule.ts.
2. Extract and normalize content
- For each message (sent and inbox): From, To, Subject, Date, body, and labels. Prefer plain-text body; strip HTML if needed.
- Optionally filter: skip very short messages, forwards, or obvious auto-replies so the style sample is representative.
3. Derive tone and style
- Option A — LLM: Send a sample of 10–20 sent email bodies to an LLM with a prompt like: “Summarize this person’s email tone and style: formal/casual, typical greetings and sign-offs, sentence length, use of bullets/lists.” Store the summary (e.g. in org settings or a new
sources.email_derived/writing_stylefield). - Option B — Heuristics: Detect common greetings (“Hi”, “Hello”, “Dear”), closings (“Best,” “Thanks,” “Regards”), and average sentence length; store as structured “style” hints.
That summary is injected into the AI system prompt when generating replies. No invented style—only what appears in the scan.
4. Derive templates and FAQ (only when present in scan)
- Templates: Group sent replies by similarity; only suggest a template when a cluster is clear. Store in mail_source with a “from inbox/sent” flag. Do not create templates we didn't see.
- FAQ: From inbox + sent threads, detect “incoming question” → “outgoing answer” pairs. Suggest qa_source only when such pairs are clearly present. Do not invent FAQ.
5. Derive website and labels (from scan only)
- Website: Extract links or domain from signature/footer/body in sent or received emails (e.g. company URL). Suggest website_data only when we find such links; do not invent a website.
- Labels: Use the labels present on scanned messages (e.g. Gmail
labelIds) as the set of “most appropriate” labels for the user. Expose for categorization/organization so the AI can align with how the user actually labels mail.
6. Persist and count as “sources”
- Schema: e.g. email_derived jsonb on
sources(writing_style, suggested_templates, suggested_faq, website, labels). Merge suggested items into mail_source / qa_source / website_data with metadata where applicable. - hasUploadedSources: Include email-derived data in the condition so “connect + scan” alone can enable Playground and connect flow.
7. UI flow
- In Knowledge Center, “Learn from my email” triggers a scan of sent emails and inbox for the selected connected account.
- The backend lists sent + inbox messages, extracts content and labels, derives only what is present (tone/style, templates/FAQ when patterns exist, website when links appear, labels from metadata), then saves to
sources. Nothing is invented. - Show a short summary of what was actually found (e.g. “We’ve learned your tone; we’ve added N suggested templates / M suggested FAQs / website / labels where we found them in your inbox and sent mail. You can edit them in Knowledge Center.”).
Summary
- Source of truth: Everything we learn comes from scanning the user’s sent emails and inbox only. We do not make things up.
- What we derive: Reply tone and style, suggested templates and FAQ when we see patterns, website when we see links/signature, and the labels the user actually uses (most appropriate labels). All of this populates the Knowledge Center (e.g.
email_derived, suggested mail_source / qa_source / website_data). - Result: Users connect email and run “Learn from my email”; the AI then uses only what was inferred from their own inbox and sent mail, with the option to add or edit data manually in the Knowledge Center.