Learn from Your Email (Reduce Manual Input)

Concept: Email-derived knowledge — everything we learn comes from scanning the user's sent emails and inbox only. We do not make things up.

Description: The product scans the user's sent emails and inbox (their connected account). From that scan only, we derive what appears in the Knowledge Center: reply tone and style, suggested templates and FAQ when we spot patterns, website (from links or signature in their emails), and the labels they actually use. Nothing is invented—only inferred from their own mail.

Why reduce manual input?

Faster onboarding — Users can connect email first and get useful AI replies with minimal or no uploads.
Better adoption — Less friction than “upload templates, FAQ, files, website” before testing.
More accurate tone — The AI learns from how the user actually writes, not just from a few hand-picked templates.

What we derive only from scanning sent emails and inbox

All of the following are inferred only from the scan of the user's sent emails and inbox. We do not invent data.

Derived data	Source (from scan)	Where it's used
Reply tone & style	Sent email bodies: greetings, sign-offs, formality, sentence length	Writing-style summary for the AI (e.g. in system prompt or `email_derived`)
Email templates	Sent replies: cluster similar messages → “when to use” + body	Suggested mail_source (Knowledge Center → Email Templates) when patterns exist
FAQ-like pairs	Inbox + sent threads: recurring question → answer	Suggested qa_source when we detect clear Q&A patterns
Website	Links or signature/footer in sent or received emails (e.g. company URL)	Suggested website_data so the AI can reference the same site the user uses
Labels	Labels applied to messages in the account (e.g. Gmail labels)	“Most appropriate” labels for categorization and organization, based on how the user actually labels mail
Business-related text	Company name, product names, or policies mentioned in sent/inbox content	Append to text_source or `email_derived` when present in the scan

The existing AI workflow uses mail_source, qa_source, documents, text_source, and website_data. Email-derived data from the scan is stored (e.g. in email_derived or merged into these) and used as context for the AI—all of it traceable to the inbox/sent scan, with nothing made up.

What if I don't use templates or get common inquiries?

Many people don't have fixed templates or repeat the same Q&As. The scan is still useful.

We always learn your tone and style from your sent emails. We infer how you write: formal or casual, how you greet and sign off, sentence length, use of bullets or lists. That summary comes only from the scan and is used so the AI replies in your voice.
Templates and FAQ are suggested only when we see them in your inbox/sent. We only add suggested templates or FAQ when we detect clear patterns (e.g. similar sent replies, recurring question→answer in threads). If we don't see patterns, we don't add them—we don't make anything up.
Website and labels come from what appears in your emails (links, signature) and from the labels you apply to messages. If we don't see a clear website or label usage, we don't invent any.

You can rely on tone and style from the scan alone; anything else (templates, FAQ, website, labels) is only suggested when it's actually present in your sent emails and inbox.

How it fits with the Knowledge Center

Today: Knowledge Center has five tabs (Email Templates, FAQ, Files, Website, Raw Text). The app requires “uploaded sources” before the Playground and connect flow are fully enabled.
With “Learn from your email”:
- Option A — Optional: Scanning sent emails and inbox is an optional step. If the user has no uploads but has connected email and run “Learn from my email,” the system can treat email_derived (or merged sources) as valid context and allow testing (e.g. hasUploadedSources becomes true when email-derived data exists).
- Option B — Supplement: Email-derived data only supplements manual uploads. Users still add templates/FAQ/files for accuracy, but the AI also uses tone and style (and website, labels, etc.) from the scan of their sent emails and inbox.

Recommendation: support both — allow “connect + scan only” to unlock the Playground with minimal uploads, while still encouraging Knowledge Center uploads for best results.

Auto-scanning and background build

When you use Learn from my email for a connected inbox (or when an inbox is first scanned), InboxPilot builds that inbox’s knowledge in the background. You don’t have to stay on the page.

When it runs: The build starts when you trigger “Learn from my email” for a selected inbox, or when the system first scans that inbox (e.g. after connecting or selecting it in Knowledge Center).
How long it takes: Building an inbox’s knowledge can take from a few seconds up to about 5 minutes, depending on volume. You’ll see a “Scanning this inbox” message in Knowledge Center until the build is done.
Leaving the page: You can navigate away. When you come back to Knowledge Center and select that inbox again, the FAQ, Email Templates, Website, and other tabs will show the derived data once the build has finished. The page checks in the background and refreshes when the data is ready.
Per-inbox: Each connected inbox has its own derived knowledge. Selecting a different inbox shows that inbox’s data (or a scanning state if that inbox is still being built).

So: auto-scanning means the heavy work runs in the background; you can keep working elsewhere and return to see the results when they’re ready.

Implementation approach

Everything below is derived only from scanning the user's sent emails and inbox. No data is made up.

1. Scan sent emails and inbox

Sent: Gmail API users.messages.list with the SENT label (and equivalent for Outlook/Zendesk). Fetch recent N sent messages.
Inbox: List messages in INBOX (or equivalent) so we have both sides of threads and the labels the user applies.
Labels: From message metadata (e.g. labelIds in Gmail), collect the labels the user actually uses so we can treat them as most appropriate for categorization.

Reuse: getAuthenticatedGmailClient(email) and body extraction similar to extractEmailContent in src/server/actions/rules/execute-rule.ts.

2. Extract and normalize content

For each message (sent and inbox): From, To, Subject, Date, body, and labels. Prefer plain-text body; strip HTML if needed.
Optionally filter: skip very short messages, forwards, or obvious auto-replies so the style sample is representative.

3. Derive tone and style

Option A — LLM: Send a sample of 10–20 sent email bodies to an LLM with a prompt like: “Summarize this person’s email tone and style: formal/casual, typical greetings and sign-offs, sentence length, use of bullets/lists.” Store the summary (e.g. in org settings or a new sources.email_derived / writing_style field).
Option B — Heuristics: Detect common greetings (“Hi”, “Hello”, “Dear”), closings (“Best,” “Thanks,” “Regards”), and average sentence length; store as structured “style” hints.

That summary is injected into the AI system prompt when generating replies. No invented style—only what appears in the scan.

4. Derive templates and FAQ (only when present in scan)

Templates: Group sent replies by similarity; only suggest a template when a cluster is clear. Store in mail_source with a “from inbox/sent” flag. Do not create templates we didn't see.
FAQ: From inbox + sent threads, detect “incoming question” → “outgoing answer” pairs. Suggest qa_source only when such pairs are clearly present. Do not invent FAQ.

5. Derive website and labels (from scan only)

Website: Extract links or domain from signature/footer/body in sent or received emails (e.g. company URL). Suggest website_data only when we find such links; do not invent a website.
Labels: Use the labels present on scanned messages (e.g. Gmail labelIds) as the set of “most appropriate” labels for the user. Expose for categorization/organization so the AI can align with how the user actually labels mail.

6. Persist and count as “sources”

Schema: e.g. email_derived jsonb on sources (writing_style, suggested_templates, suggested_faq, website, labels). Merge suggested items into mail_source / qa_source / website_data with metadata where applicable.
hasUploadedSources: Include email-derived data in the condition so “connect + scan” alone can enable Playground and connect flow.

7. UI flow

In Knowledge Center, “Learn from my email” triggers a scan of sent emails and inbox for the selected connected account.
The backend lists sent + inbox messages, extracts content and labels, derives only what is present (tone/style, templates/FAQ when patterns exist, website when links appear, labels from metadata), then saves to sources. Nothing is invented.
Show a short summary of what was actually found (e.g. “We’ve learned your tone; we’ve added N suggested templates / M suggested FAQs / website / labels where we found them in your inbox and sent mail. You can edit them in Knowledge Center.”).

Summary

Source of truth: Everything we learn comes from scanning the user’s sent emails and inbox only. We do not make things up.
What we derive: Reply tone and style, suggested templates and FAQ when we see patterns, website when we see links/signature, and the labels the user actually uses (most appropriate labels). All of this populates the Knowledge Center (e.g. email_derived, suggested mail_source / qa_source / website_data).
Result: Users connect email and run “Learn from my email”; the AI then uses only what was inferred from their own inbox and sent mail, with the option to add or edit data manually in the Knowledge Center.

Back to Step 1: Upload Your Data →
Connect Your Email →