Home
← Back to HomeView on GitHub

Email Summary Microservice

Motivation

The inspiration for this project came from wanting to make sure I did not miss anything important in my mailbox. I aimed to highlight the most relevant and important messages amidst the constant flow of emails. I wanted a system that could surface key emails aligned with my priorities without relying on generic filters or commercial tools that might compromise privacy.

The main goal is to help users avoid missing important updates, whether they are about job applications and interviews, work projects, or other critical communications.

Technical Overview

  • Automated Email Ingestion: The service connects to Gmail using OAuth2, securely fetching recent messages via the Gmail API. It runs on a schedule (every 24 hours in production) to ensure summaries are always up to date.
  • Customizable Categorization Engine: Emails are parsed and categorized using a YAML-driven rule system. Each category is defined by a set of keyword patterns (AND/OR logic), allowing for highly flexible, user-specific filtering. This approach is transparent and easy to audit or modify.
  • Summary Generation: The system compiles a digest email, grouping messages by category and highlighting those that match user-defined criteria. The summary is sent using the Resend API, and is formatted for quick scanning.
  • Privacy-First Design: All processing happens locally or on a private server. No email content is sent to third-party AI services or analytics providers.

Design Choices & Tradeoffs

  • Keyword Rules vs. LLMs: I experimented with using large language models for categorization, which can be more accurate and context-aware. However, LLMs are expensive to run at scale and raise privacy concerns if used via API. The current design uses explicit keyword rules, which are fast, cheap, and fully auditable. If you have the resources to self-host an LLM, you could swap out the rule engine for smarter classification.
  • Config-Driven Logic: All categorization logic lives in a YAML config file. This makes it easy to adapt the service to new use cases (e.g., tracking freelance invoices, monitoring for urgent alerts) without touching the codebase.
  • Stateless, Scheduled Execution: The service is designed to run as a scheduled job (e.g., via Vercel/Serverless cron or Railway), requiring no persistent backend. This minimizes operational complexity and cost.
  • OAuth2 Security: Integrating with Gmail required careful handling of OAuth2 tokens and redirect URIs, especially to support both local development and cloud deployment. All secrets are managed via environment variables.

Implementation Notes

  • Email Parsing: The service extracts subject, sender, and a snippet of the body for each message. It normalizes text for robust keyword matching, handling edge cases like HTML emails and encoding issues.
  • Extensibility: Adding new categories or changing detection logic is as simple as editing the YAML config. The codebase is structured to make it easy to swap out the categorization backend if needed.
  • Deployment: I use Vercel’s scheduled functions for automation, but the code is portable to any environment that supports Python and cron jobs.

Source & License