project

Tracking Return Deadlines With NLP Instead of Sticky Note

STACK: React, Node.js, Express, Gmail API, OpenAI

Return deadlines are easy to miss. Order confirmations arrive long before a return is needed, disappear into inbox noise, and resurface only after the window has closed. The relevant information exists, but it is scattered across unstructured emails and retailer policies. With 16.9% of online orders being returned, missed returns windows are an unnecessary waste of money.

This project explores whether modern NLP techniques can make that information easier to act on by structuring a Gmail inbox around active return windows rather than message threads.

Problem Framing

Order confirmation emails are inconsistent in format and content. Dates are ambiguous, retailer names vary, and many emails that look transactional are not actually returnable purchases. Rule-based approaches struggle here because the signal is often implicit rather than explicit.

The system takes a deliberately constrained approach: use language models only where semantic judgment is required, and rely on deterministic logic everywhere else.

System Design

The application is built as a small web system with clear boundaries.

Architecture overview:

React frontend for Gmail connection and UI
Node.js and Express backend for validation, date logic, and caching
Gmail API with OAuth 2.0 read-only access
OpenAI models for classification, policy inference, and responses
Local JSON storage for cached return policies

Processing Pipeline

Emails are analyzed using the subject line and a truncated body. A language model classifies whether the email represents a physical product order and extracts the retailer name. Food delivery, subscriptions, promotions, and digital services are explicitly excluded.

The purchase date is taken from the email header rather than parsed from text to avoid ambiguity introduced by order updates and delivery estimates.

For confirmed orders, the system checks a cached return policy store. Unknown retailers trigger a one-time policy inference step, after which the result is saved locally to reduce latency and unnecessary API calls. Remaining return time is calculated using inclusive date arithmetic, and expired orders are filtered out.

A conversational layer then formats the validated results into a readable response, with strict constraints to prevent hallucinated purchases or expired items.

Evaluation and Results

Testing on a real Gmail inbox showed reliable exclusion of non-returnable emails and consistent retailer extraction across formatting variations. Policy caching significantly reduced latency after initial lookups. For a varied inbox of 14 emails, the system performed with 100% accuracy and precision. Further testing is of course needed on larger, varied inboxes with larger email windows to validate performance; this was not possible due to time and resource constraints. When tested on the most recent 20 emails from my personal inbox, the performance remained at 100%.

While the sample size was limited, the system behaved predictably under realistic conditions, suggesting that tightly scoped language model usage can be effective for inbox-scale automation.

Takeaways

Language models are used here as semantic filters. When paired with strict validation, caching, and deterministic logic, they can support practical automation without introducing unnecessary risk.

Email remains an under-explored but demanding environment for applied NLP, precisely because it forces systems to handle ambiguity rather than idealized inputs. As systems and methods develop, email remains at the center of personal assistant tools. In the future, I would like to develop this project further into an agentic system that not only alerts the user in advance of upcoming returns deadlines, but can also process returns labels and schedule returns into personal calendars.

Github script