Back to Blog
5 min read

What does your email address say about you?

What does your email address say about you?

What Your Email Address Reveals About You: AI and Digital Footprints

LLMs are famous for having been trained on massive amounts of data. Estimates for GPT4, for example, give training data sizes of up to 1 petabyte of data. This training data comes from crawling the open internet, as well as collections of books, articles, scientific papers, etc.

What this means is that the LLMs know more than you can imagine. They know about everything in wikipedia, they've read every book ever written (if it's been scanned, at least). They also might know every social media post you've ever written, every product review and every youtube comment.

Obviously, this is a big concern! This blog post explores this topic, and also presents a fun tool for users to see what an LLM knows about their email address.

The Privacy Paradox

One of the biggest concerns in AI development is the potential for models to inadvertently memorize and disclose sensitive information from their training data. Recent research suggests that this risk increases with model size - larger models may be more prone to revealing sensitive information they've been trained on.

As a safe example, the NYT has gathered evidence that GPT4 stores entire news article in it's model, which can be recreated through the right prompting.

Most AI providers have implemented safeguards against direct personal information disclosure, but are they good enough? They have various policy's that allow users to disable the model from using new information that they give them, but what about old information that lives on the open internet?

A simple test with copilot or cursor can reveal what the LLM knows about you. Essentially, you can create a user data structure, and see if the LLM autocompletes your correct email. See screenshot.

Copilot Screenshot

For what it's worth, at the time of writing this article, the LLM did not correctly complete my email address. However, in the past when I tried this with github’s copilot, it did indeed complete my email address. One can easily imagine scenarios where this could be really bad - if it autocompletes social security numbers, credit card numbers, API keys, etc.

The Art of Inference

image.png

Scrubbing PII and adding guardrails against exposing PII is actively being worked on and is often solved for common use cases, like the email address explored above. But does that solve everything?

Here's where it gets interesting: just like human psychics who make educated guesses based on subtle visual and behavioral cues, AI can make surprisingly accurate inferences about individuals based on seemingly minimal information. This isn't about revealing memorized data - it's about pattern recognition and statistical correlation.

Consider these real-world parallels:

  • Psychics observe clothing choices, speech patterns, and body language
  • Astrologers make broad statements that can apply to many people
  • Personality tests like Myers-Briggs use answers to specific questions to make broader characterizations

Your Email Address: A Digital Crystal Ball

Your email address might reveal more about you than you think. Let's break down what AI can potentially infer:

  1. Age Range: Email providers and naming conventions can suggest generational belonging. For example, a zoomer would not have an ‘@aol’ email address!
  2. Professional Background: Domain names and username structure might indicate industry or occupation
  3. Cultural Background: Language patterns in usernames can suggest cultural or linguistic heritage
  4. Interests and Hobbies: Numbers or references in email addresses often reflect personal interests
  5. Location: Domain extensions and service providers can hint at geographical location
  6. Gender: names are often used in emails, which can reveal gender.

“So what?” you might think. Well, this information can be very valuable to internet applications, especially in ad tech. If you are a privacy sensitive person, you might think twice about how your email is named.

Try It Yourself

Curious about what your email address might reveal about you? I've created an interactive tool that uses AI to analyze email addresses and generate insights. The analysis is done by an LLM. It’s meant to be a fun tool to illustrate the topic, and isn’t to be taken too seriously.

Try the Email Reading Tool

The Technical Side

For the technically curious, the tool is simple:

  • It passes the email address to the LLM, along with a prompt, asking it to infer information about the user.
  • The email is not stored, and the IP address is not used.
  • We use instructor to extract a formatted response, as well as a summary.
  • The frontend uses NextJS and Vercel to display the UI.