If I am in front of a computer, I probably have Gmail open in a browser tab. It is my daily driver, both for personal and professional emails.

Sisyphus pushing an email up a hill

Sisyphus pushing an email up a hill. Behind him a thunderbird hovers to deliver more emails if he makes too much progress. [1]

And it’s labeling system is a PIT🍑 that makes it hard to find old threads with new messages. I dug into why and learned how to fix my issues.

The Problem

I get a lot of emails at work. I’m at over 100k unread and that number is only going to go up. I don’t try hard to unsubscribe from lists or automation, because sometimes an interesting item flits by and I want to read through it. But within all those emails, I need to find the ones that I do need to reply to. I might not be able to get to it right away either, so I need a way to remember to respond. And then I need to keep track of those emails, follow up on conversations, and look for replies. It’s possible that those replies can arrive weeks, or even months, later.

Sample picture of an email

Sample of an email with the “Respond To” label highlighted in yellow.

I started with labeling all the emails as they came in. Labels are simple, I attach one to an email and then I can look it up later. If it’s unread I know to look at it, and potentially reply to it. Whenever someone replies, the thread shows an unread message again. It’s an easy system.

At more than 50 emails though, you need to go to the next page to look for unread messages. As you tag more messages, there’s more pages to go back through. Messages from months ago start to get lost. It becomes tedious to click back, and back, and back. But this is Gmail, the Google product, so a simple search should pull those labeled threads with unread messages to the forefront.

Sketch of a search query not returning expected results

A drawing showing two searches, one with a label and the other with the label and is:unread. The latter is missing an unread email from Maddison P. I used an Excalidraw sketch so I don’t have to fiddle with obscuring my work emails.

The problem is, this doesn’t work. label:respond-to pulls up threads with that label. is:unread shows all the threads with unread messages in your inbox. Combining them does not show every thread with that label and with an unread message. It took a few years for me to get fed up enough to figure out why.

Anatomy of an Email

Before discussing the issue, I need to start with some terminology. Emails consist of “messages”, which are chained together to form “threads”. Labels are then applied on top of those.

----
From: John Doe [email protected]
Sender: Michael Jones [email protected]
To: Mary Smith [email protected]
Subject: Saying Hello
Date: Fri, 21 Nov 1997 09:55:06 -0600
Message-ID: [email protected]

This is a message just to say hello.
So, “Hello”.
----
Example message from RFC 2822, Section A.1.1.

An instance of the Internet Message Format is a single email that is sent to another party. Emails have been around for a long time and starts with RFC 733 in ye olde days of 1977. Since then there’s been a few more RFCs to update the standard, such as 822, 2822 (this one gets referenced a lot), and 5322. Each message roughly consists of a set of headers, and the body of the message itself. Headers are where data such as “from”, “subject”, “to”, and “cc” all reside. One of the headers is a mandatory field called the “message-id” which is a unique (to that host) id for that version of that message.

----
From: John Doe [email protected]
To: Mary Smith [email protected]
Subject: Saying Hello
Date: Fri, 21 Nov 1997 09:55:06 -0600
Message-ID: <[email protected]>

This is a message just to say hello.
So, “Hello”.
----
----
From: Mary Smith [email protected]
To: John Doe [email protected]
Reply-To: “Mary Smith: Personal Account” [email protected]
Subject: Re: Saying Hello
Date: Fri, 21 Nov 1997 10:01:10 -0600
Message-ID: <[email protected]>
In-Reply-To: <[email protected]>
References: <[email protected]>

This is a reply to your hello.
----
----
To: “Mary Smith: Personal Account” [email protected]
From: John Doe [email protected]
Subject: Re: Saying Hello
Date: Fri, 21 Nov 1997 11:00:00 -0600
Message-ID: <[email protected]>
In-Reply-To: <[email protected]>
References: <[email protected]> <[email protected]>

This is a reply to your reply.
----
An example thread from RFC 2822, Section A.2. Note how each message has a unique message-id, and how subsequent replies use “in-reply-to” and “references” fields.

Threads are a set of messages in an ongoing conversation, made possible via optional headers like “in-reply-to” and “references”. RFC 2822 is the first one in the defining RFCs that mentions this concept, and was standardized in 2001-04. It’s not that threading did not exist before then (I found a blog[2] referencing threading in the 90’s and the RFC 733 has the required headers) it’s just that it was hard to get right. Gmail did it best when it launched in 2004 and most others played catch up since then. Conceptually, threading is fairly simple:

  • Every message has it’s unique message-id.
  • Optional fields like “in-reply-to”, and “references” let the sender specify what message is being replied to in the thread.
  • Other contextual clues like subject lines, dates, recipients, and the email body can be used to figure out which thread a message belongs in. Getting this to work well requires everyone to properly use optional fields (gg 🥳) and apply some heuristics to try and keep things working properly. If you’ve ever seen an email that says something like “Re: Re: Re: Re: Check this out LOL” that’s caused by broken threading[3], possibly on an old email client. One important takeaway is that threads are not a part of the email standard, only messages are. Threads are a UX feature email clients offer on top of messages so it is easy to follow a series of messages.

Picture of a gmail account showing a label being applied

Applying the “Updates” label to an email. While you’re here you should check out wizard zines, run by the very informative b0rk.

Since UX features were mentioned, this is where Labels can be introduced. Labels, as shown in Gmail, are an optional user visible tag that can be attached to a message or thread. Labels can be searched for later to find places where that thread is present. Users are able to define labels for their own uses and Gmail uses labels for it’s own system purposes like marking unread emails. In the Gmail view presented to users, labels are designed to appear as through they apply to a thread. But they don’t, labels apply to individual messages. When applying a label to a thread that label is applied to every message currently in the thread. That is where the problem with search arises.

Why Doesn’t Search Work

The problem comes from the difference between expectation and what is actually being done. Custom labels and a message being unread are both labels. Labels are attached to messages. When you search for label:respond-to is:unread what you’re really looking for is the set of messages with both those labels. Gmail then displays the thread the messages are found in.

Sketch showing email messages and their labels

Threads from RFC 2822, Section A.2 and Gmail style labels imposed on each message. Yellow background is used for system labels, blue for user defined. Note how the read ones have a “Respond To” label, and were presumably labeled before the latest “Unread” message arrived.

Because the labels are not actually applied to the thread, this means that any new message will come in without the custom label. Because the application of labels is only shown in the threaded message view, ie. not on each message like in my picture, and because I can’t search for “Show me any thread that has messages where some messages have the respond-to label and others are unread” I can’t actually do the search I want.

The Solution

The solution I used was to write a basic script to solve this. It’s available at https://github.com/er4hn/gmail-labeler/ and this post is based on commit 99685c087f546ba03238c4ac5c27d482108e7eef.

The script is fairly simple. It works by:

  • Resolving the human readable label names into the internal label IDs via get_label_id.
  • Using check_threads to get a list of threads which have that label id on any message in the thread
    • This makes use of pagination, since each query only returns a subset of results.
    • This uses the passed in condition_func which will change the labels if it returns true.
      • The two condition functions are condition_reply_to_archive and condition_archive_to_reply, whose names should explain what they do.

To configure the script a config file must be provided. The schema is saved in CONFIG_SCHEMA_V1 and my settings are:

(I’ve changed my actual label names since they are silly and personal)

{  
       "Version": "1.0.0",  
       "idle_time_to_archive_days": 7,  
       "Labels": {  
               "RespondTo": "Respond To",  
               "Archive": "Responded Archive"  
       },  
       "Secrets": {  
               "project_token_path": "../gmail-labeler-secrets/gmail_labeler_client_secret.json",  
               "user_token_path": "../gmail-labeler-secrets/gmail_labeler_client_token.json"  
       }  
}

config.json example for how I use my script.

To use the script you will also need a Google project setup which has the Gmail API enabled. The project token is linked to the project and an oauth2 sequence will be required to allow access to your account.

Using AI to Write the Script

I tried to play with some AI models to write the script since I didn’t want to read a lot of docs to begin. My takeaway was that it had mixed results. I was able to get ChatGPT’s GPT-4 model to be able to provide me a nix flake and the initial script. It got the API calls mostly correct, though I ended up writing what was a lot of spaghetti code. What annoyed me about GPT-4 is it provided an implementation that would search message by message and took forever to run. I asked if there was an API to search for labels by thread and it told me no. I optimized it by seeing if the message was part of a thread I’d already seen and moved on. The moment that I started to dig through the API docs I realized GPT-4 was wrong, there is an API to search by thread, and rewrote the script to use that API.

I also played with Claude 3.5 Sonnet to ask a few softball questions around other python libraries I didn’t feel like reading the docs for. One example was how to parse out the command line arguments, without specifying the use of argparse, to see what it would give me. It worked fine for those softball questions.

References

[1]: Created in Bing Image Creator on 2024-10-19. Prompt was: “Sisyphus pushing an email up a hill, pencil sketch style on parchment paper background”
[2]: https://feld.com/archives/2010/06/the-magic-of-email-conversations/
[3]: RFC 2822, Section 3.6.5: “When used in a reply, the field body MAY start with the string “Re: " (from the Latin “res”, in the matter of) followed by the contents of the “Subject:” field body of the original message. If this is done, only one instance of the literal string “Re: " ought to be used since use of other strings or more than one instance can lead to undesirable consequences.

Addendum

I write these posts in Obsidian, which I then use Hugo to turn into webpages. Fun issues I ran into this time worth noting:

  • Blockquotes in Hugo do not respect newlines. You have to add 2 spaces after each line for Hugo to render a newline. Special thanks to this blogpost for explaining that: https://andreas.scherbaum.la/post/2024-03-01_blockquotes-in-hugo/
  • Hugo doesn’t support highlighting either. You can use shortcodes to render these as an alternative.