Investigating Substack/Mailgun tracking links
In a previous post, I looked at the tracking links of three major email newsletter providers and deconstructed two of them. The third, Substack/Mailgun, stubbornly resisted deciphering. Until now.
After the previous post, a reddit user helped work it out.
A tracking link like this:
http://email.substack1.exponentialview.co/c/eJxdkEuOhCAQhk_TLI1VIOqCxWz6GoZ HaZOx0WDZPc7pB9vdJARSCfU_Pm-ZpiUfZl02Fuc18LGSSfTeZmKmLPaN8hCDwU43UmsUwbTglE IRt2HMRE8bZ8N5J7Hubo7eclzSZwMbFA8DUtV9b0MgqJUf0bmuRtQBGvQtBXfZ2j1ESp4MvSgfS yIxmwfzepNfN7yXw-945qn88iyT_S3GIhqsEYpaUW7apq-gsqOuNYFVspG-gXEEX7sAnXQokWR_ U_W2u42t_4aKftbilDja-RXpXbRFNo7SxDZB-Tmd3U7Hs9pQ3ueeIh8DJetmCldrvuB9OAwTJco FahgsG9Agu06qrut1e9UsWCS2CIWlKDnCUraS-ZfjD2RriM4
looks like a bunch of gibberish but turns out to contain data as a base64url encoding of a zlib stream. In other words, data is embedded into the tracking link by (zlib) compressing it and then (base64url) encoding it.
Using CyberChef, I decoded the blob into this query-like string (pretty-formatted for readability):
category=post& email_generated_at=1613883488967& is_freemail=true& publication_id=2252& post_audience=everyone& post_id=32721653& pub_community_enabled=true& user_id=28653662& post_type=newsletter& subdomain=exponentialview& l=http://twitter.com/azeem& r=[redacted]& d=71b442& h=134099adde104cf2bb80226d152c7edb& email@example.com& t=post
Let’s break this down. Previously, I said:
I speculate you are sending to Mailgun:
- The destination URL,
- Whatever is in
- And some other unknown stuff which is taking up the rest of the 90 characters in the blob.
It turns out I was approximately correct.
The destination URL (ie.
http://twitter.com/azeem) is contained in the field
l. All of the variables in the
X-Mailgun-Variables email header are embedded.
The remaining fields:
r=[redacted]& d=71b442& h=134099adde104cf2bb80226d152c7edb& firstname.lastname@example.org& t=post
contain an email address in
r (which I’ve redacted), and some seemingly randomly generated identifiers (Probably some internal Mailgun stuff). These remaining fields account for the ‘rest of the 90 characters in the blob’.
Investigation complete. Now we know how to find out exactly what is contained in a Substack (or any other Mailgun retailer) tracking link.
Some random thoughts.
Why are Substack tracking links so bulky?
- Mailgun embeds a bunch of data in there.
- Substack embeds even more data in there (more than double). Do they really need to embed so much?
Someone should write software that intercepts these tracking links and redirects to the destination link instead. It would be a step towards better privacy.
This post tells how to do it for Mailgun-based email providers and the previous post does the same for ConvertKit.
(One day, I’ll put this feature into email-untracker but it’s hardly likely to get widespread usage there.)
Other newsletter or ‘marketing email’ providers
Mailchimp, ConvertKit and Substack aren’t the only service providers which use tracking links. They’re just the ones I’ve encountered recently.
Some other providers have been suggested to me for investigation as well. I’d like to deconstruct them but I’m a bit busy and not sure when or if I’ll have the time. We’ll see.
Base64 with 0% overhead
Previously, I said:
They managed to do base64 with no overhead??
Yes, they did.
The overhead of base64 is much maligned. In this case, you can get rid of the overhead by compressing beforehand.
Or just don’t worry about it because web browsers and servers generally use compression anyway. Base64 introduces overhead due to the redundancy in the encoding. But this very same redundancy is what allows compression to perform well. In the end, it ends up being about the same.