topic Extracting text of tracked "engagement" emails in APIs & Integrations

Extracting text of tracked "engagement" emails

robin_lord — Fri, 22 Sep 2023 14:29:00 GMT

I'm posting this partially to help people get past what I was stuck on up until now and partially to hopefully get some help/a sense check on the next bit.

What I'm trying to do:

Extract all emails tracked as part of the Hubspot email plugin (so not marketing emails/newsletters but communications between us and clients which are logged in the Hubspot platform).

I went down a couple rabbit holes with this;

1. As far as I understand the /emails/ endpoints are for something else - they are for details of marketing emails that have been tracked - the endpoint for extracting the kind of individual tracked emails I describe is /engagements/

2. After a bit of searching I ended up at this documentation - there doesn't seem to be masses about the engagements endpoint and I thought I'd have to extract every tracked email ever, store it in a database, and handle sync myself (not an attractive prospect)

3. Then I found

/engagements/v1/engagements/associated/COMPANY/{company_id}/paged which lets you just get all the engagements associated with a specific company (much more manageable)

4. The next challenge is getting the full email text - automatically that search endpoint gives you bodyPreview but not the FULL body text (I need all of it)

5. We can use /engagements/v1/engagements/{engagement_id} to get the full email content but only in unparsed HTML format

Where I'm at now;

Whenever I need to update I need to make a few API calls

- At least 1 per company to list all recent relevant emails (hopefully after initial bulk update a daily update won't require pagination but maybe I'll need to send more if pagination is required)

- At least 1 per relevant email (to go from bodyPreview to full body)

I can't see any pre-parsed body content so I think I'm going to have to parse the html myself.

What I need to check;

1. Do I need to do all of these calls or am I missing something stupid?

2. Do i need to parse the html myself or am I missing something stupid?

Hope this helps anyone who is trying to solve the above, and hopefully someone can help me with the last couple questions!

Thanks in advance

Re: Extracting text of tracked "engagement" emails

Jaycee_Lewis — Fri, 22 Sep 2023 19:39:29 GMT

Hey, @robin_lord 👋 Firstly, thank you very much for taking the time to document your steps. Your approach seems reasonable.

Your two questions are right on track. I don't know of a simpler way to accomplish the calls, and there isn't a built-in parser like you are after.

Have you looked into BeautifulSoup and using lxml as the underlying parser? Full disclosure, I came to this doing research for us. I don't use Python too often. If you've already ruled this out, we can see if any of our Python using community members have any recent experience.

Best,

Jaycee

Re: Extracting text of tracked "engagement" emails

Jaycee_Lewis — Fri, 22 Sep 2023 19:40:59 GMT

Additionally, once we're done here, I'd like to get this marked as a solution ✅

Re: Extracting text of tracked "engagement" emails

robin_lord — Sat, 23 Sep 2023 19:40:15 GMT

Thanks very much for the response! OK that's exactly what I needed - confirmation 🙂

BeautifulSoup and lxml are actually my next port-of-call! If anyone has tips on how to effectively split up threaded emails using those parsers I would love to hear about it (I think Hubspot is already doing that for the interface). That said - happy that your comment above resolves this question and we can mark is as a solution 🙂