Hubspot missing records from BQ tables

Highlighted
Occasional Contributor

Hi, Hubspot! I have outlined our issue below 

For BMC/Warehouse/hubspot/hubspot_bmc2


According to the API the most recent record was created on 25th Sept and same was present in their warehouse hence confirming that the cursor is not falling behind and something went wrong in between:

image 1image 1image 2image 2

We also found out that a lot of the DELIVERED events were created on 24th Sept in bulk, and the historical sync is able to fetch all of them as seen here:

 

image 3image 3image 4image 4

Also the events were generated with createdAt timestamp very close to syncing time and I believe that Hubspot API had some discrepancy while we made the API request. Looks like the huge volume of suddenly created events wasn’t reflected in the API call.

 

*** Please CC samuel.dawit@fivetran.com in your response*** 

Reply
0 Upvotes
10 Replies 10
Highlighted
HubSpot Moderator

Hey @Fivetran,

 

Just to ensure that we are on the same page, do you mean that the number on image 3 and 4 doesn't match? 

 

If so, can I confirm the exact endpoint that you're using on image 4? Additionally, how is your system generating the data on image 3? I'm assuming that it would be the same endpoint as of image 4, is that right? 

Highlighted
Occasional Contributor

Hi Wendy,

 

For the 1st image (email_campaign table) we are hitting the endpoint as mentioned here: https://legacydocs.hubspot.com/docs/methods/email/get_campaign_data

After that to get the data as seen in 2nd image (email_event table) we use this endpoint: https://legacydocs.hubspot.com/docs/methods/email/get_events where we pass "appId" and "campaignId" params. Here the events will be returned in reverse-chronological order, and we are limiting them based on our cursor.

In the 2nd image we have filtered email_event table to show only DELIVERED event as that had some discrepancy.

Please let me know if this helps out. 

 

Reply
0 Upvotes
Highlighted
Occasional Contributor

Wendy, can you please help? 

Reply
0 Upvotes
Highlighted
HubSpot Moderator

Hey @Fivetran,

 

Apologise for the delayed in response.

 

Just to clarify, what is the portal ID and campaign ID used in this endpoint: Get campaign data for a given campaign | Email Events API?

 

I tried searching for campaign ID 9596xxxx on portal 470xxxx however I'm getting an unknown email campaign ID. Am I looking at the wrong campaign ID? 

Reply
0 Upvotes
Highlighted
Occasional Contributor

campaign ID = 95963900
portal ID = 6682575

I think you got the portal Id wrong. Could you try this? 

Reply
0 Upvotes
Highlighted
Occasional Contributor

@WendyGoh Did you get a chance to look into this? 

Reply
0 Upvotes
Highlighted
HubSpot Moderator

Hey @Fivetran,

 

On my end, when I run this endpoint: Get campaign data for a given campaign | Email Events API

GET email/public/v1/campaigns/95963900

 

I'm seeing the following counters for delivered: 2691.

 

This number matches the number I received when I use the endpoint: Get email events | Email Events API

GET email/public/v1/events?appId=113&campaignId=95963900&limit=1000&eventType=DELIVERED&offset=Ch8KFgj7wbCVwJG78YUBEMD0pOmX_8CcqQEY5qr3iMwu

 

I called the endpoint for a total of three times until the hasMore value is false:

 

The first offset value I got was: Ch8KFgiti6bfhY_wx6YBEMjpwPXqrMWZiwEYj8T3iMwu (in which there's 1000 return response)

The second offset value I got was: Ch8KFgj7wbCVwJG78YUBEMD0pOmX_8CcqQEY5qr3iMwu (in which there's 1000 return response)

There's no third offset value as the hsaMore value is false and there is a total of 691 return response.

 

Adding it up: 1000 + 1000 + 691 = 2691

 

This matches with the value returned on the Get campaign data for a given campaign | Email Events API endpoint.

 

In this case, could you run these endpoint over a API development tool - Postman to ensure that we are testing on a similar environment and see if you're still getting a total of 2692 when using the Get email events | Email Events API endpoint?

Reply
0 Upvotes
Highlighted
Occasional Contributor

We were getting these counts (2691) 

But the main problem was what will happen when the events are being generated and at the same time API request is made to fetch them as shown below. Could the API miss out on some records in which case we have to implement an in house solution to tackle such problems (probably some sort of rollback sync strategy)

 

 Hubspot- Reach-out to Hubspot support for missing email_events Image 1 .png

As seen in box 1, the event was created on 24/9 17:16:42 and we synced it using API at 24/9 17:53:19 (box 2), I don't have the screenshots but there were more results where created and _fivetran_synced overlapped. NOTE we use the same request with offset, but how the API behaves when events are still generating and we are using API with offset at that time?

Reply
0 Upvotes
Highlighted
Occasional Contributor
Reply
0 Upvotes
Highlighted
HubSpot Moderator

Hey @Fivetran,

 

That's a great question and I believe for the Get email events | Email Events API, it is sorted in descending order where the recent dates will be put first. As such, if you're using offset value that are already generated before newer records are being created, I wouldn't expect it to affect much since the offset value will be pulling older dates records. 

 

Additionally, you may also want to use the startTimestamp or endTimestamp parameter to only pull return events based on the given timestamp. For instance, now you the latest record you pulled is 5th Nov 1108 PM  > the next time you call the endpoint to retrieve newer records, you can use the startTimestamp parameter to return events which occured at or after 5th Nov 1108.

 

Hope this helps to clarify and let me know if there's any further query on this 🙂 

Reply
0 Upvotes