According to the API the most recent record was created on 25th Sept and same was present in their warehouse hence confirming that the cursor is not falling behind and something went wrong in between:
image 1image 2
We also found out that a lot of the DELIVERED events were created on 24th Sept in bulk, and the historical sync is able to fetch all of them as seen here:
image 3image 4
Also the events were generated with createdAt timestamp very close to syncing time and I believe that Hubspot API had some discrepancy while we made the API request. Looks like the huge volume of suddenly created events wasn’t reflected in the API call.
That's a great question and I believe for the Get email events | Email Events API, it is sorted in descending order where the recent dates will be put first. As such, if you're using offset value that are already generated before newer records are being created, I wouldn't expect it to affect much since the offset value will be pulling older dates records.
Additionally, you may also want to use the startTimestamp or endTimestamp parameter to only pull return events based on the given timestamp. For instance, now you the latest record you pulled is 5th Nov 1108 PM > the next time you call the endpoint to retrieve newer records, you can use the startTimestamp parameter to return events which occured at or after 5th Nov 1108.
Hope this helps to clarify and let me know if there's any further query on this 🙂
But the main problem was what will happen when the events are being generated and at the same time API request is made to fetch them as shown below. Could the API miss out on some records in which case we have to implement an in house solution to tackle such problems (probably some sort of rollback sync strategy)
As seen in box 1, the event was created on 24/9 17:16:42 and we synced it using API at 24/9 17:53:19 (box 2), I don't have the screenshots but there were more results where created and _fivetran_synced overlapped. NOTE we use the same request with offset, but how the API behaves when events are still generating and we are using API with offset at that time?
In this case, could you run these endpoint over a API development tool - Postman to ensure that we are testing on a similar environment and see if you're still getting a total of 2692 when using the Get email events | Email Events API endpoint?
Just to ensure that we are on the same page, do you mean that the number on image 3 and 4 doesn't match?
If so, can I confirm the exact endpoint that you're using on image 4? Additionally, how is your system generating the data on image 3? I'm assuming that it would be the same endpoint as of image 4, is that right?
After that to get the data as seen in 2nd image (email_event table) we use this endpoint: https://legacydocs.hubspot.com/docs/methods/email/get_events where we pass "appId" and "campaignId" params. Here the events will be returned in reverse-chronological order, and we are limiting them based on our cursor.
In the 2nd image we have filtered email_event table to show only DELIVERED event as that had some discrepancy.