Best practice (contact) property history incremental load
SOLVE
Hi there!
We are currently working to load Hubspot Data into a Postgres-DB. We are utilizing the v3 API and everything's working fine so far except for incremental data load for contact history data.
Regarding the other objects we are fine with just full load & refresh, but since we have a lot of contacts in our CRM we are struggling with performance issues there. We load the non-historic data incrementally already via the Seach-API.
So, now my question is: What's the most efficient way to get the contact history data that has been generated later than a certain timestamp.
Our thoughts and what we've tried:
- We tried utilizing the Search-API, but apparently there is no way to load properties_with_history doing that?
- An other approach that comes to my mind is using the v1-endpoint /contacts/v1/contact/vid/:vid/profile for a list of updated contacts obtained by the v3-Search-API. This isn't very promising though performance-wise since we'd have to use a lot of API-calls there.
- Also the /contacts/v1/lists/recently_updated/contacts/recent endpoint won't do the trick because within the last 30 days we'd update 90% of our contacts due to frequent synchronization with other systems.
I'd be grateful for any hints on best practices regarding this!
I think you're halfway there. My recommendation would be to continue using the search API to identify contacts meeting the create date criteria in batches of <=100. Take the returned IDs and map them in a manner that fits the object structure detailed in the docs for the batch Contact read endpoint. Fairly sure you can process 100 at a time using this endpoint. This is the only reasonable method I know of reading Contact record property history based on an additional criterion.
Also note: The search endpoint will error out if you attempt to "page" beyond 10,000 search result records in one call, when you'd have the parameter"after: 10,000", so consider planning for that if you're working with a lot of data at one time.
Unless you devise some clever way of using workflows and manipulating a property with "hasUniqueValue: true" in a manner that would allow you to derive its create date, which seems like a lot.
I think you're halfway there. My recommendation would be to continue using the search API to identify contacts meeting the create date criteria in batches of <=100. Take the returned IDs and map them in a manner that fits the object structure detailed in the docs for the batch Contact read endpoint. Fairly sure you can process 100 at a time using this endpoint. This is the only reasonable method I know of reading Contact record property history based on an additional criterion.
Also note: The search endpoint will error out if you attempt to "page" beyond 10,000 search result records in one call, when you'd have the parameter"after: 10,000", so consider planning for that if you're working with a lot of data at one time.
Unless you devise some clever way of using workflows and manipulating a property with "hasUniqueValue: true" in a manner that would allow you to derive its create date, which seems like a lot.
Best practice (contact) property history incremental load
SOLVE
Hey nikodev,
thanks a lot for your answer! The batch read API was just the piece of the puzzle that I was missing. That makes it <= 100 times more effective than my approach 🙂
Regarding the 10 000 search result limitation, that's not new to us and I'm confident that we'll manage to partition our contacts in a convenient way.
Best practice (contact) property history incremental load
SOLVE
Hi, @Harder👋 Thanks for reaching out. I appreciate you listing out what you've tried and the roadblocks you are facing. It definitely helps the community with context and understanding of your issue. Hey, @himanshurauthan@tjoyce@nikodev, have you tackled anything similar?