Hey all! A teammate and I have been looking into using the Search API to iterate through records in cases where the Export API would be prohibitive.
We're aware of three limits that impact iteration:
Limit of 4 requests/second
Results returned in a single query are limited to 100
Paginating through query results failes once you ask for records past the 10,000th record
These are all fine, but it's #3 that is concerning for our use-case.
However!—we think we can circumvent this limitation if we order by a property and filter for records that come after some sentinel value.
Example:
We have 20,000 records
10,000 of these are created after December 31st, 2024
Therefore we can filter for records created after December 31st, 2024 and use the Search API to query for the remaining 10,000
This mostly works, but there's an edge case where the first records in the second half of those 20k records share the same Created At date. In this case our filter from the example above will cause us to see fewer than the remaining 10,000.
So Created At doesn't work because those values aren't unique.
Object IDs (`hs_object_id`), however, are.
So, reusing the same example, we can order on Object ID and use the final Object ID as the sentinel value for our filter (`%{ propertyName: "hs_object_id", operator: "GT", value: id_of_the_10000th_record }`).
This brings up a question for us about new records:
If we're in the middle of iterating through these records with the Search API, do we have any guarantees that new records will receive an hs_object_id that is greater than the other records of that kind in the CRM?
I'll reframe the question a bit with an example:
We are iterating through the CRM's 20k records with the Search API ordering by `hs_object_id ASCENDING`
We grab our final batch from the first 10k records
The last record has an `hs_object_id` with a value of 10000
Meanwhile new records are being created in the CRM by Users and/or other processes
A new record gets created with an `hs_object_id` of 9000
We use that last record's ID of 10000 to filter for records with IDs greater than this value, continuing to order by `hs_object_id ASCENDING`
We never iterate over the new record with `hs_object_id` 9000
Is it possible that we can ever be in this situation with how `hs_object_id`'s are allocated by HubSpot?
Hey, @Ocean_Lewis👋 Welcome to our community. Your workaround is quite ingenious. Thank you very much for sharing. I took your main question to the team that owns the CRM endpoints. My gut says you are safe, but I don't have a public document we can lean on for clarity. I'll update here with any details I get for us.