Large Deal Volume, CPU Overload

Occasional Contributor

Hi,

 

We have developed a Java Application that runs overnight and synchronises Hubspot with our main front end system.

 

We currently have around 500,000 deals (and growing) in Hubspot, and to avoid duplication we must somehow have access to all them at the very start of the Java Application. Currently, we are loading all 500,000 deals in to memory by bringing them down via the 'Get All Deals' API (GSON stream mode in to Array). This worked for a while but the CPU is working too hard and eventually failing once the JVM Heap space has been filled.

 

This process is running perfectly on our Citrix environment, but does not work on the server that we use to run our overnight jobs. It is this fact that heavily suggests it is also our environment that is contributing to the problem.

 

The problem, however, remains the same, I need to re-think the software we run overnight and eventually have a more scalable solution as our business grows. My question to the community is, does anyone have such a large volume of deals in Hubspot? How do they bring them in to memory without burning up the Memory / CPU of the machine their integration tool runs on.

 

(We are currently developing a solution that involves storing the 500,000 deals in to a temporary Database at the start which can then be queried in the main line of code).

 

Thanks in advance for any advice you could give!

 

Thanks,
Sam

Reply
0 Upvotes
4 Replies 4
HubSpot Moderator

Hi @stldr,

 

So I'm not sure if I can really give you a great response myself, but I did notice that this hasn't gotten a response and I'd like to help how I can.

 

I think you're more looking for someone with experience working with so many objects at once. I thought maybe I could tag a few people in, and at the very least bump this thread to the top of the board. @Mike_Eastwood: it seems like you're pretty knowledgeable in this area, based on a few posts of yours that I've seen. Mind taking a look? And @Jsum, I've seen some extremely helpful posts from you on the CMS dev side of things. I'm not sure if you have expertise in this area, but if you'd have any advice to offer here or if you know someone who might, that would be wonderful. I also saw a post from a few weeks ago from @lord_dark_basic, where you were working on a Java app.

 

Otherwise if anyone sees this and has some thoughts or best practices, please chime in! I'll keep brainstorming in the meantime.

 

 - Leland

Leland Scanlan

HubSpot Developer Support
Reply
0 Upvotes
Occasional Contributor

Thank you for the responses @lscanlan @Jsum @Mike_Eastwood

 

Just wanted to give you a heads up that we solved this by developing an SQL stored procedure that loaded the very large JSON file in to our own database tables, which my main line of code now queries.

Esteemed Advisor

Hi @stldr ,

 

@lscanlan , I do not know Java, and I don't have any experience with deals, so I might not be any help here. 

 

I do know from, general api work, that it can take a huge load off the cpu when you cache the response. If it were me, I'd be looking for a way to cache the response, one time, then check for changes to pull instead of pulling the entire response each time. I will share some links based on a quick google search, but I do not know that they will be helpful to this situation:

 

https://stackoverflow.com/questions/43295105/how-to-cache-rest-api-response-in-java

 

https://dzone.com/refcardz/java-caching?chapter=1

 

https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers

 

https://stackoverflow.com/questions/25758458/java-help-on-an-strategy-to-cache-tons-of-gb

 

https://dzone.com/articles/applying-back-pressure-when

 

https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/investigating_high_cpu_for_java_...

 

I had a project where I had to build a proxy to make calls to a SOAP api, one to get a list of location IDs, then another call for each of the ID's to pull hundreds of lines of location data per ID. The result was the same as you described. We ended up created a cache that pulled and stored specific information needed from each location, and the proxy served that client side to our hubspot website.

 

I have also ran into similar situations with GoDaddy's minimum hosting plans when trying to host WordPress sites. The smallest sites would hit the cpu threshold because of the weight of WordPress and all of the plugins needed to secure and run even a simple site. 

 

If creating and comparing a cache isn't possible then you might just have to up your hosting power.  an extreme example would be Digital Ocean's largest plan:

$960 / mo

192GB / 32 CPUs

3.75 TB SSD disk

12 TB Transfer

 

Let me know if that helps.

 

 

Need help? Hire Us Here

- Jonathan Sumner
Highlighted
Regular Advisor | Silver Partner

Hi @stldr thanks for the intro @lscanlan 

 

The way we handled processing issues for large numbers of *Contacts* was to get them one Smart List at a time, unfortunately there's no way to do that with Deals. If you use a time based parameter to filter that'll probably make life more difficult. 

 

Just checking, are you only downloading the Deal Properties you need? Or all Deal Properties?

[if you're not doing it already] You can specify the 'Properties' parameter to only retreive the properties you need to work with. 

properties=x&properties=y
Used in the request URL, may be included multiple times to get multiple properties.

 

https://developers.hubspot.com/docs/methods/deals/get-all-deals

 

This could save on memory and data transfer (just a matter of time before you exceed your API limits).

 

Good luck

Mike