topic Duplicate Companies in APIs & Integrations

Duplicate Companies

taran42 — Tue, 20 Oct 2020 15:55:23 GMT

When I make a call via Python to get all my companies, I am only getting 250 contacts. If I set the max to 500, I get 500, but HubSpot duplicates the companies. So halving that would still only be 250. If I set the max for 1000, I get 1000, but HubSpot quadruples the values (meaning I'm still only getting 250 unique values). I should have 340 companies. Any ideas why HubSpot would do this? It seems to not be paginating properly or something.

import requests import json import urllib # Maximum number of results to cycle through max_results = 1000 hapikey = 'APIkey' # Specify the number of companies to return in the API call (default is 100, max is 250) limit = 250 company_list = [] # Set the properties to retrieve in the URL (&property=prop_name) get_all_companies_url = "https://api.hubapi.com/companies/v2/companies/paged?&properties=hs_object_id&" parameter_dict = {'hapikey': hapikey, 'limit': limit} headers = {} # Paginate your request using offset has_more = True # has_more lets the program know if there are more companies to cycle through while has_more: parameters = urllib.parse.urlencode(parameter_dict) get_url = get_all_companies_url + parameters r = requests.get(url= get_url, headers = headers) response_dict = json.loads(r.text) has_more = response_dict.get('has-more') or False company_list.extend(response_dict.get('companies') or []) parameter_dict['Offset']= response_dict.get('offset') # Exit pagination, based on what ever value you've set your max results variable to if len(company_list) >= max_results: print('maximum number of results exceeded') break print('loop finished') list_length = len(company_list) print("You've succesfully parsed through {} company records and added them to a list".format(list_length))

I updated the post with my current code. Maybe it'll help.

@dennisedson I tagged you since you seem to eventually find and reply to all of my posts anyways, regardless of your Python expertise. 😄

Re: Duplicate Companies

dennisedson — Tue, 20 Oct 2020 20:15:38 GMT

@taran42 , my old friend!

Is there a way to print out the second iteration of the loop to see what the offset is?

Have you tried to use something like postman to test that request?

I see what you are doing. You are forcing me to learn Python. Sneaky.

Calling my Python crew @akaiser , @khookguy , @wfong! I am beginning to think that we need to set up a python room 🙂

Do you guys have any idea what is going wrong here?

Thanks all!!

Re: Duplicate Companies

taran42 — Tue, 20 Oct 2020 21:06:55 GMT

@dennisedson if you learned Python, would that be so bad? 😄 I'm actually fairly new to Python myself. I've only been using it for the last couple of months. The problem is I was just thrown in the deep in with it, not having a chance to learn it. I come from a web development background, so I have some coding experience, but not with stuff like this.

I have no problem with the Contact code, and it's set up almost exactly like Companies and Deals. How many contacts you parse through, whether they're paginatined or not, is shown at the top of the prompt when you run the code. So for Contacts, it lists all 2k+ I have. With Companies and Deals, it's listing whatever I set my Max Results to.

So just twenty minutes or so ago, a friend helped me to figure out how to get the Companies to return properly. He noted that the offset value isn't defined. I added offset=250 under the limit and I was able to get all 340 companies. However that same fix did not work for Deals. So I'm halfway there.

I have no idea why Contacts, Companies and Deals work so differentlyl from one another when they're basically the same thing. It's like three different designers coded the API.

I use Visual Studio Code to look for errors, which works much like Postman does. I usually code in Notepad++ though.

Re: Duplicate Companies

taran42 — Wed, 21 Oct 2020 01:49:43 GMT

The same friend helped me with Deals. It turns out that the offset needed to be set on it as well (which I did) and that has more should be called as hasMore instead of has-more. The HubSpot documentation seems to be really broken, at least when it comes to the Python API calls. Everything I was using I had pulled from the documenation, but I kept running into numerous errors. Little things like has more is used as hasMore, has_more and has-more throughout Contacts, Companies and Deals. One would think they'd all be the same. Also limit and count are used interchangeably. There are several other errors I struggled with while using the documentation that I got help with here and there. And being new to Python, I didn't know how to spot and fix those errors.

I know I am using V2 and HubSpot uses V3 now, but I tried V3 and still had pagination problems. So I just stuck with what I was already familiar with.

Re: Duplicate Companies

dennisedson — Wed, 21 Oct 2020 14:12:52 GMT

@taran42 , Glad you got it figured out. The key benefit to the V3 API is to bring consistency to the APIs as you are not alone in thinking that they were built in silos.

Hopefully you can figure out the issue with the pagination on the newer APIs.

Regardless, glad this immediate issue is resolved!