Ability to identify duplicate contacts based on custom Contact property value
SOLVE
Hi,
We have a custom Contact property in HubSpot that stores the value of a unique identifier property in our database of record. These values are pushed into HubSpot contacts via one-way sync from our database on a regular basis, but we do not enforce unique values for this property in HubSpot.
I stumbled upon one edge case where we have two HubSpot contacts with the same value of this custom property, so I want to see if this is a greater issue and if we have other contacts across our HubSpot instance where multiple contacts have the same value for this custom property.
I also do not want to export contact data (especially because HubSpot does not allow me to exclude PII like 'name' when exporting a list of Contacts). If I could only export this single custom property then this could be an option for us so that we can use an external method to identify duplicate values.
There are some great third-party solutions that integrate with HubSpot (Koalify, Insycle) which you're probably aware of. Nevertheless, I wanted to mention them. At 3.5 million contacts, some of them get prohibitively expensive however.
Currently, there isn't any out of the box solution for what you're trying to do. HubSpot duplicate detection is not that advanced.
Still, when I read your post I had an idea that leverages a new product update (beta). You could do the following:
Create a contact-based that enrolls all contacts where the identifier is known, then use the 'Create associations' workflow action to associate contacts, based on the matching identifier, apply the association label 'Duplicate'
And voilà, you can now filter for contacts where this calculation property is greater than or equal to 1.
I'm not sure how the beta deals with cases where there are more than two records sharing your identifier. That's something you'd have to test. Nevertheless, this should flag all contacts where there is at least one duplicate.
Have a great weekend!
Karsten Köhler HubSpot Freelancer | RevOps & CRM Consultant | Community Hall of Famer
Thank you for the mention @karstenkoehler. I wanted to share some more context about Insycle, as we have some very advanced bulk deduplication (and automation) features that sounds like they would be helpful, particularly for a database of that size.
Here is a quick breakdown of our Merge Duplicates tool:
Works for all primary record types (contacts, companies, deals, tickets, etc.) and we can add custom objects as well if need be.
Use any field in your database as a potential unique ID matching field. For example, you can use similar or exact matching on a full name, company name, email domain, mailing address, - whatever fields make the most sense based on your use case.
Set rules for determining the master record that duplicates will all merge into, such as the earlier created create, highest deal amount, or most recently updated.
Set rules for retaining data down to the individual field level. For example, you could instruct Insycle to append data to specific fields during the merge so that it is not lost, or set rules for defining which record to keep the data from (keep the notes from the record with a deal in active pipeline stage, etc)
Fully automate your deduplication templates. Run them on a set schedule or inject them into Workflows so that deduping happens automatically after your contact (or any record type) is created.
Preview your deduplication runs to see how they work and ensure accuracy before pushing the update live.
If you have any questions don't hesitate to shoot me a DM, happy to help.
Ability to identify duplicate contacts based on custom Contact property value
SOLVE
Thanks for mentioning Koalify @karstenkoehler. I also love the creative approach using the association labels 💡
@JLacey9 your use case mirrors the exact scenario I encountered with a reverse ETL setup, which inspired the creation of Koalify. Our plugin is fully integrated with HubSpot, providing an almost native experience.
Here’s how it works:
Create a rule in based on your unique identifier property.
There are some great third-party solutions that integrate with HubSpot (Koalify, Insycle) which you're probably aware of. Nevertheless, I wanted to mention them. At 3.5 million contacts, some of them get prohibitively expensive however.
Currently, there isn't any out of the box solution for what you're trying to do. HubSpot duplicate detection is not that advanced.
Still, when I read your post I had an idea that leverages a new product update (beta). You could do the following:
Create a contact-based that enrolls all contacts where the identifier is known, then use the 'Create associations' workflow action to associate contacts, based on the matching identifier, apply the association label 'Duplicate'
And voilà, you can now filter for contacts where this calculation property is greater than or equal to 1.
I'm not sure how the beta deals with cases where there are more than two records sharing your identifier. That's something you'd have to test. Nevertheless, this should flag all contacts where there is at least one duplicate.
Have a great weekend!
Karsten Köhler HubSpot Freelancer | RevOps & CRM Consultant | Community Hall of Famer
Ability to identify duplicate contacts based on custom Contact property value
SOLVE
Hi @karstenkoehler thank you for your quick solution - this is great to know more about association labels and the new beta workflow action!
I didn't mention in my original post, but our custom Contact property that I want to detect duplictaes on is a number property type, and the article mentions we can only use single-line text, multi-line text, or phone number properties when matching records to create associations. When testing this solution, I added a step to create a new single-line text field to copy the value from the number field, and used that in the 'create association' action in the workflow.
Then when I tested the workflow with the 'create association' action, I got a "BLOCK" server response:
I am not sure if this could have something to do with Sandbox.
If I can get past this error then I believe this solution will work for us! Thanks again for your help.