More efficient way to find and merge duplicate contacts

The Sales CRM functions more as intended for lead capture from a web form where one email would be provided to create the contact record.  

You're thinking in terms of the CRM as a Sales and Marketing tool, which it is, but it's also just a CRM: Connected to Gmail it's getting new records daily from people that have multiple email addresses.  Moreover, just doing an import of addresses from other databases can result in multiple Contacts being created for the various email addresses someone might have.

Consider myself as an example, I have 12 different email addresses.  When I take a new client or job, I'll have another.  Those are 13 different records. They are all just me and each of those email addresses should be more easily uncovered as associated with the same person and merged.


What needs to improve
Today we have to go into a Contact records and use the Merge function to find the other emails and merge them.  That's a manual process. A pain for large Contact databases.

The Contact list should have a "Find and Dedup" option.  The CRM should find likely duplicates not based on the email identifier but other personal identifiers: same name, similar name in same location, etc.  Flag those in the list and make it easy to check the duplicate Contacts and "Merge" them.

This really needs to be done as part of the platform as growing companies, teams with many people using Gmail as part of their outreach, etc. will constantly result in new Contact records that can go unnoticed as duplicates of existing records.

@gelflex-cc it would be good for finding duplicate Companies.

@BB1 in theory yes. Some of my companies share a phone number as well.  As long as there is an option to choose I am good with it but just run no.

In general...I prefer that there be a question that says something like:  'Do you want to ...? yes or no'... You can also have a secondary question like 'Do for all...yes or no' for those that want to allow all of the records to be merged, connected or whatever enmass.


I really don't like the automatic merging and overwriting.  What works for Company A may not work for Company B.  I don't like that I find 2 contacts merged into one contact without knowing about it and then have to figure out how to split them apart.



@KeyWestScott  apparently looking at his comment, they have a strategic partnership with dedupely to provide this feature, but of course, that requires another subscription to another service to supplement the shortcomings of the "all-in-one platform" that most people are already paying a pretty penny for. 


@gelflex-cc that's each user should have control of what is searched, a simple on/off switch for each desired field would make it 10x more useful, but as it stands, it's already providing little value by suggesting duplicates with little or no relevance. It's counterproductive and untrustworthy, and yes I agree, it doesn't consider the many different operations and business verticals that use it.


Maybe I'll end up using a 3rd party integration but I would be much happier if they just made a few of these very useful changes to make their users happy, and their product "better" as it's part of their whole ideology right?

Hi folks -- thank you for all of the continued feedback on the duplicate management tool! We're thrilled that this has been as useful as we'd hoped to help businesses keep their data as clean to take advantage of a unified customer database for marketing, sales, and customer service teams.


We're proud to build an open platform to support our integration partners like Dedupely and Insycle. While our partner products can deliver differentiated value, the native duplicate management tool is developed entirely by HubSpot.


How we identify potential duplicates: Under the hood, this tool uses machine learning (ML) to identify contacts and companies that you are likely to merge. At its core, ML helps automate tasks by analyzing examples of a task to evalutate new tasks to complete. In the case of duplicate management, the task is merging contacts. By analysing past merges, ML algorithms identify other pairs of contacts that have a high likelihood of being merged based on your behavior, which are shown to you in the duplicate management tool.

Today, the ML models for contact and company merge suggestions are based on contact and company properties including name, email, phone number, company name, company industry (determined by HubSpot Insights), and company industry (determined by HubSpot Insights). We plan to expand the properties that the ML models consider to other properties, but to be transparent, we're more likely to add default contact properties (e.g. contact activity data) before custom portal properties in the near term.


Because this is a machine learning product, it doesn't rely on strict rules like exact matches phone number or name; by not relying on rules, the duplicate suggestions can more closely mimic our own human understanding of challenges like several contacts with the same business phone number or several contacts with the same generic name, like Kevin Smiley Happy.  As you merge and dismiss pairs, you provide feedback to the tool to help improve the accuracy of our merge recommendations based on patterns in the contact and company properties.


This tool is still under active development, and you should expect improvements around both the accuracy of the suggested merges and the merge experience throughout the rest of the year. 

Another bad reply from the hubspot product team 😒

@_Kevin  I think you need to recap the comments on this thread and reposition your response from the perspective of your users, not your partners. I don't want the advanced features of either of those company's offerings. What I want is for hubspot to realize they have a priority to serve their customers before their profitable partners by making the tool work better. I don't see how machine learning isn't picking up on records that have the exact same phone number when you say that's what it's supposed to do, and how machine learning thinks that "Roger G." and "Roger G." aren't duplicates when the contact only has a name and phone number and these are the exact same, but thinks that it's doing me a favor when it suggests "Stephanie Smith" and "Steven McDonald" as duplicates when they have completely different emails, phone numbers, creation dates, associated records etc. NOT TO MENTION COMPLETELY DIFFERENT NAMES AND GENDERS.  How about you specifically address this, because after thoroughly reading others issues and comments, I don't seem to be alone on this. 

Hi Daniel -- I've reached out to schedule a call for us. Thanks again

Booked 💪

Thanks for following up. Hopefully, this call will provide clarification for your communities concerns and pain points, as well as provide me some better insights into how this tool functions and how I can leverage some better use of it at the very least.

Happy to say after speaking with the HubSpot product dev team on this, I think they are going to implement some much-needed changes to improve this tool. They set better expectations and realized that these expectations were not as clear as they could have been, which is one of the stated improvements they plan to make along with actual tool improvements. I look forward to seeing updates on this forum or in the product updates, thanks again! 


First, I want to say whatever the most recent update was did wonders for us - we saw our potential duplicate numbers go from the hundreds to 16. Which is awesome! The new duplicates are MUCH more accurate. We had a lot of junks before (which, was probably just the machine getting up to speed and learning).


One suggestion I had thou is how often HubSpot runs the duplicate match on the backend. Right now, we have not had duplicates run since Aug 17th, over 3 weeks ago. I also still use Incycle to supplement duplicates for now, and we have gotten quite a few in between Aug 17th and today. We really need to have either some way to push duplicates to run, or have them run at least every 7-10 days for us to effectively catch dups. 


Again great work on this tool, really helping us out!

Contacts, Actions, Manage Duplicates.



The dedup just released is painful to use. It has found over 2000 records duplicated and the tool only allow merging one at a time. It is obvious the records are duplicated so i wish they allow batch merging to that it is not so painful. Not useful at all.

When you do it one by one, you can select which should be the primary contact/company based on which should not be overwritten by the other. If you do it en-masse, you would not (yet) be able to be selective. The option for now would be to use an integration, like Insycle.

I'd love to be able to filter my contacts so I can prioritize duplicates that I need to de-dupe.. For example, if I'm filtering out a list of companies, when I click the "manage duplicates" list, I'd like to see only those companies. 

@_Kevin thank you for implementing the tool, I'm excited to make use of it!

Could you please consider including in the Duplicates section an export option similar to Contact Lists?

We've built a tool for merging HS contacts and having such a list in a file would make our deduplication process much faster.

+ I'd expect it would enable other users to get a better insight into their duplicates: which characteristics duplicates share, which sources duplicates come from etc.

@mallory_gsp What types of filters would you use? Dedupely actually has some basic filtering settings.


@wojtolinho We actually have this (exporting duplicates) sitting on our roadmap in the planning stages for this year. Duplicate source diagnostics is also been on our minds forever, would love if you stopped by and gave us some insight into how we could provide this in our integration.

Any chance with the dedupe tool - which is a big step up! - you can have a "merge all" button as opposed to hundreds of clicks, we could simply let it do it's thing with one click?

have you tried the notification function? Does it work for you guys?

