Placekey Blog

Product updates, industry-leading insights, and more

Joining Overture and NPI Datasets

by Placekey

Joining Overture and NPI datasets with Placekey

Joining Overture and NPI datasets now takes just 6 minutes

If you are even remotely involved with healthcare data in the United States, you’re familiar with the National Provider Identifier (NPI) data – a list of every health care provider that takes Medicare.  It is a great (and free) dataset but it is missing many interesting attributes (like lat-long and website).  The new Overture Maps Places has deep information and is also free. 

So we wanted to see how easy it would be to join the Overture Maps Places data with the NPI data.

We went ahead and appended Placekeys to US Overture Maps Places and National Provider Identifier (NPI) datasets to highlight how easy Placekey is to use and the value of adding attributes from Overture to a dataset like NPI. With Placekey, the join took us under 10 minutes to do. Below, we quickly derived insights about the join. If you are already familiar with Placekey and these datasets and want the data yourself feel free to skip ahead to the Download the Data section after taking a look at the insights.

Placekey is an open entity matching service for places and addresses that helps with deduping, matching, syncing, and merging physical places. If you are interested in doing this join on your own we wrote up a tutorial in a Google Colab notebook.

Insights From Joining NPI with Overture

All files used for this join can be found in this bucket (aws account required) or this drive folder

We focused on joining Overture data from the US to the NPI file. We applied Placekeys to 11.7 million Overture rows and a little over 8.1 million NPI rows. 

We learned a few things quickly from joining these datasets:

  • 9% on NPI entities are in the Overture Maps file
  • but 25% of the addresses in the NPI file have data in Overture Maps 
  • only 2.7%  of NPI Entity One (individual providers i.e. Dr. Sally Smith) are in the Overture Maps file
  • but 30% of NPI Entity Two for organizations (larger health care providers like a hospital) are in the Overture Maps file

These overlaps indicate inconsistencies between the datasets, likely influenced by factors such as the self-reported nature of the NPI file and the lack of regular updates on provider information. Overture is still a growing dataset and does not yet have many healthcare providers but what this mostly highlights is the importance of thorough data validation and maintenance processes in healthcare data. 

Enriching the NPI data with Overture lead to:

  • 173,646 centroids on Entity Code One records
  • 527,652 centroids on Entity Code Two records
  • 137,279 websites attributed to Entity Code One records
  • 476,671 websites attributed to Entity Code Two records
  • 63,300 socials attributed to Entity Code One records
  • 413,882 socials attributed to Entity Code Two records

The profiles of each individual provider or hospital now have enriched information from Overture making the NPI dataset more valuable. You can see how easily centroid information can be applied to the NPI dataset, which can help with a variety of use cases like assisting with analysis of healthcare service accessibility and the identification of underserved areas. By associating centroids with NPI data, healthcare organizations can strategically plan the placement of facilities to better serve communities in need.

Adding  websites and social profiles to the provider information is super helpful since the NPI’s contact information is sparse and often out of date. This simple join can greatly help potential patients by providing direct access to healthcare providers' websites for more information or appointment scheduling. You can imagine how you can easily add information from Overture such as polygons, confidence score, phone numbers, additional categories, and more with Placekey. 

Doing the join with Placekey was SUPER easy.  It took a few minutes.  And it is free for anyone to do and replicate.

Download the Data 

We made both Overture Maps and National Provider Index (NPI) files with Placekeys appended publicly available to download. These are the raw files with Placekeys, skinny files (the respective internal ID for each dataset and Placekey) and joined dataset are free to download on from this bucket. Note for Overture only US Places were joined with NPI.

The datasets are in this bucket or this drive link if you do not have an AWS account. Each link below will also allow you to download the data:

The skinny files are best to use if you want to quickly see overlaps between these datasets or any others you want to join.

The larger NPI file (full_npi_with_placekey) has quite a few columns but the highlights are NPI id, legal business name, address, name of the provider, and entity type. You will notice a lot of null information as most of these are optional fields for providers to share with the government.

The joined file (full_npi_to_overture_joined) is the NPI file joined with Overture Placekeys and the internal ID. We included all the NPI columns in this file.

Checkout the code snippet we used to join these, which you can utilize to join other datasets already with Placekeys. 

There is also a tutorial with the datasets highlighted in this Google Colab notebook if you would like to try it for yourself!

Overture and NPI Datasets 

Overture has some super cool data.  It has just under 12 million places in the United States.  While the data is still a work in progress, it is a good starting point for a lot of organizations because the data is free – you cannot beat that price.

The NPI data is a list of almost every health provider (doctors, pharmacies, imaging centers, etc,) in the US – about 8 million providers.  It is provided by the government and the data is easy to access and free.  Like Overture, the data isn’t perfect (there are tons of errors, bad data, mis-spellings, out-of-date records, etc) but it is the standard data that is used for healthcare in the US and does a good job of distinguishing individual practitioners and larger providers like Hospitals.  

What is Placekey

Placekey is an open entity matching API for places and addresses and helps with deduping, matching, syncing, and merging physical places..  You send in an address (or a location name and address) and you get back a simple unique key.  If you have an addresses “464 N. Main St Suite 504 … “ you will get a Placekey and even with the 10,000 different permutations and mis-spellings … you should still get the same Placekey.  So it is a really nice system to help join, dedupe, and merge data about physical places.

Placekey is free (forever) for up to 300,000 look-ups a month.  Beyond that there is a small fee to cover servers and core engineering work. The key itself is open and you get a perpetual license to it.   

There are a few different types of Placekeys, which all help join places data but provide flexibility based on the attributes you have, your specific use case, and the level of granularity you care about. For the purpose of example we will use address_placekey and placekey but will also give you building_placekey in the files linked later. A Placekey is returned if you have a location_name or more simply this is for when you want to have an id for a POI. Address_placekey is when you only need to dig into the address or do not have any POI information. For example, you can use address_placekey to quickly identify all the POIs at the same address.

Conclusion

We hope these files are useful and we encourage you to get an API Key and try the API yourself. This tutorial has the data provided mentioned above and is a great entry point to trying out Placekey. Please explore these in detail and let us know what you think and if you find any issues with our matching.

This join was done quickly but aims to show how Placekey scales exponentially. That is all to say, whether you have 20 POIs or 20 million POIs, Placekey enables you to effectively join these datasets.

Get ready to unlock new insights on physical places