A step-by-step guide that will teach you how to compare the proximity of two points of interest (POI) using a distance matrix.
Reading time: 10 minutes
How do you find the distance between two points on a map? A distance matrix is a table that displays the distances between pairs of objects, in this case locations. For example, given two locations A,B, a distance matrix calculates how far apart each location is from the other and the starting position in order, which ultimately helps commuters understand prioritization on a given route.
While comparing distances between two points (objects) is straightforward, the problem becomes exponentially more complex as the number of points scale. This is one of the main business problems for logistics companies or other organizations that depend on drive-time, delivery routing, real estate site planning, and more.
Using SafeGraph’s Data Portal we can easily download point of interest data and coupled with native analytical capabilities in R we can then compare proximity. Let’s get started!
This tutorial is a step-by-step guide that will teach you how to compare the proximity of two points of interest (POI) by creating a distance matrix.
The question is as follows: If you’re interested in how visits to different locations within say 5 miles of each other changes over a time period, you’d need to know all the stores within 5 miles and their SafeGraph visit counts. That is what we’re doing here.
The most common way of doing this is with a distance matrix. A distance matrix like a spreadsheet, where each row is one location and each column is the distance to another location.
For instance, the first row may show that the distance between Store 1 and Store 2 is 1.2 miles, the distance between Store 1 and Store 3 is 2.7 miles, and the distance between Store 1 and Store 4 is 0.4 miles.
In order to compare the proximity of two points of interest (POIs), there are three main steps:
First, you need to know which SafeGraph data sets you’ll need to combine.
Second, you’ll need to decide on what tool you’ll use to accomplish the task.
Third, you’ll need some straight-forward code to do it.
This tutorial walks through the steps using R.
Step 1: Contact Sales for Sample Data or Preview Data at the Data Shop
To contact sales about using SafeGraph’s data, click here.
Once you have spoken with the sales team about purchasing SafeGraph data, downloaded the preview data, and/or created your new SafeGraph account, you can download point of interest data.
If you are trying out SafeGraph data, you will receive .csv or .gz files for download.
If you are a paying SafeGraph customer, you can download data through the web portal or through a command line interface using Amazon’s S3 CLI command line interface.
The data is available to download with .gz (compressed) and .csv file extensions.
From here, you are going to download two data sets if you are a paying customer or ask for a sample of data if you are considering purchasing SafeGraph data.
The first is “Core Places US (Nov 2020 - Present)”.
This data set offers important detail for each location (also known as point of interest)
The second is “Weekly Places Patterns (for data from 2020-11-30 to Present)”.
This provides the raw visit data, used in every SafeGraph analysis.
Please note that if you are just trying out SafeGraph data, you will not see a full list of available data. You need to make sure you ask your sales representative for a sample of the core places and weekly patterns information that fit your geographic area of interest.
Now that the data is all in one place, the next step is to create a file (or formally a distance matrix) that shows the distance between each location and all other locations. We will accomplish this using the geosphere package in R. This will return the distances between each point of interest in meters.
#Create the distance matrix
#Limit to just a city
visits_locations <- subset(visits_locations,city == 'Austin')
list1 <- visits_locations
list2 <- visits_locations
list1 <- subset(list1,!duplicated(list1$safegraph_place_id))
list2 <- subset(list2,!duplicated(list2$safegraph_place_id))
mat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)
list1import <- read.csv("Distance_Matrix.csv")
visits <- merge(visits_locations,list1import, by="safegraph_place_id")
After you run this, the result is an R dataset that is subsequently saved as a .csv. Each row shows the distance (in meters) between a location and every other location. The output should look as follows:
The matrix displays the distance between the first location and all other locations. In this case, Pier 1 Imports in Manchester, New Hampshire, and other locations.
In the figure above, the left side column has the ID. The first entity is ID = 0. In Column L is ID1. This is the store with ID1, which is the store we’re looking at, ID = 1. Thus, the distance between store 1 and store 1 is 0. In Column M is the distance between ID = 1 (far left column) and store 2. In this case, the distance is 228.6 miles.
That’s a wrap!
Now that we have compared two points of interest, possibilities abound. Whether you are interested in researching store closures on local business commerce within a given radius or the optimal route for a shipping container over a certain season, the process above will help you unearth the answers to these questions.
What have we accomplished? Well, suppose you’re interested in researching whether the closing of a Sears store impacted other retail establishments within say a 5 mile or 10 mile radius, the distance matrix we calculated here will provide a well-structured data set to answer this question.
Or suppose you’re interested in looking at the effect of closing colleges has had on retail establishments within a certain distance. Our distance matrix created here does just that.