Deriving POIs from Nameless Permit Data

Given a POI dataset without location names, we can use Placekey to identify specific POIs. To do this, we set the location_name parameter equal to 'Starbucks'. Not every location is a Starbucks. In fact, most aren't. By setting location_name to 'Starbucks', we tell the Placekey API to only return Placekeys for Starbucks locations. In this way, we are able to identify Starbucks locations from name-less permit data.

Take a look at our Google Colab notebook for this tutorial to access the code and run it yourself!

Getting Started

Before moving forward, it might be a good idea to get more familiar with Placekey. There are a growing number of resources available:

Imports and Installations

In the first code block, we install the placekey package in our Google Colab environment and import the necessary packages.

!pip install placekey

from placekey.api import PlacekeyAPI
import pandas as pd
import numpy as np
from ast import literal_eval
import json
from google.colab import drive as mountGoogleDrive 
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

Authentication

Run this code block to authenticate yourself with Google, giving you access to the datasets.

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
print("You are fully authenticated and can edit and re-run everything in the notebook. Enjoy!")

Set API key

Replace the asterisks below with your Placekey API key. If you don’t have one yet, it’s completely free.

placekey_api_key = "***************" # fill this in with your personal API key (do not share publicly)

pk_api = PlacekeyAPI(placekey_api_key)

‍

Dataset

This tutorial uses permits data from DataSF.

Define functions

First, define a couple functions to make it easier to read in the dataset. These functions will read a tutorial CSV from Google Drive without you having to upload your own data.

def pd_read_csv_drive(id, drive, dtype=None, converters=None, encoding=None):
  downloaded = drive.CreateFile({'id':id}) 
  downloaded.GetContentFile('Filename.csv')  
  return(pd.read_csv('Filename.csv',dtype=dtype, converters=converters, encoding=encoding))

def get_drive_id(filename):
    drive_ids = {'permits-tutorial' : '18LnJBJ3D3DslDcoZ33_FFOQ3ivFhdRVr',
                 }
    return(drive_ids[filename])

Read datasets

permits = pd_read_csv_drive(get_drive_id('permits-tutorial'), drive=drive, dtype={'Zipcode' : str})
permits['Zipcode'] = permits['Zipcode'].str[:-2]
permits['index'] = permits.index.astype(str)
permits.head()

Dropping duplicates based on latitude and longitude so we make less redundant Placekey requests.

permits_unique = permits.drop_duplicates(['latitude','longitude']).reset_index()
print(permits_unique.shape)
permits_unique['index'] = permits_unique['index'].astype(str)
permits_unique.head()

‍

Adding Placekey with Address

We're ready to request Placekeys for our dataset.

Map columns to appropriate fields

In this step, we create a new dataframe with just the address columns from the permits dataset. The columns are renamed and reformatted to conform to the Placekey API. Additionally, we add a location_name column to 'Starbucks' for every row, along with city = 'San Francisco' and region = 'CA'.

def get_df_for_api(df, column_map):
  df_for_api = df.rename(columns=column_map)
  cols = list(column_map.values())
  df_for_api = df_for_api[cols]
  df_for_api['iso_country_code'] = 'US'
  return(df_for_api)

permits_unique_address = permits_unique.copy()
permits_unique_address[['Street Number','Street Number Suffix','Street Name','Street Suffix', 'Unit']] = permits_unique_address[['Street Number','Street Number Suffix','Street Name','Street Suffix', 'Unit']].replace(np.nan, '')
permits_unique_address['Street Number'] = permits_unique_address['Street Number'].astype(str)
permits_unique_address['Zipcode'] = permits_unique_address['Zipcode'].astype(str)

permits_unique_address['street_address'] = permits_unique_address['Street Number'] + " " + permits_unique_address['Street Name'] + " " + permits_unique_address['Street Suffix']
df_for_api = get_df_for_api(permits_unique_address, {'index' : 'query_id', 'street_address' : 'street_address', 'Zipcode' : 'postal_code'})
df_for_api['city'] = 'San Francisco'
df_for_api['region'] = 'CA'
df_for_api['location_name'] = 'Starbucks'
df_for_api.head(3)

Convert the dataframe to JSON

Each row will be represented by a JSON object, so that it conforms to the Placekey API.

data_jsoned = json.loads(df_for_api.to_json(orient="records"))
print("number of records: ", len(data_jsoned))
print("example record:")
data_jsoned[0]

Request Placekeys from the Placekey API

After getting the responses, we convert them to a dataframe stored in df_placekeys.

print('Requesting Placekeys... Estimated time: 1 minute')
responses = pk_api.lookup_placekeys(data_jsoned, verbose=True)
df_placekeys = pd.read_json(json.dumps(responses), dtype={'query_id':str})
df_placekeys.head()

Add Placekeys back to the original permits dataset

def merge_and_format(permit_df, placekeys_df):
  permit_placekey = pd.merge(placekeys_df, permit_df, left_on = 'query_id', right_on="index", how='right')
  permit_placekey = permit_placekey.drop('error', axis=1)
  return(permit_placekey)

permit_placekey = merge_and_format(permits, df_placekeys)
print(permit_placekey.shape)
permit_placekey.head()

The full Placekeys correspond to the Starbucks locations. A Placekey is "full" if it has the POI and address components, for a total of 15 characters (not counting hyphens and "@"). We filter the dataframe below to rows with full Placekeys, finding 232 permits for Starbucks, with 49 unique locations.

starbucks = permit_placekey[(permit_placekey.placekey.notna()) & (permit_placekey.placekey.str.len() == 19) & (permit_placekey['Existing Use'] == 'food/beverage hndlng')]
print(starbucks.shape)
print("Unique Starbucks:", len(starbucks.placekey.unique()))
starbucks.head()

‍

Conclusion

In this tutorial, we learned how to find specific locations in name-less POI data using the Placekey API. Specifically, given a dataset of San Francisco permits, we identified which permits corresponded to Starbucks locations. By making each request with location_name = 'Starbucks', we forced the Placekey API to only return full Placekeys for addresses corresponding to Starbucks locations.

Want to learn more?

Check out the Placekey website.
Join the SafeGraph Community, a free Slack community for geospatial data enthusiasts. Receive support, share your work, or connect with others in the GIS community.

Placekey Tutorials