As part of a IBM certificate in data science I earned in 2020, I put together a project that categorizes pizza places in Boston based on their geography.
TLDR Takeaway: K-means clustering is a very cool technique for partitioning a dataset into distinct clusters, but it's difficult to apply to real-estate opportunity analysis just because there are so many variables to take into consideration.
TLDR Takeaway: K-means clustering is a very cool technique for partitioning a dataset into distinct clusters, but it's difficult to apply to real-estate opportunity analysis just because there are so many variables to take into consideration.
Introduction and Problem
How can restaurants make use of data science and geographic data to identify expansion opportunities? This analysis is an exploratory step in this direction.
With Python, we can leverage Foursquare’s location data to help identify possible new opportunities for restaurants in specific geographic areas. We can help answer questions like:
This small project will use geographic data for the city of Boston to identify food establishments of a particular type (in this case, Pizza Places) to determine whether certain neighborhoods may already be showing some saturation already. In addition, we will use this same information to understand if the existing neighborhood locations serve as sufficient categories for which to group these pizza places. Are all pizza places in the Financial District the same, or is there more variation between neighborhoods than within neighborhoods geographically?
With Python, we can leverage Foursquare’s location data to help identify possible new opportunities for restaurants in specific geographic areas. We can help answer questions like:
- Is there already saturation in a given neighborhood or for a specific type of food?
- What kinds of establishments are already popular in a certain neighborhood?
- What kinds of geographic locations could be ripe for new development?
This small project will use geographic data for the city of Boston to identify food establishments of a particular type (in this case, Pizza Places) to determine whether certain neighborhoods may already be showing some saturation already. In addition, we will use this same information to understand if the existing neighborhood locations serve as sufficient categories for which to group these pizza places. Are all pizza places in the Financial District the same, or is there more variation between neighborhoods than within neighborhoods geographically?
Data Sources
1. City of Boston Neighborhood Data
The city of Boston has more than 20 distinct neighborhoods – each with a unique character. You can read more about each neighborhood [here](https://www.boston.gov/neighborhoods). For example, the North End was originally a neighborhood of Italian immigrants but more recently has become an upscale location for tourists with plenty of Italian-style dining options. The neighborhood of Brighton is known for being a slightly more affordable area and typically caters to young professionals.
The city of Boston posts its location data publicly on its website to help real estate developers, researchers, and city planners have access to this information. In this lab, we will download one of the city’s more popular neighborhood datasets simply called “Boston Neighborhoods”, available [here]( https://data.boston.gov/dataset/boston-neighborhoods/resource/13ee2b65-6547-4168-b112-83995f138602).
The city of Boston posts its location data publicly on its website to help real estate developers, researchers, and city planners have access to this information. In this lab, we will download one of the city’s more popular neighborhood datasets simply called “Boston Neighborhoods”, available [here]( https://data.boston.gov/dataset/boston-neighborhoods/resource/13ee2b65-6547-4168-b112-83995f138602).
2. Foursquare Geographic Data
In this project will also be downloading and accessing data from Foursquare. Foursquare is a company that provides location data and intelligence to its customers - primarily web developers - for use in their applications. In this project, we will be making a limited number of calls to Foursquare’s API to pull data about pizza places in areas of the city of particular interest. We will be using the Foursquare [explore endpoint]( https://developer.foursquare.com/docs/places-api/endpoints/) to get venue recommendations in the “Pizza Place” Category.
Analysis
To begin, we will be pulling those pizza place categories that correspond to the top 50 locations of the geographic center of each of the neighborhoods in question. These geographic centers we will identify through Google.
To make this assignment more practical, we will be excluding several neighborhoods in our analysis. This will include most of the larger, outlying neighborhoods outside of the city center (including Brighton, Allston, Dorchester, Roxbury, Mattapan, the Harbor Islands, Roslindale, West Roxbury, and others).
Dictionary and Libraries: To complete this project, we will also be downloading a number of python scripting libraries. This includes:
To make this assignment more practical, we will be excluding several neighborhoods in our analysis. This will include most of the larger, outlying neighborhoods outside of the city center (including Brighton, Allston, Dorchester, Roxbury, Mattapan, the Harbor Islands, Roslindale, West Roxbury, and others).
Dictionary and Libraries: To complete this project, we will also be downloading a number of python scripting libraries. This includes:
- Pandas library to work with data
- Numpy library to work with vectorized data
- Json library to analyze json data from the City of Boston
- Geopy library to find location data for Boston
- MatplotLib library to plot our data
- Folium to render our data in a map
- SkLearn to run a Kmeans clustering algorithm to group the restaurant data into clusters
Neighborhood Centers
Pizza Places from Foursquare
Pizza Places Categorized by Geography
Results and Discussion
As you can see from the maps above, even though 12 clusters were specified in the kmeans algorithm, these 12 clusters do NOT correspond to the same neighborhoods as mapped initially in this project. While the downtown core consists of 4 major neighborhoods (North End, West End, Downtown, Beacon Hill) there are 4 clusters that do not correspond to these neighborhood boundaries. Cluster 10 transverses both the North End and Downtown, Cluster 3 transverses the West End and the North End, Cluster 0 transverses Chinatown and Downtown, and Cluster 2 transverses Downtown and the South Boston Waterfront (Seaport).
One large takeaway from this analysis is that the South Boston Waterfront (Otherwise known as the Seaport District) has a dearth of pizza restaurants in general. Although this section of the city has been under extensive development recently, there is a paucity of pizza restaurants in this section of the city. Anyone who desired pizza who happened to be in this area would have to cross the fort point channel to get downtown.
Another key takeaway is that the neighborhoods are not a particularly useful starting place from which to begin our analysis. In order to get a fuller picture of the most operative development opportunities for Pizza Restaurants around the city, more research would be necessary to understand zoning restrictions and also the general socioeconomic status of any development area in question. Like McDonald’s, a pizza chain might fit best along highly trafficked areas of sprawling suburban areas.
One large takeaway from this analysis is that the South Boston Waterfront (Otherwise known as the Seaport District) has a dearth of pizza restaurants in general. Although this section of the city has been under extensive development recently, there is a paucity of pizza restaurants in this section of the city. Anyone who desired pizza who happened to be in this area would have to cross the fort point channel to get downtown.
Another key takeaway is that the neighborhoods are not a particularly useful starting place from which to begin our analysis. In order to get a fuller picture of the most operative development opportunities for Pizza Restaurants around the city, more research would be necessary to understand zoning restrictions and also the general socioeconomic status of any development area in question. Like McDonald’s, a pizza chain might fit best along highly trafficked areas of sprawling suburban areas.