Ruddra.com

Find Best Neighborhood to Fight Pandemic in NYC - Methodology

Find Best Neighborhood to Fight Pandemic in NYC - Methodology
Photo by visuals on Unsplash

Disclaimer: this article has been generated as part of IBM Data Science Professional Certificate course’s final submission.

This report consists of three parts: business problem and data preparation, methodology, visualization and results. In this article we are going to describe the methodology we used in step by step:

Step one: New York city data with latitude and longitude

We are using requests to get the json data from nyc dataset and stored it in a data frame.

NYC Data

Step two: New York city data with population

Then we can use BeautifulSoup to scrape boroughs from Wikipedia. Then we have collected every link given in neighborhood column of the table. From each link, we can run iteration via requests to visit those Wikipedia pages, and scrap population data from right hand side table.

NYC Population Data

Step three: combine step one and step two

We can combine data frames from previous steps into one based on “neighborhood” and “borough”:

NYC Combined Data

Here is a box chart of “Population” per “borough”:

Population vs borough

Also, another box chart of “neighborhood” per “borough”:

Population vs borough

Step four: collect hospital data from Foursquare

After collecting population data, now it is time to collect the hospital data. We can use the Foursquare API to fetch hospital data for latitude and longitude of each neighborhood from the previous dataset.

Hospital per borough

Step five: collect hospital bed data from NYS Health Profile

We can also collect hospital bed related data from NYS Health Profile website. We can scrap data by using Selenium with BeautifulSoap. We have collected the IDs of hospitals in NYC manually, and based on those IDs, we have scraped data from NYS Health Profile website. The data frame looks like this:

NYS

Step six: combine step four and step five

Now we are going to combine data from step four and step five. We are going to internally join the data frame based on “neighborhood” and “borough”.

Combine hospital data

We are going to clean up the data a little bit and sum up bed count and icu bed count grouping by “neighborhood” and “borough”:

Cleaned hospital data

Here is a box charts of “bed count” per “borough”:

Bed count per boroguh

Also another box charts of “ICU bed count” per “borough”:

Bed count per borough

Step seven: combine data from step three and step six

Now we are going to combine data from step three and step six. Means, we are going to combine the population data with hospital bed count data. We are going to merge two data frames based on “neighborhood” and “borough”. New data frame looks like this:

Bed count per borough

Step eight: add bed and icu per hundred people to data frame

Now we are going to calculate bed per hundred people based on two rows: Population and Bed Number. Then add this to the data frame. Similarly, we are going to add ICU data to data frame:

With bed/icu per 100 people

Step nine: K-means clustering

Now we are going to use k-means clustering to partition the data into k groups. we will be using elbow method to find the optimal number of k. The “elbow” (the point of inflection on the curve) is a good indication that the underlying model fits best at that point. In the visualizer “elbow”, value of k is 3.

K-means elbow

Step ten: merge cluster labels with dataset

After that, we are going to merge cluster labels of groups with data frames. The data frame looks like this:

DF with cluster label

Step eleven: see which borough goes to which cluster

Let us see which boroughs belong to which clusters.

Here is the dataset for cluster 0:

Cluster 0

Here is the dataset for cluster 1:

Cluster 1

Here is the dataset for cluster 2:

Cluster 2

Step twelve: neighborhoods without hospital

So far, we have analyzed dataset for neighborhoods with hospitals. Now, we should look into neighborhoods without hospital data:

neighborhood without hospital

If we see the indexes of neighborhoods with and without hospital, it should look like this:

Count of neighborhoods w/o hospital

We can see that there are 100 neighborhoods which does not have any hospital.

Conclusion

In next article, we are going to visualize the data collected from previous steps and discuss about our results.

Last updated: September 16, 2020


Share Your Thoughts
M↓ Markdown