Find Best Neighborhood to Fight Pandemic in NYC - Methodology

offline
Photo by visuals on Unsplash

Disclaimer: this article has been generated as part of IBM Data Science Professional Certificate course’s final submission.

This report consists of three parts: business problem and data preparation, methodology, visualization and results. In this article we are going to describe the methodology we used in step by step:

Step one: New York city data with latitude and longitude

We are using requests to get the json data from nyc dataset and stored it in a data frame.

Step two: New York city data with population

Then we can use BeautifulSoup to scrape boroughs from Wikipedia. Then we have collected every link given in neighborhood column of the table. From each link, we can run iteration via requests to visit those Wikipedia pages, and scrap population data from right hand side table.

Step three: combine step one and step two

We can combine data frames from previous steps into one based on “neighborhood” and “borough”:

Here is a box chart of “Population” per “borough”:

Also, another box chart of “neighborhood” per “borough”: