Take Home Exercise-5

Reveal the social areas of the city of Engagement, Ohio USA.

Shachi Anirudha Raodeo , true
2022-05-29

The Task

This take-home exercise aims to reveal reveal social areas and visualizing and analyzing locations with traffic bottleneck of the city of Engagement, Ohio USA by using visualization techniques in R.

Assuming the volunteers are representative of the city’s population, characterize the distinct areas of the city that you identify. For each area you identify, provide your rationale and supporting data.

Where are the busiest areas in Engagement? Are there traffic bottlenecks that should be addressed? Explain your rationale.

Links to the dataset:

TravelJournal.csv Employers.csv Pubs.csv Restaurants.csv Buildings.csv Schools.csv Apartments.csv

Step-by-step Data Visualisation

Installing and launching R packages

Packages, namely `tidyverse, sftime, ViSiElse, tmap are required for this exercise. This code chunk installs the required packages and loads them onto RStudio environment.

packages = c('tidyverse','readr','knitr','sf','tmap','sf','clock','sftime','rmarkdown','distill','lubridate')

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Data Preparation

Data Source

The dataset used in this exercise is Participants.csv, published by the IEEE for [VAST challenge 2022] (https://vast-challenge.github.io/2022/)

Importing the dataset

The code chunk below imports Buildings.csv , TravelJournal.csv, ParticipantStatusLogs10.csv, Schools.csv, Employers.csv, Pubs.csv, Restaurants.csv and Apartments.csv from the data folder into R by using read_csv() function of readr and saves it as Tibble data frame called buildings, travel, logs, schools, employers, pubs, Restaurants and apartments

The TravelJournal.csv contains information about participants’ motivation for movement around the city.

travel <- read_csv("data/TravelJournal.csv")
summary(travel)

read_sf() of sf package is used to parse School.csv Pubs.csv, Apartments.csv, Buildings.csv, Employer.csv, and Restaurants.csv into R as sf data.frames

schools <- read_sf("data/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
logs <- read_sf("data/ParticipantStatusLogs10.csv", 
                options = "GEOM_POSSIBLE_NAMES=currentLocation")
glimpse(logs)

Data Wrangling

The Travel Journal contains travel data of a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home.

The other schools, buildings, restaurants, pubs, apartments and employers gives us location of distinct places in the city.

Participant logs gives us the routine log file of a participant over a certain time period.

Calculating Amount Spent

Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal gives us the amount spent by the participant at a particular location.

travel$amountSpent <- travel$endingBalance -travel$startingBalance

Extract information from timestamp

We use weekdays(), day(), month(), year() functions to extract the day of the week, date, moth and year of checkin to perform time series visualizations. Extract the date from timestamp using floor_date() function.

Convert the travelEndLocationId into character format.

Calculate travel time as the difference between the travel start time and the travel end time and calculate the time spent as the difference of check in and check out times.

data_travel= travel%>%
  mutate(weekday = weekdays(checkInTime),
         day = day(checkInTime),
         month=as.character(checkInTime,"%b %y"),
         year = year(checkInTime),
         monthYear = floor_date(checkInTime, "month"),
         travelEndLocationId=as.character(travelEndLocationId),
         timeSpent = checkOutTime - checkInTime,
         travelTime = travelEndTime- travelStartTime,
         participantId=as.character(participantId),
         purpose=as.character(purpose))

data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel$travelTime <- as.numeric(as.character(data_travel$travelTime))

Filter necessary columns

filter out the columns needed and pass them to the tibblr data frame.

data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "checkInTime", "amountSpent","timeSpent","travelTime","weekday","day","month","year","monthYear")]

Save files as RDS

save file into rds using saveRDS() function.

saveRDS ( data_travel, 'data/data_travel.rds')
saveRDS ( logs, 'data/logs.rds')
data_travel <- readRDS ( 'data/data_travel.rds')
head (data_travel)
# A tibble: 6 × 13
  participantId travelStartLocationId travelEndLocationId purpose     
  <chr>                         <dbl> <chr>               <chr>       
1 23                              532 894                 Recreation …
2 876                              NA 1804                Eating      
3 902                              NA 1801                Eating      
4 919                              NA 1802                Eating      
5 154                              NA 446                 Eating      
6 509                              NA 1801                Eating      
# … with 9 more variables: checkInTime <dttm>, amountSpent <dbl>,
#   timeSpent <dbl>, travelTime <dbl>, weekday <chr>, day <int>,
#   month <chr>, year <int>, monthYear <dttm>
logs <- readRDS ( 'data/logs.rds')
head(logs)
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -3624.161 ymin: 2870.49 xmax: -1791.201 ymax: 5848.29
CRS:           NA
# A tibble: 6 × 12
  timestamp             currentLocation participantId currentMode
  <chr>                         <POINT> <chr>         <chr>      
1 2022-04-26T00:2… (-2459.039 2982.023) 314           AtHome     
2 2022-04-26T00:2… (-3624.161 5661.512) 315           AtHome     
3 2022-04-26T00:2…  (-1791.201 5848.29) 316           AtHome     
4 2022-04-26T00:2…  (-2384.221 2870.49) 317           AtHome     
5 2022-04-26T00:2… (-2459.754 3053.668) 318           AtHome     
6 2022-04-26T00:2… (-3339.739 5572.409) 319           AtHome     
# … with 8 more variables: hungerStatus <chr>, sleepStatus <chr>,
#   apartmentId <chr>, availableBalance <chr>, jobId <chr>,
#   financialStatus <chr>, dailyFoodBudget <chr>,
#   weeklyExtraBudget <chr>

Process movement Data

convert timestamp field from character data type to date-time data type by using date_time_parse() of clock package. derive a day field by using get_day() of clock package. extract records whereby currentMode field is equal to Transport class by using filter() of dplyr package.

logs_path <- logs %>%
  mutate(Timestamp = date_time_parse(timestamp,
                                     zone= "",
                                     format = "%Y-%m-%dT%H:%M:%S"))%>%
  mutate(day=get_day(Timestamp))%>%
  filter(currentMode == "Transport")

Data Visualization

Buildings present in Ohio, USA

plot the building polygon features by using tm_polygon() function.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1)
tmap_mode("plot")

Location of Employers, Restaurants and Pubs present in Ohio USA

We notice that employers are distributed throughout the city with a maximum number in the central part of the city which appears more connected.

It is found that the restaurants are more distributed towards the northen and central blovks of the city as compared to the southern block. The southern block only hosts three restaurants.

We also notice that most of the restaurants are present in and around the areas with employers which says that restaurant and pub owners build up restaurants near workplaces as it is a common practice for working people to visit the restaurants and pubs near office places.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "blue")+
tm_shape(restaurants)+
  tm_dots(col = "red")+
  tm_shape(pubs) +
  tm_dots(col = "green")
tmap_mode("plot")

Location of Apartments and Schools present in Ohio USA

Since we have only one school present in the city of engagement Ohio, USA. we notice that two schools are present in the north east block of the city, one in th central east and one in the southern block.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(schools) +
  tm_dots(col = "red")+
  tm_shape(apartments) +
  tm_dots(col = "yellow")
tmap_mode("plot")

There are certain residential areas and certain office suburbs in the city of engagement Ohio USA. The Restaurants and Pubs are intermixed among these areas. All schools except school with schoolId 0 are located near to the periphery of the city of Ohio.

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "blue") +
tm_shape(apartments) +
  tm_dots(col = "lightblue") +
tm_shape(pubs) +
  tm_dots(col = "red") +
tm_shape(restaurants) +
  tm_dots(col = "green") +
tm_shape(schools) +
  tm_dots(col = "yellow")
tmap_mode("plot")

Plotting movement data for participants

All th regions which act as connection nodes between two regions are the regions with most traffic. Inline regions or small lanes with dead ends are not much frequented.

tmap_mode("plot")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(logs_path) +
  tm_dots(col = "yellow", size= 0.02) 
tmap_mode("plot")
hex_buildings <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id_buil')

points_in_hex_buil <- st_join(logs_path, 
                         hex_buildings, 
                         join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id_buil)
head(points_in_hex_buil)
# A tibble: 6 × 2
  hex_id_buil pointCount
        <int>      <int>
1         169         16
2         212         49
3         225         12
4         226         93
5         227         14
6         228         36
hex_combined_buil <- hex_buildings %>%
  left_join(points_in_hex_buil, 
            by = 'hex_id_buil') %>%
  replace(is.na(.), 0)

tm_shape(hex_combined_buil %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1)

hex_employers <- st_make_grid(employers, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id_emp')

points_in_hex_emp <- st_join(logs_path, 
                         hex_employers, 
                         join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id_emp)
head(points_in_hex_emp)
# A tibble: 6 × 2
  hex_id_emp pointCount
       <int>      <int>
1         25         19
2         42         18
3         68         24
4         69         33
5         70         12
6         71         16
hex_combined_emp <- hex_employers %>%
  left_join(points_in_hex_emp, 
            by = 'hex_id_emp') %>%
  replace(is.na(.), 0)

tm_shape(hex_combined_emp %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile",colorNA = 'white', palette = "-magma") +
  tm_borders(alpha = 0.1)

Conclusion

  1. Small lanes in the city of Ohio which do not connect to major points in the city or are present at the city periphery have less traffic.

  2. Areas in Ohio which are more interconnected and connect important buildings such as school to apartments or apartments or employers to restaurants and pubs have more traffic as they serve for the daily movement of the participants from school to home or from restaurant/pub or from places of recreation or social gathering to home

  3. Areas in Ohio surrounding clusters of apartments are high traffic bottleneck areas