Take-home Exercise 4: Prototyping Modules for Visual Analytics Shiny Application

Published

March 5, 2024

Modified

March 7, 2024

1 Overview

1.1 Introduction

In Singapore’s ever-changing rental market, understanding the factors influencing rental prices is crucial for tenants and landlords alike. This study focuses on three key aspects: descriptive analysis, correlation analysis, and clustering analysis.

1.2 Objectives

  • Determine the correlation between rental prices and various numerical variables, such as property size and proximity to amenities, and analyze their relationships.

  • Utilize clustering analysis to discern distinct groups within the rental market based on property characteristics, such as size, amenities proximity, and other relevant factors, aiming to uncover patterns and insights into different market segments or tenant preferences.

2 Loading R packages

The original design will then be remade using data visualization design principles and best practices using ggplot2, its extensions, and tidyverse packages.

pacman::p_load(poLCA, ggplot2, plotly, tidyverse, corrplot)

3 Dataset

The data on rental transactions was collected from the Urban Redevelopment Authority’s (URA) REALIS database.

The study utilized rental transaction data from 01 January 2021 to 31 December 2022, sourced from IRAS via URA. It includes rental prices, commencement dates, building names, addresses, and planning regions. Zoning and postal district information were omitted as they duplicated planning area data.

Rental_data <- read_csv("../../data/ResidentialRental_Final.csv")

To ensure there are no missing values in the processed data, check it as follows.

sum(is.na(Rental_data))
[1] 0

4 Descriptive Analysis

Click to view the code.
ggplot(Rental_data, aes(x=`Planning_Region`, y= `Monthly_Rent_SGD`, fill=`Property_Type`)) +
  geom_boxplot() +
  facet_wrap(~`Planning_Region`, scales = "free") +
  labs(x="Planning Region", y="Monthly Rent")

Click to view the code.
ggplot(Rental_data, aes(x=`Planning_Region`, y= `Monthly_Rent_SGD`)) +
  geom_boxplot() +
  facet_wrap(~`Property_Type`, scales = "free") +
  labs(x="Planning Region", y="Monthly Rent") +
  theme(axis.text.x = element_text(angle = 90))

5 Correlation Analysis

Plotting a correlation matrix of the various numerical variables, we observe that the correlations between monthly rent and variables measuring proximity are fairly week ranging between 0.3 to -0.3. Rental prices were most strongly correlated with the size of the unit.

Click to view the code.
numerical_data <- Rental_data[, sapply(Rental_data, is.numeric)]

numerical_data <- numerical_data[, -1]

correlation_matrix <- cor(numerical_data)
corrplot(correlation_matrix, method = "circle")

pairs(~Monthly_Rent_SGD + Floor_Area_SQM_Avg + Floor_Area_SQFT_Avg + distance_to_mrt, data = Rental_data,
      main = "Scatterplot Matrix")

6 Clustering Analysis

6.1 Data Preparation

  • Exclude columns that are either irrelevant or similar, such as retaining only “Monthly_Rent_PSF” instead of both “Monthly_Rent_PSM” and “Monthly_Rent_PSF”.

  • Bin continuous variables into categories of relatively similar sizes and assign labels to each category.

Click to view the code.
df_clustering <- Rental_data %>%
  select(-Column1, -Project_Name, -Street_Name, -Postal_District, -Monthly_Rent_PSM, -Floor_Area_SQM_Avg, -Lease_Commencement_Date, -nearest_mrt, -nearest_school, -latitude, -longitude) %>%
  mutate(Monthly_Rent_SGD = cut(Monthly_Rent_SGD,
                             breaks = c(0,2000,3000,4000,5000,Inf),
                             labels = c("0-2k", "2-3k", "3-4k", "4-5k", "5k+"))) %>%
  mutate(Monthly_Rent_PSF = cut(Monthly_Rent_PSF,
                             breaks = c(0,3,4,Inf),
                             labels = c("0-3", "3-4", "4+"))) %>%
  mutate(Floor_Area_SQFT_Avg = cut(Floor_Area_SQFT_Avg,
                             breaks = c(0,600,1000,1400,Inf),
                             labels = c("0-600", "600-1000", "1000-1400", "1400+"))) %>%
  mutate(distance_to_mrt = cut(distance_to_mrt,
                             breaks = c(0,0.3,0.6,0.9,Inf),
                             labels = c("0-0.3", "0.3-0.6", "0.6-0.9", "0.9+"))) %>%
  mutate(distance_to_school = cut(distance_to_school,
                             breaks = c(0,0.3,0.5,0.7,Inf),
                             labels = c("0-0.3", "0.3-0.5", "0.5-0.7", "0.7+")))

print(df_clustering)
# A tibble: 192,199 × 8
    Year Planning_Region Property_Type         Monthly_Rent_SGD Monthly_Rent_PSF
   <dbl> <chr>           <chr>                 <fct>            <fct>           
 1  2021 East Region     Non-landed Properties 0-2k             3-4             
 2  2021 North Region    Executive Condominium 2-3k             0-3             
 3  2021 West Region     Non-landed Properties 2-3k             0-3             
 4  2021 Central Region  Non-landed Properties 3-4k             3-4             
 5  2021 East Region     Non-landed Properties 2-3k             0-3             
 6  2021 Central Region  Non-landed Properties 4-5k             3-4             
 7  2021 Central Region  Non-landed Properties 3-4k             0-3             
 8  2021 Central Region  Non-landed Properties 3-4k             3-4             
 9  2021 East Region     Non-landed Properties 0-2k             0-3             
10  2021 Central Region  Non-landed Properties 3-4k             4+              
# ℹ 192,189 more rows
# ℹ 3 more variables: Floor_Area_SQFT_Avg <fct>, distance_to_mrt <fct>,
#   distance_to_school <fct>

After completing the data cleaning process, all variables are transformed into categorical factors and verified to ensure there are no missing values.

Click to view the code.
categorical_vars <- c("Year", "Planning_Region", "Property_Type", "Monthly_Rent_SGD", "Monthly_Rent_PSF", "Floor_Area_SQFT_Avg", "distance_to_mrt", "distance_to_school")
df_clustering[categorical_vars] <- lapply(df_clustering[categorical_vars], factor)

sapply(df_clustering, function(x) sum(is.na(x)))
               Year     Planning_Region       Property_Type    Monthly_Rent_SGD 
                  0                   0                   0                   0 
   Monthly_Rent_PSF Floor_Area_SQFT_Avg     distance_to_mrt  distance_to_school 
                  0                   0                   0                   0 

6.2 Model and Result

Run the model by specifying the desired number of classes(7) and the number of repetitions(5).

If we perform more than one repetition, it indicates that we conducted a comprehensive search to find the lowest BIC score, ensuring the model’s robustness and accuracy in determining the optimal number of classes.

Click to view the code.
set.seed(1234)

f <- as.formula(cbind(Year, Planning_Region, Property_Type, Monthly_Rent_SGD, Monthly_Rent_PSF, Floor_Area_SQFT_Avg, distance_to_mrt, distance_to_school) ~ 1)

LCA_model <- poLCA(f, df_clustering, nclass = 7, nrep = 5, maxiter = 5000)
Model 1: llik = -1547810 ... best llik = -1547810
Model 2: llik = -1548973 ... best llik = -1547810
Model 3: llik = -1546851 ... best llik = -1546851
Model 4: llik = -1547142 ... best llik = -1546851
Model 5: llik = -1546851 ... best llik = -1546851
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$Year
           Pr(1)  Pr(2)
class 1:  0.4577 0.5423
class 2:  0.3235 0.6765
class 3:  0.5960 0.4040
class 4:  0.6658 0.3342
class 5:  0.6861 0.3139
class 6:  0.5486 0.4514
class 7:  0.4127 0.5873

$Planning_Region
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)
class 1:  0.3788 0.2676 0.1338 0.0276 0.1922
class 2:  0.6706 0.1337 0.0847 0.0078 0.1032
class 3:  0.3310 0.3404 0.1773 0.0302 0.1210
class 4:  0.1401 0.3621 0.1641 0.0665 0.2673
class 5:  0.1854 0.3656 0.2176 0.0610 0.1703
class 6:  0.4173 0.2434 0.1598 0.0395 0.1400
class 7:  0.8433 0.0870 0.0168 0.0009 0.0521

$Property_Type
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)
class 1:  0.0000 0.0355 0.9607 0.0012 0.0026
class 2:  0.0000 0.0007 0.9979 0.0000 0.0014
class 3:  0.0000 0.0001 0.9995 0.0000 0.0004
class 4:  0.0002 0.0706 0.9149 0.0029 0.0115
class 5:  0.0000 0.0131 0.9831 0.0004 0.0035
class 6:  0.0991 0.0047 0.5520 0.1395 0.2047
class 7:  0.0042 0.0011 0.9742 0.0052 0.0152

$Monthly_Rent_SGD
           Pr(1)  Pr(2)  Pr(3)  Pr(4)  Pr(5)
class 1:  0.0000 0.0000 0.6513 0.3487 0.0000
class 2:  0.0000 0.1064 0.5555 0.2500 0.0881
class 3:  0.2962 0.7038 0.0000 0.0000 0.0000
class 4:  0.0240 0.5309 0.4443 0.0009 0.0000
class 5:  0.0918 0.9082 0.0000 0.0000 0.0000
class 6:  0.0018 0.0188 0.1190 0.2867 0.5738
class 7:  0.0000 0.0000 0.0000 0.0995 0.9005

$Monthly_Rent_PSF
           Pr(1)  Pr(2)  Pr(3)
class 1:  0.0000 0.9302 0.0698
class 2:  0.0000 0.0000 1.0000
class 3:  0.0047 0.1997 0.7956
class 4:  1.0000 0.0000 0.0000
class 5:  0.3909 0.5636 0.0455
class 6:  0.7863 0.1910 0.0226
class 7:  0.0000 0.2924 0.7076

$Floor_Area_SQFT_Avg
           Pr(1)  Pr(2)  Pr(3)  Pr(4)
class 1:  0.0000 0.3034 0.6966 0.0000
class 2:  0.2216 0.7784 0.0000 0.0000
class 3:  1.0000 0.0000 0.0000 0.0000
class 4:  0.0000 0.0000 0.8052 0.1948
class 5:  0.0000 1.0000 0.0000 0.0000
class 6:  0.0000 0.0000 0.0000 1.0000
class 7:  0.0000 0.0068 0.3896 0.6035

$distance_to_mrt
           Pr(1)  Pr(2)  Pr(3)  Pr(4)
class 1:  0.2058 0.3333 0.2035 0.2574
class 2:  0.3642 0.3708 0.1327 0.1324
class 3:  0.2130 0.3904 0.1556 0.2410
class 4:  0.1353 0.2572 0.2548 0.3527
class 5:  0.1407 0.3098 0.2012 0.3483
class 6:  0.0708 0.2126 0.4026 0.3140
class 7:  0.2002 0.3984 0.2240 0.1774

$distance_to_school
           Pr(1)  Pr(2)  Pr(3)  Pr(4)
class 1:  0.2121 0.3309 0.2251 0.2319
class 2:  0.1764 0.2652 0.2517 0.3067
class 3:  0.1952 0.3924 0.2139 0.1986
class 4:  0.2037 0.3781 0.1819 0.2363
class 5:  0.1916 0.4133 0.2005 0.1946
class 6:  0.1182 0.4297 0.1727 0.2794
class 7:  0.1091 0.2524 0.1973 0.4412

Estimated class population shares 
 0.1299 0.1655 0.1393 0.1574 0.1349 0.1184 0.1546 
 
Predicted class memberships (by modal posterior prob.) 
 0.1293 0.1645 0.145 0.1607 0.1341 0.1105 0.156 
 
========================================================= 
Fit for 7 latent classes: 
========================================================= 
number of observations: 192199 
number of estimated parameters: 174 
residual degrees of freedom: 47825 
maximum log-likelihood: -1546851 
 
AIC(7): 3094050
BIC(7): 3095819
G^2(7): 193628.3 (Likelihood ratio/deviance statistic) 
X^2(7): 434072.2 (Chi-square goodness of fit) 
 

6.3 Plot BIC score

Click to view the code.
LCA_model$bic
[1] 3095819

6.4 Plot AIC score

Click to view the code.
LCA_model$aic
[1] 3094050

6.5 Plot Entropy

Click to view the code.
entropy <- function(p) sum(-p*log(p))
error_prior <- entropy(LCA_model$P)
error_post <- mean(apply(LCA_model$posterior, c(1,2), entropy), na.rm=T)
LCA_entropy <- (error_prior - error_post) / error_prior
LCA_entropy
[1] 0.9846065

7 Visualising Results

The classification obtained from the model is added to the initial dataset to facilitate the comparison of variables across different classes for plotting purposes.

Click to view the code.
df_clustering$class <- LCA_model$predclass
df_clustering$class <- factor(df_clustering$class)
Click to view the code.
plot_table <- df_clustering %>%
  group_by(Planning_Region, class) %>%
  summarise(counts = n()) %>% 
  ungroup

p1 <- ggplot(plot_table, aes(fill = Planning_Region, y = counts, x = class)) + 
  geom_bar(position = "fill", stat = "identity")

ggplotly(p1)
Click to view the code.
p2 <- ggplot(plot_table, aes(fill = class, y = counts, x = Planning_Region)) + 
  geom_bar(position = "fill", stat = "identity")

ggplotly(p2)

8 UI Design

Click here to view the shinyapp of this exercise.

Click to view the code.
pacman::p_load(shiny, tidyverse, shinydashboard)

visualdata <- read_csv("data/ResidentialRental_Final.csv")

ui <- dashboardPage(
  dashboardHeader(title = 'Rental market learning and valuation', titleWidth = 400),
  dashboardSidebar(width = 400,
                   sidebarMenu(id = 'a',
                               menuItem('Historical data', tabName = 'historical', icon = icon("search")),
                               menuItem('Statistics', tabName = 'Statistics', icon = icon("line-chart"))
                   )
  ),
  dashboardBody(
    tabItems(
      #——————————————————————————————————————————————————————————————————————Historical data
      tabItem(tabName = "historical",
              fluidPage(
                titlePanel("Places to Rent"),
                selectInput("region", "Planning Region", choices = unique(visualdata$Planning_Region)),
                selectInput("type", "Property Type", choices = NULL),
                tableOutput("data")
              )
      ),
      tabItem(tabName = "Statistics",
              fluidPage(
                titlePanel("Statistics"),
                fluidRow(
                  column(width = 12,
                         tabsetPanel(
                           #——————————————————————————————————————————————————————————————————————Barchart
                           tabPanel("Bar Chart",
                                    box(
                                      radioButtons('xcol1',
                                                   label = tags$strong('Analyse Sales By:'),
                                                   choices = c('Property Type' = 'Property_Type',
                                                               'Planning Region' = 'Planning_Region'),
                                                   inline = TRUE)
                                    ),
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('barchart', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————Boxplot1
                           tabPanel("Boxplot1", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Boxplot1', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————Boxplot2
                           tabPanel("Boxplot2", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Boxplot2', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————Boxplot3
                           tabPanel("Boxplot3", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Boxplot3', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————scatterplot1
                           tabPanel("Scatterplot1", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Scatterplot1', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————scatterplot2
                           tabPanel("Scatterplot2", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Scatterplot2', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————scatterplot3
                           tabPanel("Scatterplot3", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('Scatterplot3', height = 750)
                                    )
                           ),
                           #——————————————————————————————————————————————————————————————————————scatterplot matrix
                           tabPanel("Scatterplot matrix", 
                                    box(
                                      width = 12,
                                      height = 800,
                                      solidHeader = TRUE,
                                      collapsible = FALSE,
                                      collapsed = FALSE,
                                      plotOutput('matrix', height = 750)
                                    )
                           )
                         ), #tabsetPanel(
                  ) #column(
                ) #fluidRow(
              ), #fluidPage(
      )
    ) #tabItems(
  ) #dashboardBody(
) #dashboardPage(



# Define server logic required to draw a histogram
server <- function(input, output){
  #——————————————————————————————————————————————————————————————————————historical
  region <- reactive({
    filter(visualdata, Planning_Region == input$region)
  })
  observeEvent(region(), {
    choices <- unique(region()$Property_Type)
    updateSelectInput(inputId = "type", choices = choices)
  })
  type <- reactive({
    req(input$type)
    filter(region(), Property_Type == input$type)
  })
  
  #——————————————————————————————————————————————————————————————————————Statistics
  #——————————————————————————————————————————————————————————————————————barchart
  output$barchart <- renderPlot({
    analysis <- visualdata %>%
      group_by_(.dots = input$xcol1) %>%
      summarise(basket_value = mean(`Monthly_Rent_SGD`, na.rm = T))
    
    p <- ggplot(analysis, aes_string(y = 'basket_value', x = input$xcol1)) +
      geom_bar(aes_string(fill = input$xcol1), stat = 'identity') +
      labs(title = 'Average Rental Price', subtitle = paste('by', input$xcol1), 
           x = input$xcol1, y = 'Rental Price ($)',
           fill = input$xcol1)
    return(p)
  })
  #——————————————————————————————————————————————————————————————————————Boxplot1
  output$Boxplot1 <- renderPlot({
    p1 <- ggplot(visualdata, aes(x=`Planning_Region`, y= `Monthly_Rent_SGD`, fill=`Property_Type`)) +
      geom_boxplot() +
      facet_wrap(~`Planning_Region`, scales = "free") +
      labs(x="Planning Region", y="Monthly Rent")
    return(p1)
  })
  #——————————————————————————————————————————————————————————————————————Boxplot2
  output$Boxplot2 <- renderPlot({
    p2 <- ggplot(visualdata, aes(x=`Planning_Region`, y= `Monthly_Rent_SGD`)) +
      geom_boxplot() +
      facet_wrap(~`Property_Type`, scales = "free") +
      labs(x="Planning Region", y="Monthly Rent") +
      theme(axis.text.x = element_text(angle = 90))
    return(p2)
  })
  #——————————————————————————————————————————————————————————————————————Boxplot3
  output$Boxplot3 <- renderPlot({
    p3 <- ggplot(visualdata, aes(x=`Property_Type`, y= `Monthly_Rent_SGD`)) +
      geom_boxplot() +
      facet_wrap(~`Planning_Region`, scales = "free") +
      labs(x="Property Type", y="Monthly Rent") +
      theme(axis.text.x = element_text(angle = 90))
    return(p3)
  })
  #——————————————————————————————————————————————————————————————————————Scatterplot1
  output$Scatterplot1 <- renderPlot({
    p4 <- ggplot(visualdata, aes(x=`distance_to_school`, y=`Monthly_Rent_SGD`)) +
      geom_point(size=0.5) +
      scale_x_continuous(breaks = seq(0, 2, by = 0.2)) +
      coord_cartesian(xlim = c(0, 2)) +
      facet_grid(`Planning_Region` ~ `Property_Type`, scales = "free", space ="fixed") +
      labs(x='Distance to School', y = 'Monthly Rent') +
      theme(axis.text.x = element_text(angle = 90))
    return(p4)
  })
  #——————————————————————————————————————————————————————————————————————Scatterplot2
  output$Scatterplot2 <- renderPlot({
    p5 <- ggplot(visualdata, aes(x=`distance_to_mrt`, y=`Monthly_Rent_SGD`)) +
      geom_point(size=0.5) +
      scale_x_continuous(breaks = seq(0, 2, by = 0.2)) +
      coord_cartesian(xlim = c(0, 2)) +
      facet_grid(`Planning_Region` ~ `Property_Type`, scales = "free", space ="fixed") +
      labs(x='Distance to MRT', y = 'Monthly Rent') +
      theme(axis.text.x = element_text(angle = 90))
    return(p5)
  })
  #——————————————————————————————————————————————————————————————————————Scatterplot3
  output$Scatterplot3 <- renderPlot({
    p6 <- ggplot(visualdata, aes(x=`Floor_Area_SQFT_Avg`, y=`Monthly_Rent_SGD`)) +
      geom_point(size=0.5) +
      #  scale_x_continuous(breaks = seq(0, 2, by = 0.2)) +
      #  coord_cartesian(xlim = c(0, 2)) +
      facet_grid(`Planning_Region` ~ `Property_Type`, scales = "free", space ="fixed") +
      labs(x='Floor Area SQFT', y = 'Monthly Rent') +
      theme(axis.text.x = element_text(angle = 90))
    return(p6)
  })  
  #——————————————————————————————————————————————————————————————————————matrix
  output$matrix <- renderPlot({
    pm <- pairs(~visualdata$Monthly_Rent_SGD + visualdata$Floor_Area_SQM_Avg + visualdata$Floor_Area_SQFT_Avg + visualdata$distance_to_mrt, data = visualdata,
                main = "Scatterplot Matrix")
    return(pm)
  })
}

# Run the application 
shinyApp(ui = ui, server = server)