-
Notifications
You must be signed in to change notification settings - Fork 5
2026 Model data update #960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Since we now have an excel and csv file, we have to do some data manipulation in the raw file. This is especially true since there is no year in either data (aside from the title). It seems to make sense to do the small amount of data transformation in that file (renaming column names and adding year), and removing the cleaning file. Do you have thoughts on that? |
…ta-architecture into 953-model-data-2026-refresh
…-data/data-architecture into 953-model-data-2026-refresh
…ta-architecture into 953-model-data-2026-refresh
…-data/data-architecture into 953-model-data-2026-refresh
…ta-architecture into 953-model-data-2026-refresh
| ) %>% | ||
| ) | ||
|
|
||
| osm_roads <- osm_roads %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R was having a hard time not crashing when I tried to pass this entire process through without giving it a break.
| ) %>% | ||
| ) | ||
|
|
||
| osm_roads <- osm_roads %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R was having a hard time not crashing when I tried to pass this entire process through without giving it a break.
| # Read privileges for the this drive location are limited. | ||
| # Contact Cook County GIS if permissions need to be changed. | ||
| file_path <- "//10.122.19.14/ArchiveServices" | ||
| file_path <- "//gisemcv1.ccounty.com/ArchiveServices" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Positron wasn't letting me connect to this data using the IP address. Writing out the path does work.
| land_nbhd_rate_2026 <- openxlsx::read.xlsx(tmp_file_nbhd_rate_2026) %>% | ||
| set_names(snakecase::to_snake_case(names(.))) %>% | ||
| select( | ||
| town_nbhd = neighborhood_number, | ||
| `2026` = proposed_2026_class_two_rate | ||
| ) %>% | ||
| mutate( | ||
| town_nbhd = gsub("\\D", "", town_nbhd), | ||
| township_code = substr(town_nbhd, 1, 2), | ||
| township_name = ccao::town_convert(township_code) | ||
| ) %>% | ||
| relocate(c(township_code, township_name)) %>% | ||
| pivot_longer( | ||
| c(`2026`), | ||
| names_to = "year", values_to = "land_rate_per_sqft" | ||
| ) %>% | ||
| mutate( | ||
| across(c(township_code:year), as.character), | ||
| land_rate_per_sqft = parse_number(land_rate_per_sqft), | ||
| data_year = "2026" | ||
| ) %>% | ||
| expand_grid(class) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is for the south tri, we want processing '26 data to look like processing '23 data rather than '24 or '25. There are no bifurcated rates in the south.
| # This filter keeps only the multisale rows with the most non-null values | ||
| # within document number and pin | ||
| mutate(non_null_count = rowSums(!is.na(across(everything())))) %>% | ||
| filter( | ||
| non_null_count == max(non_null_count), | ||
| .by = c(document_number, line_1_primary_pin) | ||
| ) %>% | ||
| select(-non_null_count) %>% | ||
| # After the abover filter, what's left are true duplicates if there are | ||
| # multiple rows within documnet number and pin with the same number of | ||
| # non-null values. We use distinct() to keep only one of those rows. | ||
| distinct(document_number, line_1_primary_pin, .keep_all = TRUE) %>% | ||
| relocate(year_of_sale = year, .after = last_col()) %>% | ||
| group_by(year_of_sale) %>% | ||
| group_by(year_of_sale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting some duplicates in our mydece sales that are either complete duplicates, or one has slightly more NAs than the other. I've tried to treat both cases appropriately.
| coastline_years <- parse_number( | ||
| get_bucket_df(input_bucket, prefix = "spatial/environment/coastline/")$Key | ||
| ) | ||
| walk(coastline_years, function(x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the current_year business can lead to errors. Much better to only look at raw data that actually exists.
| ) | ||
| flood_fema_warehouse <- file.path( | ||
| output_bucket, "flood_fema", "year=2024", "part-0.parquet" | ||
| fema_years <- parse_number( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple years of fema data but we were only looking at one.
| "ward_evanston_2019" = c("ward"), | ||
| "ward_evanston_2022" = c("ward") | ||
| "ward_evanston_2022" = c("ward"), | ||
| "ward_evanston_2025" = c("ward") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Evanston wards just dropped.
| "arrow": { | ||
| "Package": "arrow", | ||
| "Version": "21.0.0.1", | ||
| "Version": "15.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You hate to see it, but we need to downgrade this work with the deprecated version of geoarrow we depend on.
No description provided.