Skip to content

Conversation

@wrridgeway
Copy link
Member

No description provided.

@wrridgeway wrridgeway linked an issue Dec 30, 2025 that may be closed by this pull request
@Damonamajor
Copy link
Contributor

Damonamajor commented Dec 31, 2025

Since we now have an excel and csv file, we have to do some data manipulation in the raw file. This is especially true since there is no year in either data (aside from the title). It seems to make sense to do the small amount of data transformation in that file (renaming column names and adding year), and removing the cleaning file. Do you have thoughts on that?

@wrridgeway wrridgeway linked an issue Dec 31, 2025 that may be closed by this pull request
) %>%
)

osm_roads <- osm_roads %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R was having a hard time not crashing when I tried to pass this entire process through without giving it a break.

) %>%
)

osm_roads <- osm_roads %>%
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R was having a hard time not crashing when I tried to pass this entire process through without giving it a break.

# Read privileges for the this drive location are limited.
# Contact Cook County GIS if permissions need to be changed.
file_path <- "//10.122.19.14/ArchiveServices"
file_path <- "//gisemcv1.ccounty.com/ArchiveServices"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positron wasn't letting me connect to this data using the IP address. Writing out the path does work.

Comment on lines +194 to +217
land_nbhd_rate_2026 <- openxlsx::read.xlsx(tmp_file_nbhd_rate_2026) %>%
set_names(snakecase::to_snake_case(names(.))) %>%
select(
town_nbhd = neighborhood_number,
`2026` = proposed_2026_class_two_rate
) %>%
mutate(
town_nbhd = gsub("\\D", "", town_nbhd),
township_code = substr(town_nbhd, 1, 2),
township_name = ccao::town_convert(township_code)
) %>%
relocate(c(township_code, township_name)) %>%
pivot_longer(
c(`2026`),
names_to = "year", values_to = "land_rate_per_sqft"
) %>%
mutate(
across(c(township_code:year), as.character),
land_rate_per_sqft = parse_number(land_rate_per_sqft),
data_year = "2026"
) %>%
expand_grid(class)


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is for the south tri, we want processing '26 data to look like processing '23 data rather than '24 or '25. There are no bifurcated rates in the south.

Comment on lines +72 to +85
# This filter keeps only the multisale rows with the most non-null values
# within document number and pin
mutate(non_null_count = rowSums(!is.na(across(everything())))) %>%
filter(
non_null_count == max(non_null_count),
.by = c(document_number, line_1_primary_pin)
) %>%
select(-non_null_count) %>%
# After the abover filter, what's left are true duplicates if there are
# multiple rows within documnet number and pin with the same number of
# non-null values. We use distinct() to keep only one of those rows.
distinct(document_number, line_1_primary_pin, .keep_all = TRUE) %>%
relocate(year_of_sale = year, .after = last_col()) %>%
group_by(year_of_sale) %>%
group_by(year_of_sale)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting some duplicates in our mydece sales that are either complete duplicates, or one has slightly more NAs than the other. I've tried to treat both cases appropriately.

Comment on lines +23 to +26
coastline_years <- parse_number(
get_bucket_df(input_bucket, prefix = "spatial/environment/coastline/")$Key
)
walk(coastline_years, function(x) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the current_year business can lead to errors. Much better to only look at raw data that actually exists.

)
flood_fema_warehouse <- file.path(
output_bucket, "flood_fema", "year=2024", "part-0.parquet"
fema_years <- parse_number(
Copy link
Member Author

@wrridgeway wrridgeway Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have multiple years of fema data but we were only looking at one.

"ward_evanston_2019" = c("ward"),
"ward_evanston_2022" = c("ward")
"ward_evanston_2022" = c("ward"),
"ward_evanston_2025" = c("ward")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Evanston wards just dropped.

"arrow": {
"Package": "arrow",
"Version": "21.0.0.1",
"Version": "15.0.1",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You hate to see it, but we need to downgrade this work with the deprecated version of geoarrow we depend on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update ARI ingest script to gather new data Model data 2026 refresh

2 participants