Skip to content

Conversation

@brownag
Copy link
Member

@brownag brownag commented Dec 21, 2025

This pull request significantly refactors the SQL query generation logic in the .property_weighted_average_CTE function within R/get_SDA_property.R. Reduces repetition and total number of steps which were carried over from the old temporary table approach.

Closes #431 which noted a regression in ability to query larger numbers of specific mapunits via mukeys argument. On deeper investigation, this seemed to be primarily related to repetition of complex WHERE clause constraints where IN statements contained many thousants of values, but there were several other opportunities for improvement.

With these changes we can comparatively efficiently query larger sets. Of course extremely large numbers of MUKEY will still be an issue and trigger "invalid query" response from server (as shown below), it should be quite a bit more rare.

Invalid query: The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

Example (10000 mapunits, 3 properties):

library(soilDB)

x <- SDA_query("SELECT TOP 10000 mukey FROM mapunit")
#> single result set, returning a data.frame

vars <- c("om_r", "claytotal_r", "ph1to1h2o_r")

p <- get_SDA_property(
    property = vars,
    method = "Weighted Average", 
    mukeys = x$mukey,
    top_depth = 5,
    bottom_depth = 15,
    include_minors = TRUE, 
    miscellaneous_areas = FALSE
)
#> single result set, returning a data.frame
length(unique(p$mukey))
#> [1] 10000

@brownag brownag self-assigned this Dec 21, 2025
@dylanbeaudette
Copy link
Member

Nice. The new SQL is much simpler. This looks a lot closer to the type of queries that are used to create the ISSR-800 intermediate tables.

@brownag brownag changed the title refactor: get_SDA_propery(method="weighted average") refactor: get_SDA_property(method="weighted average") Dec 21, 2025
@brownag
Copy link
Member Author

brownag commented Dec 21, 2025

The failure on macOS is related to stale CRAN metadata (see https://stat.ethz.ch/pipermail/r-sig-mac/2025-December/015293.html) and can be ignored for the time being--unrelated to this PR.

@dylanbeaudette
Copy link
Member

I just compared the main branch with this PR, running get_SDA_property(..., method = 'weighted average') for all of CONUS SSURGO, and some STATSGO. The number of rows returned and aggregated values for all mukey are identical. The following horizon columns were included c("sandtotal_r", "silttotal_r", "claytotal_r", "ph1to1h2o_r", "wthirdbar_r", "wfifteenbar_r").

@brownag
Copy link
Member Author

brownag commented Dec 22, 2025

Thanks 👍, I need to confirm the same against an SQLite backend then I believe it will be ready to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_SDA_property: server failure when number of mukey > ~1500

3 participants