-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Describe the bug
I am trying to add new expressions to ActivitySim model specs to estimate models, but the estimation is crashing with a NameError: name 'df' is not defined error. The error traces to Sharrow. See the full error tracing below. (This seems to be because the new expressions are not in the Sharrow flow?)
For example, adding a new distance interaction term with household income (util_dist_income_under_50) in the school_location_SPEC.csv below causes the estimation to crash. I've confirmed that income_segment is in the chooser data in the estimation data bundle. There is nothing special about the new expression, it uses the same syntax as the existing ones.
| Label | Description | Expression | highschool |
|---|---|---|---|
| # Existing expressions | |||
| util_dist_part_time | Distance,part time | @(df['pemploy']==2) * _DIST | coef_highschool_dist_part_time |
| ... | |||
| # New expressions | |||
| util_dist_income_under_50 | Distance, income under 50k | @(df['income_segment']==1) * _DIST | coef_highschool_dist_income_under_50 |
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
NameError: name 'df' is not defined
During: Pass nopython_type_inference
During: resolving callee type: type(CPUDispatcher(<function __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P at 0x000001D72A5C7EB0>))
During: typing of call at C:\Users\swang\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py (772)
File "..\..\..\..\..\..\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py", line 772:
def __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P_dim3_filler(
<source elided>
for j1 in range(result.shape[1]):
result[j0, j1, col_num] = __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P(j0, result[j0, j1, :], __main__school_segment)
^
During: Pass nopython_type_inference
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
Cell In[47], [line 11](vscode-notebook-cell:?execution_count=47&line=11)
3 from activitysim.estimation.larch import component_model
4 model, data = component_model(
5 modelname,
6 edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_alpha_w_preschool_segment",
7 # edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_blind",
8 return_data=True
9 )
---> 11 model.estimate(method="BHHH", options={"maxiter": 1000})
12 model.estimation_statistics()
15 from activitysim.estimation.larch import update_coefficients, update_size_spec
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1155, in Model.estimate(self, *args, **kwargs)
1141 def estimate(self, *args, **kwargs):
1142 """
1143 Maximize loglike, and then calculate parameter covariance.
1144
(...)
1153 dictx
1154 """
-> 1155 result = self.maximize_loglike(*args, **kwargs)
1156 self.calculate_parameter_covariance()
1157 return result
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1139, in Model.maximize_loglike(self, *args, **kwargs)
1137 return self.jax_maximize_loglike(*args, **kwargs)
1138 else:
-> 1139 return super().maximize_loglike(*args, **kwargs)
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:2484, in NumbaModel.maximize_loglike(self, *args, **kwargs)
2455 """
2456 Maximize the log likelihood.
2457
(...)
2480
2481 """
2482 from .optimization import maximize_loglike
-> 2484 return maximize_loglike(self, *args, **kwargs)
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\optimization.py:125, in maximize_loglike(model, method, method2, quiet, screen_update_throttle, final_screen_update, check_for_overspecification, return_tags, reuse_tags, iteration_number, iteration_number_tail, options, maxiter, bhhh_start, jumpstart, jumpstart_split, return_dashboard, dashboard, prior_result, stderr, **kwargs)
120 if isinstance(model, NumbaModel):
121 if (
122 getattr(model, "data_as_loaded", None) is None
123 and getattr(model, "datatree", None) is not None
124 ):
--> 125 model.unmangle(force=True)
126 if (
127 getattr(model, "data_as_loaded", None) is None
128 and not model.use_streaming
129 ):
130 raise MissingDataError("no data attached to model")
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:165, in Model.unmangle(self, force, structure_only)
163 try:
164 setattr(self, marker, True)
--> 165 super().unmangle(force=force, structure_only=structure_only)
166 for mix in self.mixtures:
167 mix.prep(self._parameter_bucket)
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1201, in NumbaModel.unmangle(self, force, structure_only)
1199 if not structure_only:
1200 if self._dataset is None or force:
-> 1201 self.reflow_data_arrays()
1202 if self._fixed_arrays is None or force:
1203 self._rebuild_fixed_arrays()
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:177, in Model.reflow_data_arrays(self)
175 """Reload the internal data_arrays so they are consistent with the datatree."""
176 if self.compute_engine != "jax":
--> 177 return super().reflow_data_arrays()
179 if self.graph is None:
180 self._data_arrays = None
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1067, in NumbaModel.reflow_data_arrays(self)
1064 from .data_arrays import prepare_data
1066 logger.debug(f"Model.datatree.cache_dir = {datatree.cache_dir}")
-> 1067 self.dataset, self.dataflows = prepare_data(
1068 datasource=datatree,
1069 request=self,
1070 float_dtype=self.float_dtype,
1071 cache_dir=datatree.cache_dir,
1072 flows=self.dataflows,
1073 make_unused_flows=self.use_streaming,
1074 )
1075 if self.use_streaming:
1076 # when streaming the dataset created above is a vestigial
1077 # one-case dataset, really we just want the flows, so we
1078 # get rid of the dataset now
1079 self._dataset = None
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:172, in prepare_data(datasource, request, float_dtype, cache_dir, flows, make_unused_flows)
170 casealt_dim = datatree.root_dataset.attrs.get(_CASEALT)
171 if casealt_dim is None:
--> 172 model_dataset, flows["ca"] = _prep_ca(
173 model_dataset,
174 datatree,
175 request["ca"],
176 tag="ca",
177 dtype=float_dtype,
178 cache_dir=cache_dir,
179 flow=flows.get("ca"),
180 )
181 else:
182 model_dataset, flows["ce"] = _prep_ce(
183 model_dataset,
184 datatree,
(...)
188 flow=flows.get("ce"),
189 )
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:488, in _prep_ca(model_dataset, shared_data_ca, vars_ca, tag, preserve_vars, dtype, cache_dir, flow, force_flow, use_array_maker)
485 except NameError:
486 # the original resolution of the flow failed, try again with a fresh flow
487 flow = shared_data_ca.setup_flow(vars_ca, cache_dir=cache_dir, hashing_level=2)
--> 488 arr = flow.load(
489 shared_data_ca,
490 dtype=dtype,
491 use_array_maker=use_array_maker,
492 )
494 caseid_dim = shared_data_ca.CASEID
495 altid_dim = shared_data_ca.ALTID
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2667, in Flow.load(self, source, dtype, compile_watch, mask, use_array_maker)
2665 if use_array_maker:
2666 runner = self._module.array_maker
-> 2667 return self._load(
2668 source=source,
2669 dtype=dtype,
2670 compile_watch=compile_watch,
2671 mask=mask,
2672 runner=runner,
2673 )
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2508, in Flow._load(self, source, as_dataframe, as_dataarray, as_table, runner, dtype, dot, logit_draws, pick_counted, compile_watch, logsums, nesting, mask)
2506 if source.relationships_are_digitized:
2507 if logit_draws is None:
-> 2508 result = self._iload_raw(
2509 source,
2510 runner=runner,
2511 dtype=dtype,
2512 dot=dot,
2513 mask=mask,
2514 compile_watch=compile_watch,
2515 )
2516 else:
2517 result, result_p, pick_count, out_logsum = self._iload_raw(
2518 source,
2519 runner=runner,
(...)
2527 compile_watch=compile_watch,
2528 )
File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2212, in Flow._iload_raw(self, rg, runner, dtype, dot, mnl, pick_counted, logsums, nesting, mask, compile_watch)
2210 problem = re.search("NameError: (.*)\x1b", err.args[0])
2211 if problem:
-> 2212 raise NameError(problem.group(1)) from err
2213 problem = re.search("NameError: (.*)\n", err.args[0])
2214 if problem:
NameError: name 'df' is not definedTo Reproduce
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
Estimation should not crash if the variable is in the EDB.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status