Skip to content

Estimation fails on new expressions with an df is not defined error #1024

@i-am-sijia

Description

@i-am-sijia

Describe the bug
I am trying to add new expressions to ActivitySim model specs to estimate models, but the estimation is crashing with a NameError: name 'df' is not defined error. The error traces to Sharrow. See the full error tracing below. (This seems to be because the new expressions are not in the Sharrow flow?)

For example, adding a new distance interaction term with household income (util_dist_income_under_50) in the school_location_SPEC.csv below causes the estimation to crash. I've confirmed that income_segment is in the chooser data in the estimation data bundle. There is nothing special about the new expression, it uses the same syntax as the existing ones.

Label Description Expression highschool
# Existing expressions
util_dist_part_time Distance,part time @(df['pemploy']==2) * _DIST coef_highschool_dist_part_time
...
# New expressions
util_dist_income_under_50 Distance, income under 50k @(df['income_segment']==1) * _DIST coef_highschool_dist_income_under_50
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
NameError: name 'df' is not defined
During: Pass nopython_type_inference
During: resolving callee type: type(CPUDispatcher(<function __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P at 0x000001D72A5C7EB0>))
During: typing of call at C:\Users\swang\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py (772)

File "..\..\..\..\..\..\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py", line 772:
def __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P_dim3_filler(
    <source elided>
        for j1 in range(result.shape[1]):
            result[j0, j1, col_num] = __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P(j0,  result[j0, j1, :], __main__school_segment)
            ^

During: Pass nopython_type_inference

The above exception was the direct cause of the following exception:

NameError                                 Traceback (most recent call last)
Cell In[47], [line 11](vscode-notebook-cell:?execution_count=47&line=11)
      3 from activitysim.estimation.larch import component_model
      4 model, data = component_model(
      5     modelname, 
      6     edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_alpha_w_preschool_segment",
      7     # edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_blind",
      8     return_data=True
      9 )
---> 11 model.estimate(method="BHHH", options={"maxiter": 1000})
     12 model.estimation_statistics()
     15 from activitysim.estimation.larch import update_coefficients, update_size_spec

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1155, in Model.estimate(self, *args, **kwargs)
   1141 def estimate(self, *args, **kwargs):
   1142     """
   1143     Maximize loglike, and then calculate parameter covariance.
   1144 
   (...)
   1153     dictx
   1154     """
-> 1155     result = self.maximize_loglike(*args, **kwargs)
   1156     self.calculate_parameter_covariance()
   1157     return result

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1139, in Model.maximize_loglike(self, *args, **kwargs)
   1137     return self.jax_maximize_loglike(*args, **kwargs)
   1138 else:
-> 1139     return super().maximize_loglike(*args, **kwargs)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:2484, in NumbaModel.maximize_loglike(self, *args, **kwargs)
   2455 """
   2456 Maximize the log likelihood.
   2457 
   (...)
   2480 
   2481 """
   2482 from .optimization import maximize_loglike
-> 2484 return maximize_loglike(self, *args, **kwargs)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\optimization.py:125, in maximize_loglike(model, method, method2, quiet, screen_update_throttle, final_screen_update, check_for_overspecification, return_tags, reuse_tags, iteration_number, iteration_number_tail, options, maxiter, bhhh_start, jumpstart, jumpstart_split, return_dashboard, dashboard, prior_result, stderr, **kwargs)
    120 if isinstance(model, NumbaModel):
    121     if (
    122         getattr(model, "data_as_loaded", None) is None
    123         and getattr(model, "datatree", None) is not None
    124     ):
--> 125         model.unmangle(force=True)
    126     if (
    127         getattr(model, "data_as_loaded", None) is None
    128         and not model.use_streaming
    129     ):
    130         raise MissingDataError("no data attached to model")

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:165, in Model.unmangle(self, force, structure_only)
    163 try:
    164     setattr(self, marker, True)
--> 165     super().unmangle(force=force, structure_only=structure_only)
    166     for mix in self.mixtures:
    167         mix.prep(self._parameter_bucket)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1201, in NumbaModel.unmangle(self, force, structure_only)
   1199 if not structure_only:
   1200     if self._dataset is None or force:
-> 1201         self.reflow_data_arrays()
   1202     if self._fixed_arrays is None or force:
   1203         self._rebuild_fixed_arrays()

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:177, in Model.reflow_data_arrays(self)
    175 """Reload the internal data_arrays so they are consistent with the datatree."""
    176 if self.compute_engine != "jax":
--> 177     return super().reflow_data_arrays()
    179 if self.graph is None:
    180     self._data_arrays = None

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1067, in NumbaModel.reflow_data_arrays(self)
   1064 from .data_arrays import prepare_data
   1066 logger.debug(f"Model.datatree.cache_dir = {datatree.cache_dir}")
-> 1067 self.dataset, self.dataflows = prepare_data(
   1068     datasource=datatree,
   1069     request=self,
   1070     float_dtype=self.float_dtype,
   1071     cache_dir=datatree.cache_dir,
   1072     flows=self.dataflows,
   1073     make_unused_flows=self.use_streaming,
   1074 )
   1075 if self.use_streaming:
   1076     # when streaming the dataset created above is a vestigial
   1077     # one-case dataset, really we just want the flows, so we
   1078     # get rid of the dataset now
   1079     self._dataset = None

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:172, in prepare_data(datasource, request, float_dtype, cache_dir, flows, make_unused_flows)
    170 casealt_dim = datatree.root_dataset.attrs.get(_CASEALT)
    171 if casealt_dim is None:
--> 172     model_dataset, flows["ca"] = _prep_ca(
    173         model_dataset,
    174         datatree,
    175         request["ca"],
    176         tag="ca",
    177         dtype=float_dtype,
    178         cache_dir=cache_dir,
    179         flow=flows.get("ca"),
    180     )
    181 else:
    182     model_dataset, flows["ce"] = _prep_ce(
    183         model_dataset,
    184         datatree,
   (...)
    188         flow=flows.get("ce"),
    189     )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:488, in _prep_ca(model_dataset, shared_data_ca, vars_ca, tag, preserve_vars, dtype, cache_dir, flow, force_flow, use_array_maker)
    485 except NameError:
    486     # the original resolution of the flow failed, try again with a fresh flow
    487     flow = shared_data_ca.setup_flow(vars_ca, cache_dir=cache_dir, hashing_level=2)
--> 488     arr = flow.load(
    489         shared_data_ca,
    490         dtype=dtype,
    491         use_array_maker=use_array_maker,
    492     )
    494 caseid_dim = shared_data_ca.CASEID
    495 altid_dim = shared_data_ca.ALTID

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2667, in Flow.load(self, source, dtype, compile_watch, mask, use_array_maker)
   2665 if use_array_maker:
   2666     runner = self._module.array_maker
-> 2667 return self._load(
   2668     source=source,
   2669     dtype=dtype,
   2670     compile_watch=compile_watch,
   2671     mask=mask,
   2672     runner=runner,
   2673 )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2508, in Flow._load(self, source, as_dataframe, as_dataarray, as_table, runner, dtype, dot, logit_draws, pick_counted, compile_watch, logsums, nesting, mask)
   2506 if source.relationships_are_digitized:
   2507     if logit_draws is None:
-> 2508         result = self._iload_raw(
   2509             source,
   2510             runner=runner,
   2511             dtype=dtype,
   2512             dot=dot,
   2513             mask=mask,
   2514             compile_watch=compile_watch,
   2515         )
   2516     else:
   2517         result, result_p, pick_count, out_logsum = self._iload_raw(
   2518             source,
   2519             runner=runner,
   (...)
   2527             compile_watch=compile_watch,
   2528         )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2212, in Flow._iload_raw(self, rg, runner, dtype, dot, mnl, pick_counted, logsums, nesting, mask, compile_watch)
   2210 problem = re.search("NameError: (.*)\x1b", err.args[0])
   2211 if problem:
-> 2212     raise NameError(problem.group(1)) from err
   2213 problem = re.search("NameError: (.*)\n", err.args[0])
   2214 if problem:

NameError: name 'df' is not defined

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
Estimation should not crash if the variable is in the EDB.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

BugSomething isn't working/bug f

Type

No type

Projects

Status

ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions