Estimation fails on new expressions with an `df is not defined` error

**Describe the bug**
I am trying to add new expressions to ActivitySim model specs to estimate models, but the estimation is crashing with a `NameError: name 'df' is not defined` error. The error traces to Sharrow. See the full error tracing below. (_This seems to be because the new expressions are not in the Sharrow flow?_)

For example, adding a new distance interaction term with household income (`util_dist_income_under_50`) in the `school_location_SPEC.csv` below causes the estimation to crash. I've confirmed that `income_segment` is in the chooser data in the estimation data bundle. There is nothing special about the new expression, it uses the same syntax as the existing ones.

| Label | Description | Expression | highschool |
|--|--|--|--|
|# Existing expressions|
|util_dist_part_time|Distance,part time|@(df['pemploy']==2) * _DIST|coef_highschool_dist_part_time|
...
|# New expressions|
|util_dist_income_under_50|Distance, income under 50k| @(df['income_segment']==1) * _DIST|coef_highschool_dist_income_under_50|

```python
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
NameError: name 'df' is not defined
During: Pass nopython_type_inference
During: resolving callee type: type(CPUDispatcher(<function __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P at 0x000001D72A5C7EB0>))
During: typing of call at C:\Users\swang\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py (772)

File "..\..\..\..\..\..\AppData\Local\Temp\2\tmpqj8f8t_o\flow_EXAVYU5ZZQI3B33FK2KRVE25DR4RIWIN\__init__.py", line 772:
def __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P_dim3_filler(
    <source elided>
        for j1 in range(result.shape[1]):
            result[j0, j1, col_num] = __df__income_segment____1_____DIST___school_segment__1__Z6OPND5R6KSVEQXYNGZZGC4P(j0,  result[j0, j1, :], __main__school_segment)
            ^

During: Pass nopython_type_inference

The above exception was the direct cause of the following exception:

NameError                                 Traceback (most recent call last)
Cell In[47], [line 11](vscode-notebook-cell:?execution_count=47&line=11)
      3 from activitysim.estimation.larch import component_model
      4 model, data = component_model(
      5     modelname, 
      6     edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_alpha_w_preschool_segment",
      7     # edb_directory=r"P:\Dev\vMVP\Estimation\school_location\v02_alpha\edb_blind",
      8     return_data=True
      9 )
---> 11 model.estimate(method="BHHH", options={"maxiter": 1000})
     12 model.estimation_statistics()
     15 from activitysim.estimation.larch import update_coefficients, update_size_spec

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1155, in Model.estimate(self, *args, **kwargs)
   1141 def estimate(self, *args, **kwargs):
   1142     """
   1143     Maximize loglike, and then calculate parameter covariance.
   1144 
   (...)
   1153     dictx
   1154     """
-> 1155     result = self.maximize_loglike(*args, **kwargs)
   1156     self.calculate_parameter_covariance()
   1157     return result

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:1139, in Model.maximize_loglike(self, *args, **kwargs)
   1137     return self.jax_maximize_loglike(*args, **kwargs)
   1138 else:
-> 1139     return super().maximize_loglike(*args, **kwargs)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:2484, in NumbaModel.maximize_loglike(self, *args, **kwargs)
   2455 """
   2456 Maximize the log likelihood.
   2457 
   (...)
   2480 
   2481 """
   2482 from .optimization import maximize_loglike
-> 2484 return maximize_loglike(self, *args, **kwargs)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\optimization.py:125, in maximize_loglike(model, method, method2, quiet, screen_update_throttle, final_screen_update, check_for_overspecification, return_tags, reuse_tags, iteration_number, iteration_number_tail, options, maxiter, bhhh_start, jumpstart, jumpstart_split, return_dashboard, dashboard, prior_result, stderr, **kwargs)
    120 if isinstance(model, NumbaModel):
    121     if (
    122         getattr(model, "data_as_loaded", None) is None
    123         and getattr(model, "datatree", None) is not None
    124     ):
--> 125         model.unmangle(force=True)
    126     if (
    127         getattr(model, "data_as_loaded", None) is None
    128         and not model.use_streaming
    129     ):
    130         raise MissingDataError("no data attached to model")

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:165, in Model.unmangle(self, force, structure_only)
    163 try:
    164     setattr(self, marker, True)
--> 165     super().unmangle(force=force, structure_only=structure_only)
    166     for mix in self.mixtures:
    167         mix.prep(self._parameter_bucket)

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1201, in NumbaModel.unmangle(self, force, structure_only)
   1199 if not structure_only:
   1200     if self._dataset is None or force:
-> 1201         self.reflow_data_arrays()
   1202     if self._fixed_arrays is None or force:
   1203         self._rebuild_fixed_arrays()

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\jaxmodel.py:177, in Model.reflow_data_arrays(self)
    175 """Reload the internal data_arrays so they are consistent with the datatree."""
    176 if self.compute_engine != "jax":
--> 177     return super().reflow_data_arrays()
    179 if self.graph is None:
    180     self._data_arrays = None

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\numbamodel.py:1067, in NumbaModel.reflow_data_arrays(self)
   1064 from .data_arrays import prepare_data
   1066 logger.debug(f"Model.datatree.cache_dir = {datatree.cache_dir}")
-> 1067 self.dataset, self.dataflows = prepare_data(
   1068     datasource=datatree,
   1069     request=self,
   1070     float_dtype=self.float_dtype,
   1071     cache_dir=datatree.cache_dir,
   1072     flows=self.dataflows,
   1073     make_unused_flows=self.use_streaming,
   1074 )
   1075 if self.use_streaming:
   1076     # when streaming the dataset created above is a vestigial
   1077     # one-case dataset, really we just want the flows, so we
   1078     # get rid of the dataset now
   1079     self._dataset = None

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:172, in prepare_data(datasource, request, float_dtype, cache_dir, flows, make_unused_flows)
    170 casealt_dim = datatree.root_dataset.attrs.get(_CASEALT)
    171 if casealt_dim is None:
--> 172     model_dataset, flows["ca"] = _prep_ca(
    173         model_dataset,
    174         datatree,
    175         request["ca"],
    176         tag="ca",
    177         dtype=float_dtype,
    178         cache_dir=cache_dir,
    179         flow=flows.get("ca"),
    180     )
    181 else:
    182     model_dataset, flows["ce"] = _prep_ce(
    183         model_dataset,
    184         datatree,
   (...)
    188         flow=flows.get("ce"),
    189     )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\larch\model\data_arrays.py:488, in _prep_ca(model_dataset, shared_data_ca, vars_ca, tag, preserve_vars, dtype, cache_dir, flow, force_flow, use_array_maker)
    485 except NameError:
    486     # the original resolution of the flow failed, try again with a fresh flow
    487     flow = shared_data_ca.setup_flow(vars_ca, cache_dir=cache_dir, hashing_level=2)
--> 488     arr = flow.load(
    489         shared_data_ca,
    490         dtype=dtype,
    491         use_array_maker=use_array_maker,
    492     )
    494 caseid_dim = shared_data_ca.CASEID
    495 altid_dim = shared_data_ca.ALTID

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2667, in Flow.load(self, source, dtype, compile_watch, mask, use_array_maker)
   2665 if use_array_maker:
   2666     runner = self._module.array_maker
-> 2667 return self._load(
   2668     source=source,
   2669     dtype=dtype,
   2670     compile_watch=compile_watch,
   2671     mask=mask,
   2672     runner=runner,
   2673 )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2508, in Flow._load(self, source, as_dataframe, as_dataarray, as_table, runner, dtype, dot, logit_draws, pick_counted, compile_watch, logsums, nesting, mask)
   2506 if source.relationships_are_digitized:
   2507     if logit_draws is None:
-> 2508         result = self._iload_raw(
   2509             source,
   2510             runner=runner,
   2511             dtype=dtype,
   2512             dot=dot,
   2513             mask=mask,
   2514             compile_watch=compile_watch,
   2515         )
   2516     else:
   2517         result, result_p, pick_count, out_logsum = self._iload_raw(
   2518             source,
   2519             runner=runner,
   (...)
   2527             compile_watch=compile_watch,
   2528         )

File c:\Users\swang\Documents\DevOps\SAM\venvs\asim_estimation\.venv\lib\site-packages\sharrow\flows.py:2212, in Flow._iload_raw(self, rg, runner, dtype, dot, mnl, pick_counted, logsums, nesting, mask, compile_watch)
   2210 problem = re.search("NameError: (.*)\x1b", err.args[0])
   2211 if problem:
-> 2212     raise NameError(problem.group(1)) from err
   2213 problem = re.search("NameError: (.*)\n", err.args[0])
   2214 if problem:

NameError: name 'df' is not defined
```

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
Estimation should not crash if the variable is in the EDB.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estimation fails on new expressions with an `df is not defined` error #1024

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Label	Description	Expression	highschool
# Existing expressions
util_dist_part_time	Distance,part time	@(df['pemploy']==2) * _DIST	coef_highschool_dist_part_time
...
# New expressions
util_dist_income_under_50	Distance, income under 50k	@(df['income_segment']==1) * _DIST	coef_highschool_dist_income_under_50

Estimation fails on new expressions with an df is not defined error #1024

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Estimation fails on new expressions with an `df is not defined` error #1024