-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello,
After executing a new test suite using Trevas-VTL 1.10.0, we encountered several behaviors related to the keep function that seem to contradict the VTL 2.0 specification. This issue describes three cases with examples and expected vs actual behavior.
✅ Specification Reference
According to the VTL 2.0 (also in the 2.1) standard implemented in Trevas:
The operator takes as input a Data Set (op) and some Component names of such a Data Set (comp). These Components can be Measures or Attributes of op but not Identifiers. The operator maintains the specified Components, drops all the other dependent Components of the Data Set (Measures and Attributes), and maintains the independent Components (Identifiers) unchanged.
Case 1 – keep drops identifiers from the original dataset
Input:
ds1:
col_A (identifier) | col_B (identifier) | col_C1 | col_D1
ds2:
col_A (identifier) | col_B (identifier) | col_C2 | col_D2
VTL:
ds1_aux := ds1[keep col_C1, col_D1]
[rename col_C1 to C1, col_D1 to D1];
ds2_aux := ds2[keep col_C2, col_D2]
[rename col_C2 to C2, col_D2 to D2];
ds_result := left_join(ds1_aux, ds2_aux using col_A, col_B);
Expected:
Identifiers col_A and col_B should be preserved after keep, per VTL 2.0 specification.
Actual:
The following runtime error is raised:
fr.insee.vtl.engine.exceptions.InvalidArgumentException: using component col_A is not present in all datasets
This suggests that identifiers are not retained as specified.
Case 2 – keep allows explicitly selecting identifiers (should be invalid)
Input:
ds1:
col_A (identifier) | col_B (identifier) | col_C1 | col_D1
ds2:
col_A (identifier) | col_B (identifier) | col_C2 | col_D2
VTL:
ds1_aux := ds1[keep col_A, col_B, col_C1, col_D1]
[rename col_C1 to C1, col_D1 to D1];
ds2_aux := ds2[keep col_A, col_B, col_C2, col_D2]
[rename col_C2 to C2, col_D2 to D2];
ds_result := left_join(ds1_aux, ds2_aux using col_A, col_B);
Observation:
Although the VTL 2.0 specification states that identifiers should not be explicitly selected in keep, the engine accepts this syntax and runs successfully. This might lead to ambiguity or unintended side effects.
Case 3 – keep + rename silently fail on invalid (case-sensitive) column names
Input:
ds1
col_A (identifier) | col_B (identifier) | col_C1 (identifier) | col_D1
ds2
col_A (identifier) | col_B (identifier) | col_C2 (identifier)| col_D2
VTL:
ds1_aux := ds1[keep col_A, col_B, COL_C1, col_D1]
[rename COL_C1 to C, col_D1 to D1];
ds2_aux := ds2[keep col_A, col_B, COL_C2, col_D2]
[rename COL_C2 to C, col_D2 to D2];
ds_result := left_join(ds1_aux, ds2_aux using col_A, col_B, C);
Expected:
An error should be thrown in keep or rename because COL_C1 and COL_C2 are not valid column names (Trevas is case-sensitive).
Actual:
No error occurs until the left_join, which fails with:
InvalidArgumentException: using component C is not present in all datasets
This raises two concerns:
- keep and rename do not validate the existence of columns properly (case-sensitive mismatch).
- If the left_join is removed, the script runs without any runtime error, despite the invalid components.
Questions
- Is Trevas-VTL intended to be case-sensitive regarding component names?
- Should the keep and rename functions validate the existence of components (with case-sensitivity)?
- Could the keep behavior be aligned more strictly with the VTL 2.0 specification regarding identifier preservation and selection?
Thank you in advance :)
Best regards,
Miguel
Metadata
Metadata
Assignees
Labels
Type
Projects
Status