-
Notifications
You must be signed in to change notification settings - Fork 174
Open
Description
The schema of the 2020 Blob Dataset presents AnonFunctionInvocationId and AnonAppName as unique IDs.
However, there are sometimes discrepancies where the invocationId spans multiple application names. For example,
full_df[full_df['AnonFunctionInvocationId'] == 1967128581]
| Timestamp | AnonRegion | AnonUserId | AnonAppName | AnonFunctionInvocationId | AnonBlobName | BlobType | AnonBlobETag | BlobBytes | Read | Write | Datetime |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1606814873193 | q2d | 1209884869 | 01qqaww4 | 1967128581 | 1wx5dgohq1kiwjum | BlockBlob/text/plain; charset=utf-8 | f1x5p2nqh6 | 28.0 | True | False | 2020-12-01 09:27:53.193 |
| 1607004493391 | q2d | 1209884869 | j2alqt8s | 1967128581 | 1wx5dgohq1kiwjum | BlockBlob/text/plain; charset=utf-8 | w5mohi6523 | 28.0 | True | False | 2020-12-03 14:08:13.391 |
This seems to be a recurrent pattern with this user, for example consider other functionInvocationIds 830734703, 440926898, or 900464655.
This leads me to believe that the cause is unlikely to be unfortunate prefixes of hashed IDs. Is there any way to explain this discrepancy, apart from the data being potentially unclean?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels