Skip to content

generating official database prototype #29

@Louise-Seamster

Description

@Louise-Seamster

some decision points made today:

  1. the "primary key" will be the timestamp: even though this is not a fully unique ID, it best matches the most likely query (ie people are more likely to want to sort by date than by page of the PDF)
  2. Type A duplicate column: if a timestamp/sender combo appears in another textfile with 100% match of timestamp/senders (type A duplicate), this cell will have the textfile name of the "preferred" iteration of the duplicate (see this issue for the process of generating preferred versions).
  3. Type B duplicate column: if a timestamp/sender combo appears in any other textfile (does not need to be total match), this cell will have the textfile name of the "preferred" iteration of the duplicate.
  4. queries will be able to pull only the preferred version by suppressing any row in which the "duplicate column" (A or B)'s textfile name does not match the row textfile name.

sample database format 072920.xlsx

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions