Skip to content

Decide on metadata header (or line format) for ABIF #6

@robla

Description

@robla

A couple of weeks ago (in May 2021), @cpsolver wrote a message to the EM-list that I'm only now getting around to responding to (indirectly). See "Re: [EM] Ballot Data Format" by VoteFair on 2021-06-06 for more.

In the email, he suggests the following:

A case number allows the ballot data to be processed through separate
vote-counting software while the metadata -- such as precinct number,
political-party affiliations, etc. -- can follow a different path and be
re-joined to produce the published results.

In particular, my vote-counting software focuses on the numbers/counts,
and I use different software (written in my Dashrep programming
language) to process the text info.

The use of a case number also has other benefits.

I think it's inevitable that we're going to need to figure out how to allow for custom metadata outside of comments. One thing that I love about the old email standards (and in particular, RFC 822) is how simple the rules were for distinguishing between the header (with the metadata about the email) and the body (which contained the message, which could be pretty much ANYTHING).

The following message is vaguely compatible with RFC 822:

Hyphen-separate-field-1: Random-ish characters, terminated by CRLF
Hyphen-separate-field-3: Even more random-ish characters, terminated by CRLF
Hyphen-separate-field-2: More random-ish characters, terminated by another CRLF
From: Random name with random characters <email-address@example.com>
Subject: Does anyone remember RFC 822?
To: The world <world@example.com>
Date: Today-ish
Hyphen-separate-field-4: Oh, yeah, here's another header, terminated by another CRLF

This is my email ode to RFC 822!  業業業業whee業業業業wheeee!!!!!!!

Did I mention this: whee!  Oh, yeah, and 業!  ña, ña, ña!

I suspect my example above has a few problems of non-compliance with RFC 822, and probably also has problems with the updated specs (RFC 5322 and RFC 6854). Still, the format hasn't changed much; in fact, it still uses US-ASCII rather than UTF-8, and most developers who have done much with email will recognize the example as something vaguely compatible with RFC 822.

Note that there are many arbitrary headers in the top portion of the example, and that the order seems a bit random. My hope for ABIF is that we would do something very similar. I realize now that my proposed headers on some of the test cases for ABIF (as I write this on June 13) don't seem to allow a lot of room for expansion.

There's many ways I can see for solving this problem:

  • a. create a way of having a mandatory body, and an optional header in all ABIF files
    • a1. Create a way of expressing the header as valid JSON (allowing for newlines), and a way of delimiting between JSON and an ABIF-body section
    • a2. Create a way of expressing the header as valid YAML (allowing for newlines and following YAML whitespace rules), and create a way of delimiting between YAML and an ABIF-body section
    • a3. Create a way of attaching a valid RFC 5322 header to the top of the file, with a blank newline as the delimiter between the RFC-5322-formatted header and the ABIF body
    • a4. Create some other header format
  • b. Create rules for having a variety of line types in ABIF which can be recognized and routed according to their first character. The following sub-options are NOT mutually exclusive
    • b1. Have [0-9] as the first line character correspond to a ballot grouping
    • b2. Have "#" as the first line character correspond to a comment
    • b3. Have open square bracket ([) correspond to an ABIF mapping line (like "[Sue Ye (蘇業)]: SY")
    • b4. Have open squirrelly bracket ({) correspond to a valid NDJSON line. Arbitrary metadata can be placed inside of JSON dictionaries, which most parsers MAY ignore.
    • b5. Allow all b1 through b4 to occur in any order in a valid ABIF file
  • c. Some combination of the "a" and "b" above

My current preference is option "c", because I think writing parsers will be easier if all of the metadata is declared at the top of the file, but I also want to keep the option to have metadata and comments down in the body of the document. I also think that it should be safe for authors to add spaces and tabs at the beginning of the line, and have those stripped out by parsers. I'd also like to make it reasonably easy to write a single-pass parser for ABIF files, which becomes much easier if the candidate mappings (described in "b3." above) are handled as part of "header" handling, so that there are no surprise candidate token declarations in the body.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions