Skip to content

Deprecate NFData Doc. It is a footgun. #71

@andreasabel

Description

@andreasabel

The instance NFData Doc (added in #13) must be considered harmful, as the size of Doc normal form baloons even for small documents.

I shot myself badly in the foot when I refactored a DeepSeq.force on a rendered Doc (so, a String) to run on the Doc itself. (The purpose of the deep normalization is to bring exceptions to the surface.) The memory consumption exploded to the point of even making my terminal crash (on a 48GB machine!). The story can he read here and I link to the fatal loc:

What is going on? The problem is the Union constructor in Doc that holds two documents that render to the same text (with maybe different layout). Relying on laziness, usually the second alternative is only used when the first does not work out well. However, NFData's rnf will normalize both alternatives.
Also, the way that the pretty DSL is designed, common Doc constructors are implemented shallowly ("smart constructors"), so they distribute over Union. For instance, take beside which is ubiquitous because it is invoked whenever you concatenate documents horizontally (such as in <>, <+>, hcat, hsep...).

beside (p1 `Union` p2) g q = beside p1 g q `union_` beside p2 g q
You can see that the arguments g and q are duplicated when beside distributes itself over a Union. And unions are also ubiquitous as they are introduced by seps:
-- Specification: sep ps = oneLiner (hsep ps)
-- `union`
-- vcat ps
sepNB g Empty k ys
= oneLiner (nilBeside g (reduceDoc rest)) `mkUnion`
-- XXX: TODO: PRETTY: Used to use True here (but GHC used False...)
nilAboveNest False k (reduceDoc (vcat ys))
So the normal forms of Doc are quadratic and laziness is essential to make the pretty DSL work.
Adding a normalizer like NFData is contrary to the whole design of this library.

The addition of NFData was innocuous:

Contributor: #12 (comment)

Would it be feasible to add an NFData instance for the Doc type, etc.? This would probably bee needed for #2.

Maintainer: #12 (comment)

Yes, this would be fine.

Don't get me wrong; both the Contributor and the Maintainer could have been me. One usually is just smarter in hindsight, when the calamity has struck.
This is the usual plight of software engineering: the original authors of the software leave, taking their deep understanding and unspoken dos and don'ts with them, and the second generation has take over without having the expertise.

What is the way forward here?
I think the NFData Doc instance should be deprecated, i.e., equipped with a warning pointing out its hazards (e.g. pointing to this issue).

The original motivation for the introduction of NFData Doc also seems questionable in the light of this issue:

This would probably bee needed for #2.

Since NFData degrades performance, it is probably not a tool to measure performance.

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions