Skip to content

sed performance issues #187

@sjvudp

Description

@sjvudp

Realizing that during "Updates..." the sed process consumes "100% CPU" for several minutes, I investigated it a bit (see also https://stackoverflow.com/q/77818891/6607497). Eventually I could reduce the initial runtime of more than six minutes to less than half of a second.

Note that similar sed code it used in other places, too.

Some comments from the question cited above:
(...) And it is obvious that at least 9 of the s/// can be trivially refactored into just one. –

It also appears to rewrite the same line multiple times. For example, if I change the /g to /gp on each line, it prints three (identical) lines for the input password = foo bar –

not relevant to the problem but [P|p] probably doesn't match what is intended. –

Honestly, here, in most of replacement cases, a regexp is not even needed : having multiple regexps .*word.* is fundamentally inefficient (by several order of magnitude). Indeed, one can just search a bag of words in each line and replace the whole line. The later is far more efficient. –

Technically, .*foo.* is not the same as foo.* but that seems unlikely to be a issue here. As an aside, any s/// that matches foo.* does not need /g since there can be no other matches, but removing it won't improve performance here. –

(.*foo.* matches the final occurrence of "foo"; foo.* matches the first occurrence - but if the input ever contains two occurrences (eg. wanted1 foo secret wanted2 foo secret), the original code would fail to sanitize fully (wanted1 foo secret wanted2 foo hidden, while with the change it would (wanted1 foo hidden))

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions