sed performance issues

Realizing that during "Updates..." the `sed` process consumes "100% CPU" for several minutes, I investigated it  a bit (see also https://stackoverflow.com/q/77818891/6607497). Eventually I could reduce the initial runtime of more than six minutes to less than half of a second.

Note that similar `sed` code it used in other places, too.

Some comments from the question cited above:
(...) And it is obvious that at least 9 of the `s///` can be trivially refactored into just one. – 

It also appears to rewrite the same line multiple times. For example, if I change the `/g` to `/gp` on each line, it prints three (identical) lines for the input password = foo bar – 

not relevant to the problem but `[P|p]` probably doesn't match what is intended. – 

Honestly, here, in most of replacement cases, a regexp is not even needed : having multiple regexps `.*word.*` is fundamentally inefficient (by several order of magnitude). Indeed, one can just search a bag of words in each line and replace the whole line. The later is far more efficient. – 

Technically, `.*foo.*` is not the same as `foo.*` but that seems unlikely to be a issue here. As an aside, any `s///` that matches `foo.*` does not need `/g` since there can be no other matches, but removing it won't improve performance here. – 

(`.*foo.*` matches the final occurrence of "foo"; `foo.*` matches the first occurrence - but if the input ever contains two occurrences (eg. `wanted1 foo secret wanted2 foo secret`), the original code would fail to sanitize fully (wanted1 foo secret wanted2 foo hidden, while with the change it would (wanted1 foo hidden))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sed performance issues #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sed performance issues #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions