Skip to content

handleConnectionChange() in dockerd kills containerd 'silently' / without notification to user #314

@rh-ulrich-o

Description

@rh-ulrich-o

The handleConnectionChange() function in dockerd monitors the health of containerd by sending grpc 'HealthCheckRequest' messages periodically. If containerd is unresponsive to such messages during a certain amount of time, dockerd initiates a restart of containerd. The amount of time is determined by two hard-coded constants: containerdHealthCheckTimeout and maxConnectionRetryCount. A simple experiment (sending a STOP signal to containerd) demonstrates that dockerd kills containerd after only a few seconds of unresponsiveness.

The issue is that handleConnectionChange() kills containerd 'silently', i.e. this leaves the user with no clue at all as to what happened. The monitorConnection() function in upstream moby code includes some useful improvements in this regard.

  • It logs an informative message "killing and restarting containerd".
  • It tries to obtain a goroutine stack dump of containerd via SIGUSR1.

Related snippet of code from monitorConnection():

if system.IsProcessAlive(r.daemonPid) {
        r.logger.WithField("pid", r.daemonPid).Info("killing and restarting containerd")
        // Try to get a stack trace
        syscall.Kill(r.daemonPid, syscall.SIGUSR1)
        <-time.After(100 * time.Millisecond)
        system.KillProcess(r.daemonPid)
}

Please consider a back-port of these improvements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions