Skip to content

File descriptor leak in plane proxy? #870

@brianwebb11

Description

@brianwebb11

Observation

We are seeing file descriptors associated with the plane proxy process grow steadily over time. Then it hits a tipping point, presumably when the number of file descriptors used by the plane proxy exceeds the soft ulimit set on the host. At this point the host goes into an accelerated decline where CPU spikes and disk writes spike (presumably due to the plan proxy logging errors messages).

We are seeing two specific error messages that occur when the host gets into a bad state

  1. Failed to accept connection. and Os { code: 24, kind: Uncategorized, message: "Too many open files" }
  2. Upstream request failed and RequestFailed(hyper_util::client::legacy::Error(Connect, Boxed(ConnectError("tcp open error", Os { code: 24, kind: Uncategorized, message: "Too many open files" }))))

Theory

After some investigation, the working theory is that there are cases where the connection pool (and associated underlying file descriptors) are not being cleaned up in some cases. Maybe when the drone doesn't close the connection cleanly? When the FD pool is exhausted, the proxy can no longer accept new incoming requests or make any outbound requests to the upstream drone service.

Potential Changes

When creating the HttpConnector, we can configure the pool to mange the connections better and cleanup stale connections.

let mut connector = HttpConnector::new();
connector.set_pool_idle_timeout(Duration::from_secs(90));
connector.set_pool_max_idle_per_host(10);
connector.set_keepalive(Some(Duration::from_secs(60)));

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions