-
Notifications
You must be signed in to change notification settings - Fork 687
Description
Problem description
After several hours of uptime, all new unary gRPC calls (Subscribe) from our Node client stop transmitting over TCP.
At the application layer, client.Subscribe() appears to execute normally and logs “write called,” but when inspecting with tcpdump, no new TCP traffic is sent — only the periodic gRPC HTTP/2 ping/pong frames.
Restarting the Node process or explicitly closing and recreating the client fixes it immediately (new TCP SYN, new subchannel, traffic resumes).
This strongly suggests the internal HTTP/2 subchannel or transport remains “READY” but is stuck / non-functional — effectively a ghost connection.
Environment
Library: @grpc/grpc-js
Version: 1.14.0
Node.js: v20.x
OS: Rocky Linux 9
Connection: direct TCP (no proxy / load balancer)
Server: C++ gRPC v1.62.0
RPCs used:
Unary: Subscribe, Unsubscribe
Server streaming: MvrStream, BckPypStream (always open)
Reproduction steps
Reproduction pattern
Start client and server.
Client opens two persistent streaming RPCs (MvrStream, BckPypStream) and periodically issues unary RPCs Subscribe(account_id, study_type) every few minutes.
Everything works fine for a few hours.
After several hours of uptime (typically 3–5h), all new Subscribe calls silently hang — no callback, no error.
Verbose gRPC logs still show write() called, halfClose called.
tcpdump shows only small 17-byte packets every 5s (keepalive ping/pong). No new HEADERS/DATA frames leave the client.
Restarting the client (new channel) immediately restores functionality.
Expected behavior
When client.Subscribe() is called, gRPC should open a new HTTP/2 stream and send the unary request.
Actual behavior
The call remains pending indefinitely.
No outbound network traffic occurs (only keepalive).
Channel state remains READY — no reconnects triggered.
Manual client.close() + recreate fixes it instantly.
tcpdump evidence
(Port 50051; 10.18.35.30 = server)
Only ping/pong frames observed:
05:11:25.084 IP 10.18.35.20.52840 > 10.18.35.30.50051: Flags [P.], seq 620:637, ack 264, win 125, length 17
05:11:25.085 IP 10.18.35.30.50051 > 10.18.35.20.52840: Flags [P.], seq 264:281, ack 637, win 501, length 17
No new TCP frames (DATA/HEADERS) appear when Subscribe() is invoked.
gRPC debug logs around the stall
D | resolving_call | [76330] write() called with message of length 11
D | resolving_call | [76330] halfClose called
D | load_balancing_call | [76333] Pick result: COMPLETE subchannel: (2) 10.18.35.30:50051 status: undefined undefined
D | subchannel_call | [9] sending data chunk of length 11
after hours...
D | resolving_call | [0] write() called with message of length 16
but no subchannel_call send or receive
After this point, pings continue, but no new outbound streams appear.
Analysis
It looks like the subchannel’s Http2Session remains open and ping/pong-responsive, but stops issuing new stream IDs.
grpc-js continues to route new calls to this “READY” subchannel, which never transmits.
This is effectively a zombie subchannel:
TCP connection alive (ping ACKs).
gRPC channel stuck in READY.
New calls never written to the wire.