-
Notifications
You must be signed in to change notification settings - Fork 33
Realtime Media Capabilities Expansion (v1) #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…al camera and microphone handle v4l2loopback and pulseaudio devices, manage start/pause/resume/stop and scale-to-zero
refactor virtualinputs: new ctor with resolution/framerate, reset last config on stop, rebuild ffmpeg arg construction to handle video/audio indexing, paused/shared sources and unified input arg builder
…te virtual inputs manager into apiservice lifecycle
… error vars, add test helper
…ice; add mock virtualinputs manager and assert stop called on shutdown
install v4l2loopback, set pulseaudio default sink/source, switch to npm for openapi-down-convert
switch makefile to pnpm for openapi-down-convert and update generated code metadata
avoid err shadowing, normalize numeric types/enums, cast status and add virtual input defaults
…virtual inputs tests
…test for virtual input env vars
…ack unavailable prepare fifo video/audio files, set mode and file paths on manager/status and log fallback warning
…al inputs reset manager mode/files on stop, handle fake video/audio files in ffmpeg args, add fifo preparation and improved device checks and flag file merging/restart logic
…nput mode to device expose mode, video_file and audio_file in api status types
…g to auto-overwrite outputs
…peg process does not exit
… paths/descriptions increase ffmpeg stop timeout to 7s and remove processAlive check
rename fake-file to virtual-file, update comments and add configure/pause/resume/get/stop methods with json body alias
preserve cmd/state instead of clearing if the process remains alive
…ar manager state when process already exited
use stored pid and processAlive(pid) to avoid referencing m.cmd.Process.Pid when cmd may be nil
…ag and ffmpeg liveness check
add curl examples to exercise endpoints under /input/devices/virtual: status, configure, resume, pause, stop and ffmpeg verification
…earing manager state revise virtual input quick curl flow docs wording and expectations
… and update docs ensure lingering ffmpeg processes are terminated during virtual input shutdown
…l neko image clarify use_example curl flow and tweak expected outputs
add killAllFFmpeg helper using pkill (TERM then KILL) with short delay; change log to use virtual capture files
…dependently fix chromium fake-device flags, prepare capture dir and safe file cleanup; stop treating webrtc disconnected as failure
…rtual device config example
clarify virtual input audio/video format hints (socket vs webrtc) in oapi models
…ify ws chunk sender defaults and update readme
store up to 512kb of early stream data and replay to new connections; reset intro on format changes
…nt fifo blocking close keepalives if ffmpeg fails to start; apply to configure, pause and resume
… type instead of first-frame heuristic
…or correct keyframe detection
…et and webrtc sources prevent use-file-for-fake-video-capture for realtime feeds
…am restarts add findIvfHeader and update processing to discard stale bytes when a header is found mid-buffer and reset decoder state on stream restarts.
only build ffmpeg inputs/outputs for non-realtime audio/video, avoid mapping errors when paused and simplify pulse routing logic
…tual feed broadcaster, audio to pulseaudio via ffmpeg; remove fifo pipe usage and adjust manager/webrtc
…ing ffmpeg for realtime sources start ffmpeg only when args are present in pause/resume, adjust webrtc test error message
…ter for client adds, intro sends and broadcasts
… mpeg1 requirement set sample_video_mpeg1.ts as default ingest video in ws_chunk_ingest.js
…vf preamble, remove intro buffering openapi: add virtual input audio destination enum microphone,speaker
route ingest audio to microphone or speaker and wire destination through api, webrtc and ffmpeg
…nation handling handle virtual input audio destination (default microphone), update webrtc Configure calls/tests to new signature
… and docs add destination field and enum for virtual input audio; update README with examples for microphone vs speaker routing and realtime feed notes
…o shared schema use $ref for VirtualInputAudio.destination and add description/default
add destination field to openapi schema for ingest audio
…ts and expose destination on ingest endpoint simplify textevent stream handlers by replacing manual flush loops with io.Copy
…ation flush text/event-stream responses with buffered reads and http.Flusher, falling back to io.Copy
adjust virtual inputs, webrtc and socket ingest to place pulse as output placeholder
| function assertEnv() { | ||
| if (!REMOTE_RTMP_URL) throw new Error('REMOTE_RTMP_URL is required'); | ||
| if (!ELEVENLABS_API_KEY) throw new Error('ELEVENLABS_API_KEY is required'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unused environment variable required by validation
The assertEnv() function requires REMOTE_RTMP_URL to be set, but ENABLE_REMOTE_LIVESTREAM is hardcoded to false on line 14, meaning the remote RTMP feature is disabled and the URL is never actually used. This causes users to receive an unnecessary "REMOTE_RTMP_URL is required" error when running the sample, even though the variable isn't needed. The validation in assertEnv() and sequence.sh should be conditional on whether ENABLE_REMOTE_LIVESTREAM is true.
Additional Locations (1)
| async run(meetingUrl) { | ||
| this.meetingUrl = meetingUrl; | ||
| this.feedUrl = `${KERNEL_API_BASE}/input/devices/virtual/feed?fit=cover`; | ||
| const feedPage = await this.agent.newPage(); // new tab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: BrowserFlow agent property never initialized
The BrowserFlow class initializes this.agent to null in the constructor (line 613) but never assigns it a value. When run() is called via the /browser/join endpoint, calling this.agent.newPage() will throw a null reference error ("Cannot read properties of null"). The browser automation agent is expected to exist but is never created or passed into the BrowserFlow instance.
Aims
Rationale:
Media-first projects difficult to test / automate remotely due to infra limitations and lack of tooling.
Realtime audiovisual AI agents, use cases being highly restricted by automation / API limitations in realtime contexts / difficulty to setup. Constraints current gen meeting AI agents to passive low level features like transcriptions and summaries. Remote browser makes for perfect environment to solve these restrictions.
This PR features open doors for a wider range of use cases.
GUI-first apps, APIs limitations|unavailability → Remote browsers automations to operatebut for virtual agents.The new capabilities enables:
Limitations / Current Workarounds
v4l2loopback, which enables, virtual video inputs unusable due to container kernel limitations (see this issue ).({ "type":"stream"|"file","url":"..." })but isnt truly realtime as it is meant to be a chrome test feature, which replays the source from start on any load/refresh event.http://localhost:444/input/devices/virtual/feed?fit=coverto simplify consuming it.The limitations do not apply in the case of audio.
Writing to / exposing audio devices works properly without additional methods.
Next Steps
SDKs to wrap all virtual inputs & output livestreams, managing + sync media in realtime with simple function calls.
You can check
samples/agent-live-demofor a prototype that demonstrates everything at once.Fork + new build for chromium to resolve the virtual input device limitations by extending
--use-fake-device-for-media-stream --use-file-for-fake-video-capture=source.y4mrelayer, which was designed for mock playback rather than live, to enable consuming in realtime (or add alternative methods)References
Examples
1.1 Virtual Video Input - WebSocket Feed
Real-time video chunks via WebSocket.
Uses MPEG-1 video in MPEG-TS container for JSMpeg playback.
Configure WebSocket Video Input
Expected Response:
{ "state": "running", "video": { "type": "socket", "format": "mpegts", "width": 1280, "height": 720, "frame_rate": 30 }, "ingest": { "video": { "protocol": "socket", "format": "mpegts", "url": "ws://localhost:10001/input/devices/virtual/socket/video" } } }Encode Source Video to MPEG-1
# Convert any video to MPEG-1 (required for JSMpeg) ffmpeg -i input.mp4 -c:v mpeg1video -b:v 1500k -r 25 -f mpegts output.tsFeed Video Chunks (Node.js)
Real-time Behavior
Preview Feed
Open in browser:
http://localhost:444/input/devices/virtual/feed?fit=cover1.2 Virtual Video Input - WebRTC Feed
Real-time video via WebRTC (VP8/VP9 in IVF format internally).
Configure WebRTC Video Input
Expected Response:
{ "state": "running", "video": {"type": "webrtc"}, "ingest": { "video": { "protocol": "webrtc", "format": "ivf", "url": "http://localhost:10001/input/devices/virtual/webrtc/offer" } } }Send Video via WebRTC (Python)
Real-time Factor
1.3 Virtual Audio Input - WebSocket Feed
Real-time audio chunks via WebSocket (MP3 format).
Configure WebSocket Audio Input (to Virtual Mic)
Expected Response:
{ "state": "running", "audio": { "type": "socket", "format": "mp3", "destination": "microphone" }, "ingest": { "audio": { "protocol": "socket", "format": "mp3", "destination": "microphone", "url": "ws://localhost:10001/input/devices/virtual/socket/audio" } } }Visual Examples
Feed Page States
Streaming State - Video chunks actively being received:
Shows test pattern video being streamed via WebSocket. The feed displays video frames only when chunks are actively arriving.
Idle State - No video configured or chunks stopped:
After stopping or refreshing with no active stream, the feed shows "No virtual video feed configured" message. No cached data is displayed.
Audio Destinations
microphone(default)audio_inputspeakeraudio_outputFeed Audio Chunks (Node.js)
Example Logs (Real-time Audio Ingest)
1.4 Virtual Audio Input - WebRTC Feed
Real-time audio via WebRTC (Opus codec).
Configure WebRTC Audio Input
Route to Speaker Instead
Send Audio via WebRTC (Python)
1.5 Combined Virtual Input (Video + Audio)
WebSocket Video + Audio
Then feed both sockets simultaneously:
ws://localhost:444/input/devices/virtual/socket/videows://localhost:444/input/devices/virtual/socket/audioWebRTC Video + Audio
Both tracks use the same WebRTC peer connection.
1.6 Livestream - WebRTC Playback
Expose container display as WebRTC stream for browser consumption.
Start WebRTC Livestream
Expected Response:
{ "id": "webrtc-live", "mode": "webrtc", "ingest_url": "", "webrtc_offer_url": "http://localhost:10001/stream/webrtc/offer", "is_streaming": true, "started_at": "2024-01-15T10:30:00Z" }Connect from Browser (JavaScript)
Real-time Factor
1.7 Livestream - WebSocket Audio
Stream container audio output as MP3 chunks over WebSocket.
Start Socket Audio Livestream
Expected Response:
{ "id": "audio-live", "mode": "socket", "ingest_url": "", "websocket_url": "ws://localhost:10001/stream/socket/audio-live", "is_streaming": true }Consume Audio Stream (Node.js)
Example Logs (Audio Livestream)
1.8 Livestream - RTMP (Local & Remote)
Internal RTMP Server
Expected Response:
{ "id": "default", "mode": "internal", "ingest_url": "rtmp://localhost:1935/live/default", "playback_url": "rtmp://localhost:1935/live/default", "is_streaming": true }Play with ffplay
Push to Remote RTMP
1.9 Control Commands
Pause (Black Frames/Silence)
Resume
Stop All
Stop Livestream
API Reference
Virtual Inputs
/input/devices/virtual/configure/input/devices/virtual/status/input/devices/virtual/pause/input/devices/virtual/resume/input/devices/virtual/stop/input/devices/virtual/feed/input/devices/virtual/feed/socket/info/input/devices/virtual/webrtc/offer/input/devices/virtual/socket/video/input/devices/virtual/socket/audioLivestream
/stream/start/stream/stop/stream/list/stream/webrtc/offer/stream/socket/{id}Request/Response Schemas
VirtualInputsRequest
VirtualInputsStatus
StartStreamRequest
StreamInfo
Video Encoding Notes
The feed page uses JSMpeg for WebSocket video playback, which requires MPEG-1 video codec.
Encoding Command
Parameters
-c:v mpeg1video-b:v 1500k-r 25-f mpegtsAudio Format Notes
WebSocket Audio Ingest
WebRTC Audio
Real-time Behavior Summary
Key Principle: No caching of past data. When chunks stop, output shows idle state. When chunks resume, output resumes from current data.
Real-time Factor
The virtual inputs and livestream features are designed for true real-time behavior:
No Replay on Refresh: Unlike buffered video players, refreshing the feed page does not replay cached content. Each refresh shows the current state.
Immediate State Reflection:
Connection Independence: Opening multiple tabs shows the same real-time stream, not separate cached copies.
Real-time Video | WebRTC
WebSocket Real-time Characteristics
Verified Real-time Behavior
The following behaviors have been tested and verified:
Audio Real-time Routing
Audio chunks can be routed to two destinations:
[ @rgarcia @juecd ]
Note
Introduce real-time livestreaming (RTMP/RTMPS/WebRTC/WebSocket) and virtual audio/video inputs with WebSocket/WebRTC ingest, preview feed, and full server/image plumbing, plus samples and minor UI tweaks.
FFmpegstreamers (RTMP/RTMPS internal server, WebRTC, WebSocket), manager, and endpoints:POST /stream/start,POST /stream/stop,GET /stream/list,POST /stream/webrtc/offer; WS playback at/stream/socket/{id}.stream|file|socket|webrtc; endpoints:POST /input/devices/virtual/configure|pause|resume|stop,GET /input/devices/virtual/status, preview pageGET /input/devices/virtual/feed, feed socket info, and WS ingest at/input/devices/virtual/socket/{video|audio}.v4l2loopback,rtkit, open perms/groups, start PulseAudio via supervisord; expose ports1935/1936; set env for PulseAudio.samples/agent-live-demo,samples/livestream,samples/virtual-inputs(scripts, docs, WS/WebRTC senders, feed capture).ws, etc.).Written by Cursor Bugbot for commit bb79584. This will update automatically on new commits. Configure here.