Realtime Media Capabilities Expansion (v1) #106

raiden-staging · 2025-12-15T19:28:04Z

Aims

Rationale:

Media-first projects difficult to test / automate remotely due to infra limitations and lack of tooling.
Realtime audiovisual AI agents, use cases being highly restricted by automation / API limitations in realtime contexts / difficulty to setup. Constraints current gen meeting AI agents to passive low level features like transcriptions and summaries. Remote browser makes for perfect environment to solve these restrictions.
This PR features open doors for a wider range of use cases.
- GUI-first apps, APIs limitations|unavailability → Remote browsers automations to operate but for virtual agents.

The new capabilities enables:

livestreams (audio|video) and virtual input sources (microphone| ~camera)
for different sources (file | stream url | chunks)
across different protocols (rtmp | webrtc | websockets) for write/consumption
websockets in/out tested extensively for realtime capabilities

Limitations / Current Workarounds

v4l2loopback, which enables, virtual video inputs unusable due to container kernel limitations (see this issue ).
- A source stream | file can be used as fake camera source in chrome ({ "type":"stream"|"file","url":"..." }) but isnt truly realtime as it is meant to be a chrome test feature, which replays the source from start on any load/refresh event.
- The current workaround is to have a pipe for the virtual feed that can be consumed from the browser, which ie. enables screensharing the virtual video input. The live feed is automatically setup at http://localhost:444/input/devices/virtual/feed?fit=cover to simplify consuming it.
The limitations do not apply in the case of audio.
Writing to / exposing audio devices works properly without additional methods.

Next Steps

SDKs to wrap all virtual inputs & output livestreams, managing + sync media in realtime with simple function calls.
You can check samples/agent-live-demo for a prototype that demonstrates everything at once.
Fork + new build for chromium to resolve the virtual input device limitations by extending --use-fake-device-for-media-stream --use-file-for-fake-video-capture=source.y4m relayer, which was designed for mock playback rather than live, to enable consuming in realtime (or add alternative methods)

References

Examples

1.1 Virtual Video Input - WebSocket Feed

Real-time video chunks via WebSocket.
Uses MPEG-1 video in MPEG-TS container for JSMpeg playback.

Configure WebSocket Video Input

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "video": {
      "type": "socket",
      "format": "mpegts",
      "width": 1280,
      "height": 720,
      "frame_rate": 30
    }
  }' | jq

Expected Response:

{
  "state": "running",
  "video": {
    "type": "socket",
    "format": "mpegts",
    "width": 1280,
    "height": 720,
    "frame_rate": 30
  },
  "ingest": {
    "video": {
      "protocol": "socket",
      "format": "mpegts",
      "url": "ws://localhost:10001/input/devices/virtual/socket/video"
    }
  }
}

Encode Source Video to MPEG-1

# Convert any video to MPEG-1 (required for JSMpeg)
ffmpeg -i input.mp4 -c:v mpeg1video -b:v 1500k -r 25 -f mpegts output.ts

Feed Video Chunks (Node.js)

import { createReadStream } from 'node:fs';
import WebSocket from 'ws';

const ws = new WebSocket('ws://localhost:444/input/devices/virtual/socket/video');
const delay = ms => new Promise(r => setTimeout(r, ms));

ws.on('open', async () => {
  for await (const chunk of createReadStream('video.ts', { highWaterMark: 64*1024 })) {
    ws.send(chunk);
    await delay(35); // ~realtime pacing
  }
  console.log('Streaming... socket left open for more chunks');
});

Real-time Behavior

Feed page shows video only when chunks arrive
Refresh = no cached replay; shows "Loading..." until new chunks
Stop sending = black screen; resume = video resumes
This is true real-time: no buffering of past data

Preview Feed

Open in browser: http://localhost:444/input/devices/virtual/feed?fit=cover

1.2 Virtual Video Input - WebRTC Feed

Real-time video via WebRTC (VP8/VP9 in IVF format internally).

Configure WebRTC Video Input

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{"video": {"type": "webrtc"}}' | jq

Expected Response:

{
  "state": "running",
  "video": {"type": "webrtc"},
  "ingest": {
    "video": {
      "protocol": "webrtc",
      "format": "ivf",
      "url": "http://localhost:10001/input/devices/virtual/webrtc/offer"
    }
  }
}

Send Video via WebRTC (Python)

import asyncio, aiohttp
from aiortc import RTCPeerConnection, RTCSessionDescription
from aiortc.contrib.media import MediaPlayer

async def main():
    pc = RTCPeerConnection()
    player = MediaPlayer("video.mp4")
    if player.video:
        pc.addTrack(player.video)

    offer = await pc.createOffer()
    await pc.setLocalDescription(offer)

    async with aiohttp.ClientSession() as s:
        resp = await s.post(
            "http://localhost:444/input/devices/virtual/webrtc/offer",
            json={"sdp": pc.localDescription.sdp}
        )
        answer = await resp.json()

    await pc.setRemoteDescription(
        RTCSessionDescription(sdp=answer["sdp"], type="answer")
    )
    print("Streaming...")
    await asyncio.Future()

asyncio.run(main())

Real-time Factor

WebRTC provides lowest latency (~100-300ms typical)
Feed page refreshes show current frame, not cached history
Track stops = black screen; track resumes = video resumes

1.3 Virtual Audio Input - WebSocket Feed

Real-time audio chunks via WebSocket (MP3 format).

Configure WebSocket Audio Input (to Virtual Mic)

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "audio": {
      "type": "socket",
      "format": "mp3",
      "destination": "microphone"
    }
  }' | jq

Expected Response:

{
  "state": "running",
  "audio": {
    "type": "socket",
    "format": "mp3",
    "destination": "microphone"
  },
  "ingest": {
    "audio": {
      "protocol": "socket",
      "format": "mp3",
      "destination": "microphone",
      "url": "ws://localhost:10001/input/devices/virtual/socket/audio"
    }
  }
}

Visual Examples

Feed Page States

Streaming State - Video chunks actively being received:

Shows test pattern video being streamed via WebSocket. The feed displays video frames only when chunks are actively arriving.

Idle State - No video configured or chunks stopped:

After stopping or refreshing with no active stream, the feed shows "No virtual video feed configured" message. No cached data is displayed.

Audio Destinations

Destination	PulseAudio Sink	Use Case
`microphone` (default)	`audio_input`	Virtual mic for apps reading mic input
`speaker`	`audio_output`	Monitor/playback through container audio

Feed Audio Chunks (Node.js)

import { createReadStream } from 'node:fs';
import WebSocket from 'ws';

const ws = new WebSocket('ws://localhost:444/input/devices/virtual/socket/audio');
const delay = ms => new Promise(r => setTimeout(r, ms));

ws.on('open', async () => {
  for await (const chunk of createReadStream('audio.mp3', { highWaterMark: 16*1024 })) {
    ws.send(chunk);
    await delay(50);
  }
  console.log('Audio streaming... socket open for more');
});

Example Logs (Real-time Audio Ingest)

[virtual-input] audio socket connected
[virtual-input] audio chunk received: 16384 bytes
[virtual-input] routing to microphone (audio_input sink)
[virtual-input] audio chunk received: 16384 bytes
[virtual-input] audio chunk received: 8192 bytes
[virtual-input] audio ingest idle, waiting for chunks...
[virtual-input] audio chunk received: 16384 bytes
[virtual-input] routing resumed to microphone

1.4 Virtual Audio Input - WebRTC Feed

Real-time audio via WebRTC (Opus codec).

Configure WebRTC Audio Input

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "audio": {"type": "webrtc", "destination": "microphone"}
  }' | jq

Route to Speaker Instead

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "audio": {"type": "webrtc", "destination": "speaker"}
  }' | jq

Send Audio via WebRTC (Python)

import asyncio, aiohttp
from aiortc import RTCPeerConnection, RTCSessionDescription
from aiortc.contrib.media import MediaPlayer

async def main():
    pc = RTCPeerConnection()
    player = MediaPlayer("audio.mp3")
    if player.audio:
        pc.addTrack(player.audio)

    offer = await pc.createOffer()
    await pc.setLocalDescription(offer)

    async with aiohttp.ClientSession() as s:
        resp = await s.post(
            "http://localhost:444/input/devices/virtual/webrtc/offer",
            json={"sdp": pc.localDescription.sdp}
        )
        answer = await resp.json()

    await pc.setRemoteDescription(
        RTCSessionDescription(sdp=answer["sdp"], type="answer")
    )
    await asyncio.Future()

asyncio.run(main())

1.5 Combined Virtual Input (Video + Audio)

WebSocket Video + Audio

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "video": {"type": "socket", "format": "mpegts", "width": 1280, "height": 720},
    "audio": {"type": "socket", "format": "mp3"}
  }' | jq

Then feed both sockets simultaneously:

Video: ws://localhost:444/input/devices/virtual/socket/video
Audio: ws://localhost:444/input/devices/virtual/socket/audio

WebRTC Video + Audio

curl -s http://localhost:444/input/devices/virtual/configure \
  -H "Content-Type: application/json" \
  -d '{
    "video": {"type": "webrtc"},
    "audio": {"type": "webrtc"}
  }' | jq

Both tracks use the same WebRTC peer connection.

1.6 Livestream - WebRTC Playback

Expose container display as WebRTC stream for browser consumption.

Start WebRTC Livestream

curl -s http://localhost:444/stream/start \
  -H "Content-Type: application/json" \
  -d '{"mode": "webrtc", "id": "webrtc-live"}' | jq

Expected Response:

{
  "id": "webrtc-live",
  "mode": "webrtc",
  "ingest_url": "",
  "webrtc_offer_url": "http://localhost:10001/stream/webrtc/offer",
  "is_streaming": true,
  "started_at": "2024-01-15T10:30:00Z"
}

Connect from Browser (JavaScript)

const pc = new RTCPeerConnection();
pc.ontrack = e => {
  document.getElementById('video').srcObject = e.streams[0];
};

const offer = await pc.createOffer({ offerToReceiveVideo: true, offerToReceiveAudio: true });
await pc.setLocalDescription(offer);

const resp = await fetch('http://localhost:444/stream/webrtc/offer', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ id: 'webrtc-live', sdp: pc.localDescription.sdp })
});
const answer = await resp.json();
await pc.setRemoteDescription({ type: 'answer', sdp: answer.sdp });

Real-time Factor

Sub-second latency typical
No buffering; frame drops on slow connections

1.7 Livestream - WebSocket Audio

Stream container audio output as MP3 chunks over WebSocket.

Start Socket Audio Livestream

curl -s http://localhost:444/stream/start \
  -H "Content-Type: application/json" \
  -d '{"mode": "socket", "id": "audio-live"}' | jq

Expected Response:

{
  "id": "audio-live",
  "mode": "socket",
  "ingest_url": "",
  "websocket_url": "ws://localhost:10001/stream/socket/audio-live",
  "is_streaming": true
}

Consume Audio Stream (Node.js)

import WebSocket from 'ws';
import fs from 'node:fs';

const ws = new WebSocket('ws://localhost:444/stream/socket/audio-live');
const out = fs.createWriteStream('captured_audio.ts');

ws.on('message', chunk => out.write(chunk));
ws.on('close', () => out.end());

Example Logs (Audio Livestream)

[livestream] starting socket mode stream: audio-live
[livestream] capturing audio from pulse audio_output
[livestream] websocket client connected to audio-live
[livestream] streaming audio chunk: 4096 bytes
[livestream] streaming audio chunk: 4096 bytes
[livestream] client disconnected from audio-live

1.8 Livestream - RTMP (Local & Remote)

Internal RTMP Server

curl -s http://localhost:444/stream/start \
  -H "Content-Type: application/json" \
  -d '{"mode": "internal"}' | jq

Expected Response:

{
  "id": "default",
  "mode": "internal",
  "ingest_url": "rtmp://localhost:1935/live/default",
  "playback_url": "rtmp://localhost:1935/live/default",
  "is_streaming": true
}

Play with ffplay

ffplay -fflags nobuffer -i rtmp://localhost:1935/live/default

Push to Remote RTMP

curl -s http://localhost:444/stream/start \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "remote",
    "target_url": "rtmp://live.example.com/app/stream-key"
  }' | jq

1.9 Control Commands

Pause (Black Frames/Silence)

curl -X POST http://localhost:444/input/devices/virtual/pause

Resume

curl -X POST http://localhost:444/input/devices/virtual/resume

Stop All

curl -X POST http://localhost:444/input/devices/virtual/stop

Stop Livestream

curl -X POST http://localhost:444/stream/stop

API Reference

Virtual Inputs

Endpoint	Method	Description
`/input/devices/virtual/configure`	POST	Configure video/audio virtual inputs
`/input/devices/virtual/status`	GET	Get current virtual input status
`/input/devices/virtual/pause`	POST	Pause with black frames/silence
`/input/devices/virtual/resume`	POST	Resume live media
`/input/devices/virtual/stop`	POST	Stop and release resources
`/input/devices/virtual/feed`	GET	HTML page for live preview
`/input/devices/virtual/feed/socket/info`	GET	WebSocket URL info for feed
`/input/devices/virtual/webrtc/offer`	POST	WebRTC SDP negotiation
`/input/devices/virtual/socket/video`	WS	WebSocket video ingest
`/input/devices/virtual/socket/audio`	WS	WebSocket audio ingest

Livestream

Endpoint	Method	Description
`/stream/start`	POST	Start livestream (internal/remote/webrtc/socket)
`/stream/stop`	POST	Stop livestream
`/stream/list`	GET	List active streams
`/stream/webrtc/offer`	POST	WebRTC SDP for livestream playback
`/stream/socket/{id}`	WS	WebSocket MPEG-TS stream

Request/Response Schemas

VirtualInputsRequest

interface VirtualInputsRequest {
  video?: {
    type: "stream" | "file" | "socket" | "webrtc";
    url?: string;           // For stream/file types
    format?: string;        // "mpegts" for socket, "ivf" for webrtc
    width?: number;
    height?: number;
    frame_rate?: number;
  };
  audio?: {
    type: "stream" | "file" | "socket" | "webrtc";
    url?: string;
    format?: string;        // "mp3" for socket
    destination?: "microphone" | "speaker";  // Default: microphone
  };
  start_paused?: boolean;
}

VirtualInputsStatus

interface VirtualInputsStatus {
  state: "idle" | "running" | "paused";
  mode: "device" | "virtual-file";
  video_device: string;
  audio_sink: string;
  microphone_source: string;
  video?: VirtualInputVideo;
  audio?: VirtualInputAudio;
  ingest?: {
    video?: { protocol: string; format: string; url: string; };
    audio?: { protocol: string; format: string; destination: string; url: string; };
  };
  started_at?: string;
  last_error?: string;
}

StartStreamRequest

interface StartStreamRequest {
  id?: string;
  mode: "internal" | "remote" | "webrtc" | "socket";
  target_url?: string;      // Required for "remote" mode
  framerate?: number;       // 1-20 fps
}

StreamInfo

interface StreamInfo {
  id: string;
  mode: "internal" | "remote" | "webrtc" | "socket";
  ingest_url: string;
  playback_url?: string;
  websocket_url?: string;
  webrtc_offer_url?: string;
  is_streaming: boolean;
  started_at: string;
}

Video Encoding Notes

The feed page uses JSMpeg for WebSocket video playback, which requires MPEG-1 video codec.

Encoding Command

ffmpeg -i source.mp4 -c:v mpeg1video -b:v 1500k -r 25 -f mpegts output.ts

Parameters

Parameter	Value	Notes
`-c:v mpeg1video`	MPEG-1	Required for JSMpeg
`-b:v 1500k`	1.5 Mbps	Adjust for quality/bandwidth
`-r 25`	25 fps	Match source or reduce
`-f mpegts`	MPEG-TS	Container format for streaming

Audio Format Notes

WebSocket Audio Ingest

Format: MP3 chunks
Chunk size: 16-64 KB typical
Pacing: ~50ms between chunks for real-time

WebRTC Audio

Codec: Opus
Handled automatically by WebRTC stack

Real-time Behavior Summary

Feature	Latency	Buffer	Refresh Behavior
WebSocket Video	~100-500ms	None	Shows "Loading..." until chunks arrive
WebRTC Video	~100-300ms	Minimal	Current frame only
WebSocket Audio	~50-200ms	None	Silence when idle
WebRTC Audio	~50-150ms	Minimal	Silence when idle
RTMP Internal	~1-3s	Some	Standard RTMP behavior

Key Principle: No caching of past data. When chunks stop, output shows idle state. When chunks resume, output resumes from current data.

Real-time Factor

The virtual inputs and livestream features are designed for true real-time behavior:

No Replay on Refresh: Unlike buffered video players, refreshing the feed page does not replay cached content. Each refresh shows the current state.
Immediate State Reflection:
- Chunks arriving → video/audio plays
- Chunks stop → idle state shown
- Chunks resume → playback resumes from current data
Connection Independence: Opening multiple tabs shows the same real-time stream, not separate cached copies.

Real-time Video | WebRTC

┌─────────────┐     WebRTC      ┌─────────────┐
│   Source    │ ──────────────► │    Feed     │
│  (aiortc)   │   ~100-300ms    │   (page)    │
└─────────────┘    latency      └─────────────┘

Codec: VP8/VP9 (video), Opus (audio)
Latency: 100-300ms typical

WebSocket Real-time Characteristics

┌─────────────┐    WebSocket    ┌─────────────┐
│   Source    │ ──────────────► │   JSMpeg    │
│  (chunks)   │   ~100-500ms    │  (decoder)  │
└─────────────┘    latency      └─────────────┘

Codec: MPEG-1 video (JSMpeg), MP3 (audio)
Latency: 100-500ms typical

Verified Real-time Behavior

The following behaviors have been tested and verified:

Scenario	Expected	Actual
Page refresh while streaming	Shows current frame	✓
Page refresh after stop	Shows idle message	✓
Start streaming on idle page	Video appears immediately	✓
Stop streaming while viewing	Shows idle/blank	✓
Multiple tabs same stream	All show same real-time content	✓

Audio Real-time Routing

Audio chunks can be routed to two destinations:

                     ┌──────────────────┐
                     │  Virtual Input   │
                     │    (chunks)      │
                     └────────┬─────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌─────────────────┐             ┌─────────────────┐
    │   audio_input   │             │  audio_output   │
    │  (virtual mic)  │             │   (speaker)     │
    └────────┬────────┘             └────────┬────────┘
             │                               │
             ▼                               ▼
    Apps read from mic              Playback/monitor

destination: "microphone" (default): Routes to virtual mic input. Apps reading from microphone receive this audio.
destination: "speaker": Routes to audio output. Useful for monitoring/playback.

[ @rgarcia @juecd ]

Note

Introduce real-time livestreaming (RTMP/RTMPS/WebRTC/WebSocket) and virtual audio/video inputs with WebSocket/WebRTC ingest, preview feed, and full server/image plumbing, plus samples and minor UI tweaks.

Backend/API:
- Livestreaming: Implement FFmpeg streamers (RTMP/RTMPS internal server, WebRTC, WebSocket), manager, and endpoints: POST /stream/start, POST /stream/stop, GET /stream/list, POST /stream/webrtc/offer; WS playback at /stream/socket/{id}.
- Virtual Inputs: Add manager for virtual mic/cam with sources stream|file|socket|webrtc; endpoints: POST /input/devices/virtual/configure|pause|resume|stop, GET /input/devices/virtual/status, preview page GET /input/devices/virtual/feed, feed socket info, and WS ingest at /input/devices/virtual/socket/{video|audio}.
- Spec/Tests: Expand OpenAPI and add extensive unit tests for stream/virtual-inputs.
Image/Runtime:
- Enable PulseAudio (configs, dbus), add v4l2loopback, rtkit, open perms/groups, start PulseAudio via supervisord; expose ports 1935/1936; set env for PulseAudio.
Samples:
- Add samples/agent-live-demo, samples/livestream, samples/virtual-inputs (scripts, docs, WS/WebRTC senders, feed capture).
Frontend/UI:
- New real-time feed page (socket/WebRTC); minor client theme color change; loader and mute-indicator/auto-unmute UX.
Tooling:
- Update Makefile (use npm down-convert), add deps (ws, etc.).

^{Written by Cursor Bugbot for commit bb79584. This will update automatically on new commits. Configure here.}

…al camera and microphone handle v4l2loopback and pulseaudio devices, manage start/pause/resume/stop and scale-to-zero

refactor virtualinputs: new ctor with resolution/framerate, reset last config on stop, rebuild ffmpeg arg construction to handle video/audio indexing, paused/shared sources and unified input arg builder

…te virtual inputs manager into apiservice lifecycle

… error vars, add test helper

…ice; add mock virtualinputs manager and assert stop called on shutdown

install v4l2loopback, set pulseaudio default sink/source, switch to npm for openapi-down-convert

switch makefile to pnpm for openapi-down-convert and update generated code metadata

avoid err shadowing, normalize numeric types/enums, cast status and add virtual input defaults

…virtual inputs tests

…test for virtual input env vars

…ack unavailable prepare fifo video/audio files, set mode and file paths on manager/status and log fallback warning

…al inputs reset manager mode/files on stop, handle fake video/audio files in ffmpeg args, add fifo preparation and improved device checks and flag file merging/restart logic

…nput mode to device expose mode, video_file and audio_file in api status types

…g to auto-overwrite outputs

…peg process does not exit

… paths/descriptions increase ffmpeg stop timeout to 7s and remove processAlive check

rename fake-file to virtual-file, update comments and add configure/pause/resume/get/stop methods with json body alias

preserve cmd/state instead of clearing if the process remains alive

…ar manager state when process already exited

use stored pid and processAlive(pid) to avoid referencing m.cmd.Process.Pid when cmd may be nil

…ag and ffmpeg liveness check

add curl examples to exercise endpoints under /input/devices/virtual: status, configure, resume, pause, stop and ffmpeg verification

…earing manager state revise virtual input quick curl flow docs wording and expectations

… and update docs ensure lingering ffmpeg processes are terminated during virtual input shutdown

…l neko image clarify use_example curl flow and tweak expected outputs

add killAllFFmpeg helper using pkill (TERM then KILL) with short delay; change log to use virtual capture files

…dependently fix chromium fake-device flags, prepare capture dir and safe file cleanup; stop treating webrtc disconnected as failure

…rtual device config example

clarify virtual input audio/video format hints (socket vs webrtc) in oapi models

…ify ws chunk sender defaults and update readme

store up to 512kb of early stream data and replay to new connections; reset intro on format changes

…nt fifo blocking close keepalives if ffmpeg fails to start; apply to configure, pause and resume

… type instead of first-frame heuristic

…or correct keyframe detection

…et and webrtc sources prevent use-file-for-fake-video-capture for realtime feeds

…am restarts add findIvfHeader and update processing to discard stale bytes when a header is found mid-buffer and reset decoder state on stream restarts.

…al-inputs

only build ffmpeg inputs/outputs for non-realtime audio/video, avoid mapping errors when paused and simplify pulse routing logic

…tual feed broadcaster, audio to pulseaudio via ffmpeg; remove fifo pipe usage and adjust manager/webrtc

…ing ffmpeg for realtime sources start ffmpeg only when args are present in pause/resume, adjust webrtc test error message

…to info

…ter for client adds, intro sends and broadcasts

…d_broadcaster

… mpeg1 requirement set sample_video_mpeg1.ts as default ingest video in ws_chunk_ingest.js

…vf preamble, remove intro buffering openapi: add virtual input audio destination enum microphone,speaker

route ingest audio to microphone or speaker and wire destination through api, webrtc and ffmpeg

…nation handling handle virtual input audio destination (default microphone), update webrtc Configure calls/tests to new signature

… and docs add destination field and enum for virtual input audio; update README with examples for microphone vs speaker routing and realtime feed notes

…o shared schema use $ref for VirtualInputAudio.destination and add description/default

add destination field to openapi schema for ingest audio

…ts and expose destination on ingest endpoint simplify textevent stream handlers by replacing manual flush loops with io.Copy

…ation flush text/event-stream responses with buffered reads and http.Flusher, falling back to io.Copy

adjust virtual inputs, webrtc and socket ingest to place pulse as output placeholder

cursor · 2025-12-15T19:32:24Z

samples/agent-live-demo/conductor.js

+function assertEnv() {
+  if (!REMOTE_RTMP_URL) throw new Error('REMOTE_RTMP_URL is required');
+  if (!ELEVENLABS_API_KEY) throw new Error('ELEVENLABS_API_KEY is required');
+}


Bug: Unused environment variable required by validation

The assertEnv() function requires REMOTE_RTMP_URL to be set, but ENABLE_REMOTE_LIVESTREAM is hardcoded to false on line 14, meaning the remote RTMP feature is disabled and the URL is never actually used. This causes users to receive an unnecessary "REMOTE_RTMP_URL is required" error when running the sample, even though the variable isn't needed. The validation in assertEnv() and sequence.sh should be conditional on whether ENABLE_REMOTE_LIVESTREAM is true.

Additional Locations (1)

samples/agent-live-demo/sequence.sh#L45-L46

cursor · 2025-12-15T19:32:24Z

samples/agent-live-demo/conductor.js

+  async run(meetingUrl) {
+    this.meetingUrl = meetingUrl;
+    this.feedUrl = `${KERNEL_API_BASE}/input/devices/virtual/feed?fit=cover`;
+    const feedPage = await this.agent.newPage(); // new tab


Bug: BrowserFlow agent property never initialized

The BrowserFlow class initializes this.agent to null in the constructor (line 613) but never assigns it a value. When run() is called via the /browser/join endpoint, calling this.agent.newPage() will throw a null reference error ("Cannot read properties of null"). The browser automation agent is expected to exist but is never created or passed into the BrowserFlow instance.

raiden-staging added 30 commits December 2, 2025 13:36

[audio] audio support once again

c62cc7f

[liveinputs] implement virtual inputs manager for ffmpeg-backed virtu…

3988cdc

…al camera and microphone handle v4l2loopback and pulseaudio devices, manage start/pause/resume/stop and scale-to-zero

[liveinputs] add virtual input manager and configuration options

38c8519

refactor virtualinputs: new ctor with resolution/framerate, reset last config on stop, rebuild ffmpeg arg construction to handle video/audio indexing, paused/shared sources and unified input arg builder

[liveinputs] introduce virtual inputs api and openapi schema; integra…

51a251b

…te virtual inputs manager into apiservice lifecycle

[liveinputs] implement virtual inputs api handlers, introduce manager…

e986e84

… error vars, add test helper

[liveinputs] tests(api): replace direct New calls with newTestApiServ…

0ee66cb

…ice; add mock virtualinputs manager and assert stop called on shutdown

[liveinputs] add virtual inputs api tests

3bc9364

install v4l2loopback, set pulseaudio default sink/source, switch to npm for openapi-down-convert

[liveinputs] regenerate oapi client to add virtual inputs endpoints

f2dc1d9

switch makefile to pnpm for openapi-down-convert and update generated code metadata

[liveinputs] fix api tests and virtual inputs handling

26dbb8e

avoid err shadowing, normalize numeric types/enums, cast status and add virtual input defaults

[liveinputs] replace magic state strings with oapi enum constants in …

c50242b

…virtual inputs tests

[liveinputs] document virtual media inputs and endpoints; add config …

13ffe5c

…test for virtual input env vars

[liveinputs] add fake-file fallback for virtual inputs when v4l2loopb…

085f659

…ack unavailable prepare fifo video/audio files, set mode and file paths on manager/status and log fallback warning

[liveinputs] apply chromium capture flags and support fake-file virtu…

534b33f

…al inputs reset manager mode/files on stop, handle fake video/audio files in ffmpeg args, add fifo preparation and improved device checks and flag file merging/restart logic

[liveinputs] add chromium fake-capture fallback and default virtual i…

2f0310d

…nput mode to device expose mode, video_file and audio_file in api status types

[liveinputs] allow writable chromium flags mount and pass -y to ffmpe…

4326e06

…g to auto-overwrite outputs

[liveinputs] drop explicit rw option from chromium flags bind mount

d62bb51

[liveinputs] virtualinputs: enforce shutdown timeout and error if ffm…

6b0227f

…peg process does not exit

[liveinputs] rename fake-file mode to virtual-file and update openapi…

9a82e3e

… paths/descriptions increase ffmpeg stop timeout to 7s and remove processAlive check

[liveinputs] implement virtual inputs client endpoints

a7fbf72

rename fake-file to virtual-file, update comments and add configure/pause/resume/get/stop methods with json body alias

[liveinputs] return error when ffmpeg process fails to exit

cdcd879

preserve cmd/state instead of clearing if the process remains alive

[liveinputs] wait for process to exit after kill with 2s timeout

93cc0a9

[liveinputs] check process liveness before killing process group; cle…

870fc40

…ar manager state when process already exited

[liveinputs] only error if ffmpeg process still alive

151c40f

use stored pid and processAlive(pid) to avoid referencing m.cmd.Process.Pid when cmd may be nil

[liveinputs] virtualinputs: simplify stop logic by removing exited fl…

a1ba3b7

…ag and ffmpeg liveness check

[liveinputs] add virtual input api quick test docs

f7e7712

add curl examples to exercise endpoints under /input/devices/virtual: status, configure, resume, pause, stop and ffmpeg verification

[liveinputs] apply chromium capture flags for virtual-file mode

d97ef98

[liveinputs] return error when ffmpeg process fails to exit before cl…

a82906e

…earing manager state revise virtual input quick curl flow docs wording and expectations

[liveinputs] rename virtual inputs api path to /input/devices/virtual…

10d4f36

… and update docs ensure lingering ffmpeg processes are terminated during virtual input shutdown

[liveinputs] enable webcam and microphone capture for chromium-headfu…

2f33d9b

…l neko image clarify use_example curl flow and tweak expected outputs

[liveinputs] kill lingering ffmpeg processes on configure/pause/resume

f49a77d

add killAllFFmpeg helper using pkill (TERM then KILL) with short delay; change log to use virtual capture files

raiden-staging added 29 commits December 9, 2025 12:43

[mediav1@rev2] virtual inputs: handle virtual-file video and audio in…

27491bb

…dependently fix chromium fake-device flags, prepare capture dir and safe file cleanup; stop treating webrtc disconnected as failure

[mediav1@rev2] improve virtual feed websocket detection and update vi…

d03e811

…rtual device config example

[mediav1@rev2] treat unsupported audio/video as bad request

ff6f885

clarify virtual input audio/video format hints (socket vs webrtc) in oapi models

[mediav1@rev2] require mpeg-ts video for virtual socket ingest; simpl…

9957255

…ify ws chunk sender defaults and update readme

[mediav1@rev2] cache and send initial feed intro to websocket clients

2129700

store up to 512kb of early stream data and replay to new connections; reset intro on format changes

[mediav1@rev2.1] open pipe keepalives before starting ffmpeg to preve…

03d2da4

…nt fifo blocking close keepalives if ffmpeg fails to start; apply to configure, pause and resume

[mediav1@rev2.1] detect vp8/vp9 keyframes and set encoded video chunk…

c6a17f8

… type instead of first-frame heuristic

[mediav1@rev2.1] virtual feed: disable loop and track current codec f…

4c36fe9

…or correct keyframe detection

[mediav1@rev2.1] skip assigning virtual audio/video files for websock…

1fbaed0

…et and webrtc sources prevent use-file-for-fake-video-capture for realtime feeds

[mediav1@rev2.1] detect ivf header anywhere in buffer and handle stre…

8fbc2e9

…am restarts add findIvfHeader and update processing to discard stale bytes when a header is found mid-buffer and reset decoder state on stream restarts.

[mediav1@rev2.1] remove stray node_modules symlink from samples/virtu…

ff963a0

…al-inputs

[mediav1@rev2.1] skip ffmpeg for realtime (socket/webrtc) sources

92f7d8e

only build ffmpeg inputs/outputs for non-realtime audio/video, avoid mapping errors when paused and simplify pulse routing logic

[mediav1@rev2.1] route realtime virtual inputs directly: video to vir…

8e3f758

…tual feed broadcaster, audio to pulseaudio via ffmpeg; remove fifo pipe usage and adjust manager/webrtc

[mediav1@rev2.1] switch socket ingest logging to slog and avoid start…

86f8de2

…ing ffmpeg for realtime sources start ffmpeg only when args are present in pause/resume, adjust webrtc test error message

[mediav1@rev2.1] change log level for received video websocket chunk …

7220f52

…to info

[mediav1@rev2.1] add structured slog logging to virtual feed broadcas…

d3cf76e

…ter for client adds, intro sends and broadcasts

[mediav1@rev2.1] remove slog info logs and unused vars in virtual_fee…

c3415ec

…d_broadcaster

[mediav1@rev2.1] virtual-inputs: add mpeg1 sample and document jsmpeg…

f1b8be1

… mpeg1 requirement set sample_video_mpeg1.ts as default ingest video in ws_chunk_ingest.js

[mediav1@rev2.1] make virtual feed broadcaster real-time: keep only i…

d3bba03

…vf preamble, remove intro buffering openapi: add virtual input audio destination enum microphone,speaker

[mediav1@rev2.1] support audio destination for virtual inputs

b37f81b

route ingest audio to microphone or speaker and wire destination through api, webrtc and ffmpeg

[mediav1@rev2.1] use npm for openapi-down-convert and add audio desti…

41fecb7

…nation handling handle virtual input audio destination (default microphone), update webrtc Configure calls/tests to new signature

[mediav1@rev2.1] virtual inputs: add audio destination routing to api…

c504078

… and docs add destination field and enum for virtual input audio; update README with examples for microphone vs speaker routing and realtime feed notes

[mediav1@rev2.1] openapi: extract virtual input audio destination int…

a9c302c

…o shared schema use $ref for VirtualInputAudio.destination and add description/default

[mediav1@rev2.1] include audio destination in virtual inputs status

dffa91a

add destination field to openapi schema for ingest audio

[mediav1@rev2.1] add virtual input audio destination type and constan…

d38ee09

…ts and expose destination on ingest endpoint simplify textevent stream handlers by replacing manual flush loops with io.Copy

[mediav1@rev2.1] use oapi.speaker enum for virtual input audio destin…

ea9dec5

…ation flush text/event-stream responses with buffered reads and http.Flusher, falling back to io.Copy

[mediav1@rev2.2] use -device to specify pulseaudio sink in ffmpeg args

f6417b8

adjust virtual inputs, webrtc and socket ingest to place pulse as output placeholder

[media@v1@rev2.2] live video+audio agent demo project sample

ce3af8c

[media@v1@rev2.2] live video+audio agent demo project sample*

bb79584

cursor bot reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Realtime Media Capabilities Expansion (v1) #106

Realtime Media Capabilities Expansion (v1) #106

raiden-staging commented Dec 15, 2025 •

edited by cursor bot

Loading

Uh oh!

cursor bot Dec 15, 2025

Uh oh!

cursor bot Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Realtime Media Capabilities Expansion (v1) #106

Are you sure you want to change the base?

Realtime Media Capabilities Expansion (v1) #106

Conversation

raiden-staging commented Dec 15, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aims

Limitations / Current Workarounds

Next Steps

References

Examples

1.1 Virtual Video Input - WebSocket Feed

Configure WebSocket Video Input

Encode Source Video to MPEG-1

Feed Video Chunks (Node.js)

Real-time Behavior

Preview Feed

1.2 Virtual Video Input - WebRTC Feed

Configure WebRTC Video Input

Send Video via WebRTC (Python)

Real-time Factor

1.3 Virtual Audio Input - WebSocket Feed

Configure WebSocket Audio Input (to Virtual Mic)

Visual Examples

Feed Page States

Audio Destinations

Feed Audio Chunks (Node.js)

Example Logs (Real-time Audio Ingest)

1.4 Virtual Audio Input - WebRTC Feed

Configure WebRTC Audio Input

Route to Speaker Instead

Send Audio via WebRTC (Python)

1.5 Combined Virtual Input (Video + Audio)

WebSocket Video + Audio

WebRTC Video + Audio

1.6 Livestream - WebRTC Playback

Start WebRTC Livestream

Connect from Browser (JavaScript)

Real-time Factor

1.7 Livestream - WebSocket Audio

Start Socket Audio Livestream

Consume Audio Stream (Node.js)

Example Logs (Audio Livestream)

1.8 Livestream - RTMP (Local & Remote)

Internal RTMP Server

Play with ffplay

Push to Remote RTMP

1.9 Control Commands

Pause (Black Frames/Silence)

Resume

Stop All

Stop Livestream

API Reference

Virtual Inputs

Livestream

Request/Response Schemas

VirtualInputsRequest

VirtualInputsStatus

StartStreamRequest

StreamInfo

Video Encoding Notes

Encoding Command

Parameters

Audio Format Notes

WebSocket Audio Ingest

WebRTC Audio

Real-time Behavior Summary

Real-time Factor

Real-time Video | WebRTC

WebSocket Real-time Characteristics

Verified Real-time Behavior

Audio Real-time Routing

Uh oh!

cursor bot Dec 15, 2025

Choose a reason for hiding this comment

Bug: Unused environment variable required by validation

Uh oh!

cursor bot Dec 15, 2025

Choose a reason for hiding this comment

Bug: BrowserFlow agent property never initialized

raiden-staging commented Dec 15, 2025 •

edited by cursor bot

Loading