Skip to content

Conversation

@almarklein
Copy link
Member

@almarklein almarklein commented Nov 11, 2025

Ref #66

PRs that led up to this PR

First, we moved the context classes to rendercanvas, allowing rendercanvas to implement custom behavior.

Then we applied several changes, allowing wgpu and and rendercanvas to interoperate for their async work, and to make it efficient and fast using threading:

And some PRs related to present-method selection:

What this PR contributes

General:

  • The bitmap present method is now asynchronous, which significantly improves the performance, making it a viable method for cases where the 'screen' method is fragile.
  • The scheduling logic has undergone major refactoring to support this.
  • Improved support for forced drawing.
  • On Qt with the bitmap method, the canvas does not draw when minimized.
  • Numpy was added as a dependency, to allow faster array handling for bitmap-present.
  • Lays the foundation for more sophisticated bitmap-present submethods, like jpeg, jpeg encoding om the GPU, mpeg, etc.

Internal API changes:

  • BaseRenderCanvas._draw_frame_and_present() is removed.
  • BaseRenderCanvas._rc_request_draw() is split in _rc_request_draw() and _rc_request_paint().
  • BaseRenderCanvas._rc_request_draw() should eventually or directly call _time_to_draw(), when the canvas is ready to receive another frame.
  • BaseRenderCanvas._rc_request_paint() should eventually call _time_to_paint(),
    inside the paint-event if applicable.
  • BaseRenderCanvas._set_visible() can be used by subclasses to disbale drawing
    while the canvas is invisible (e.g. mimimized).
  • BaseRenderCanvas._rc_force_draw() is renamed to _rc_force_paint().
  • Context._rc_present() becomes Context._rc_present(*, force_sync:bool=False).
    • The method may return an async result when force_sync is not set.
    • The new param disallows async in cases like forced draws and manual offscreen canvas.
  • Add Context._rc_set_present_params(**present_params).
    • This allows backends to influence the presentation details.
    • The plan is to build on this later for the bitmap present, e.g. encoding to jpeg.
    • This logic is made part of the context, because some more sophisticated methods may use extra GPU steps before downloading the result, plus any encoding can be done on the mapped data, avoding one data-copy.

Timings

All numbers are in FPS, on a full-screen window on a Retina display (physical size 5120x2774).
The Cocoa is a WIP native backend for MacOS that uses Metal to display a texture that's stored in RAM.
The Null backend has _rc_present_bitmap as a no-op, so the bitmap is downloaded to CPU and then discarted.

Cube example

  • Screen:
    - Glfw: 180-200
    - Qt: 120-190
  • Sync bitmap:
    - Qt: ~51
    - Cocoa: 65-70
    - Null: 70-80
  • Async bitmap:
    - Qt: ~70
    - Cocoa: 110
    - Null: 140

Heavy example

  • Screen:
    - Glfw: 48-51
    - Qt: 48-51
  • Sync bitmap:
    - Qt: 21-23
    - Cocoa: 25-27
    - Null: 27-30
  • Async bitmap:
    - Qt: 46-51
    - Cocoa: 46-51
    - Null: 46-51

Interpretation

  • By doing the presentation asynchronous, the performance can be significantly increased.
  • For light visualizations, where presenting to screen yields 100+ FPS
    • For a delay of 1 frame, the result is faster, but not as fast as screen.
    • TODO: what if we have larger delays?
  • For light visualizations, where presenting to screen yields about 50-60 FPS
    • The sync bitmap present is about twice as slow.
    • The async present with 1 delay is nearly as fast as screen.

@hmaarrfk
Copy link
Contributor

wow!

@hmaarrfk
Copy link
Contributor

I would love to help benchmark on some unique systems configurations i have:

i was having a hard time getting pyside6 to show fps numbers above 60 fps on my (linux) machine.

I feel like I must set:

  1. An environment variale to select which GPU
  2. Is there an environment variable to select the "null" backend??
  3. Would it print the FPS on the terminal for me?

Happy to setup a development environment for this, i can readily test on:

  1. Linux + Intel integrated GPU
  2. Linux + Nvidia dedicated GPU
  3. Linux + AMD integrated GPU
  4. Linux Laptop + Intel Integrated GPU.

@almarklein
Copy link
Member Author

almarklein commented Nov 12, 2025

Create a canvas like this (e.g. taking our cube.py example):

canvas = RenderCanvas(
    title= $backend - $fps fps",
    update_mode="fastest",
    vsync=False,
    present_method='bitmap'  # bitmap or screen
)

Then the fps is shown in the title bar.

There is no actual null backend. I just temporarily made _rc_present_bitmap (the method that consumes the bitmap) return immediately. You can do this with any backend I guess. The idea is that it measures how fast bitmap rendering could be if the consumption of the bitmap were infinitely fast :)

My first benchmarks were on my Mac M1. I will add some tests with Intel and NVidia GPU's later.

@almarklein
Copy link
Member Author

This piece of work was brutal, but it's nearly done now.

Apart from higher fps, this also reduces the delay between processing events and drawing. Below are two movies with an artificial low fps (10 fps). The first shows the previous behaviour:

Screen.Recording.2026-01-23.at.16.42.41.mov

In the new situation, even though the fps is low, the delay is small, which is really important to get a 'smooth' experience (maybe more so than high fps):

Screen.Recording.2026-01-23.at.16.45.46.mov

@hmaarrfk
Copy link
Contributor

does your example mean that you've somehow found a way to shed 1 frame without sacrificing usability?

@almarklein
Copy link
Member Author

does your example mean that you've somehow found a way to shed 1 frame without sacrificing usability?

It's more a question of timing. In remote rendering, you're pushing frames (over the network), and you want a mechanism for that downstream system to throttle the fps. Jupyter-rfb already has such a feedback mechanism based on the number of in-flight frames.

The naive way. By the time you send it, it may already be 'old'.

process events -> render frame -> request draw -> ready -> send 

In current main we do this, but the frame can still be 'outdated':

process events -> request draw -> ready -> render frame -> send

In this PR, we have request_draw in addition to request_paint. Previously they were the same, which meant we could not process events (because you don't want to do that in a paint event). Now that they're separeted, we can:

request draw -> ready -> process events -> render frame -> send

@hmaarrfk
Copy link
Contributor

I see, so placing process events between ready and render frame makes sense as to why it feel like you won "one frame".

@almarklein almarklein marked this pull request as ready for review January 27, 2026 12:26
@almarklein almarklein requested a review from Korijn January 27, 2026 12:26
@almarklein
Copy link
Member Author

Ready; added notes to top post.

Co-authored-by: Jan <Vipitis@users.noreply.github.com>
@Korijn
Copy link
Contributor

Korijn commented Jan 28, 2026

I don't have time to carefully review, go ahead :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants