- HUB75 DMA-Based Driver
- Documentation and References
- Achievements of the Revised Driver
- Motivation
- Evolution of Pico HUB75 Drivers
- Eliminating
hub75_wait_tx_stall - DMA Chains and PIO State Machines in the Revised HUB75 Driver
- Conclusion for DMA and PIO based Approach
- Improved Colour Perception
- Brightness Control
- Demo Effects
- How to Use This Project in VSCode
- Next Steps
- Prerequisites for the Hub75 Driver
hub75_demo.mp4
Demo video: Colours are much brighter and more brilliant in reality
This project is based on:
To understand how RGB matrix panels work, refer to the article Everything You Didn't Want to Know About RGB Matrix Panels. For details on Binary Coded Modulation (BCM), see LED Dimming Using Binary Code Modulation.
The modifications to the Pimoroni HUB75 driver result in the following improvements:
- Offloading Work: Moves processing from the CPU to DMA and PIO co-processors.
- Performance Boost: Implements self-paced, interlinked DMA and PIO processes.
- Eliminates Synchronization Delays: No need for
hub75_wait_tx_stall, removing blocking synchronization. - Optimized Interrupt Handling: Reduces code complexity in the interrupt handler.
These enhancements lead to significant performance improvements. In tests up to a 250 MHz system clock, no ghost images were observed.
As part of a private project, I sought to gain deeper knowledge of the Raspberry Pi Pico microcontroller. I highly recommend Raspberry Pi Pico Lectures 2022 by Hunter Adams—they provide excellent insights!
If you are specifically interested in PIO (Programmable Input/Output), start with Lecture 14: Introducing PIO and Lecture 15: PIO Overview and Examples.
Inspired by Adams' discussion on DMA and PIO co-processors, I optimized the HUB75 driver as a self-assigned challenge.
😊 Raspberry Pi Pico Lectures 2025 by Hunter Adams is available now!
The Pico HUB75 example demonstrates connecting an HUB75 LED matrix panel using PIO. This educational example prioritizes clarity and ease of understanding.
- The color palette is generated by modulating the Output Enable (OE) signal.
- Binary Coded Modulation (BCM) is applied row-by-row, modulating all color bits before advancing to the next row.
- Synchronization depends on
hub75_wait_tx_stall. - No DMA is used, leading to lower performance.
The Pimoroni HUB75 driver improves performance by:
- Switching from row-wise to plane-wise modulation handling.
- Using DMA to transfer pixel data to the PIO state machine.
- Still relying on
hub75_wait_tx_stallfor synchronization.
Picture 1: Pimoroni's Hub75 Driver DMA Section
Both the Raspberry Pi and Pimoroni implementations use hub75_wait_tx_stall, which ensures:
- The state machine stalls on an empty TX FIFO.
- The system waits until the OEn pulse has finished.
However, this blocking method prevents an efficient DMA-based approach.
static inline void hub75_wait_tx_stall(PIO pio, uint sm) {
uint32_t txstall_mask = 1u << (PIO_FDEBUG_TXSTALL_LSB + sm);
pio->fdebug = txstall_mask;
while (!(pio->fdebug & txstall_mask)) {
tight_loop_contents();
}
}Instead of waiting for TX FIFO stalling, we can:
- Modify the PIO program to emit a signal once the OEn pulse completes.
- Set up a DMA channel to listen for this signal.
- Establish an interrupt handler to trigger once the signal is received.
This approach allows fully chained DMA execution without CPU intervention.
Picture 2: Modified hub75_row Program
The following diagram illustrates the interactions between DMA channels and PIO state machines:
[ Pixel Data DMA ] -> [ hub75_data_rgb888 PIO ]
|
|--> [ Dummy Pixel Data DMA ] -> [ hub75_data_rgb888 PIO ]
|
|--> [ OEn Data DMA ] -> [ hub75_row PIO ]
|
|--> [ OEn Finished DMA ] (Triggers interrupt)
-
Pixel Data Transfer
- Pixel data is streamed via DMA to the hub75_rdata_gb888 PIO state machine.
- This handles shifting pixel data into the LED matrix.
-
Dummy Pixel Handling
- A secondary dummy pixel DMA channel adds additional pixel data.
- This ensures correct clocking of the final piece of genuine data.
-
OEn Pulse Generation
- The OEn data DMA channel sends 32-bit words - 5 bit address information (row select) and 27 bit puls width - to the hub75_row PIO state machine.
- This output enable signal switches on those LEDs in the current row with bit set in the current bitplane for the specified number of cycles.
-
Interrupt-Driven Synchronization
- A final OEn finished DMA channel listens for the end of the pulse.
- An interrupt handler (
oen_finished_handler) resets DMA for the next cycle.
Picture 3: Chained DMA Channels and assigned PIOs
With a bit-depth of 10, the HUB75 driver achieves the following refresh rates for a 64 x 64 matrix depending on the system clock:
| System Clock | Refresh Rate |
|---|---|
| 100 MHz | 179 Hz |
| 150 MHz | 268 Hz |
| 200 MHz | 358 Hz |
| 250 MHz | 448 Hz |
These results demonstrate stable operation and high-performance display rendering across a wide range of system clocks.
✅ Fully automated data transfer using chained DMA channels.
✅ Eliminates CPU-intensive busy-waiting (hub75_wait_tx_stall).
✅ Ensures precise timing without unnecessary stalling.
By offloading tasks to DMA and PIO, the revised HUB75 driver achieves higher performance, simpler interrupt handling, and better synchronization. This approach significantly reduces CPU overhead while eliminating artifacts like ghosting at high clock speeds.
If you're interested in optimizing RGB matrix panel drivers, this implementation serves as a valuable reference for efficient DMA-based rendering.
The graphics system for the demo effects operates in RGB888 format (8 bits per channel, 24 bits per pixel). To better match human vision, colours are mapped using the CIE 1931 lightness curve. This mapping effectively expands the usable range to 10 bits per channel.
The HUB75 driver takes advantage of this: its PIO/DMA pipeline packs each pixel as a 32-bit word with 10 bits for red, 10 bits for green, and 10 bits for blue.
To go beyond native 10-bit precision without changing the data format, the driver employs temporal dithering (an accumulator-based technique):
- Each pixel maintains a high-precision accumulator (e.g. 16 bits).
- On every refresh, the top 10 bits are sent to the panel, while the lower bits remain stored.
- Over successive frames, these residuals accumulate, averaging out to produce smoother gradients.
This results in a perceived colour depth equivalent to 12–14 bits per channel.
- Noticeable improvement in gradients and subtle colour transitions.
- Minimal CPU overhead (shifts and adds only).
- Requires additional RAM for accumulators.
For a 64×64 panel:64 × 64 × 3 × sizeof(uint32_t) ≈ 48 KB.
In addition to bitplane modulation, the driver supports software-based brightness regulation. This allows easy adjustment of overall panel brightness without hardware changes.
// Set the baseline brightness scaling factor (default = 6, range 1–255).
// Larger values increase brightness but also raise OEn frequency.
void setBasisBrightness(uint8_t factor);
// Set fine-grained brightness intensity as a fraction [0.0 – 1.0].
void setIntensity(float intensity);-
setBasisBrightness(basis)Defines the top brightness.
Example:
setBasisBrightness(6)→ default brightness range for typical 64×64 panels.
Larger factors give more headroom for brightness but consume more Binary Coded Modulation (BCM) time slices. -
setIntensity(intensity)Fine-grained adjustment from 0.0 (dark/off) to 1.0 (full brightness).
This function scales the effective duty cycle within the current baseline brightness range.
// Example: brighten the panel, then dim at runtime
setBasisBrightness(8); // Start with baseline factor 8 for a brighter panel
setIntensity(0.5f); // Show at 50% of that baselinebasis_factor = 6uintensity = 1.0f(full brightness within the baseline)
This corresponds to the same brightness as earlier driver revisions without adjustment.
- Increasing the basis factor may increase peak current consumption.
- For indoor use, values between 4–8 are usually sufficient.
- For dimmer environments, you can keep the baseline factor low (e.g. 4) and rely on setIntensity() for smooth runtime control.
- Both functions are non-blocking and can be called during normal operation.
You can easily use this project with VSCode, especially with the Raspberry Pi Pico plugin installed. Follow these steps:
-
Open VSCode and start a new window.
-
Clone the repository:
-
Project Import Prompt:
-
Configure Pico SDK Settings:
-
Wait for Setup Completion:
- VSCode will download required tools, the Pico SDK, and any plugins.
-
Connect the Hardware:
- Make sure the HUB75 LED matrix is properly connected to the Raspberry Pi Pico.
- Attach the Rasberry Pi Pico USB cable to your computer
-
Build and Upload:
💡 If everything is set up correctly, your matrix should come to life with the updated HUB75 DMA driver.
-
Add another chained DMA channel to further reduce calls to the oen_finished_handler, trading memory for reduced CPU load.
-
Investigate removing the hub75_data_rgb888_set_shift method, potentially achieving a completely DMA- and PIO-based solution with no CPU involvement.
For any questions or discussions, feel free to contribute or open an issue!
This driver is designed for a 64×64 LED matrix panel. It can be adapted for 64×32, 32×32, or other HUB75-compatible panels.
The PIO implementation requires that data pins (colours) and row-select pins must be in consecutive GPIO blocks.
The default implementation looks like this (see hub75.cpp). An example of a valid alternative pin defintion is shown in Allowed Deviations
// Default wiring of HUB75 matrix to RP2350
#define DATA_BASE_PIN 0 // first color data pin
#define DATA_N_PINS 6 // number of color data pins (R0,G0,B0,R1,G1,B1)
#define ROWSEL_BASE_PIN 6 // first row-select (address) pin
#define ROWSEL_N_PINS 5 // number of row-select pins (A0–A4)
#define CLK_PIN 11 // clock
#define STROBE_PIN 12 // latch (LAT)
#define OEN_PIN 13 // output enable (OE)DATA_BASE_PIN= GPIO 0 (first in a consecutive block)DATA_N_PINS= 6 (for R0, G0, B0, R1, G1, B1)
| Hub75 Colour Bit | connected to | Pico GPIO |
|---|---|---|
| R0 | 0 | |
| G0 | 1 | |
| B0 | 2 | |
| R1 | 3 | |
| G1 | 4 | |
| B1 | 5 |
-
ROWSEL_BASE_PIN= GPIO 6 -
ROWSEL_N_PINS= 5 (A0–A4) -
Consecutiveness is required by the PIO program.
| Address bit | connected to | Pico GPIO |
|---|---|---|
| A0 | 6 | |
| A1 | 7 | |
| A2 | 8 | |
| A3 | 9 | |
| A4 | 10 |
- CLK (clock): GPIO 11
- LAT (strobe/latch): GPIO 12
- OE (output enable): GPIO 13
The diagram shows the default mapping as defined in the hub75.cpp file.
The only strict requirement is that data pins and row-select pins must be in consecutive GPIO blocks. Clock, Latch, and OE pins may be freely chosen.
#define ROWSEL_BASE_PIN 15 // Row select pins moved to GPIO 15–19
#define ROWSEL_N_PINS 5 // number of row-select pins (A0–A4)
#define DATA_BASE_PIN 3 // Color data pins starting at GPIO 3
#define DATA_N_PINS 6 // number of color data pins (R0,G0,B0,R1,G1,B1)
// Control pins assigned to arbitrarily GPIO pins
#define CLK_PIN 0
#define STROBE_PIN 1
#define OEN_PIN 2The Hub75 driver is designed for 64×64 panels, but it also supports 64×32 panels (half the height) and other panel dimensions e.g. 128x64.
The electrical connections for 64x32 panels are nearly identical to 64x64 panels — the difference lies in how rows are addressed and how the frame buffer is filled.
In file hub75_driver.cpp use the same pin definitions as for a 64×64 panel, except for the ROWSEL_N_PINS definition which must be changed to 4 (A0–A3). Address line A4 is not connected.
// Default wiring of HUB75 matrix to RP2350
#define DATA_BASE_PIN 0 // first color data pin
#define DATA_N_PINS 6 // number of color data pins (R0,G0,B0,R1,G1,B1)
#define ROWSEL_BASE_PIN 6 // first row-select (address) pin
#define ROWSEL_N_PINS 4 // number of row-select pins (A0–A3)
#define CLK_PIN 11 // clock
#define STROBE_PIN 12 // latch (LAT)
#define OEN_PIN 13 // output enable (OE)At the top of file hub75_driver.cpp set the MATRIX_PANEL_HEIGHT to the height of the panel (32 in this example).
#define MATRIX_PANEL_WIDTH 64
#define MATRIX_PANEL_HEIGHT 32
#define OFFSET MATRIX_PANEL_WIDTH *(MATRIX_PANEL_HEIGHT >> 1)- A 64×32 panel uses half the memory of a 64×64 panel.
- Each refresh cycle addresses 16 row pairs (instead of 32).
- The driver automatically adjusts its addressing logic based on the height you specify.
Not all of the demo effects will show correctly for a 64x32 matrix panel. The first two demo effects use image data for a 64x64 layout. You will see some output, but it will look weird.
The bouncing balls effect will not show the complete text as the position is hard coded. The fire_effect and the rotatormight look as they should be.
Have fun with adapting the source code or with implementing your own effects.
Do not hesitate to contact me - I will gladly answer your questions!
Many HUB75 panel datasheets describe the scan rate using terms like 1:16 or 1:32.
This can be confusing, because the actual driver behavior is about how many rows are lit simultaneously.
To make things explicit, this driver uses multiplexing defines:
| Multiplexing Mode | Rows Lit at Once | Typical Datasheet Scan Rate | Example Panels |
|---|---|---|---|
#define HUB75_MULTIPLEX_2_ROWS |
2 rows | 1:32 | 64×64 (1:32), 64×32 (1:16) |
#define HUB75_MULTIPLEX_4_ROWS |
4 rows | 1:16 or 1:8 | 64×64 (1:16), 64×32 (1:8) |
-
Datasheet says 1:32 (one out of 32 row groups active at a time).
-
This means 2 rows lit simultaneously.
-
In code, use:
#define HUB75_MULTIPLEX_2_ROWS
In your build, define the scan rate that matches your panel:
// Example for 64×64 panels (1/32 scan) - two rows lit simultaneously
#define HUB75_MULTIPLEX_2_ROWS
// Set the number of address lines - 2 rows lit simultaneously leaves 32 rows to be adressed via row select.
// That is 32 = 2 to the power of 5 - we need 5 row select pins
#define ROWSEL_N_PINS 5
// Example for 64×32 panels (1/8 scan) - four rows lit simultaneously
#define HUB75_MULTIPLEX_4_ROWS
// Set the number of address lines - 4 rows lit simultaneously leaves 8 rows to be adressed via row select.
// That is 8 = 2 to the power of 3 - we need 3 row select pins
#define ROWSEL_N_PINS 3








