Skip to content

Conversation

@astro-stan
Copy link
Contributor

Description

I've been using multus for a while but I've grown frustrated with the hassle of maintaining it due to the lack of a helm chart.

This chart makes deploying and managing Multus easy!

Notable features:

  • Sensible defaults, chart should work "out of the box" for most setups
  • Easy Multus configuration for more advanced use cases
  • Talos integration - allows installing extra reference CNIs, which do not ship with a standard Talos install (macvlan, ipvlan, etc)
  • An "uninstall" mode, which cleans up changes made to the host's filesystem, so the chart can be uninstalled cleanly
  • Docs page

⚒️ Fixes #

⚙️ Type of change

  • ⚙️ Feature/App addition
  • 🪛 Bugfix
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 🔃 Refactor of current code
  • 📜 Documentation Changes

🧪 How Has This Been Tested?

I've been testing it as I have been developing the chart. I have tested the different configuration options, the uninstall mode and the examples/tutorials given in the docs page.

📃 Notes:

✔️ Checklist:

  • ⚖️ My code follows the style guidelines of this project
  • 👀 I have performed a self-review of my own code
  • #️⃣ I have commented my code, particularly in hard-to-understand areas
  • 📄 I have made changes to the documentation
  • 🧪 I have added tests to this description that prove my fix is effective or that my feature works
  • ⬆️ I increased versions for any altered app according to semantic versioning
  • I made sure the title starts with feat(chart-name):, fix(chart-name):, chore(chart-name):, docs(chart-name): or fix(docs):

➕ App addition

If this PR is an app addition please make sure you have done the following.

  • 🖼️ I have added an icon in the Chart's root directory called icon.png

Please don't blindly check all the boxes. Read them and only check those that apply.
Those checkboxes are there for the reviewer to see what is this all about and
the status of this PR with a quick glance.

Copy link
Collaborator

@alfi0812 alfi0812 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not passing testing btw.

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 3, 2026

@astro-stan
Copy link
Contributor Author

Should work now

@astro-stan astro-stan requested a review from alfi0812 January 3, 2026 21:56
@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 3, 2026

Does not seem like it

@astro-stan
Copy link
Contributor Author

Does not seem like it

The chart is rendered successfully and the daemonset is scheduled, however the init container is I believe killed before any logs are displayed.

I've tested the chart on my Talos cluster with default values and it works.

Could this be a problem with CI? Perhaps i should merge in master?

@astro-stan
Copy link
Contributor Author

astro-stan commented Jan 4, 2026

@alfi0812 I've looked at the CI logs again. I think CI might be timing out while pulling the image, as it is quite big:

$ docker image ls
                                                                                                                                                                                                                                        i Info →   U  In Use
IMAGE                                                  ID             DISK USAGE   CONTENT SIZE   EXTRA
ghcr.io/k8snetworkplumbingwg/multus-cni:v4.2.3-thick   8307ee29fc82        482MB             0B

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 5, 2026

@alfi0812 I've looked at the CI logs again. I think CI might be timing out while pulling the image, as it is quite big:

$ docker image ls
                                                                                                                                                                                                                                        i Info →   U  In Use
IMAGE                                                  ID             DISK USAGE   CONTENT SIZE   EXTRA
ghcr.io/k8snetworkplumbingwg/multus-cni:v4.2.3-thick   8307ee29fc82        482MB             0B

400mb is quite small. The problem is most likely your probes. They finish and complete before the init even runs and so the ci counts int as pass.

@astro-stan
Copy link
Contributor Author

The problem is most likely your probes. They finish and complete before the init even runs and so the ci counts int as pass.

Are you referring to the readiness, liveness and startup probes? If so might be misinterpreting the CI logs but I don't think that's the case because:

  • init container has no probes and is not started at all (no logs)
  • The probes of the main container are defined with a 10-12s initial delay and also require several failures multiple seconds apart to consider the container failed. See CI logs:
    Liveness:   exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=12s timeout=5s period=15s #success=1 #failure=5
    Readiness:  exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=10s timeout=5s period=12s #success=2 #failure=4
    Startup:    exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=10s timeout=3s period=5s #success=1 #failure=60
  • The CI shows the container is not ready, just scheduled
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True

I said that i believe the image pull is timing out because of the events shown:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  3s    default-scheduler  Successfully assigned multus-cni-j5ms4frd8z/multus-cni-j5ms4frd8z-whs9k to k3d-k3s-default-server-0
  Normal  Pulling    3s    kubelet            Pulling image "ghcr.io/k8snetworkplumbingwg/multus-cni:v4.2.3-thick"

There are no events after these and I believe there should be an event about the image pull succeeding when the pull finishes.

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 5, 2026

The problem is most likely your probes. They finish and complete before the init even runs and so the ci counts int as pass.

Are you referring to the readiness, liveness and startup probes? If so might be misinterpreting the CI logs but I don't think that's the case because:

  • init container has no probes and is not started at all (no logs)
  • The probes of the main container are defined with a 10-12s initial delay and also require several failures multiple seconds apart to consider the container failed. See CI logs:
    Liveness:   exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=12s timeout=5s period=15s #success=1 #failure=5
    Readiness:  exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=10s timeout=5s period=12s #success=2 #failure=4
    Startup:    exec [sh -c cat "/host/etc/cni/net.d"/00-multus.conf*] delay=10s timeout=3s period=5s #success=1 #failure=60
  • The CI shows the container is not ready, just scheduled
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 False 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True

I said that i believe the image pull is timing out because of the events shown:

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  3s    default-scheduler  Successfully assigned multus-cni-j5ms4frd8z/multus-cni-j5ms4frd8z-whs9k to k3d-k3s-default-server-0
  Normal  Pulling    3s    kubelet            Pulling image "ghcr.io/k8snetworkplumbingwg/multus-cni:v4.2.3-thick"

There are no events after these and I believe there should be an event about the image pull succeeding when the pull finishes.

Those probes most likely instantly pass. As you just check for the existance of a file. So the ci is finished before the rest of the containers can even start

Yeah there should be more. But for CI @PrivatePuffin is the guy. The only thing i can say without a proper testing done. This cannot be merged.

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 5, 2026

ALso this will definetly need more testing in general (multiple ci-values.yaml in the /ci folder). To test the important options.

@astro-stan
Copy link
Contributor Author

astro-stan commented Jan 5, 2026

Those probes most likely instantly pass. As you just check for the existance of a file.

No that couldn't be happening, because that file is generated by Multus on start and removed on exit. So those probes will fail until Multus starts.

@astro-stan
Copy link
Contributor Author

ALso this will definetly need more testing in general (multiple ci-values.yaml in the /ci folder). To test the important options.

That's fine. I can add more once we resolve the issue with CI (assuming it's CI).

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 5, 2026

ALso this will definetly need more testing in general (multiple ci-values.yaml in the /ci folder). To test the important options.

That's fine. I can add more once we resolve the issue with CI (assuming it's CI).

Can you try statefulset instead of daemonset?

@astro-stan
Copy link
Contributor Author

Can you try statefulset instead of daemonset?

Yes, I can change it temporarily in order to test tonight. Out ot curiosity - why do you think that would make a difference?

@alfi0812
Copy link
Collaborator

alfi0812 commented Jan 5, 2026

Can you try statefulset instead of daemonset?

Yes, I can change it temporarily in order to test tonight. Out ot curiosity - why do you think that would make a difference?

Just a random guess as our default is statefulsets. And most chart use it.

@astro-stan
Copy link
Contributor Author

astro-stan commented Jan 6, 2026

@alfi0812 looks like your hunch was right. Changing the workload to a StatefulSet made CI actually start the containers.

Note that the failure is because Multus did not find a primary CNI, which is required for it to work.

I will leave it as a StatefulSet until CI is fixed or we are about to merge this.

@astro-stan
Copy link
Contributor Author

astro-stan commented Jan 13, 2026

@alfi0812 CI now passes, however note that the chart is still set to a StatefulSet. Has CI been fixed? Should I revert it back to a DaemonSet?

@alfi0812
Copy link
Collaborator

@alfi0812 CI now passes, however note that the chart is still set to a StatefulSet. Has CI been fixed? Should I revert it back to a DaemonSet?

Did you see any merges regarding that? So obviously no. Is there a specific reason why you want this to be daemonset and not keep it as a statefulset?

@astro-stan
Copy link
Contributor Author

astro-stan commented Jan 13, 2026

Is there a specific reason why you want this to be daemonset and not keep it as a statefulset?

Yes, since it is a CNI it must run a pod on every node. Thus, it must be a DaemonSet.

@alfi0812
Copy link
Collaborator

Then you will have to wait or propose a fix for the ci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants