From c0a861280e3c9eab84e9c41c6644e1aea48b1af7 Mon Sep 17 00:00:00 2001 From: nvazquez Date: Mon, 13 Apr 2020 15:05:23 -0300 Subject: [PATCH 1/4] KVM Rolling Maintenance --- .../images/kvm-rolling-maintenance.png | Bin 0 -> 3529 bytes source/adminguide/hosts.rst | 121 ++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 source/_static/images/kvm-rolling-maintenance.png diff --git a/source/_static/images/kvm-rolling-maintenance.png b/source/_static/images/kvm-rolling-maintenance.png new file mode 100644 index 0000000000000000000000000000000000000000..ce05eb22d37faba1c157cd47dfdac1e98bac667e GIT binary patch literal 3529 zcmV;)4L0(LP)ZgXgFbngSdJ^%m>TS-JgRCt{2 zoPBTS3%CNS!CJ8v@Wi_Vt4LN8*9-{RlSEB;nSc{9ovqAZri0U+@dSplKTE>FH71^?-rl^Z8JMK@t-auc=TSo$J9D?d>EaBrs{x zq_6B7aE#H`)`rn!x@N+3bgm~OM~);A2t?gC;20wm3K1U{hYOcAjDL$SVGiESuY|?PF>F8WH2JbP35Ew!TR8?ifh!FsM z_~9{5f9gZ))Nr|6j7_otaPEtC632|bLdiNh*Pg+9j6vw^>}1C)<+ObMIrHzi2X9jo z%9kN-m@ok~q~h@$CN?&PGpEn+Kl=|bdv+l|a=WjFb{(B-#ufJ%d*1jjPM`i1$zb56 z9WN6b7fbRuE5?yV43dG*T3Xog@=ju7V@b3mvG0aqLt|B#(c zd+I`Roz&DHaKrb;^ZDm396xcAx{p6W2`a?J#gU$xO0qRspXt*1T64wE zN~Na$0E;CNOJX9o-TGsGbc>6)*chtcdmp8vgInBFxGm?$j2%0cnAljdvwp~^t9frk z_3`dYt6BUn^O$|tJbwPK|IV{-9>X8r?XUSQPwhT+*=GAs=i`qL-pTBF^Z0qhvf%#5 z_+3rQU{*Q!*gT%90jS~Gd5`hGe&EzgEWZC`E?;FdPVw^nck6wUd`{0U7_5sDZ}nDgb)OS3TMuo;lzoP>t8wa!4dxaz6XohOhUp45)u-wxB$L4pJTH(hgC1EBHaY4>|yg0 zPxBw9O{~th(AH3gPn&+(Jyiec1R#;U^i_TpF;r@)dXD8~uaJ@TFt-@`vns;V-vS}R zQ{h4&Y(HbwDzFG~Ri?)UCd?);?|RzE&$ooP`l+h$VlsvE9H>5qq({Zq=tap8!rn{RKS!QF| z^Gr;!QMB-vS6qOgKZsbc;$UMjD}+>A!EQ~OT*-CCVTBVW?K7r3^0s8ZdyZ50H<`N~yW*u1BX z0H7#fyJHCt&rZiQ1U>BE%6b zHWTm#Py!0(JC?9wW;zD^{rJOzyzdd7sI}oxWdd3NKN&12EyisHrEWK6+bZ#D09JB} zOIeU%LTT8?=IsaYDgb6O<`z?QYYP3Irv_lyvr?gu%IGnPSj-kQO(S{SSdP_w#D{es z^&ZnAx6&fF^7k1xadXB^h;D(|Y#}~gLhI@tZq|KBpH)o$`Yn_!EXR?SL0Xm*ceWe1 z!vM@?aegJwso}=$-OZkcG!}1Ifx`g)iZzt%s%B<(KA_+;+`$Vw+?W8b;mE~wUqQpO zehzns*EP&IGTba!IiCyzs7JT)^d5<#buZ$!fLyhn$JcDfx@qnZN~GW;hZojnW5UZT zC2QDoG@qsERcv{_4#&z(ta2FeSFLAREoAp&Mas$}%d?AWc{($#;H#>JoJBZ171RQ# zN7?t5ANPt)Ot%8^K9-m4!ZUp(`9sk&y_i|mjWo_$M^T2KiZx|4=B(v~d@CSR@$}=A z?RT(tI-vSc(jH}Pu?6J)EMK+@PyR}7l`H!3$4mfZ*~HQfYe_e#94#wh+dePu`F=L9 zuf$xsju$gbs14hAe9IoZYtz`h-b4Dz4LqD;0KbQ_N7nON%0?DMbKxKiyT`CtEKHg> zk$A~~EVt0qbeat3WHOwSX^~qwSotnDXWYci&dC5YHJwHXK|(?TW5$fZJZ99@8GxI! znE&Kf=K6g!*4NWeS3y~MIkrVlvLfFaKC6e%(DqsU`k@ADtM+rG*-KNS4^?#%3ZMNQ zu1t*K%ZEPc`ql^n3D-lrSm6d$u4c=V+mSP;G1YEF2*Ih^Iwbd_xXe9vlXIq#Rlb#@ zzB$;!b$Y?B=OJzWAtk{=cE{KJrs+IH%fhPMt_l-DkmJl`+7ui9LzP3;^Hm?)DSZ5yi7xx4WS@db6=0r9N!cROdnNe6VjR{9c+goX(a)lF zcG4aUGG@$Z#*er1$?@YH{@@4#Fq;#}%e@u5eG&pV{J{|#Pn?7pa5!$HqeDUI=;+t> z&=(74?GQU(+{00y(z8wR;(4zLY^fvz@kpqeh5%arQjgzR$ijj=Tv;~c`g#do}S+- zs;VLkX=J7~^Y-3af`CxHl&@UMhDv!r{|)4iOY5iiwnz-_2MuJ~E0cmDJapMCZjLd5XipKJL1>{%vH{yQe!cq3;{ zpXTF_KjEbJG=_u(;$q`CSW!vAjOipLSq8K_G>l|D@C3iuv4b6}UT0kZFq%no74pcv zE=)ooWjZmfTS<|+jLmcI#`T*WEG^xO8L*_idd`OTKG4X< z4U2hOGh(*eFi8facyWn`BkkG9$(4Bh4Lfr)rM~uW`e)|YH?m<#DP|KGtaj{XFU_(5 zX$X3RnY1Z3Uf)>6x_keVM;=^8>kHc{Ee_Y8tAvti$sl}v%Nb$A@ss}j;fSoCh-4UD z+_$WmO=J62M>Qvysl7cOUx{nGjw8UwzlbolULk}Lx^ z;P+}g)PM8|BS#wP>e4uS_AE+!5Uo=~ikC2r9>sX;cub>4qp2$G!FKL0{3+wcjk|oc z-({3~-g+Z|^X{oQ44`^n^_^IB`<%!C`lC z&;Lh~lgAMk8;j9oBJ|~#ghC-=Vq$t19i8h%;M}?4^%(sT*{IK-KMz1`Obnf!ofH(z z=zRqJTp-XhKobHb*z=4+TwL66v(?f0n{ZX$$GDVp{k%u3qw`H+z`!sV40LsM5gV(2 zmrdu}M|XEO21E1*Mg|NFi`k6d@9(`V($V=gp>=ll`)Q~FyMh}gOh5ptrqSJfb)Id~ z(YfAqcX#);X`9U!^_T%?rJ(1lZOzTioI7_8RUPCxejS}}00x7B#Kc5wHrwz%(W#^J zT}2;N>*z#54-6fhDCmKqqZ0)^Fm!aHpa+JIP89wh$qWVBPzY^%00000NkvXXu0mjf DP4m)| literal 0 HcmV?d00001 diff --git a/source/adminguide/hosts.rst b/source/adminguide/hosts.rst index 9693f5599b..431bb99b66 100644 --- a/source/adminguide/hosts.rst +++ b/source/adminguide/hosts.rst @@ -845,3 +845,124 @@ Custom Script Execution Configuration .. _`hooks`: https://libvirt.org/hooks.html .. _`qemu`: https://libvirt.org/hooks.html#qemu .. _`arguments`: https://libvirt.org/hooks.html#arguments + + +KVM Rolling Maintenance +----------------------- + +Overview +~~~~~~~~ + +CloudStack provides a flexible framework for automating the upgrade or patch process of KVM hosts within a zone, pod or cluster by executing custom scripts. These scripts are executed in the context of a stage. Each stage defines only one custom script to be executed. + +There are four stages in the KVM rolling maintenance process: + +#. Pre-Flight stage: Pre-flight script runs on hosts before commencing the rolling maintenance. If pre-flight check scripts return an error from any host, then rolling maintenance will be cancelled with no actions taken, and an error returned. If there are no pre-flight scripts defined, then no checks will be done from the hosts. + +#. Pre-Maintenace stage: Pre-maintenance script runs before a specific host is put into maintenance. If no pre-maintenance script is defined, or if the pre-flight script on a given host determines no pre-maintenance is required on that host, then no pre-maintenance actions will be taken, and the management server will move straight to putting the host in maintenance followed by requesting that the agent runs the maintenance script. + +#. Maintenance stage: Maintenance script runs after a host has been put into maintenance. If no maintenance script is defined, or if the pre-flight or pre-maintenance scripts determine that no maintenance is required, then the host will not be put into maintenance, and the completion of the pre-maintenance scripts will signal the end of all maintenance tasks and the KVM agent will hand the host back to the management server. Once the maintenance scripts have signalled that it has completed, the host agent will signal to the management server that the maintenance tasks have completed, and therefore the host is ready to exit maintenance mode and any 'information' which was collected (such as processing times) will be returned to the management server. + +#. Post-Maintenance stage: Post-maintenance script is expected to perform validation after the host exits maintenance. These scripts will help to detect any problem during the maintenance process, including reboots or restarts within scripts. + +.. note:: + Pre-flight and pre-maintenance scripts’ execution can determine if the maintenance stage is not required for a host. The special exit code = 70 on a pre-flight or pre-maintenance script will let CloudStack know that the maintenance stage is not required for a host. + +Administrators must define only one script per stage. In case a stage does not contain a script, it is skipped, continuing with the next stage. Administrators are responsible for defining and copying scripts into the hosts + +.. note:: + The administrator will be responsible for the maintenance and copying of the hook scripts across all KVM hosts. + +On all the KVM hosts to undergo rolling maintenance, there are two types of script execution approaches: + +· Systemd service executor: This approach uses a systemd service to invoke a script execution. Once a script finishes its execution, it will write content to a file, which the agent reads and sends back the result to the management server. + +· Agent executor: The CloudStack agent invokes a script execution within the JVM. In case the agent is stopped or restarted, the management server will assume the stage was completed when the agent reconnects. This approach does not keep the state in a file. + +Configuration +~~~~~~~~~~~~~ + +The rolling maintenance process can be configured through the following global settings in the management server: + +· ``kvm.rolling.maintenance.stage.timeout``: Defines the timeout (in seconds) for rolling maintenance stage update from hosts to the management servers. The default value is 1800. This timeout is observed per stage. + +· ``kvm.rolling.maintenance.ping.interval``: Defines the ping interval (in seconds) between management server and hosts performing stages during rolling maintenance. The management server checks for updates from the hosts every ‘ping interval’ seconds. The default value is 10. + +· ``kvm.rolling.maintenance.wait.maintenance.timeout``: Defines the timeout (in seconds) to wait for a host preparing to enter maintenance mode as part of a rolling maintenance process. The default value is 1800. + +On each KVM host, the administrator must indicate the directory in which the scripts have been defined, be editing the ``agent.properties`` file, adding the property: + +- ``rolling.maintenance.hooks.dir=`` + +Optionally, the administrator can decide to use a systemd executor for the rolling maintenance scripts on each host (enabled by default) or disabling it, allowing to invoke the scripts through the agent execution. Systemd service execution can be disabled by adding this property on ``agent.properties``: + +- ``rolling.maintenance.service.executor.disabled=true`` + +Usage +~~~~~ + +An administrator can invoke a rolling maintenance process by the ``startRollingMaintenance`` API or through the UI, by selecting one or more zones, pods, clusters or hosts. + +The ``startRollingMaintenance`` API accepts the following parameters: + +- ``hostids``, ``clusterids``, ``podids`` and ``zoneids`` are mutually exclusive, and only one of them must be passed. Each of the mentioned parameters expects a list of ids of the entity that it defines. + +· ``forced``: optional boolean parameter, false by default. When enabled, does not stop iterating through hosts in case of any error in the rolling maintenance process. + +· ``timeout``: optional parameter, defines a timeout in seconds for a stage to be completed in a host. This parameter takes precedence over the timeout defined in the global setting ``kvm.rolling.maintenance.stage.timeout``. + +.. note:: + The timeout (defined by the API parameter or by the global setting) must be greater or equal than the ping interval defined by the global setting ‘kvm.rolling.maintenance.ping.interval’. In case the timeout is lower than the ping interval, the API does not start any maintenance actions and fails fast with a descriptive message. + +· ``payload``: optional string parameter, adds extra arguments to be passed to the scripts on each stage. The string set as parameter is used to invoke each of the scripts involved in the rolling maintenance process for each stage, by appending the payload at the end of the script invocation. + +.. note:: + The payload parameter is appended at the end of each stage script execution. This allows the administrator to define scripts that can accept parameters and pass them through the payload parameter to each stage execution. For example: defining the payload parameter to “param1=val1 param2=val2” will pass both parameter to each stage execution, similar to execute: ‘./script param1=val1 param2=val2’. + + +In the UI, the administrator must select one or multiple zones, pods, clusters or hosts and click the button: |kvm-rolling-maintenance.png| + + +.. |kvm-rolling-maintenance.png| image:: /_static/images/kvm-rolling-maintenance.png + +Process +~~~~~~~ + +Before attempting any maintenance actions, pre-flight checks are performed on every host: + +#. The management server performs capacity checks to ensure that every host in the specified scope can be set into maintenance. These checks include host tags, affinity groups and compute checks + +#. The pre-flight scripts are executed on each host. If any of these scripts fail, then no action is performed unless the ‘force’ parameter is enabled. + +The pre-flight script may signal that no maintenance is needed on the host. In that case, the host is skipped from the rolling maintenance hosts iteration. + +Once pre-flight checks pass, then the management server iterates through each host in the selected scope and sends a command to execute each of the rest of the stages in order. The hosts in the selected scope are grouped by clusters, therefore all the hosts in a cluster are processed before processing the hosts of a different cluster. The management server iterates through hosts in each cluster on the selected scope and does the following: + +- Disables the cluster (if it has not been disabled previously) · The existence of the maintenance script on the host is checked (this check is performed only for the maintenance script, not for the rest of the stages) + + - If the host does not contain a maintenance script, then the host is skipped and the iteration continues with the next host in the cluster. + +- Execute pre maintenance script (if any) before entering maintenance mode. + + - The pre-maintenance script may signal that no maintenance is needed on the host. In that case, the host is skipped and the iteration continues with the next host in the cluster. + + - In case the pre-maintenance script fails and the ‘forced’ parameter is not set, then the rolling maintenance process fails and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster + +- Capacity checks are recalculated, to verify that the host can enter maintenance mode. + + .. note:: + Before recalculating the capacity, the capacity is updated, similar to performing a listCapacity API execution, setting the ‘fetchLatest’ parameter to true + +· The host enters maintenance mode (throwing an error if the host does not enter maintenance after ‘kvm.rolling.maintenance.wait.maintenance.timeout’ seconds) + +· Execute maintenance script (if any) while the host is in maintenance. + + - In case the maintenance script fails and the ‘forced’ parameter is not set, the rolling maintenance process fails, maintenance mode is cancelled and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster + +· Cancel maintenance mode + +· Execute post maintenace script (if any) after cancelling maintenance mode. + + - In case the post-maintenance script fails and the ‘forced’ parameter is not set, then the rolling maintenance process fails and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster + +· Enable the cluster that has been disabled, after all the hosts in the cluster have been processed, or in case an error has occurred. \ No newline at end of file From 35c002020794352dbf0f438675c8d06986ac29c6 Mon Sep 17 00:00:00 2001 From: Nicolas Vazquez Date: Tue, 12 May 2020 10:39:25 -0300 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com> --- source/adminguide/hosts.rst | 55 ++++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 25 deletions(-) diff --git a/source/adminguide/hosts.rst b/source/adminguide/hosts.rst index 431bb99b66..2085723625 100644 --- a/source/adminguide/hosts.rst +++ b/source/adminguide/hosts.rst @@ -857,13 +857,13 @@ CloudStack provides a flexible framework for automating the upgrade or patch pro There are four stages in the KVM rolling maintenance process: -#. Pre-Flight stage: Pre-flight script runs on hosts before commencing the rolling maintenance. If pre-flight check scripts return an error from any host, then rolling maintenance will be cancelled with no actions taken, and an error returned. If there are no pre-flight scripts defined, then no checks will be done from the hosts. +#. Pre-Flight stage: Pre-flight script (``PreFlight`` or ``PreFlight.sh`` or ``PreFlight.py``) runs on hosts before commencing the rolling maintenance. If pre-flight check scripts return an error from any host, then rolling maintenance will be cancelled with no actions taken, and an error returned. If there are no pre-flight scripts defined, then no checks will be done from the hosts. -#. Pre-Maintenace stage: Pre-maintenance script runs before a specific host is put into maintenance. If no pre-maintenance script is defined, or if the pre-flight script on a given host determines no pre-maintenance is required on that host, then no pre-maintenance actions will be taken, and the management server will move straight to putting the host in maintenance followed by requesting that the agent runs the maintenance script. +#. Pre-Maintenace stage: Pre-maintenance script ((``PreMaintenance`` or ``PreMaintenance.sh`` or ``PreMaintenance.py``)) runs before a specific host is put into maintenance. If no pre-maintenance script is defined, then no pre-maintenance actions will be taken, and the management server will move straight to putting the host in maintenance followed by requesting that the agent runs the maintenance script. -#. Maintenance stage: Maintenance script runs after a host has been put into maintenance. If no maintenance script is defined, or if the pre-flight or pre-maintenance scripts determine that no maintenance is required, then the host will not be put into maintenance, and the completion of the pre-maintenance scripts will signal the end of all maintenance tasks and the KVM agent will hand the host back to the management server. Once the maintenance scripts have signalled that it has completed, the host agent will signal to the management server that the maintenance tasks have completed, and therefore the host is ready to exit maintenance mode and any 'information' which was collected (such as processing times) will be returned to the management server. +#. Maintenance stage: Maintenance script ((``Maintenance`` or ``Maintenance.sh`` or ``Maintenance.py``)) runs after a host has been put into maintenance. If no maintenance script is defined, or if the pre-flight or pre-maintenance scripts determine that no maintenance is required, then the host will not be put into maintenance, and the completion of the pre-maintenance scripts will signal the end of all maintenance tasks and the KVM agent will hand the host back to the management server. Once the maintenance scripts have signalled that it has completed, the host agent will signal to the management server that the maintenance tasks have completed, and therefore the host is ready to exit maintenance mode and any 'information' which was collected (such as processing times) will be returned to the management server. -#. Post-Maintenance stage: Post-maintenance script is expected to perform validation after the host exits maintenance. These scripts will help to detect any problem during the maintenance process, including reboots or restarts within scripts. +#. Post-Maintenance stage: Post-maintenance script ((``PostMaintenance`` or ``PostMaintenance.sh`` or ``PostMaintenance.py``)) is expected to perform validation after the host exits maintenance. These scripts will help to detect any problem during the maintenance process, including reboots or restarts within scripts. .. note:: Pre-flight and pre-maintenance scripts’ execution can determine if the maintenance stage is not required for a host. The special exit code = 70 on a pre-flight or pre-maintenance script will let CloudStack know that the maintenance stage is not required for a host. @@ -875,26 +875,26 @@ Administrators must define only one script per stage. In case a stage does not c On all the KVM hosts to undergo rolling maintenance, there are two types of script execution approaches: -· Systemd service executor: This approach uses a systemd service to invoke a script execution. Once a script finishes its execution, it will write content to a file, which the agent reads and sends back the result to the management server. +- Systemd service executor: This approach uses a systemd service to invoke a script execution. Once a script finishes its execution, it will write content to a file, which the agent reads and sends back the result to the management server. -· Agent executor: The CloudStack agent invokes a script execution within the JVM. In case the agent is stopped or restarted, the management server will assume the stage was completed when the agent reconnects. This approach does not keep the state in a file. +- Agent executor: The CloudStack agent invokes a script execution within the JVM. In case the agent is stopped or restarted, the management server will assume the stage was completed when the agent reconnects. This approach does not keep the state in a file. Configuration ~~~~~~~~~~~~~ The rolling maintenance process can be configured through the following global settings in the management server: -· ``kvm.rolling.maintenance.stage.timeout``: Defines the timeout (in seconds) for rolling maintenance stage update from hosts to the management servers. The default value is 1800. This timeout is observed per stage. +- ``kvm.rolling.maintenance.stage.timeout``: Defines the timeout (in seconds) for rolling maintenance stage update from hosts to the management servers. The default value is 1800. This timeout is observed per stage. -· ``kvm.rolling.maintenance.ping.interval``: Defines the ping interval (in seconds) between management server and hosts performing stages during rolling maintenance. The management server checks for updates from the hosts every ‘ping interval’ seconds. The default value is 10. +- ``kvm.rolling.maintenance.ping.interval``: Defines the ping interval (in seconds) between management server and hosts performing stages during rolling maintenance. The management server checks for updates from the hosts every ‘ping interval’ seconds. The default value is 10. -· ``kvm.rolling.maintenance.wait.maintenance.timeout``: Defines the timeout (in seconds) to wait for a host preparing to enter maintenance mode as part of a rolling maintenance process. The default value is 1800. +- ``kvm.rolling.maintenance.wait.maintenance.timeout``: Defines the timeout (in seconds) to wait for a host preparing to enter maintenance mode as part of a rolling maintenance process. The default value is 1800. On each KVM host, the administrator must indicate the directory in which the scripts have been defined, be editing the ``agent.properties`` file, adding the property: - ``rolling.maintenance.hooks.dir=`` -Optionally, the administrator can decide to use a systemd executor for the rolling maintenance scripts on each host (enabled by default) or disabling it, allowing to invoke the scripts through the agent execution. Systemd service execution can be disabled by adding this property on ``agent.properties``: +Optionally, the administrator can decide to disable the systemd executor for the rolling maintenance scripts on each host (enabled by default), allowing the agent to invoke the scripts through the agent execution. This can be done by editing the ``agent.properties`` file, adding the property: - ``rolling.maintenance.service.executor.disabled=true`` @@ -905,16 +905,16 @@ An administrator can invoke a rolling maintenance process by the ``startRollingM The ``startRollingMaintenance`` API accepts the following parameters: -- ``hostids``, ``clusterids``, ``podids`` and ``zoneids`` are mutually exclusive, and only one of them must be passed. Each of the mentioned parameters expects a list of ids of the entity that it defines. +- ``hostids``, ``clusterids``, ``podids`` and ``zoneids`` are mutually exclusive, and only one of them must be passed. Each of the mentioned parameters expects a comma-separated list of ids of the entity that it defines. -· ``forced``: optional boolean parameter, false by default. When enabled, does not stop iterating through hosts in case of any error in the rolling maintenance process. +- ``forced``: optional boolean parameter, false by default. When enabled, does not stop iterating through hosts in case of any error in the rolling maintenance process. -· ``timeout``: optional parameter, defines a timeout in seconds for a stage to be completed in a host. This parameter takes precedence over the timeout defined in the global setting ``kvm.rolling.maintenance.stage.timeout``. +- ``timeout``: optional parameter, defines a timeout in seconds for a stage to be completed in a host. This parameter takes precedence over the timeout defined in the global setting ``kvm.rolling.maintenance.stage.timeout``. .. note:: The timeout (defined by the API parameter or by the global setting) must be greater or equal than the ping interval defined by the global setting ‘kvm.rolling.maintenance.ping.interval’. In case the timeout is lower than the ping interval, the API does not start any maintenance actions and fails fast with a descriptive message. -· ``payload``: optional string parameter, adds extra arguments to be passed to the scripts on each stage. The string set as parameter is used to invoke each of the scripts involved in the rolling maintenance process for each stage, by appending the payload at the end of the script invocation. +- ``payload``: optional string parameter, adds extra arguments to be passed to the scripts on each stage. The string set as parameter is used to invoke each of the scripts involved in the rolling maintenance process for each stage, by appending the payload at the end of the script invocation. .. note:: The payload parameter is appended at the end of each stage script execution. This allows the administrator to define scripts that can accept parameters and pass them through the payload parameter to each stage execution. For example: defining the payload parameter to “param1=val1 param2=val2” will pass both parameter to each stage execution, similar to execute: ‘./script param1=val1 param2=val2’. @@ -922,13 +922,15 @@ The ``startRollingMaintenance`` API accepts the following parameters: In the UI, the administrator must select one or multiple zones, pods, clusters or hosts and click the button: |kvm-rolling-maintenance.png| +.. note:: + Keep in mind that the rolling maintenance job results are not shown in the UI. To see the job output, one must use API/CLI (i.e. CloudMonkey). .. |kvm-rolling-maintenance.png| image:: /_static/images/kvm-rolling-maintenance.png Process ~~~~~~~ -Before attempting any maintenance actions, pre-flight checks are performed on every host: +Before attempting any maintenance actions, pre-flight and capacity checks are performed on every host: #. The management server performs capacity checks to ensure that every host in the specified scope can be set into maintenance. These checks include host tags, affinity groups and compute checks @@ -936,13 +938,16 @@ Before attempting any maintenance actions, pre-flight checks are performed on ev The pre-flight script may signal that no maintenance is needed on the host. In that case, the host is skipped from the rolling maintenance hosts iteration. -Once pre-flight checks pass, then the management server iterates through each host in the selected scope and sends a command to execute each of the rest of the stages in order. The hosts in the selected scope are grouped by clusters, therefore all the hosts in a cluster are processed before processing the hosts of a different cluster. The management server iterates through hosts in each cluster on the selected scope and does the following: +Once pre-flight checks pass, then the management server iterates through each host in the selected scope and sends a command to execute each of the rest of the stages in order. The hosts in the selected scope are grouped by clusters, therefore all the hosts in a cluster are processed before processing the hosts of a different cluster. + +The management server iterates through hosts in each cluster on the selected scope and for each of the hosts does the following: -- Disables the cluster (if it has not been disabled previously) · The existence of the maintenance script on the host is checked (this check is performed only for the maintenance script, not for the rest of the stages) +- Disables the cluster (if it has not been disabled previously) +- The existence of the maintenance script on the host is checked (this check is performed only for the maintenance script, not for the rest of the stages) - - If the host does not contain a maintenance script, then the host is skipped and the iteration continues with the next host in the cluster. + - If the host does not contain a maintenance script, then the host is skipped and the iteration continues with the next host in the cluster. -- Execute pre maintenance script (if any) before entering maintenance mode. +- Execute pre-maintenance script (if any) before entering maintenance mode. - The pre-maintenance script may signal that no maintenance is needed on the host. In that case, the host is skipped and the iteration continues with the next host in the cluster. @@ -953,16 +958,16 @@ Once pre-flight checks pass, then the management server iterates through each ho .. note:: Before recalculating the capacity, the capacity is updated, similar to performing a listCapacity API execution, setting the ‘fetchLatest’ parameter to true -· The host enters maintenance mode (throwing an error if the host does not enter maintenance after ‘kvm.rolling.maintenance.wait.maintenance.timeout’ seconds) +- The host is instructed to enter the maintenance mode. If the host doesn't enter the maintenance mode after ‘kvm.rolling.maintenance.wait.maintenance.timeout’ seconds an exception is thrown and the API will stop executing, but the host may eventually reach the maintenance mode as this is out of the control of the rolling maintenance API/code. · Execute maintenance script (if any) while the host is in maintenance. - - In case the maintenance script fails and the ‘forced’ parameter is not set, the rolling maintenance process fails, maintenance mode is cancelled and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster + - In case the maintenance script fails and the ‘forced’ parameter is not set, the rolling maintenance process fails, maintenance mode is cancelled and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster -· Cancel maintenance mode +- Cancel maintenance mode -· Execute post maintenace script (if any) after cancelling maintenance mode. +- Execute post maintenance script (if any) after cancelling maintenance mode. - - In case the post-maintenance script fails and the ‘forced’ parameter is not set, then the rolling maintenance process fails and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster + - In case the post-maintenance script fails and the ‘forced’ parameter is not set, then the rolling maintenance process fails and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster -· Enable the cluster that has been disabled, after all the hosts in the cluster have been processed, or in case an error has occurred. \ No newline at end of file +- Enable the cluster that has been disabled, after all the hosts in the cluster have been processed, or in case an error has occurred. From e636856ffda11173b8fdc4f1e814891902997fdb Mon Sep 17 00:00:00 2001 From: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com> Date: Tue, 12 May 2020 15:48:38 +0200 Subject: [PATCH 3/4] Update source/adminguide/hosts.rst Co-authored-by: Nicolas Vazquez --- source/adminguide/hosts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/adminguide/hosts.rst b/source/adminguide/hosts.rst index 2085723625..f2aaa73f76 100644 --- a/source/adminguide/hosts.rst +++ b/source/adminguide/hosts.rst @@ -871,7 +871,7 @@ There are four stages in the KVM rolling maintenance process: Administrators must define only one script per stage. In case a stage does not contain a script, it is skipped, continuing with the next stage. Administrators are responsible for defining and copying scripts into the hosts .. note:: - The administrator will be responsible for the maintenance and copying of the hook scripts across all KVM hosts. + The administrator will be responsible for the maintenance and copying of the scripts across all KVM hosts. On all the KVM hosts to undergo rolling maintenance, there are two types of script execution approaches: From 2afd85d507ef9699fae2809ea9e44b6acd0f7532 Mon Sep 17 00:00:00 2001 From: Andrija Panic <45762285+andrijapanicsb@users.noreply.github.com> Date: Tue, 12 May 2020 18:02:12 +0200 Subject: [PATCH 4/4] Update source/adminguide/hosts.rst --- source/adminguide/hosts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/adminguide/hosts.rst b/source/adminguide/hosts.rst index f2aaa73f76..92623792f5 100644 --- a/source/adminguide/hosts.rst +++ b/source/adminguide/hosts.rst @@ -960,7 +960,7 @@ The management server iterates through hosts in each cluster on the selected sco - The host is instructed to enter the maintenance mode. If the host doesn't enter the maintenance mode after ‘kvm.rolling.maintenance.wait.maintenance.timeout’ seconds an exception is thrown and the API will stop executing, but the host may eventually reach the maintenance mode as this is out of the control of the rolling maintenance API/code. -· Execute maintenance script (if any) while the host is in maintenance. +- Execute maintenance script (if any) while the host is in maintenance. - In case the maintenance script fails and the ‘forced’ parameter is not set, the rolling maintenance process fails, maintenance mode is cancelled and an error is reported. If the ‘forced’ parameter is set, the host is skipped and the iteration continues with the next host in the cluster