Skip to content

Conversation

@ustcweizhou
Copy link
Contributor

Description

This PR fixes an issue when collect vm network or disk statistics from hosts.

If cloudstack fails to get vm statistics from a host, the whole process will be terminated, and it will not get vm statistics from remaining hosts.

With this pr, the host with problem will be ignored, and cloudstack will continue on remaining hosts.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

@ustcweizhou ustcweizhou changed the title CLSTACK-3660: Get vm network/disk statistics and update database per host Get vm network/disk statistics and update database per host Jan 20, 2021
@ustcweizhou ustcweizhou changed the title Get vm network/disk statistics and update database per host server: Get vm network/disk statistics and update database per host Jan 20, 2021
Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpickings 👍

s_logger.warn("Error while collecting vm disk stats from hosts", e);
});
} catch (Exception e) {
s_logger.warn("Error while collecting vm disk stats from hosts", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add host details in this message

s_logger.warn("Error while collecting vm network stats from hosts", e);
});
} catch (Exception e) {
s_logger.warn("Error while collecting vm network stats from hosts", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, we can now add host details to the message

@shwstppr
Copy link
Contributor

@ustcweizhou do you plan to address above comments?

@blueorangutan package

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2587

@weizhouapache
Copy link
Member

@DaanHoogland thanks for review. modified some debug message

@shwstppr code is changed.

@shwstppr
Copy link
Contributor

@blueorangutan package

@apache apache deleted a comment from blueorangutan Jan 26, 2021
@blueorangutan
Copy link

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2591

@shwstppr
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@shwstppr a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr
Copy link
Member

yadvr commented Jan 26, 2021

This may require testing on different hypervisors too
@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3412)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39607 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4601-t3412-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 82 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_07_deploy_kubernetes_ha_cluster Failure 3615.86 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.04 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 103.09 test_kubernetes_clusters.py

@blueorangutan
Copy link

Trillian test result (tid-3414)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42587 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4601-t3414-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_templates.py
Smoke tests completed. 81 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_deploy_and_upgrade_kubernetes_cluster Failure 252.62 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 3612.82 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 0.05 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 120.22 test_kubernetes_clusters.py
test_02_create_template_with_checksum_sha1 Error 5.11 test_templates.py
test_03_create_template_with_checksum_sha256 Error 5.11 test_templates.py
test_04_create_template_with_checksum_md5 Error 5.10 test_templates.py

@yadvr yadvr requested a review from shwstppr January 27, 2021 13:59
@ustcweizhou ustcweizhou reopened this Jan 27, 2021
@DaanHoogland
Copy link
Contributor

3 checksum checks that failed on vmware. not related but very disturbing!?!

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are good, but the contained code is not modularised enough.

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested changes with a KVM env, stats working fine.

@yadvr yadvr merged commit 4de6ac3 into apache:4.14 Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants