Skip to content

Indirect agent connection improvements#13028

Draft
sureshanaparti wants to merge 2 commits intoapache:mainfrom
shapeblue:indirect-agent-connection-improvements
Draft

Indirect agent connection improvements#13028
sureshanaparti wants to merge 2 commits intoapache:mainfrom
shapeblue:indirect-agent-connection-improvements

Conversation

@sureshanaparti
Copy link
Copy Markdown
Contributor

Description

This PR improves the Indirect agent connection handling, has the following improvements.

  • Enhances the Host connecting logic to avoid connecting storm (where Agent opens multiple sockets against Management Server).
  • Implements HostConnectProcess task where Host upon connection checks whether lock is available, traces Host connecting progress, status and timeout.
  • Introduces AgentConnectStatusCommand, where Host checks whether lock for the Host is available (i.e. "previous" connect process is finished).
  • Implementes logic to check whether Management Server has lock against Host (exposed MySQL DB lock presence via API)
  • Removes synchronization on Host disconnect process, double-disconnect logic in clustered Management Server environment, added early removal from ping map (in case of combination ping timeout delay + synchronized disconnect process the Agent Manager submits more disconnect requests)
  • Introduces parameterized connection and status check timeouts
  • Implements backoff algorithm abstraction - can be used either constant backoff timeout or exponential with jitter to wait between connection Host attempts to Management Server
  • Implements ServerAttache to be used on the Agent side of communication (similar to Attache on Management Server side)
  • Enhances/Adds logs significantly to Host Agent and Agent Manager logic to trace Host connecting and disconnecting process, including ids, names, context UUIDs and timings (how much time took overall initialization/deinitialization)
  • Adds logs to communication between Management Servers (PDU requests)
  • Adds DB indexes to improve search performance, uses IDEMPOTENT_ADD_INDEX for safer DB schema updates

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

…ements.

- Enhances the Host connecting logic to avoid connecting storm (where Agent opens multiple sockets against Management Server).
- Implements HostConnectProcess task where Host upon connection checks whether lock is available, traces Host connecting progress, status and timeout.
- Introduces AgentConnectStatusCommand, where Host checks whether lock for the Host is available (i.e. "previous" connect process is finished).
- Implementes logic to check whether Management Server has lock against Host (exposed MySQL DB lock presence via API)
- Removes synchronization on Host disconnect process, double-disconnect logic in clustered Management Server environment, added early removal from ping map (in case of combination ping timeout delay + synchronized disconnect process the Agent Manager submits more disconnect requests)
- Introduces parameterized connection and status check timeouts
- Implements backoff algorithm abstraction - can be used either constant backoff timeout or exponential with jitter to wait between connection Host attempts to Management Server
- Implements ServerAttache to be used on the Agent side of communication (similar to Attache on Management Server side)
- Enhances/Adds logs significantly to Host Agent and Agent Manager logic to trace Host connecting and disconnecting process, including ids, names, context UUIDs and timings (how much time took overall initialization/deinitialization)
- Adds logs to communication between Management Servers (PDU requests)
- Adds DB indexes to improve search performance, uses IDEMPOTENT_ADD_INDEX for safer DB schema updates
@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@sureshanaparti sureshanaparti requested a review from nvazquez April 15, 2026 05:17
@sureshanaparti sureshanaparti added this to the 4.23.0 milestone Apr 15, 2026
@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 20.24892% with 1666 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.97%. Comparing base (82bfa9f) to head (d7a044d).

Files with missing lines Patch % Lines
agent/src/main/java/com/cloud/agent/Agent.java 9.83% 320 Missing and 1 partial ⚠️
...java/com/cloud/agent/manager/AgentManagerImpl.java 39.30% 260 Missing and 21 partials ⚠️
...t/src/main/java/com/cloud/agent/ServerAttache.java 8.19% 266 Missing and 3 partials ⚠️
.../main/java/com/cloud/agent/HostConnectProcess.java 22.83% 168 Missing and 1 partial ⚠️
...c/main/java/com/cloud/utils/nio/NioConnection.java 27.06% 85 Missing and 12 partials ⚠️
...s/src/main/java/com/cloud/utils/nio/NioClient.java 18.36% 79 Missing and 1 partial ⚠️
...ils/backoff/impl/ExponentialWithJitterBackoff.java 0.00% 70 Missing ⚠️
...b/src/main/java/com/cloud/utils/db/GlobalLock.java 19.27% 49 Missing and 18 partials ⚠️
...main/java/com/cloud/agent/SynchronousListener.java 0.00% 47 Missing ⚠️
...rk/db/src/main/java/com/cloud/utils/db/DbUtil.java 20.51% 31 Missing ⚠️
... and 20 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13028      +/-   ##
============================================
+ Coverage     17.95%   17.97%   +0.01%     
- Complexity    16522    16567      +45     
============================================
  Files          6022     6032      +10     
  Lines        541387   542879    +1492     
  Branches      66346    66487     +141     
============================================
+ Hits          97211    97582     +371     
- Misses       433210   434283    +1073     
- Partials      10966    11014      +48     
Flag Coverage Δ
uitests 3.53% <ø> (ø)
unittests 19.12% <20.24%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17497

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-15880)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants