net, tests, hot-plug: W/A guest-agent deadlock after hot plug#5466
net, tests, hot-plug: W/A guest-agent deadlock after hot plug#5466Anatw wants to merge 2 commits into
Conversation
hot_plug_interface() already returns the full interface status including interfaceName. Pass it through to set_secondary_static_ip_address() instead of re-querying VMI status for the same value. Signed-off-by: Anat Wax <awax@redhat.com> Assisted-by: Claude <noreply@anthropic.com>
CNV-77961 causes guest-agent to stop reporting newly hot-plugged interfaces, failing tests that depend on interface data from VMI status. Work around the bug so tests continue to run until the fix is released. Signed-off-by: Anat Wax <awax@redhat.com> Assisted-by: Claude <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
👮 Files not reviewed due to content moderation or server errors (2)
📝 Walkthrough
🚥 Pre-merge checks | ❌ 1❌ Failed checks (1 inconclusive)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
For testing purposes only. Kill guest-agent before hot-plug and omit guest_device_name to guarantee all three W/A fallback paths are exercised. Not for review or merge. Needed for PR RedHatQE#5466 Signed-off-by: Anat Wax <awax@redhat.com> Assisted-by: Claude <noreply@anthropic.com>
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Branch Management
Custom Commands
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
Security Checks
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
|
/build-and-push-container |
|
/verified Both natural run + forced Q/A (using #5467) |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Output rules: Your deliverable: Test Execution Plan
Use these formats:
Real test commands (MANDATORY when changes affect session/runtime code): When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
Example output for a session-startup fix: Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL — Verification rules for smoke and gating decisions: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
gh pr view 5466 --repo RedHatQE/openshift-virtualization-tests --json files,title,number -q '.files[].path' | head -50
echo "---diff---"
gh pr diff 5466 --repo RedHatQE/openshift-virtualization-tests | head -400Length of output: 9633 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "--- markers in test file ---"
rg -n '`@pytest.mark`' tests/network/l2_bridge/test_bridge_nic_hot_plug.py
echo "--- callers of functions across repo ---"
rg -n 'hot_plug_interface\(|set_secondary_static_ip_address\(|get_guest_vm_interface_name_by_vmi_interface_name\(|hot_plug_interface_and_set_address\(' --type=py -g '!libl2bridge.py'
echo "--- smoke/gating files under l2_bridge ---"
rg -n '`@pytest.mark.smoke`|`@pytest.mark.gating`' tests/network/l2_bridge/
echo "--- conftest in dir hierarchy ---"
fd conftest.py tests/network/l2_bridge tests/network
echo "--- first changed line in first file for inline comment context ---"
sed -n '1,10p' tests/network/l2_bridge/libl2bridge.pyLength of output: 2516 🏁 Script executed: #!/bin/bash
echo "--- callers of functions across repo (retry) ---"
rg -n 'hot_plug_interface\(|set_secondary_static_ip_address\(|get_guest_vm_interface_name_by_vmi_interface_name\(|hot_plug_interface_and_set_address\(' --type py | grep -v 'tests/network/l2_bridge/libl2bridge.py'
echo "--- done callers ---"
echo "--- smoke/gating markers repo-wide referencing l2_bridge ---"
rg -n '`@pytest.mark.smoke`|`@pytest.mark.gating`' tests/network/l2_bridge/ || echo "none found"
echo "--- conftest files ---"
fd conftest.py tests/network/l2_bridge tests/network || true
echo "--- first changed line ---"
sed -n '1,5p' tests/network/l2_bridge/libl2bridge.pyLength of output: 1004 🏁 Script executed: #!/bin/bash
echo "=== callers repo-wide ==="
rg -n 'hot_plug_interface\(|set_secondary_static_ip_address\(|get_guest_vm_interface_name_by_vmi_interface_name\(|hot_plug_interface_and_set_address\(' --type py
echo "=== smoke/gating markers in l2_bridge dir ==="
rg -n '`@pytest.mark.smoke`|`@pytest.mark.gating`' tests/network/l2_bridge/
echo "=== conftest files ==="
fd conftest.py tests/network/l2_bridge tests/network
echo "=== first lines of file ==="
sed -n '1,5p' tests/network/l2_bridge/libl2bridge.pyLength of output: 836 🏁 Script executed: #!/bin/bash
rg -n 'hot_plug_interface\(' --type py || true
echo "---"
rg -n 'set_secondary_static_ip_address\(' --type py || true
echo "---"
rg -n 'get_guest_vm_interface_name_by_vmi_interface_name\(' --type py || true
echo "---"
rg -n 'hot_plug_interface_and_set_address\(' --type py || trueLength of output: 431 🏁 Script executed: #!/bin/bash
pwd
ls
fd -e py | wc -l
rg -n "def hot_plug_interface" .Length of output: 654 🏁 Script executed: #!/bin/bash
echo "=== callers of hot_plug_interface( ==="
rg -n 'hot_plug_interface\(' tests utilities libs
echo "=== callers of set_secondary_static_ip_address( ==="
rg -n 'set_secondary_static_ip_address\(' tests utilities libs
echo "=== callers of get_guest_vm_interface_name_by_vmi_interface_name( ==="
rg -n 'get_guest_vm_interface_name_by_vmi_interface_name\(' tests utilities libs
echo "=== callers of hot_plug_interface_and_set_address( ==="
rg -n 'hot_plug_interface_and_set_address\(' tests utilities libs
echo "=== smoke/gating in l2_bridge ==="
rg -n '`@pytest.mark.smoke`|`@pytest.mark.gating`' tests/network/l2_bridge
echo "=== conftest files ==="
fd conftest.py tests/network/l2_bridge tests/networkLength of output: 4477 🏁 Script executed: #!/bin/bash
set -euo pipefail
REPO=RedHatQE/openshift-virtualization-tests
PR=5466
HEAD_SHA=$(gh pr view $PR --repo $REPO --json headRefOid -q .headRefOid)
echo "HEAD_SHA=$HEAD_SHA"
BODY=$(cat <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — No `@pytest.mark.smoke` tests exist under `tests/network/l2_bridge/`.
- **Run gating tests: False** — No `@pytest.mark.gating` tests exist under `tests/network/l2_bridge/`; all callers of the changed functions (`hot_plug_interface`, `hot_plug_interface_and_set_address`, `set_secondary_static_ip_address`, `get_guest_vm_interface_name_by_vmi_interface_name`) are confined to `tests/network/l2_bridge/test_bridge_nic_hot_plug.py` and no conftest in the `tests/network/l2_bridge/` hierarchy wraps them with an autouse fixture.
- **Affected tests to run:**
- `tests/network/l2_bridge/test_bridge_nic_hot_plug.py` — all fixtures/tests use `hot_plug_interface`, `hot_plug_interface_and_set_address`, or `set_secondary_static_ip_address` (via `hot_plugged_interface_with_address`, `hot_plugged_second_interface_with_address`, and the `TestBridgeNicHotPlug`/secondary-interface test classes), so the whole file needs verification.
- Prioritize: `tests/network/l2_bridge/test_bridge_nic_hot_plug.py::test_connectivity_of_hot_plugged_jumbo_interface` (jumbo/MTU path, `special_infra`) and the secondary-static-IP tests (`hot_plugged_second_interface_with_address` consumers), since these exercise all three new console-fallback paths guarded by `is_jira_open("CNV-77961")`.
**Real tests (cluster required)**
Fix path (W/A, requires forcing guest-agent failure per PR notes, e.g. via `systemctl mask --now qemu-guest-agent`):
`pytest tests/network/l2_bridge/test_bridge_nic_hot_plug.py::TestBridgeNicHotPlug -k jumbo`
Expected: hot-plug succeeds via console MAC lookup, IP set verified via console, warnings logged referencing `CNV-77961`.
Happy path (regression, guest-agent healthy):
`pytest tests/network/l2_bridge/test_bridge_nic_hot_plug.py -m ipv4`
Expected: guest-agent reports interfaces/IPs normally; console fallback code paths are not triggered.
EOF
)
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
"repos/$REPO/pulls/$PR/reviews" \
-f commit_id="$HEAD_SHA" \
-f event="COMMENT" \
-f body=" " \
-F "comments[][path]=tests/network/l2_bridge/libl2bridge.py" \
-F "comments[][line]=2" \
-F "comments[][side]=RIGHT" \
-f "comments[][body]=$BODY"Length of output: 3027 🏁 Script executed: #!/bin/bash
set -euo pipefail
REPO=RedHatQE/openshift-virtualization-tests
PR=5466
BODY=$(cat <<'EOF'
**Test Execution Plan** (inline comment on `tests/network/l2_bridge/libl2bridge.py` line 2 was attempted but rejected by GitHub with `403 Resource not accessible by integration`; posting as a regular comment instead)
- **Run smoke tests: False** — No `@pytest.mark.smoke` tests exist under `tests/network/l2_bridge/`.
- **Run gating tests: False** — No `@pytest.mark.gating` tests exist under `tests/network/l2_bridge/`; all callers of the changed functions (`hot_plug_interface`, `hot_plug_interface_and_set_address`, `set_secondary_static_ip_address`, `get_guest_vm_interface_name_by_vmi_interface_name`) are confined to `tests/network/l2_bridge/test_bridge_nic_hot_plug.py`, and no conftest in the `tests/network/l2_bridge/` hierarchy wraps them with an autouse fixture.
- **Affected tests to run:**
- `tests/network/l2_bridge/test_bridge_nic_hot_plug.py` — all fixtures/tests use `hot_plug_interface`, `hot_plug_interface_and_set_address`, or `set_secondary_static_ip_address` (via `hot_plugged_interface_with_address`, `hot_plugged_second_interface_with_address`, and the `TestBridgeNicHotPlug`/secondary-interface test classes), so the whole file needs verification.
- Prioritize: `tests/network/l2_bridge/test_bridge_nic_hot_plug.py::test_connectivity_of_hot_plugged_jumbo_interface` (`special_infra`, jumbo/MTU path) and the secondary-static-IP tests (`hot_plugged_second_interface_with_address` consumers), since these exercise all three new console-fallback paths guarded by `is_jira_open("CNV-77961")`.
**Real tests (cluster required)**
Fix path (W/A, requires forcing guest-agent failure per PR notes, e.g. `systemctl mask --now qemu-guest-agent`):
`pytest tests/network/l2_bridge/test_bridge_nic_hot_plug.py::TestBridgeNicHotPlug -k jumbo`
Expected: hot-plug succeeds via console MAC lookup, IP set verified via console, warnings logged referencing `CNV-77961`.
Happy path (regression, guest-agent healthy):
`pytest tests/network/l2_bridge/test_bridge_nic_hot_plug.py -m ipv4`
Expected: guest-agent reports interfaces/IPs normally; console fallback code paths are not triggered.
EOF
)
gh pr comment $PR --repo $REPO --body "$BODY"Length of output: 2623
Test Execution Plan
Real tests (cluster required) Fix path (W/A, force guest-agent failure per PR notes, e.g. Happy path (regression, guest-agent healthy): |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5466 published |
What this PR does / why we need it:
CNV-77961 causes guest-agent to deadlock after hot-plugging an interface on VMs that start with only a primary (masquerade) interface. When this happens, lookup_iface_status() never sees the new interface in VMI status, and tests fail.
This PR adds a console-based fallback: when guest-agent fails to report the hot-plugged interface, we look it up directly inside the VM via
ip -j addr show, matching by MAC address. Three call sites are covered:hot_plug_interface()- falls back to console when VMI status times outset_secondary_static_ip_address()- verifies IP via console when guest-agent doesn't report itget_guest_vm_interface_name_by_vmi_interface_name()- resolves interface name via consoleAll fallback paths are guarded by
is_jira_open("CNV-77961")and will be removed once the bug is fixed.The first commit avoids a redundant VMI lookup by passing
guest_device_namethrough fromhot_plug_interface()toset_secondary_static_ip_address().Which issue(s) this PR fixes:
CNV-77961
Special notes for reviewer:
I wasn't able to reproduce the bug in our evs - I ran it dozens of time on both tlv2 env (bm02-tlv2) and rdu2 env (bm04-cnvqe-rdu2) and the bug didn't show. For this reason, I did the verification in two directions:
systemctl mask --now qemu-guest-agent) before hot-plug to guarantee the W/A path fires. After the W/A completed, the agent was restored (systemctl unmask+start) so subsequent test steps that depend on guest-agent could proceed normally. All three fallback paths were verified this way.Our plan is to get this W/A merged as is and follow the logs to see it in action when the bug appears. If we'll see an issue with the implementation - we will revert this immediately.
Example for the W/A path in action (by forcing it using #5467):
test_connectivity_of_hot_plugged_jumbo_interface — PASSED (forced W/A, all 3 paths exercised)
Actual test logs:
2026-07-02T10:18:53.052667+00:00 timeout_sampler INFO Waiting for 300 seconds [0:05:00], retry every 10 seconds. (Function: utilities.console.connect Args: (<utilities.console.Console object at 0x7fb33673f9b0>,)) 2026-07-02T10:18:53.053362+00:00 timeout_sampler INFO Waiting for 300 seconds [0:05:00], retry every 5 seconds. (Function: pexpect.pty_spawn.spawn Kwargs: {'command': 'virtctl console jumbo-hot-plug-test-vm-1782987280-6881244 -n l2-bridge-test-bridge-nic-hot-plug', 'timeout': 30, 'encoding': 'utf-8'}) 2026-07-02T13:18:53.052230 tests.network.l2_bridge.libl2bridge WARNING CNV-77961: Guest agent did not report interface hot-plug-jumbo-iface on VM jumbo-hot-plug-test-vm-1782987280-6881244, falling back to console lookup by MAC 02:24:15:4d:78:3d. 2026-07-02T13:18:53.053109 utilities.console INFO Connect to jumbo-hot-plug-test-vm-1782987280-6881244 console ... 2026-07-02T13:18:54.276254 utilities.virt INFO Execute ip -j addr show on jumbo-hot-plug-test-vm-1782987280-6881244 2026-07-02T10:18:55.980250+00:00 timeout_sampler INFO Waiting for 300 seconds [0:05:00], retry every 10 seconds. (Function: utilities.console.connect Args: (<utilities.console.Console object at 0x7fb33673fce0>,)) 2026-07-02T10:18:55.981499+00:00 timeout_sampler INFO Waiting for 300 seconds [0:05:00], retry every 5 seconds. (Function: pexpect.pty_spawn.spawn Kwargs: {'command': 'virtctl console jumbo-hot-plug-test-vm-1782987280-6881244 -n l2-bridge-test-bridge-nic-hot-plug', 'timeout': 30, 'encoding': 'utf-8'}) 2026-07-02T13:18:55.978475 tests.network.l2_bridge.libl2bridge INFO CNV-77961: looking for MAC 02:24:15:4d:78:3d in guest jumbo-hot-plug-test-vm-1782987280-6881244, visible interfaces: [{'ifname': 'lo', 'address': '00:00:00:00:00:00'}, {'ifname': 'eth0', 'address': '02:24:15:4d:78:3c'}, {'ifname': 'eth1', 'address': '02:24:15:4d:78:3d'}] 2026-07-02T13:18:55.978900 tests.network.l2_bridge.libl2bridge INFO Console fallback found interface eth1 for hot-plug-jumbo-iface on VM jumbo-hot-plug-test-vm-1782987280-6881244. ... 2026-07-02T13:21:00.556203 utilities.console INFO jumbo-hot-plug-test-vm-1782987280-6881244: Got prompt \$ 2026-07-02T13:21:00.557031 utilities.virt INFO Execute ip -j -4 addr show eth1 on jumbo-hot-plug-test-vm-1782987280-6881244 2026-07-02T13:21:02.298828 tests.network.l2_bridge.libl2bridge WARNING CNV-77961: Verified IP 172.16.241.3 on eth1 via console (guest-agent not reporting on VM jumbo-hot-plug-test-vm-1782987280-6881244). 2026-07-02T13:21:02.299476 tests.network.l2_bridge.libl2bridge INFO jumbo-hot-plug-test-vm-1782987280-6881244/hot-plug-jumbo-iface set with IP address 172.16.241.3 2026-07-02T13:21:02.300469 conftest INFO Executing function fixture: hot_plugged_jumbo_interface_in_utility_vm ... PASSED2026-07-02T13:22:44.884117 utilities.network INFO ping returned PING 172.16.241.4 (172.16.241.4) 8972(9000) bytes of data.jira-ticket:
https://issues.redhat.com/browse/CNV-77961