[Failover] NoExecute taint is not added to the unhealthy cluster after a period of time

**What happened**:
`NoExecute` taint is not added to the unhealthy cluster after a period of time 

**What you expected to happen**:
`NoExecute` taint will be added to the unhealthy cluster after a period of time 

**How to reproduce it (as minimally and precisely as possible)**:
The doc we find relatable with Failover is here: https://karmada.io/docs/v1.14/userguide/failover/failover-analysis

We got two clusters named gy1 and gy2. And a workload with weight 1:1 is deployed into gy1 and gy2.

We made a network fault injection to check Karmada's failover logic.  The network fault is injected to gy2, making it cannot be reached. After a short while, the `NoSchedule` taint is added to gy2, just like the doc describes. But after 5min(whicn can be set by `--failover-eviction-timeout` flag), no `NoExecute` taint is added to gy2. 

From the doc we know the default tolerations of `PropagationPolicy` like below are added by karamda-webhook:
```
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
  namespace: default
spec:
  placement:
    clusterTolerations:
    - effect: NoExecute
      key: cluster.karmada.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: cluster.karmada.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  resourceSelectors:
  - apiVersion: apps/v1
    kind: Deployment
    name: nginx
    namespace: default
```

The tolerations are not matched with `NoSchedule` taint. As a result, when we make a new deploy, all replicas are scheduled to gy1, which makes resources consumption doubled. It's not as expected. What we expect is the new version of the workload will be deployed to gy1 with the same number of replica just like that before the network injection is made. The gy2 may be not reached, but it's the control plane, the workloads in it may be running healthily as usual. We cannot double replicas in gy1, that could make resources being exhausted.

**Anything else we need to know?**:

**Environment**:
- Karmada version: v1.14.5
- kubectl-karmada or karmadactl version (the result of `kubectl-karmada version` or `karmadactl version`): kubectl karmada version: version.Info{GitVersion:"v1.12.2", GitCommit:"0e82ce1823fdff2859053e48eebce189d78dc9a1", GitTreeState:"clean", BuildDate:"2025-01-02T12:19:54Z", GoVersion:"go1.22.9", Compiler:"gc", Platform:"darwin/arm64"}
- Others: K8s version: v1.19.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Failover] NoExecute taint is not added to the unhealthy cluster after a period of time #6951

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Failover] NoExecute taint is not added to the unhealthy cluster after a period of time #6951

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions