Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
@etcdDiscoveryHighAvailability @HighAvailability @Functional
Feature:
In order to ensure scheduler and control plane high availability
As a Core 2 platform operator
I want to be able to change the cluster membership of scheduler when needed

@etcdDiscoveryRuleFromOneNodeScaleUp
Scenario:The cluster size of scheduler is set to 1 but the Core 2 platform operator wants to increase the scheduler
cluster size by two more nodes
Given the control plane is deployed with 1 scheduler replicas
And the scheduler uses a highly available data store
When I increase the scheduler cluster size by "2"
Then a new leader should be elected within "5s"
And exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the new leader
And the scheduler Service should route traffic to the new leader

@etcdDiscoveryRuleFromThreeNodeScaleDownToOne
Scenario:The cluster size of scheduler is set to 3 but the Core 2 platform operator wants to decrease the scheduler
cluster size by two more nodes
Given the control plane is deployed with 3 scheduler replicas
And the scheduler uses a highly available data store
When I decrease the scheduler cluster size by "2"
Then exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the new leader
And the scheduler Service should route traffic to the new leader

@etcdDiscoveryRuleFromThreeNodeScaleUpToFive
Scenario:The cluster size of scheduler is set to 3 but the Core 2 platform operator wants to increase the scheduler
cluster size by two more nodes
Given the control plane is deployed with 3 scheduler replicas
And the scheduler uses a highly available data store
When I increase the scheduler cluster size by "2"
Then a new leader should be elected within "5s"
And exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the new leader
And the scheduler Service should route traffic to the new leader
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
@SchedulerHighAvailability @HighAvailability @Functional
Feature: Scheduler High Availability
In order to ensure reliable model scheduling and orchestration
As a Core 2 platform operator
I want the control plane to continue functioning even if one or more scheduler replicas fail

Background:
Given the control plane is deployed with at least 3 scheduler replicas

Scenario: Scheduler elects a new leader and exposes it via the service when the current leader fails
Given exactly 1 scheduler pod is Ready
And the Ready scheduler pod is the leader
When I terminate the scheduler leader pod
Then a new leader should be elected within "5" seconds
And exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the new leader
And the scheduler Service should route traffic to the new leader

Scenario: Followers remain unroutable and do not become leaders on restart
Given exactly 1 scheduler pod is Ready
And the Ready scheduler pod is the leader
When I terminate a scheduler follower pod
Then the scheduler cluster should remain Ready
And a new follower pod should be running within "10" seconds
And exactly 1 scheduler pod should still be Ready
And the Ready scheduler pod should still be the leader
And no follower pod should be Ready or receive traffic

Scenario: Only the leader scheduler pod is Ready and routable
When I inspect the scheduler pods
Then exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the leader
And all follower scheduler pods should be NotReady
And the scheduler Service should route traffic to the leader pod

Scenario: Followers do not accept scheduling requests directly
Given exactly 1 scheduler pod is Ready
And the Ready scheduler pod is the leader
When I send a scheduling request directly to a follower pod
Then the request should be rejected or not routable
And the follower should not make scheduling decisions


Scenario: There is only a leader when the scheduler cluster is restarted
When the scheduler cluster is restarted
Then a new leader should be elected withing "5" seconds
And exactly 1 scheduler pod should be Ready
And the Ready scheduler pod should be the new leader
And the scheduler Service should route traffic to the new leader

# Scenario: Scheduler cluster data is the same when there is a leadership change
# THis test case might be difficult to do since data in the scheduler can easily change without intervention
# e.g server restart while we do the operation etc
17 changes: 17 additions & 0 deletions tests/integration/godog/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,28 @@ require (
sigs.k8s.io/controller-runtime v0.22.4
)

replace (
github.com/seldonio/seldon-core/apis/go/v2 => ../../../apis/go
github.com/seldonio/seldon-core/operator/v2 => ../../../operator
)

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cucumber/gherkin/go/v26 v26.2.0 // indirect
github.com/cucumber/messages/go/v21 v21.0.1 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.12.2 // indirect
github.com/evanphx/json-patch/v5 v5.9.11 // indirect
github.com/fsnotify/fsnotify v1.9.0 // indirect
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-openapi/jsonpointer v0.21.1 // indirect
github.com/go-openapi/jsonreference v0.21.0 // indirect
github.com/go-openapi/swag v0.23.1 // indirect
github.com/gofrs/uuid v4.3.1+incompatible // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/google/btree v1.1.3 // indirect
github.com/google/gnostic-models v0.7.0 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/uuid v1.6.0 // indirect
Expand All @@ -41,20 +50,28 @@ require (
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/prometheus/client_golang v1.22.0 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.65.0 // indirect
github.com/prometheus/procfs v0.17.0 // indirect
github.com/x448/float16 v0.8.4 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/net v0.41.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sync v0.15.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/term v0.32.0 // indirect
golang.org/x/text v0.26.0 // indirect
golang.org/x/time v0.12.0 // indirect
gomodules.xyz/jsonpatch/v2 v2.5.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250603155806-513f23925822 // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
k8s.io/apiextensions-apiserver v0.34.1 // indirect
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/kube-openapi v0.0.0-20250710124328-f3f2b991d03b // indirect
k8s.io/utils v0.0.0-20250604170112-4c0f3b243397 // indirect
Expand Down
14 changes: 10 additions & 4 deletions tests/integration/godog/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ github.com/evanphx/json-patch v5.9.0+incompatible h1:fBXyNpNMuTTDdquAq/uisOr2lSh
github.com/evanphx/json-patch v5.9.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
github.com/evanphx/json-patch/v5 v5.9.11 h1:/8HVnzMq13/3x9TPvjG08wUGqBTmZBsCWzjTM0wiaDU=
github.com/evanphx/json-patch/v5 v5.9.11/go.mod h1:3j+LviiESTElxA4p3EMKAB9HXj3/XEtnUf6OZxqIQTM=
github.com/fsnotify/fsnotify v1.9.0 h1:2Ml+OJNzbYCTzsxtv8vKSFD9PbJjmhYF14k/jKC7S9k=
github.com/fsnotify/fsnotify v1.9.0/go.mod h1:8jBTzvmWwFyi3Pb8djgCCO5IBqzKJ/Jwo8TRcHyHii0=
github.com/fxamacker/cbor/v2 v2.9.0 h1:NpKPmjDBgUfBms6tr6JZkTHtfFGcMKsw3eGcmD/sapM=
github.com/fxamacker/cbor/v2 v2.9.0/go.mod h1:vM4b+DJCtHn+zz7h3FFp/hDAI9WNWCsZj23V5ytsSxQ=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
Expand All @@ -43,11 +45,15 @@ github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek=
github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps=
github.com/google/btree v1.1.3 h1:CVpQJjYgC4VbzxeGVHfvZrv1ctoYCAI8vbl07Fcxlyg=
github.com/google/btree v1.1.3/go.mod h1:qOPhT0dTNdNzV6Z/lhRX0YXUafgPLFUh+gZMl761Gm4=
github.com/google/gnostic-models v0.7.0 h1:qwTtogB15McXDaNqTZdzPJRHvaVJlAl+HVQnLmJEJxo=
github.com/google/gnostic-models v0.7.0/go.mod h1:whL5G0m6dmc5cPxKc5bdKdEN3UjI7OUGxBlw57miDrQ=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0=
github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/pprof v0.0.0-20241029153458-d1b30febd7db h1:097atOisP2aRj7vFgYQBbFN4U4JNXUNYpxael3UzMyo=
github.com/google/pprof v0.0.0-20241029153458-d1b30febd7db/go.mod h1:vavhavw2zAxS5dIdcRluK6cSGGPlZynqzFM8NdvU144=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
Expand All @@ -73,13 +79,17 @@ github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnr
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo=
github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ=
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/mailru/easyjson v0.9.0 h1:PrnmzHw7262yW8sTBwxi1PdJA3Iw/EKBa8psRf7d9a4=
github.com/mailru/easyjson v0.9.0/go.mod h1:1+xMtQp2MRNVL/V1bOzuP3aP8VNwRW55fQUto+XFtTU=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
Expand Down Expand Up @@ -111,10 +121,6 @@ github.com/prometheus/procfs v0.17.0/go.mod h1:oPQLaDAMRbA+u8H5Pbfq+dl3VDAvHxMUO
github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII=
github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/seldonio/seldon-core/apis/go/v2 v2.9.1 h1:9UcxnTFRuCDApZqy7cy3Rm6B/aaW2+3bjNXUKYVvlRY=
github.com/seldonio/seldon-core/apis/go/v2 v2.9.1/go.mod h1:ptbV8xxTT6DI5hWGcOx74bizYhms/LhXBJ/04RD41jk=
github.com/seldonio/seldon-core/operator/v2 v2.10.1 h1:Btn8xcFt5rPd4+xCMFAKwcuXGHAq4/nzE5EuYuNg0uI=
github.com/seldonio/seldon-core/operator/v2 v2.10.1/go.mod h1:WMy17S3Q6QZTR2IP1OaIgRdh36RiNboT8jqCajJ6X9A=
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRMrb0=
Expand Down
16 changes: 11 additions & 5 deletions tests/integration/godog/k8sclient/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,14 @@ type K8sClient struct {
KubeClient client.WithWatch
}

var CRDLabels = map[string]string{
var DefaultCRDLabelMap = map[string]string{
"test-suite": "godog",
}

const (
DefaultCRDLabel = "test-suite=godog"
)

// New todo: separate k8s client init and pass to new
func New(namespace string) (*K8sClient, error) {
k8sScheme := runtime.NewScheme()
Expand Down Expand Up @@ -78,8 +82,10 @@ func (k8s *K8sClient) ApplyModel(model *mlopsv1alpha1.Model) error {
model.Labels = map[string]string{}
}

// Add label
model.Labels = CRDLabels
// add labels
for k, v := range DefaultCRDLabelMap {
model.Labels[k] = v
}

existing := &mlopsv1alpha1.Model{}
key := client.ObjectKey{
Expand All @@ -102,12 +108,12 @@ func (k8s *K8sClient) ApplyModel(model *mlopsv1alpha1.Model) error {
return k8s.KubeClient.Update(ctx, model)
}

func (k8s *K8sClient) DeleteGodogTestModels(ctx context.Context) error {
func (k8s *K8sClient) DeleteScenarioResources(ctx context.Context, labels client.MatchingLabels) error {

list := &mlopsv1alpha1.ModelList{}
err := k8s.KubeClient.List(ctx, list,
client.InNamespace(k8s.namespace),
client.MatchingLabels{"test-suite": "godog"},
labels,
)
if err != nil {
return err
Expand Down
20 changes: 8 additions & 12 deletions tests/integration/godog/k8sclient/watcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ import (
"fmt"
"sync"

mlopsv1alpha1 "github.com/seldonio/seldon-core/operator/v2/apis/mlops/v1alpha1"
"github.com/seldonio/seldon-core/operator/v2/pkg/generated/clientset/versioned/typed/mlops/v1alpha1"
"k8s.io/apimachinery/pkg/api/meta"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/watch"
"sigs.k8s.io/controller-runtime/pkg/client"
)

type WatcherStorage interface {
Expand All @@ -31,9 +31,9 @@ type WatcherStorage interface {
}

type WatcherStore struct {
namespace string
label map[string]string

namespace string
label string
mlopsClient v1alpha1.MlopsV1alpha1Interface
modelWatcher watch.Interface

mu sync.RWMutex
Expand All @@ -52,20 +52,16 @@ type waiter struct {
type ConditionFunc func(obj runtime.Object) (done bool, err error)

// NewWatcherStore receives events that match on a particular object list and creates a database store to query crd state
func NewWatcherStore(namespace string, label map[string]string, w client.WithWatch) (*WatcherStore, error) {
modelWatcher, err := w.Watch(
context.Background(),
&mlopsv1alpha1.ModelList{},
client.InNamespace(namespace),
client.MatchingLabels(label),
)
func NewWatcherStore(namespace string, label string, mlopsClient v1alpha1.MlopsV1alpha1Interface) (*WatcherStore, error) {
modelWatcher, err := mlopsClient.Models(namespace).Watch(context.Background(), v1.ListOptions{LabelSelector: "test-suite=godog"})
if err != nil {
return nil, fmt.Errorf("failed to create model watcher: %w", err)
}

return &WatcherStore{
namespace: namespace,
label: label,
mlopsClient: mlopsClient,
modelWatcher: modelWatcher,
store: make(map[string]runtime.Object),
doneChan: make(chan struct{}),
Expand Down
Loading