litmuschaos / chaos-operator Goto Github PK
View Code? Open in Web Editor NEWchaos engineering via kubernetes operator
License: Apache License 2.0
chaos engineering via kubernetes operator
License: Apache License 2.0
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
FEATURE REQUEST
What happened:
After updating OpenShift clusters, the mounting of the CRI for CRIO socket is failing.
MountVolume.SetUp failed for volume "cri-socket" : hostPath type check failed: /var/run/crio/crio.sock is not a file
New requirements are that the hostPath socket is mounted using type Socket
.
When mounting the path using Socket
it works correctly.
What is the desired result:
hostFileVolumes has a key type
which can be used to change the type of the hostPath type on the underlying pod object.
What happened:
It would be useful if we could fall back to the operator's namespace when appns
is not set in the ChaosEngine. This is particularly important for the use case where the operator is running watching only one namespace.
Another option is to assume that the ChaosEngine will affect the namespace where it's being created at, instead of having it point to a second namespace. I think that is more consistent with the rest of the kubernetes resources.
What you expected to happen:
When I created
I expected that the ChaosEngine will be created on my current namespace
How to reproduce it (as minimally and precisely as possible):
Refactor: Related to issue #276
To use golangci-lint in CI, fix the issues discovered by golangci-lint
@ksatchit
Just like user use kubernetes resource deployment,service ...e.t.c.
maybe user love more obvious statement,If I want to chaos pod that I must to use the resource of ChaosPod , if i would like to chaos network that i should change the resource kind
Originally posted by @jjmengze in litmuschaos/litmus#1388 (comment)
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
{"level":"error","ts":1587629175.5282156,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"chaosengine-controller","request":"default/nginx-chaos","error":"Unable to remove Finalizer from chaosEngine Resource, due to error: Operation cannot be fulfilled on chaosengines.litmuschaos.io \"nginx-chaos\": StorageError: invalid object, Code: 4, Key: /registry/litmuschaos.io/chaosengines/default/nginx-chaos, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: dd17dc89-8538-11ea-a20f-42010a800064, UID in object meta: ","stacktrace":"github.com/litmuschaos/chaos-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1587629176.5290353,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"default","Request.Name":"nginx-chaos"}
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
What happened:
Optimize the total enabled chaos count for the enabled application.
Reference comment: #90 (comment)
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
BUG REPORT
What happened:
After applying cpu-hog experiment, we got security error:
AnsibleError: Unable to create local directories(/.ansible/tmp): [Errno 13] Permission denied: '/.ansible.
Running on openshift environment
What you expected to happen:
cpu-hog experiment
How to reproduce it (as minimally and precisely as possible):
openshift v3.11.51
kubernetes v1.11.0+d4cacc0
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
BUG REPORT
What happened:
For generic pod kill experiment I need to set resource limits for cpu/mem on the runner AND the experiment pod due to platform namspace resource quota requirement. Currently I can't set any limit on the runner.
What you expected to happen:
components:
runner:
image: nginx
imagePullPolicy: Always
resources:
requests:
cpu: 10m
memory: 100Mi
limits:
memory: 200Mi
Doesn't work but it seems the code is almost there? I could be totally wrong here
Refactor the ChaosEngine types.go file
What happened:
What you expected to happen:
Delete the schedule from ChaosEngine Spec, i.e: https://github.com/litmuschaos/chaos-operator/blob/master/pkg/apis/litmuschaos/v1alpha1/chaosengine_types.go#L49
Delete the Schedule from the Experiment Spec, i.e: https://github.com/litmuschaos/chaos-operator/blob/master/pkg/apis/litmuschaos/v1alpha1/chaosengine_types.go#L87
The reconcile function today doesn't satisfy the BCH standards in terms of modularity & interfaces used. This issue will track refactor in this direction.
Can we do this function operation inside initEngineState
function, like both are doing the operation at initializing time
Originally posted by @chandankumar4 in https://github.com/litmuschaos/chaos-operator/pull/156/files
BUG REPORT
<Reported originally by @cpistick-argo >
This causes chaos to be executed irrespective of annotation check failure (lack of annotation, annotation set to false)
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Can we use the name of chaos-runner to identify -- label with key app is a little too generic. We run the risk of deleting retained jobs as well? Or a unique label helps.
Originally posted by @ksatchit in https://github.com/litmuschaos/chaos-operator/pull/156/files
The function name should be something like: initializeApplicationInfo
Originally posted by @chandankumar4 in #29
static kubeconfig path would only work if you have a cluster available, right?
But this should not be the case, unit tests should be independent of hard-codings.
So, just take a look at how to initialize a https://godoc.org/k8s.io/client-go/kubernetes/fake
Ref: https://github.com/openebs/maya/blob/a61c18e3eeabc18befedb3e9890e5c6ef292e912/pkg/webhook/webhook_test.go#L28
Originally posted by @rahulchheda in https://github.com/_render_node/MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDQ0MTQyNTQzMA==/comments/review_comment
BUG REPORT:
ref: https://kubernetes.slack.com/archives/CNXNB0ZTN/p1602018149055400
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
jobs are not deleting after chaos operator deleteed
BUG REPORT
What happened:
While running a chaos predefined workflow from litmus portal UI for sock-shop application, I deleted all the chaos engines and all workflows from the litmus namespace, resulting in the workflow failing, but somehow I got a resilience score as a non zero value, upon debugging it seems the experiment with awaited status had a probe success percentage of 100
What you expected to happen:
Ideally, as per the docs probe success percentage of awaited experiment result should have been 0 unless a concrete experiment verdict is available i.e either pass or fail.
How to reproduce it (as minimally and precisely as possible):
Run a workflow from the litmus portal and delete the chaos engines and argo workflow created on the agent while it runs after the experiment job is created.
Anything else we need to know?:
API version and code reference: https://github.com/litmuschaos/litmus/blob/master/litmus-portal/cluster-agents/subscriber/pkg/cluster/events/util.go
Here is the screenshot of the debugging.
Can we use this variable dynamicClient
instead to clientSet
Originally posted by @rahulchheda in #199
Feature request
What happened: When creating a ChaosEngine you cannot specify more than one 'appLabel' for the targeted resource
What you expected to happen: We've several resources that share a label but only some of them needs to be processed by Litmus. Having a "third" custom label only on these resources is not an option as there's a limitation which does not allow us to modify these specific resources.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Add a nil check for *unstructured.UnstructuredList
before iterating into items
Originally posted by @rahulchheda in #199
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
FEATURE REQUEST
When I manually tag the same label to two deployments and create a chaosengin with that label, I received the below error from chaos-operator:
βtoo many deployments with specified label are annotated for chaos, either provide unique labels or annotate only desired app for chaosβ
What's the meaning of "annotate only desired app for chaos"? In my use case, I would like to trigger delete-pod in part of the deployment and control with annotation or label. How should I do that? Will litmus support such scenario?
And if I would like to mix different testing in part of k8s resource in the same namespace, how should I do that?
In both cases, be ready for follow-up questions, and please respond in a timely
manner. If we can't reproduce a bug or think a feature already exists, we
might close your issue. If we're wrong, PLEASE feel free to reopen it and
explain why.
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Would it be good to merge checkRunnerPodCompleted
& checkRunnerPodForCompletion
Originally posted by @ksatchit in https://github.com/litmuschaos/chaos-operator/pull/156/files
Error return value of r.client.Get
is not checked (from errcheck
)
Originally posted by @golangcibot in #160
FEATURE REQUEST
golanglint-ci
instead, which let use many linting plugins such as whitespaces
, goimports
, and many more, and these can be configured using config.Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I've selected for refactoring 15 lines of code which are duplicated in 3 file(s) (1, 2, 3). Addressing this will make our codebase more maintainable and improve Better Code Hub's Write Code Once guideline rating! π
Here's the gist of this guideline:
You can find more info about this guideline in Building Maintainable Software. π
βΉοΈ To know how many other refactoring candidates need addressing to get a guideline compliant, select some by clicking on the π² next to them. The risk profile below the candidates signals (β ) when it's enough! π
Good luck and happy coding! β¨ π―
Issue generated by PR: #93
Can we validate that the GVR of *unstructured.UnstructuredList
matches with deploymentConfig
GVR, and then start to validate.
Originally posted by @rahulchheda in #199
BUG REPORT
chaos-operator
bug fixes, and refactoring needed.litmuschaos/litmus
, or litmuschaos/chaos-operator
issue to track them here.What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Why are we doing in this ?
Before:
We hard coded runner and exporter image inside builder function of runner and exporter.
After:
We are creating additional fields in engine spec . We are fetching image of runner and monitor pod from the engine spec .
Benefits:
We don't need to change builder function every time with change in runner or exporter image. We can directly put images inside engine.spec.
This is an umbrella tracker for increasing test coverage on the operator. Can consist of several sub-issues for individual fns/functionalities.
Are we switching back to log
instead of logrus
cc @ksatchit ?
Originally posted by @prateekpandey14 in #26
REFACTOR
Anything else we need to know?:
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
get chaosresult by chaos-operator , but .status.experimentStatus.verdict always empty
{
"metadata ":{
"name ":"c0a4f8888cefbbb6-pod-autoscaler ",
"namespace ":"default ",
"uid ":"8acb3d1a-8888-8888-8888-2e2579ba585a ",
"resourceVersion ":"617941 ",
"generation ":2,
"creationTimestamp ":"2021-06-10T03:30:20Z ",
"labels ":{
"app.kubernetes.io/component ":"experiment-job ",
"app.kubernetes.io/part-of ":"litmus ",
"app.kubernetes.io/version ":"1.13.0 ",
"chaosUID ":"ad477de0-8888-8888-8888-d1db6aa90ac6 ",
"controller-uid ":"5d2b3d3a-8888-8888-8888-e8a91f976400 ",
"job-name ":"pod-autoscaler-6qgdau ",
"name ":"c0a4f8888cefbbb6-pod-autoscaler "
},
"managedFields ":Array[1]
},
"spec ":{
"engine ":"c0a4f8888cefbbb6 ",
"experiment ":"pod-autoscaler "
},
"status ":{
"experimentStatus ":{
"phase ":" ",
"verdict ":" "
},
"history ":{
"passedRuns ":0,
"failedRuns ":1,
"stoppedRuns ":0
}
}
}
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Choose one: FEATURE REQUEST
Hey @ksatchit just curious as per this chaos operator code, we could see that if the annotationcheck is enabled then only we could see that custom_annotaion is being honored as per this code snippet
if engine.Instance.Spec.AnnotationCheck == "true" {
// Determine whether apps with matching labels have chaos annotation set to true
engine, err = resource.CheckChaosAnnotation(engine, clientSet, *dynamicClient)
if err != nil {
r.recorder.Eventf(engine.Instance, corev1.EventTypeWarning, "ChaosResourcesOperationFailed", "Unable to get chaosengine")
chaosTypes.Log.Info("Annotation check failed with", "error:", err)
return err
}
}
I was thinking - even if its set to false / not set, if we have CUSTOM_ANNOTATION environment variable set , then we should check deployment and skip chaos experiment on pod . Just a thought , what do you think
Rather than just checking boolean error, we can map error strings with thier respective errors
Originally posted by @rahulchheda in #220
Choose one: BUG REPORT or FEATURE REQUEST
What happened:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Feature request
What happened: ChaosEngine and ChaosResult do not contain information about the specific target of the chaos.
For instance if I leave the 'appLabel' field empty, when the engine executes, how will I know on which pod, exactly did the chaos happen. Even if appLabel is filled, it would still be
helpful to know the targets of the chaos. I can only see the targeted pod if i look at the logs of the helper pod.
What you expected to happen: I expect to be able to see the names of the pods, that the ChaosEngine is injecting chaos into, as well as the deployment they belong to. Either in the
ChaosEngine or the ChaosResult.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: A 'nice to have' would be to have a verdict (from ChaosResult) for each unique deployment. For example if 1 ChaosEngine does pod-kill on 3 pods, each from
a different deployment and the verdict is Fail, does this mean that one of them failed or all 3 ?
Choose one: BUG REPORT or FEATURE REQUEST
What you expected to happen?
Can you name this a little Better? deploymentConfigGVR
Originally posted by @rahulchheda in #199
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.