Giter Site home page Giter Site logo

litmuschaos / chaos-operator Goto Github PK

View Code? Open in Web Editor NEW
124.0 10.0 87.0 71.32 MB

chaos engineering via kubernetes operator

License: Apache License 2.0

Makefile 2.44% Dockerfile 0.52% Shell 0.89% Go 96.14%
chaos-operator chaos-engineering kubernetes-operator hacktoberfest kubernetes operator cloud-native

chaos-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chaos-operator's Issues

Does Litmus demo environment supports AKS ?

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Allow the specification of a mount type on ChaosExperiment hostFileVolumes

FEATURE REQUEST

What happened:

After updating OpenShift clusters, the mounting of the CRI for CRIO socket is failing.

MountVolume.SetUp failed for volume "cri-socket" : hostPath type check failed: /var/run/crio/crio.sock is not a file

New requirements are that the hostPath socket is mounted using type Socket.

When mounting the path using Socket it works correctly.

What is the desired result:

hostFileVolumes has a key type which can be used to change the type of the hostPath type on the underlying pod object.

FEATURE REQUEST: Fallback to the current namespace when `appns` is not set in the ChaosEngine

What happened:
It would be useful if we could fall back to the operator's namespace when appns is not set in the ChaosEngine. This is particularly important for the use case where the operator is running watching only one namespace.

Another option is to assume that the ChaosEngine will affect the namespace where it's being created at, instead of having it point to a second namespace. I think that is more consistent with the rest of the kubernetes resources.

What you expected to happen:

When I created
I expected that the ChaosEngine will be created on my current namespace

How to reproduce it (as minimally and precisely as possible):

  1. Install the Chaos Operator in administrator mode
  2. Install the Pod Delete Experiment
  3. Create the ChaosEngine using the attached manifest.

no-ns.yaml.txt

Improve the ChaosEngine Status to hold experiment & chaos resource names

Is this a BUG REPORT or FEATURE REQUEST?

  • FEATURE REQUEST

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

  • The engine status schema today is not very useful for external tools to parse/extract data from. It doesn't explicitly hold experiment name or the names of the chaos resources (pods) executing the chaos. It does provide the job name, but tools trying to parse the status do-not get readily usable artifacts (they have to derive info)

What you expected to happen:

  • Enhance the schema to hold experiment and chaos resource names associated with the experiment run

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Observing this error on deleting chaosengine

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

{"level":"error","ts":1587629175.5282156,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"chaosengine-controller","request":"default/nginx-chaos","error":"Unable to remove Finalizer from chaosEngine Resource, due to error: Operation cannot be fulfilled on chaosengines.litmuschaos.io \"nginx-chaos\": StorageError: invalid object, Code: 4, Key: /registry/litmuschaos.io/chaosengines/default/nginx-chaos, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: dd17dc89-8538-11ea-a20f-42010a800064, UID in object meta: ","stacktrace":"github.com/litmuschaos/chaos-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\ngithub.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\ngithub.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\tsrc/github.com/litmuschaos/chaos-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1587629176.5290353,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"default","Request.Name":"nginx-chaos"}

What you expected to happen:

  • Finalizer removal is smooth upon chaosengine deletion and logs don't show any errors

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Add Directly invokable Namespace Chaos

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Openshift support

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST
BUG REPORT

What happened:
After applying cpu-hog experiment, we got security error:
AnsibleError: Unable to create local directories(/.ansible/tmp): [Errno 13] Permission denied: '/.ansible.
Running on openshift environment

What you expected to happen:
cpu-hog experiment

How to reproduce it (as minimally and precisely as possible):
openshift v3.11.51
kubernetes v1.11.0+d4cacc0

Anything else we need to know?:

Add dependent resource removal events to chasoengine

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Can't set cpu/memory resource limits for runner

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST
BUG REPORT

What happened:
For generic pod kill experiment I need to set resource limits for cpu/mem on the runner AND the experiment pod due to platform namspace resource quota requirement. Currently I can't set any limit on the runner.
What you expected to happen:

  components:
    runner:
      image: nginx
      imagePullPolicy: Always
      resources:
        requests:
          cpu: 10m
          memory: 100Mi
        limits:
          memory: 200Mi

Doesn't work but it seems the code is almost there? I could be totally wrong here

containerForRunner.WithResourceRequirements(engine.Instance.Spec.Components.Runner.Resources)

How to reproduce it (as minimally and precisely as possible):
Run the demo with the above runner spec in engine yaml, then check runner's spec.resources.
Anything else we need to know?:

(refactor): Refactor the ChaosEngine types.go

Is this a BUG REPORT or FEATURE REQUEST?

Refactor the ChaosEngine types.go file

What happened:

  • ChaosEngine doesn't support the schedules
  • The schedule inside the experiments is not used anywhere

What you expected to happen:

Chaos executed irrespective of annotation check failures

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

<Reported originally by @cpistick-argo >

This causes chaos to be executed irrespective of annotation check failure (lack of annotation, annotation set to false)

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Adding fakeclient in the unit tests

static kubeconfig path would only work if you have a cluster available, right?
But this should not be the case, unit tests should be independent of hard-codings.
So, just take a look at how to initialize a https://godoc.org/k8s.io/client-go/kubernetes/fake
Ref: https://github.com/openebs/maya/blob/a61c18e3eeabc18befedb3e9890e5c6ef292e912/pkg/webhook/webhook_test.go#L28
Originally posted by @rahulchheda in https://github.com/_render_node/MDI0OlB1bGxSZXF1ZXN0UmV2aWV3Q29tbWVudDQ0MTQyNTQzMA==/comments/review_comment

Wrong default value of probe success percentage for experiments with `Awaited` status. Value is 100 instead of 0

BUG REPORT

What happened:

While running a chaos predefined workflow from litmus portal UI for sock-shop application, I deleted all the chaos engines and all workflows from the litmus namespace, resulting in the workflow failing, but somehow I got a resilience score as a non zero value, upon debugging it seems the experiment with awaited status had a probe success percentage of 100

What you expected to happen:

Ideally, as per the docs probe success percentage of awaited experiment result should have been 0 unless a concrete experiment verdict is available i.e either pass or fail.

How to reproduce it (as minimally and precisely as possible):

Run a workflow from the litmus portal and delete the chaos engines and argo workflow created on the agent while it runs after the experiment job is created.

Anything else we need to know?:

API version and code reference: https://github.com/litmuschaos/litmus/blob/master/litmus-portal/cluster-agents/subscriber/pkg/cluster/events/util.go

Here is the screenshot of the debugging.

Screenshot 2021-03-16 at 1 06 49 PM

Feature Request: ChaosEngine to support multiple app labels

Feature Request: ChaosEngine to support multiple app labels

Feature request
What happened: When creating a ChaosEngine you cannot specify more than one 'appLabel' for the targeted resource
What you expected to happen: We've several resources that share a label but only some of them needs to be processed by Litmus. Having a "third" custom label only on these resources is not an option as there's a limitation which does not allow us to modify these specific resources.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:

Monitoring Feature Needed

Is this a BUG REPORT or FEATURE REQUEST?

  • Feature Request
    Choose one: BUG REPORT or FEATURE REQUEST

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

annotate only desired app for chaos

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

When I manually tag the same label to two deployments and create a chaosengin with that label, I received the below error from chaos-operator:
β€œtoo many deployments with specified label are annotated for chaos, either provide unique labels or annotate only desired app for chaos”
What's the meaning of "annotate only desired app for chaos"? In my use case, I would like to trigger delete-pod in part of the deployment and control with annotation or label. How should I do that? Will litmus support such scenario?
And if I would like to mix different testing in part of k8s resource in the same namespace, how should I do that?

In both cases, be ready for follow-up questions, and please respond in a timely
manner. If we can't reproduce a bug or think a feature already exists, we
might close your issue. If we're wrong, PLEASE feel free to reopen it and
explain why.
What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Add golangci-lint integration in CI

FEATURE REQUEST

  • In current CI, we have lint checks, but they seem not to detect all the linting issues, where as we can use golanglint-ci instead, which let use many linting plugins such as whitespaces, goimports, and many more, and these can be configured using config.
    ref: https://golangci-lint.run/usage/install/#ci-installation

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Ignore the check from BCH of license text

I've selected for refactoring 15 lines of code which are duplicated in 3 file(s) (1, 2, 3). Addressing this will make our codebase more maintainable and improve Better Code Hub's Write Code Once guideline rating! πŸ‘

Here's the gist of this guideline:

  • Definition πŸ“–
    Do not copy code.
  • Why❓
    When code is copied, bugs need to be fixed in multiple places. This is both inefficient and a source of regression bugs.
  • How πŸ”§
    Avoid duplication by never copy/pasting blocks of code and reduce duplication by extracting shared code, either to a new unit or introduce a superclass if the language permits.

You can find more info about this guideline in Building Maintainable Software. πŸ“–


ℹ️ To know how many other refactoring candidates need addressing to get a guideline compliant, select some by clicking on the πŸ”² next to them. The risk profile below the candidates signals (βœ…) when it's enough! 🏁


Good luck and happy coding! :shipit: ✨ πŸ’―

Issue generated by PR: #93

Refactor Codebase, for easier understanding

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

  • This is an umbrella issue to be tracked for chaos-operator bug fixes, and refactoring needed.
  • Please comment on this issue with litmuschaos/litmus, or litmuschaos/chaos-operator issue to track them here.

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

(feat): Get runner and monitor image from engine spec

FEATURE REQUEST

Why are we doing in this ?

Before:
We hard coded runner and exporter image inside builder function of runner and exporter.

After:
We are creating additional fields in engine spec . We are fetching image of runner and monitor pod from the engine spec .

Benefits:
We don't need to change builder function every time with change in runner or exporter image. We can directly put images inside engine.spec.

get litmus chaosresult by chaos-operator without .status.experimentStatus.verdict

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

What happened:

get chaosresult by chaos-operator , but .status.experimentStatus.verdict always empty

{
    "metadata ":{
        "name ":"c0a4f8888cefbbb6-pod-autoscaler ",
        "namespace ":"default ",
        "uid ":"8acb3d1a-8888-8888-8888-2e2579ba585a ",
        "resourceVersion ":"617941 ",
        "generation ":2,
        "creationTimestamp ":"2021-06-10T03:30:20Z ",
        "labels ":{
            "app.kubernetes.io/component ":"experiment-job ",
            "app.kubernetes.io/part-of ":"litmus ",
            "app.kubernetes.io/version ":"1.13.0 ",
            "chaosUID ":"ad477de0-8888-8888-8888-d1db6aa90ac6 ",
            "controller-uid ":"5d2b3d3a-8888-8888-8888-e8a91f976400 ",
            "job-name ":"pod-autoscaler-6qgdau ",
            "name ":"c0a4f8888cefbbb6-pod-autoscaler "
        },
        "managedFields ":Array[1]
    },
    "spec ":{
        "engine ":"c0a4f8888cefbbb6 ",
        "experiment ":"pod-autoscaler "
    },
    "status ":{
        "experimentStatus ":{
            "phase ":" ",
            "verdict ":" "
        },
        "history ":{
            "passedRuns ":0,
            "failedRuns ":1,
            "stoppedRuns ":0
        }
    }
}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

If CUSTOM_ANNOTATION is enabled in environment and annotation check is set to false then experiment need not be triggered

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: FEATURE REQUEST

Hey @ksatchit just curious as per this chaos operator code, we could see that if the annotationcheck is enabled then only we could see that custom_annotaion is being honored as per this code snippet

if engine.Instance.Spec.AnnotationCheck == "true" {
		// Determine whether apps with matching labels have chaos annotation set to true
		engine, err = resource.CheckChaosAnnotation(engine, clientSet, *dynamicClient)
		if err != nil {
			r.recorder.Eventf(engine.Instance, corev1.EventTypeWarning, "ChaosResourcesOperationFailed", "Unable to get chaosengine")
			chaosTypes.Log.Info("Annotation check failed with", "error:", err)
			return err
		}
	} 

I was thinking - even if its set to false / not set, if we have CUSTOM_ANNOTATION environment variable set , then we should check deployment and skip chaos experiment on pod . Just a thought , what do you think

Feature Request: Add info about the resources under chaos

Feature Request: Add info about the resources under chaos

Feature request
What happened: ChaosEngine and ChaosResult do not contain information about the specific target of the chaos.
For instance if I leave the 'appLabel' field empty, when the engine executes, how will I know on which pod, exactly did the chaos happen. Even if appLabel is filled, it would still be
helpful to know the targets of the chaos. I can only see the targeted pod if i look at the logs of the helper pod.
What you expected to happen: I expect to be able to see the names of the pods, that the ChaosEngine is injecting chaos into, as well as the deployment they belong to. Either in the
ChaosEngine or the ChaosResult.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: A 'nice to have' would be to have a verdict (from ChaosResult) for each unique deployment. For example if 1 ChaosEngine does pod-kill on 3 pods, each from
a different deployment and the verdict is Fail, does this mean that one of them failed or all 3 ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.