chaostoolkit / chaostoolkit-lib Goto Github PK
View Code? Open in Web Editor NEWThe Chaos Toolkit core library
Home Page: https://chaostoolkit.org/
License: Apache License 2.0
The Chaos Toolkit core library
Home Page: https://chaostoolkit.org/
License: Apache License 2.0
As of chaoslib 0.12, hvac is not installed by default but it will fail on loading secrets because of that.
According to documentation i can only get responses from an http method as string.
But tolerance using [int, int] only allows int and fails on the steady state validation.
Example:
{
"type": "probe",
"name": "CCCCC",
"tolerance":: [1,10000],
"provider": {
"type": "http",
"url": "http://localhost:3000/metrics/query",
"method": "POST",
"arguments": {
"query": any service returning a string",
"datasource": 161
}
}
Currently we don't load yaml using the safe mechanism of the yaml package. This should be the case as we read from unsafe places.
Hi,
I just noticed that it is allowed to use probes inside method
as opposed to steady-state-hypothesis
but at the same time tolerance is not validated when doing so.
I find it useful to use probes inside method
, first example use case I have is when I don't want to run probe both before and after some actions. Instead the probe can be used to validate some conditions in the middle of experiment. For example, I stop a random instance in ASG and if it's not marked unhealthy (or perhaps is not replaced fast) I don't want to continue with the experiment.
Please advice if this behavior is by intention or by accident.
Original content from docs: # Adding Time Constraints to an Experiment
It is a common requirement to execute a chaos experiment for a certain period of
time, ending the experiment if it goes on indefinitely.
We've been very careful to rely on other tools for these types of concerns, and
so timing constraints are not a built-in feature of the Chaos Toolkit's
experiments.
Currently, only the current context (steady-state, method, activity...) is passed down to a control fonction. We should also pass the experiment as it contains a larger context that may be useful as well.
Ideally this should go into 1.0.0rc2
In order to support #33, it will be necessary to store settings for the toolkit.
The control level to determine the Python function to call is overriden and should be preserved.
We can't rely on ModuleNotFoundError which was defined Python 3.6 and invalid in 3.5
In the validation function, the name variable is undeclared.
Hi,
I've been testing chaostoolkit
and stumbled upon below scenario:
During a successful experiment run, rollback
was unsuccessful, changing the system and basically bringing the app down, yet experiment was successful:
app-must-be-healthy
is a probe ref of steady-state-hypothesis
"rollbacks": [
{
"type": "action",
"name": "restart-app",
"provider": {
"type": "process",
....
....
},
"pauses": {
"after": 5
}
},
{
"ref": "app-must-be-healthy"
}
]
chaostoolkit_1 | [2019-04-04 13:48:13 INFO] Steady state hypothesis is met!
chaostoolkit_1 | [2019-04-04 13:48:13 INFO] Let's rollback...
chaostoolkit_1 | [2019-04-04 13:48:13 INFO] Rollback: restart-app
chaostoolkit_1 | [2019-04-04 13:48:13 INFO] Action: restart-app
chaostoolkit_1 | [2019-04-04 13:48:13 INFO] Pausing after activity for 5s...
chaostoolkit_1 | [2019-04-04 13:48:18 INFO] Rollback: None
chaostoolkit_1 | [2019-04-04 13:48:18 INFO] Probe: app-must-be-healthy
chaostoolkit_1 | [2019-04-04 13:48:18 ERROR] => failed: failed to connect to http://nginx:80/health: HTTPConnectionPool(host='nginx', port=80): Max retries exceeded with url: /health (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91218d54e0>: Failed to establish a new connection: [Errno -2] Name does not resolve',))
chaostoolkit_1 | [2019-04-04 13:48:18 INFO] Experiment ended with status: completed
Would it make sense to re-evaluate steady-state-hypothesis
and experiment result after rollback?
P.S. I hope I opened this issue correctly here and not in https://github.com/chaostoolkit :)
The tookit does its best to not have a global state and so far, there was never really a need to take the output of an activity and feed it into another activity. But there are cases when this is useful (when an operation returns an ID for instance).
Let's see how we can add this.
Currently loading experiments over HTTP forces the Accept header to static values of:
"application/json, application/x-yaml"
In some cases, this should be amended by the operator.
While Chaos Engineering make degrade the system, we should be careful not to harm too massively. So, having a mechanism to mark an action as "dangerous" could help those use cases.
From the CLI side, this could translate into asking the users before running an experiment?
Right now the discovery mechanism expects module to have a __all__
attribute. Do not fail when it is missing.
When a process returns non-utf-8 data, the activity fails quite poorly. Try to be smarter here.
I got tricked into thinking that you could call client.secrets.kv.read_secret(path)
as per the documentation but it seems the documentation is quite out of sync with the code.
It appears the toolkit doesn't tell you when a key couldn't be found in the environment.
In the readme, the Chaos Toolkit model link (http://chaostoolkit.org/overview/concepts/) is broken (404)
by the way, your 404 page contains weird content "Cloud bread lo-fi woke echo park cronut plaid banjo hammock fingerstache ennui gentrify fashion axe poke. ... " is this wanted ?
When performing local tests, a user may rely on a self-signed certificate for their server, the HTTP probe must take a parameter to disable TLS verification.
For debug purpose, it could be handy to log a message where a particular activity was loaded from.
For process calls, anything other than a 0
return code should result in a warning. For HTTP, a status code greater than 399 should trigger a warning message.
I would be nice to be able to perform actions before and after certain points in an experiment.
Some ideas/examples:
Small change, just better grammatically.
Most HTTP API are behind authorizations, it should be straightforward to provide credentials to experiments when needed.
I got the following unfriendly output when there was a collision with an existing integration extension:
chaos discover chaostoolkit-kubernetes
[2018-01-30 15:35:15 INFO] Attempting to download and install package 'chaostoolkit-kubernetes'
[2018-01-30 15:35:19 INFO] Package downloaded and installed in current environment
Traceback (most recent call last):
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaoslib/discovery/package.py", line 85, in get_importname_from_package
name = dist.get_metadata('top_level.txt').split("\n)", 1)[0]
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1493, in get_metadata
value = self._get(self._fn(self.egg_info, name))
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1605, in _get
with open(path, 'rb') as stream:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaostoolkit_kubernetes-0.8.0.dist-info/top_level.txt'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/russellmiles/.venvs/chaostk/bin/chaos", line 11, in <module>
sys.exit(cli())
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaosiq/cli.py", line 140, in discover
download_and_install=not no_install)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaoslib/discovery/discover.py", line 30, in discover
package = load_package(package_name)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaoslib/discovery/package.py", line 45, in load_package
name = get_importname_from_package(package_name)
File "/Users/russellmiles/.venvs/chaostk/lib/python3.6/site-packages/chaoslib/discovery/package.py", line 89, in get_importname_from_package
"Was the package installed properly?".format(p=package_name))
chaoslib.exceptions.DiscoveryFailed: failed to load package 'chaostoolkit-kubernetes' metadata. Was the package installed properly?
Sometimes, we want to validate the experiment in a more shallow fashion and we can't load the Python providers in those cases. Add a flag to support that case.
The current interface of chaostoolkit supports the run
of experiments. However, as part of the goal for chaostoolkit, we always wantedt o make it simpler to get into chaos engineering.
The new discover
command is aiming to collecting information about a specific target and offer suggestions about potential chaos engineering experiments.
discover
has the goal to let look up what an extension is capable of doing, as well as a summary of the platform/application this extension targets (if available) and a list of chaos experiment suggestions.
As not all extensions are Python packages, discover
should eventually be able to load a spec file which describes an extension made of process calls or HTTP calls. This may be done in a second iteration of the command.
Controls seem to be duplicated while they are applied
When the vault secret id of the app role has expired, this blows up the whole process. Catch and fail gracefully.
Currently there is a bug when a controls
block as applied at the activity level, i.e. not top level, and there are no top level controls applied either.
A fix such as the following needs to be applied to the chaos lib from line 201 onwards:
for c in controls.copy():
if "ref" in c:
for top_level_control in top_level_controls:
if c["ref"] == top_level_control["name"]:
controls.append(deepcopy(top_level_control))
break
else:
tc = None
for tc in top_level_controls:
if c.get("name") == tc.get("name"):
break
else:
if tc and tc.get("automatic", True):
controls.append(deepcopy(tc))
My hypothesis probe defines a curl request, which outputs its total time to stdout stream.
The probe defines a range tolerance of [0, 1] intended to check if the total time is within one second. Here's that probe for reference:
{
...
"steady-state-hypothesis": {
"title": "cURL www.google.com",
"probes": [
{
"type": "probe",
"name": "http google",
"tolerance": [0,1],
"provider": {
"type" : "process",
"path" : "curl",
"arguments": "-o /dev/null -w \"%{time_total}\" -s https://www.google.com"
}
}
]
},
...
}
What appears happens is the tolerance range of [0, 1] checks the status value of the process probe rather than the stdout. This means that if the output is 10.234, the hypothesis is still met.
For example, I would expect this to succeed, as stdout
is between 0 and 1
[2019-03-19 13:04:23 DEBUG] [process:54] Running: /usr/bin/curl -o /dev/null -w "%{time_total}" -s https://www.google.com
[2019-03-19 13:04:23 DEBUG] [__init__:115] Data encoding detected as 'ascii' with a confidence of 1.0
[2019-03-19 13:04:23 DEBUG] [activity:179] => succeeded with '{'stderr': '', 'stdout': '0.420', 'status': 0}'
[2019-03-19 13:04:23 DEBUG] [hypothesis:177] allowed tolerance is [0, 1]
[2019-03-19 13:04:23 INFO] [hypothesis:184] Steady state hypothesis is met!
and I would expect the following to fail as stdout
is greater than 1, but it passes as status is 0.
[2019-03-19 13:04:23 DEBUG] [process:54] Running: /usr/bin/curl -o /dev/null -w "%{time_total}" -s https://www.google.com
[2019-03-19 13:04:27 DEBUG] [__init__:115] Data encoding detected as 'ascii' with a confidence of 1.0
[2019-03-19 13:04:27 DEBUG] [activity:179] => succeeded with '{'stderr': '', 'stdout': '3.397', 'status': 0}'
[2019-03-19 13:04:27 DEBUG] [hypothesis:177] allowed tolerance is [0, 1]
[2019-03-19 13:04:27 INFO] [hypothesis:184] Steady state hypothesis is met!
Is there some way to instruct the process probe tolerance which value it needs to check, rather than just using 'status'? Looking at using the HTTP probe type there is no response time property either.
Reading vault secrets is not working as you'd expect.
Currently, the whole Vault payload of a secret is read into the chaostoolkit secret section (including the vault secret metadata). This is not what you'd expect. Also, it's not intuitive that the key
argument refers to the path
.
This issue's goal is for the community to discuss interest and solutions to support extension providers implemented in languages other than Python.
As a reminder, currently, the toolkit supports three extension providers:
While Python is considered a good choice for the core and most extensions, we always cared for larger than a single community. @dastergon asked on that subject topic on the community slack and he suggested I should kick the ball with a high-level view of what would need to be done.
Generally speaking, it seems the simplest/easiest integration for calling native code from Python is to export a native library that exports its symbols (much like a C library). When doing that, Python has facilities to call them for you with ctypes.
This is what people seem to generally do:
Alternatives to ctypes are CFFI and cython. The latter is quite interesting because you provide a C-like wrapper on your native extensions and the generated Python code makes it look fairly native. It is popular but requires more work.
There could be two paths:
{
"type": "probe",
"name": "my-go-blah",
"provider": {
"type": "go",
"lib_name": "my-go-lib.so",
"func": "func_name_in_lib",
"arguments": { ... }
}
This is what is done for Python as well but here that would expect simply a native library.
I think both are valuable but I wonder what communities would prefer.
While the current validation does an okay job, it can't handle some important cases in the data that are being passed on.
It might be useful to rely on schema validation https://github.com/keleshev/schema
At the moment there is a load_settings
function but not one to then save settings back if settings have been changed in some way.
It is not recommended to use pyyaml.load so let's use safe_load instead.
HTTP-based notifications aren't logged into the chaostoolkit.log (unless of an error) so it's hard to know if they worked.
When a before_activity control is executed, if the activity references another activity, it is not looked up before hand so the control has no real context.
Fail vaildation when jsonpath is empty
I was trying to script chaos to run continuously if the experiment was successful.
The script was simple:
chaos run ./experiments/consensus-recovery.json
while [ $? -eq 0 ]; do
chaos run ./experiments/consensus-recovery.json
done
Eventually, the experiment stopped being successful but continued to run.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.