Giter Site home page Giter Site logo

shakenfist's Introduction

shakenfist's People

Contributors

bradh avatar dependabot[bot] avatar fifieldt avatar github-actions[bot] avatar jackadamson avatar ludvikgalois avatar mandoonandy avatar mcarden avatar mikalstill avatar shakenfist-bot avatar tristangoode avatar veenarm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

shakenfist's Issues

Events lookup fails

Jun 17 14:43:02 sf-1 sf[20328]: Traceback (most recent call last):
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1950, in full_dispatch_request
Jun 17 14:43:02 sf-1 sf[20328]:     rv = self.dispatch_request()
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1936, in dispatch_request
Jun 17 14:43:02 sf-1 sf[20328]:     return self.view_functions[rule.endpoint](**req.view_args)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/local/lib/python3.6/dist-packages/flask_restful/__init__.py", line 472, in wrapper
Jun 17 14:43:02 sf-1 sf[20328]:     return self.make_response(data, code, headers=headers)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/local/lib/python3.6/dist-packages/flask_restful/__init__.py", line 501, in make_response
Jun 17 14:43:02 sf-1 sf[20328]:     resp = self.representations[mediatype](data, *args, **kwargs)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/local/lib/python3.6/dist-packages/flask_restful/representations/json.py", line 21, in output_json
Jun 17 14:43:02 sf-1 sf[20328]:     dumped = dumps(data, **settings) + "\n"
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
Jun 17 14:43:02 sf-1 sf[20328]:     return _default_encoder.encode(obj)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
Jun 17 14:43:02 sf-1 sf[20328]:     chunks = self.iterencode(o, _one_shot=True)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
Jun 17 14:43:02 sf-1 sf[20328]:     return _iterencode(o, 0)
Jun 17 14:43:02 sf-1 sf[20328]:   File "/usr/lib/python3.6/json/encoder.py", line 180, in default
Jun 17 14:43:02 sf-1 sf[20328]:     o.__class__.__name__)
Jun 17 14:43:02 sf-1 sf[20328]: TypeError: Object of type 'generator' is not JSON serializable

Creating an instance with no network causes a crash

# sf-client instance create cirros 1 1 -d 8@cirros
Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 10, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/srv/shakenfist/src/shakenfist/client/main.py", line 405, in instance_create
    network, disk, sshkey_content, userdata_content))
  File "/srv/shakenfist/src/shakenfist/client/apiclient.py", line 77, in create_instance
    'user_data': userdata
  File "/srv/shakenfist/src/shakenfist/client/apiclient.py", line 45, in _request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.APIException: ('API request failed', 'POST', 'http://localhost:13000/instances', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 56, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 245, in post\\n    for network in args[\'network\']:\\nTypeError: \'NoneType\' object is not iterable\\n"}')

Namespaces write directly to etcd from the API code

Everything else uses the db.py abstraction, which I think has advantages in terms of keeping the format of what is in etcd consistent. We should decide if we value db.py and if we do ensure that all etcd access flows through it.

Snapshots do not handle new CDROM tweaks

# sf-client instance snapshot 4dae894a-3f89-4c7d-b8eb-04d890cf0d5d
Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 10, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/srv/shakenfist/src/shakenfist/client/main.py", line 473, in instance_snapshot
    uuid = CLIENT.snapshot_instance(instance_uuid, all)
  File "/srv/shakenfist/src/shakenfist/client/apiclient.py", line 83, in snapshot_instance
    '/snapshot', data={'all': all})
  File "/srv/shakenfist/src/shakenfist/client/apiclient.py", line 45, in _request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.APIException: ('API request failed', 'POST', 'http://localhost:13000/instances/4dae894a-3f89-4c7d-b8eb-04d890cf0d5d/snapshot', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 56, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 89, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 113, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 319, in post\\n    snap_uuid = instance_from_db_virt.snapshot(all=args[\'all\'])\\n  File \\"/srv/shakenfist/src/shakenfist/virt.py\\", line 425, in snapshot\\n    d[\'path\'], os.path.join(snappath, d[\'device\']))\\n  File \\"/srv/shakenfist/src/shakenfist/virt.py\\", line 402, in _snapshot_device\\n    images.snapshot(source, destination)\\n  File \\"/srv/shakenfist/src/shakenfist/images.py\\", line 212, in snapshot\\n    shell=True)\\n  File \\"/usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py\\", line 424, in execute\\n    cmd=sanitized_cmd)\\noslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.\\nCommand: qemu-img convert --force-share -O qcow2 /srv/shakenfist/instances/4dae894a-3f89-4c7d-b8eb-04d890cf0d5d/vdc.qcow2 /srv/shakenfist/snapshots/dc498e79-974a-49f8-a776-d0cc7f5b2f31/vdc\\nExit code: 1\\nStdout: \'\'\\nStderr: \\"qemu-img: Could not open \'/srv/shakenfist/instances/4dae894a-3f89-4c7d-b8eb-04d890cf0d5d/vdc.qcow2\': Could not open \'/srv/shakenfist/instances/4dae894a-3f89-4c7d-b8eb-04d890cf0d5d/vdc.qcow2\': No such file or directory\\\\n\\"\\n"}')

I shouldn't be trying to snapshot CDROMs anyways.

Instance delete is failing

$ for uuid in `sf-client --simple instance list | grep -v uuid | cut -f 1 -d "," | head -1`; do   sf-client instance delete $uuid; done
Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/main.py", line 533, in instance_delete
    CLIENT.delete_instance(instance_uuid)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 186, in delete_instance
    '/instances/' + instance_uuid)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 109, in _request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.APIException: ('API request failed', 'DELETE', 'http://localhost:13000/instances/23000e33-25cd-47f2-a857-6ca6721cc86f', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 91, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py\\", line 107, in wrapper\\n    verify_jwt_in_request()\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py\\", line 32, in verify_jwt_in_request\\n    jwt_data, jwt_header = _decode_jwt_from_request(request_type=\'access\')\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py\\", line 294, in _decode_jwt_from_request\\n    decoded_token = decode_token(encoded_token, csrf_token)\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/utils.py\\", line 118, in decode_token\\n    allow_expired=allow_expired\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/tokens.py\\", line 140, in decode_jwt\\n    leeway=leeway, options=options, issuer=issuer)\\n  File \\"/usr/local/lib/python3.6/dist-packages/jwt/api_jwt.py\\", line 92, in decode\\n    jwt, key=key, algorithms=algorithms, options=options, **kwargs\\n  File \\"/usr/local/lib/python3.6/dist-packages/jwt/api_jws.py\\", line 156, in decode\\n    key, algorithms)\\n  File \\"/usr/local/lib/python3.6/dist-packages/jwt/api_jws.py\\", line 223, in _verify_signature\\n    raise InvalidSignatureError(\'Signature verification failed\')\\njwt.exceptions.InvalidSignatureError: Signature verification failed\\n"}')

Longer lived clusters have etcd compaction issues

All the CI testing is done with short lived clusters, so of course I missed this...

Jul  4 10:14:53 sau-3f41e-or Failed to write /sf/node/sf-3, attempt 2: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "etcdserver: mvcc: database space exceeded"
	debug_error_string = "{"created":"@1593857693.847666930","description":"Error received from peer ipv4:127.0.0.1:2379","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"etcdserver: mvcc: database space exceeded","grpc_status":8}"
>
Jul  4 10:14:53 sau-3f41e-or Failed to collect resource statistics: Cannot write "/sf/node/sf-3"

I need to compact / defrag / expire old events regularly it seems.

While launching many VMs on lots of networks, I got this error

While launching many VMs on lots of networks, I got this error. Its weird because I only got it once out of hundreds of networks. A less used code path perhaps?

++ sf-client --simple network create 192.168.0.0/24 cybertaipan-20
++ grep uuid:
++ cut -f 2 -d :
Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/main.py", line 373, in network_create
    netblock, dhcp, nat, name, namespace))
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 275, in allocate_network
    'namespace': namespace
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 140, in _request_url
    return self._actual_request_url(method, url, data=data)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 119, in _actual_request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.APIException: ('API request failed', 'POST', 'http://localhost:13000/networks', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 104, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py\\", line 108, in wrapper\\n    return fn(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 1074, in post\\n    n.create()\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/net.py\\", line 172, in create\\n    self.deploy_nat()\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/net.py\\", line 198, in deploy_nat\\n    self.persist_floating_gateway()\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/net.py\\", line 97, in persist_floating_gateway\\n    db.persist_floating_gateway(self.uuid, self.floating_gateway)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/db.py\\", line 165, in persist_floating_gateway\\n    etcd.put(\'network\', None, network_uuid, n)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/etcd.py\\", line 56, in put\\n    encoded = json.dumps(data, indent=4, sort_keys=True)\\n  File \\"/usr/lib/python3.6/json/__init__.py\\", line 238, in dumps\\n    **kw).encode(obj)\\n  File \\"/usr/lib/python3.6/json/encoder.py\\", line 201, in encode\\n    chunks = list(chunks)\\n  File \\"/usr/lib/python3.6/json/encoder.py\\", line 430, in _iterencode\\n    yield from _iterencode_dict(o, _current_indent_level)\\n  File \\"/usr/lib/python3.6/json/encoder.py\\", line 404, in _iterencode_dict\\n    yield from chunks\\n  File \\"/usr/lib/python3.6/json/encoder.py\\", line 437, in _iterencode\\n    o = _default(o)\\n  File \\"/usr/lib/python3.6/json/encoder.py\\", line 180, in default\\n    o.__class__.__name__)\\nTypeError: Object of type \'IPv4Address\' is not JSON serializable\\n"}')

Creating networks is slow

$ hyperfine "sf-client network create 192.168.1.0/24 mynet"
Benchmark #1: sf-client network create 192.168.1.0/24 mynet
  Time (mean ± σ):     18.615 s ±  0.550 s    [User: 361.9 ms, System: 40.3 ms]
  Range (min … max):   17.725 s … 19.362 s    10 runs

Its interesting, because the individual events aren't super slow:

$ sf-client network events ee744264-6d0c-423e-9823-dc36f7ca5aa5
+----------------------------+------+------------------------+------------+---------------------+---------+
|         timestamp          | node |       operation        |   phase    |       duration      | message |
+----------------------------+------+------------------------+------------+---------------------+---------+
| 2020-06-17 19:01:22.428386 | sf-1 |          api           |   create   |         None        |   None  |
| 2020-06-17 19:01:24.115698 | sf-1 | create vxlan interface |   start    |         None        |   None  |
| 2020-06-17 19:01:24.479220 | sf-1 | create vxlan interface |   finish   |  0.3634636402130127 |   None  |
| 2020-06-17 19:01:24.829187 | sf-1 |  create vxlan bridge   |   start    |         None        |   None  |
| 2020-06-17 19:01:25.263628 | sf-1 |  create vxlan bridge   |   finish   | 0.43483662605285645 |   None  |
| 2020-06-17 19:01:25.598291 | sf-1 |      create netns      |   start    |         None        |   None  |
| 2020-06-17 19:01:26.039656 | sf-1 |      create netns      |   finish   | 0.43940114974975586 |   None  |
| 2020-06-17 19:01:26.391639 | sf-1 |   create router veth   |   start    |         None        |   None  |
| 2020-06-17 19:01:26.978238 | sf-1 |   create router veth   |   finish   |  0.5865623950958252 |   None  |
| 2020-06-17 19:01:27.326934 | sf-1 |  create physical veth  |   start    |         None        |   None  |
| 2020-06-17 19:01:27.770758 | sf-1 |  create physical veth  |   finish   | 0.44352030754089355 |   None  |
| 2020-06-17 19:01:30.158449 | sf-1 | enable virtual routing |   start    |         None        |   None  |
| 2020-06-17 19:01:30.698722 | sf-1 | enable virtual routing |   finish   |  0.5399155616760254 |   None  |
| 2020-06-17 19:01:31.046316 | sf-1 |       enable nat       |   start    |         None        |   None  |
| 2020-06-17 19:01:31.962038 | sf-1 |       enable nat       |   finish   |  0.9159204959869385 |   None  |
| 2020-06-17 19:01:32.296439 | sf-1 |      ensure mesh       |   start    |         None        |   None  |
| 2020-06-17 19:01:32.952852 | sf-1 |     discover mesh      |   start    |         None        |   None  |
| 2020-06-17 19:01:33.302200 | sf-1 |     discover mesh      |   finish   | 0.34851765632629395 |   None  |
| 2020-06-17 19:01:33.636606 | sf-1 |      ensure mesh       |   finish   |  1.3521108627319336 |   None  |
| 2020-06-17 19:01:33.944221 | sf-1 |      update dhcp       |   start    |         None        |   None  |
| 2020-06-17 19:01:36.005281 | sf-1 |      update dhcp       |   finish   |  2.0472848415374756 |   None  |
| 2020-06-17 19:01:36.338392 | sf-1 |      ensure mesh       |   start    |         None        |   None  |
| 2020-06-17 19:01:37.006036 | sf-1 |     discover mesh      |   start    |         None        |   None  |
| 2020-06-17 19:01:37.354087 | sf-1 |     discover mesh      |   finish   | 0.34702134132385254 |   None  |
| 2020-06-17 19:01:37.701589 | sf-1 |      ensure mesh       |   finish   |  1.3621313571929932 |   None  |
| 2020-06-17 19:02:24.758767 | sf-1 |          api           | get events |         None        |   None  |
+----------------------------+------+------------------------+------------+---------------------+---------+

Creating an instance fails but returns 200

Creating an instance randomly returns an empty data structure with a 200 OK.
The instance is subsequently created.

**************************
*** Create an instance ***
**************************
UUID: 
Name: 
CPUs: 0
Memory (MB): 0
Disks:
SSHKey: 
Node: 
ConsolePort: 0
VDIPort: 0
UserData: 
State: 
StateUpdated: 1970-01-01 10:00:00 +1000 AEST

API Endpoints do not conform to POST/PUT standardisation

Is the endpoint API the best design?

Currently:
POST auth/namespace/<namespace>/metadata/ creates a key

Would it be more appropriate:
POST auth/namespace/<namespace>/metadata creates a key
PUT auth/namespace/<namespace>/metadata/<key> creates or updates a key

Doesn't add functionality, only polish.

Error on GET /networks/uuid

Network UUID does exist. On request the error:

Jun 23 08:04:18 andy-200623-fiu9eiqu-sf-1 External API request: <bound method arg_is_network_uuid.<locals>.wrapper of <shakenfist.external_api.app.Network object at 0x7f18763a1da0>> () {'network_uuid': '00997cbc-1e97-4a3d-b55b-3720117fb71a'}
Jun 23 08:04:18 andy-200623-fiu9eiqu-sf-1 Returning API error: 500, server error#012    Traceback (most recent call last):#012      File "/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py", line 91, in wrapper#012        return func(*args, **kwargs)#012      File "/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py", line 108, in wrapper#012        return fn(*args, **kwargs)#012      File "/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py", line 173, in wrapper#012        return func(*args, **kwargs)#012      File "/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py", line 654, in get#012        if network_from_db is not None:#012    KeyError: 'ipmanager'

Use of reserved mac address

virsh start sf:edfbb915-704c-49ce-9ed3-4ad53795c364
error: Failed to start domain sf:edfbb915-704c-49ce-9ed3-4ad53795c364
error: unsupported configuration: Unable to use MAC address starting with reserved value 0xFE - 'fe:94:0c:39:75:c6' - 

Handling of deleting already deleted networks

root@sf-1:/srv/shakenfist# sf-client network delete 2f40b085-cdfc-4a2c-9031-a17f7b48273a 
Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/main.py", line 412, in network_delete
    CLIENT.delete_network(network_uuid)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 249, in delete_network
    '/networks/' + network_uuid)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 131, in _request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.APIException: ('API request failed', 'DELETE', 'http://localhost:13000/networks/2f40b085-cdfc-4a2c-9031-a17f7b48273a', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 104, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/flask_jwt_extended/view_decorators.py\\", line 108, in wrapper\\n    return fn(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 213, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 259, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 242, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/external_api/app.py\\", line 999, in delete\\n    n = net.from_db(network_uuid)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/net.py\\", line 37, in from_db\\n    namespace=dbnet[\'namespace\'])\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/net.py\\", line 55, in __init__\\n    ipm = db.get_ipmanager(self.uuid)\\n  File \\"/usr/local/lib/python3.6/dist-packages/shakenfist/db.py\\", line 151, in get_ipmanager\\n    raise Exception(\'IP Manager not found for network %s\' % network_uuid)\\nException: IP Manager not found for network 2f40b085-cdfc-4a2c-9031-a17f7b48273a\\n"}')
root@sf-1:/srv/shakenfist# ```

Creating instances is slow

Creating instances is slower than I expected too:

$ hyperfine "sf-client instance create myinst 1 1 -d 8@cirros -n e67e2be9-2736-44c2-a56c-2a0575a882c6"
Benchmark #1: sf-client instance create myinst 1 1 -d 8@cirros -n e67e2be9-2736-44c2-a56c-2a0575a882c6
  Time (mean ± σ):     60.729 s ± 11.505 s    [User: 366.3 ms, System: 47.4 ms]
  Range (min … max):   48.186 s … 79.898 s    10 runs

With these sorts of events:

$ sf-client instance events 416dd877-6604-4130-a5f6-a8a553682ce2
+----------------------------+------+----------------------------+-----------------------------+---------------------+--------------------------+
|         timestamp          | node |         operation          |            phase            |       duration      |         message          |
+----------------------------+------+----------------------------+-----------------------------+---------------------+--------------------------+
| 2020-06-18 07:08:11.256338 | sf-1 |       uuid allocated       |             None            |         None        |           None           |
| 2020-06-18 07:08:13.595873 | sf-1 |          schedule          |            start            |         None        |           None           |
| 2020-06-18 07:08:15.261052 | sf-1 |          schedule          |      Initial candidates     |         None        | ['sf-1', 'sf-2', 'sf-3'] |
| 2020-06-18 07:08:15.594456 | sf-1 |          schedule          |    Have enough actual CPU   |         None        | ['sf-1', 'sf-2', 'sf-3'] |
| 2020-06-18 07:08:15.927853 | sf-1 |          schedule          |     Have enough idle CPU    |         None        | ['sf-1', 'sf-2', 'sf-3'] |
| 2020-06-18 07:08:16.261326 | sf-1 |          schedule          |     Have enough idle RAM    |         None        | ['sf-1', 'sf-2', 'sf-3'] |
| 2020-06-18 07:08:16.594862 | sf-1 |          schedule          |    Have enough idle disk    |         None        | ['sf-1', 'sf-2', 'sf-3'] |
| 2020-06-18 07:08:19.280864 | sf-1 |          schedule          | Have most matching networks |         None        |         ['sf-3']         |
| 2020-06-18 07:08:19.947462 | sf-1 |          schedule          |  Have most matching images  |         None        |         ['sf-3']         |
| 2020-06-18 07:08:20.280687 | sf-1 |          schedule          |            finish           |  6.684841871261597  |           None           |
| 2020-06-18 07:08:21.280062 | sf-1 |         placement          |             None            |         None        |           sf-3           |
| 2020-06-18 07:08:21.956380 | sf-3 |          schedule          |            start            |         None        |           None           |
| 2020-06-18 07:08:23.084216 | sf-3 |          schedule          |      Forced candidates      |         None        |         ['sf-3']         |
| 2020-06-18 07:08:23.276393 | sf-3 |          schedule          |      Initial candidates     |         None        |         ['sf-3']         |
| 2020-06-18 07:08:23.468592 | sf-3 |          schedule          |    Have enough actual CPU   |         None        |         ['sf-3']         |
| 2020-06-18 07:08:23.718679 | sf-3 |          schedule          |     Have enough idle CPU    |         None        |         ['sf-3']         |
| 2020-06-18 07:08:24.026666 | sf-3 |          schedule          |     Have enough idle RAM    |         None        |         ['sf-3']         |
| 2020-06-18 07:08:24.207056 | sf-3 |          schedule          |    Have enough idle disk    |         None        |         ['sf-3']         |
| 2020-06-18 07:08:25.343608 | sf-3 |          schedule          | Have most matching networks |         None        |         ['sf-3']         |
| 2020-06-18 07:08:25.902487 | sf-3 |          schedule          |  Have most matching images  |         None        |         ['sf-3']         |
| 2020-06-18 07:08:26.152653 | sf-3 |          schedule          |            finish           |  4.278729200363159  |           None           |
| 2020-06-18 07:08:40.168090 | sf-3 |   ensure networks exist    |            start            |         None        |           None           |
| 2020-06-18 07:08:51.337100 | sf-3 |   ensure networks exist    |            finish           |  11.136167287826538 |           None           |
| 2020-06-18 07:08:51.670927 | sf-3 |     instance creation      |            start            |         None        |           None           |
| 2020-06-18 07:08:52.004922 | sf-3 |     make config drive      |            start            |         None        |           None           |
| 2020-06-18 07:08:53.204224 | sf-3 |     make config drive      |            finish           |  1.2850000858306885 |           None           |
| 2020-06-18 07:08:53.366705 | sf-3 |        fetch image         |            start            |         None        |           None           |
| 2020-06-18 07:08:55.677333 | sf-3 |        fetch image         |            finish           |  2.224187135696411  |           None           |
| 2020-06-18 07:08:56.010746 | sf-3 |      transcode image       |            start            |         None        |           None           |
| 2020-06-18 07:08:56.344267 | sf-3 |      transcode image       |            finish           | 0.33343958854675293 |           None           |
| 2020-06-18 07:08:56.677983 | sf-3 |        resize image        |            start            |         None        |           None           |
| 2020-06-18 07:08:57.011498 | sf-3 |        resize image        |            finish           |  0.3337991237640381 |           None           |
| 2020-06-18 07:08:57.200808 | sf-3 | create copy on write layer |            start            |         None        |           None           |
| 2020-06-18 07:08:57.387142 | sf-3 | create copy on write layer |            finish           | 0.18468737602233887 |           None           |
| 2020-06-18 07:08:57.874770 | sf-3 |     create domain XML      |            start            |         None        |           None           |
| 2020-06-18 07:08:59.019109 | sf-3 |     create domain XML      |            finish           |  1.0588061809539795 |           None           |
| 2020-06-18 07:08:59.207091 | sf-3 |       create domain        |            start            |         None        |           None           |
| 2020-06-18 07:09:01.591925 | sf-3 |       create domain        |            finish           |  2.300135374069214  |           None           |
| 2020-06-18 07:09:02.321802 | sf-3 |     instance creation      |            finish           |  10.73669147491455  |           None           |
| 2020-06-18 07:09:03.871757 | sf-1 |            api             |        get interfaces       |         None        |           None           |
| 2020-06-18 07:22:55.862299 | sf-1 |            api             |          get events         |         None        |           None           |
+----------------------------+------+----------------------------+-----------------------------+---------------------+--------------------------+

gunicorn has a request timeout

So... gunicorn has a request timeout. The default is 30 seconds, although I am currently changing that to 300 seconds. Why? Well, fetching a large image for an instance start can take a long time. The file might be hundreds of gig! I think long term we might want to move to a queue system for instance starts, but I see that as a v0.3 thing not a v0.2 thing.

Better support stock windows instances

Windows lacks virtio drivers by default. A user of Shaken Fist should be able to express the need to provide windows supported devices inside an instance fully via the API. Specifically, network cards need to have their model exposed for configuration.

sf-client should gracefully handle common exceptions

sf-client should not have unhandled exceptions, especially resource not found.

Traceback (most recent call last):
  File "/usr/local/bin/sf-client", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/main.py", line 230, in network_show
    _show_network(ctx, CLIENT.get_network(network_uuid))
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 206, in get_network
    '/networks/' + network_uuid)
  File "/usr/local/lib/python3.6/dist-packages/shakenfist/client/apiclient.py", line 105, in _request_url
    'API request failed', method, url, r.status_code, r.text)
shakenfist.client.apiclient.ResourceNotFoundException: ('API request failed', 'GET', 'http://localhost:13000/networks/021adbe5-af42-4223-b6f9-6054f8c63649', 404, '{"error": "network not found", "status": 404}')

Namespace Auth not propogating to other servers??

New errors with latest commits.

  1. Attempt to create instance returns empty data structure.
    Namespace: testspace
    Call was to sf-1. Instantiated on sf-3. Success but empty data structure returned.

  2. Subsequent call to /instances, returned 404 - no URL.

Jun 28 10:53:24 andy-200628-amai3ieg-sf-1 API request: POST http://localhost:13000/auth
    Headers:
        ('Host', 'localhost:13000')
        ('User-Agent', 'Go-http-client/1.1')
        ('Content-Length', '41')
        ('Authorization', 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE1OTMzNDE2MDEsIm5iZiI6MTU5MzM0MTYwMSwianRpIjoiODQ2ZDRkOWItMjE3MS00ZmNmLTgzZDYtYjk4OTFlMzQwOTZkIiwiZXhwIjoxNTkzMzQyNTAxLCJpZGVudGl0eSI6InRlc3RzcGFjZSIsImZyZXNoIjpmYWxzZSwidHlwZSI6ImFjY2VzcyJ9.BwhPucz5hW7lsAeNPLSk8aEWJYznc9VtQ9wGmxnr_cU')
        ('Content-Type', 'application/json')
        ('Accept-Encoding', 'gzip')
    Args: ()
    KWargs: {'namespace': 'testspace', 'key': 'testkey'}
Jun 28 10:53:24 andy-200628-amai3ieg-sf-1 systemd-networkd[4970]: vxlan-7: Gained IPv6LL
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 systemd-networkd[4970]: phy-7-o: Gained IPv6LL
Jun 28 10:53:25 localhost gunicorn.sf.access: [14349] 127.0.0.1 - - [28/Jun/2020:10:53:25 +0000] "POST /auth HTTP/1.1" 200 302 "-" "Go-http-client/1.1"
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 API request: POST http://localhost:13000/instances
    Headers:
        ('Host', 'localhost:13000')
        ('User-Agent', 'Go-http-client/1.1')
        ('Content-Length', '232')
        ('Authorization', 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE1OTMzNDE2MDUsIm5iZiI6MTU5MzM0MTYwNSwianRpIjoiOGZiZjRhM2UtMjhjNC00ZGE2LTk5NzktNGQ5OTcyNjM2N2JiIiwiZXhwIjoxNTkzMzQyNTA1LCJpZGVudGl0eSI6InRlc3RzcGFjZSIsImZyZXNoIjpmYWxzZSwidHlwZSI6ImFjY2VzcyJ9.sBbNFOEO_XEHJpc07m1I3hlCuxPlgobXF6MSDEAUFpg')
        ('Content-Type', 'application/json')
        ('Accept-Encoding', 'gzip')
    Args: ()
    KWargs: {'name': 'golang', 'cpus': 1, 'memory': 1, 'network': [{'network_uuid': '7a32b742-a7b0-434f-aa86-4e37fe27b5d1', 'address': '', 'macaddress': '', 'model': ''}], 'disk': [{'base': 'cirros', 'size': 8, 'bus': '', 'type': 'disk'}], 'ssh_key': '', 'user_data': ''}




Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] start as candidates
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] have enough actual CPU
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] have enough idle CPU
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] have enough idle RAM
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] have enough idle disk
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-1', 'sf-2', 'sf-3'] have most matching networks
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Scheduling instance(43521ba9-47fb-464c-b445-e9228ffa6421), ['sf-3'] have most matching images
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 instance(43521ba9-47fb-464c-b445-e9228ffa6421): Finish schedule, duration 0.25 seconds
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 systemd-networkd[4970]: br-vxlan-7: Gained IPv6LL
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Fetching testspace auth token from http://sf-3:13000/auth
Jun 28 10:53:25 andy-200628-amai3ieg-sf-3 API request: POST http://sf-3:13000/auth
    Headers:
        ('Host', 'sf-3:13000')
        ('User-Agent', 'Mozilla/5.0 (Ubuntu; Linux x86_64) Shaken Fist/0.1.2')
        ('Accept-Encoding', 'gzip, deflate')
        ('Accept', '*/*')
        ('Connection', 'keep-alive')
        ('Content-Type', 'application/json')
        ('Content-Length', '87')
    Args: ()
    KWargs: {'namespace': 'testspace', 'key': 'fyhhzcrbvzqbgtgfjbmjzmmhcczkclifwirgckrjykfkrswtkq'}
Jun 28 10:53:25 andy-200628-amai3ieg-sf-3 API request: POST http://sf-3:13000/instances
    Headers:
        ('Host', 'sf-3:13000')
        ('User-Agent', 'Mozilla/5.0 (Ubuntu; Linux x86_64) Shaken Fist/0.1.2')
        ('Accept-Encoding', 'gzip, deflate')
        ('Accept', '*/*')
        ('Connection', 'keep-alive')
        ('Authorization', 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE1OTMzNDE2MDUsIm5iZiI6MTU5MzM0MTYwNSwianRpIjoiZDRiM2RjYjItMDIwNC00MWY1LThhMjMtM2JhNGNiMmQ0ODFlIiwiZXhwIjoxNTkzMzQyNTA1LCJpZGVudGl0eSI6InRlc3RzcGFjZSIsImZyZXNoIjpmYWxzZSwidHlwZSI6ImFjY2VzcyJ9.YvYm5jtrlEhglcgTnas-1_e6W-5tNzZib5h4bzSsNfg')
        ('Content-Length', '363')
    Args: ()
    KWargs: {'name': 'golang', 'cpus': 1, 'memory': 1, 'network': [{'network_uuid': '7a32b742-a7b0-434f-aa86-4e37fe27b5d1', 'address': '', 'macaddress': '', 'model': ''}], 'disk': [{'base': 'cirros', 'size': 8, 'bus': '', 'type': 'disk'}], 'ssh_key': '', 'user_data': '', 'placed_on': 'sf-3', 'instance_uuid': '43521ba9-47fb-464c-b445-e9228ffa6421', 'namespace': 'testspace'}
Jun 28 10:53:25 andy-200628-amai3ieg-sf-3 Returning API error: 401, only admins can create resources in a different namespace
Jun 28 10:53:25 andy-200628-amai3ieg-sf-1 Returning proxied request: 401, {"error": "only admins can create resources in a different namespace", "status": 401}
Jun 28 10:53:25 localhost gunicorn.sf.access: [14349] 127.0.0.1 - - [28/Jun/2020:10:53:25 +0000] "POST /instances HTTP/1.1" 401 85 "-" "Go-http-client/1.1"
Jun 28 10:53:26 localhost gunicorn.sf.access: [14358] 127.0.0.1 - - [28/Jun/2020:10:53:26 +0000] "GET /instances/ HTTP/1.1" 404 232 "-" "Go-http-client/1.1"```

We should check that an image fits into the disk size specified

And report a nice error instead of crashing...

shakenfist.client.apiclient.APIException: ('API request failed', 'POST', 'http://localhost:13000/instances', 500, '{"error": "server error", "status": 500, "traceback": "Traceback (most recent call last):\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 83, in wrapper\\n    return func(*args, **kwargs)\\n  File \\"/srv/shakenfist/src/shakenfist/daemons/external_api.py\\", line 371, in post\\n    instance.create()\\n  File \\"/srv/shakenfist/src/shakenfist/virt.py\\", line 172, in create\\n    hashed_image_path, str(disk[\'size\']) + \'G\')\\n  File \\"/srv/shakenfist/src/shakenfist/images.py\\", line 190, in resize_image\\n    shell=True)\\n  File \\"/usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py\\", line 424, in execute\\n    cmd=sanitized_cmd)\\noslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.\\nCommand: qemu-img resize /srv/shakenfist/image_cache/579708eb995f49304bb123b16851e3e45a7b95d31ddd0e5dbac6451bb28994a5.v001.qcow2.2G 2G\\nExit code: 1\\nStdout: \'\'\\nStderr: \\"qemu-img: warning: Shrinking an image will delete all data beyond the shrunken image\'s end. Before performing such an operation, make sure there is no important data there.\\\\nqemu-img: Use the --shrink option to perform a shrink operation.\\\\n\\"\\n"}')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.