clusterhq / flocker-ceph-vagrant Goto Github PK
View Code? Open in Web Editor NEWA local vagrant environment for flocker and ceph using virtualbox.
License: MIT License
A local vagrant environment for flocker and ceph using virtualbox.
License: MIT License
DUE to a recent issue/incompatibility in 'ansible-ceph' (with latest debian/ubuntu ceph release), see ceph/ceph-ansible#788
for the deployment to work, you need to apply following ceph/ceph-ansible@85fb03f
Without the change/fix the deployment fails with errors:
TASK [ceph.ceph-common : configure cluster name] *******************************
fatal: [ceph2]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpM8FvRb to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
fatal: [ceph3]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpdhnk_5 to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
fatal: [ceph4]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpFoO0Vc to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
...
PLAY RECAP *********************************************************************
ceph2 : ok=51 changed=7 unreachable=0 failed=1
ceph3 : ok=50 changed=7 unreachable=0 failed=1
ceph4 : ok=50 changed=7 unreachable=0 failed=1
After
ceph -s
returns health HEALTH_OK
As proposed in the tutorial, I ran:
vagrant ssh ceph3 -c 'sudo docker volume create -d flocker --name test2g -o size=2G'
and waited some time.. but no flocker volume gets created ( vagrant ssh ceph3 -c "sudo df -h | grep flocker"
returns nothing. ps: I double checked without the grep! )
I checked the logs (and googled a bit for similar issues) but cannot see what's going wrong.
vagrant@ceph2:/var/log/flocker$ ls -lart
...
-rw-r--r-- 1 root root 2323 May 23 12:56 flocker-docker-plugin.log
-rw-r--r-- 1 root root 15492 May 23 12:57 flocker-container-agent.log
-rw-r--r-- 1 root root 90136 May 23 12:58 flocker-dataset-agent.log
No errors found in the first 2 log files.. but many 'exceptions' found in the 'dataset' one (always the same 2-3 exceptions are repeated .. it looks like flocker repeatedly tries to get it working, but always fails).
Here's an extract , containing 2x the group of the 3 exceptions plus any logs around
{"exception": "subprocess.CalledProcessError", "task_level": [2, 2, 3], "action_type": "flocker:agent:discovery", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009448.802584, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "failed"}
{"exception": "subprocess.CalledProcessError", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009448.804318, "traceback": "Traceback: <class 'subprocess.CalledProcessError'>: Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_logging.py:102:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:534:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:592:output\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/_loop.py:529:output_CONVERGE\n--- <exception caught here> ---\n/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py:150:maybeDeferred\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1798:discover_state\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1345:_count_calls\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1744:_discover_raw_state\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:259:list_volumes\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:121:_list_maps\n/usr/lib/python2.7/subprocess.py:573:check_output\n", "message_type": "eliot:traceback", "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "task_level": [2, 3]}
{"task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "log_level": "INFO", "timestamp": 1464009448.804492, "message": "Intentionally delaying the next iteration of the convergence loop to avoid RequestLimitExceeded.", "message_type": "flocker:node:_loop:delay", "current_wait": 10, "task_level": [2, 4]}
{"delay": 10, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "log_level": "INFO", "timestamp": 1464009448.804658, "message": "Delaying until next convergence loop.", "message_type": "flocker:node:_loop:CONVERGE:delay", "task_level": [2, 5]}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=SLEEP>", "timestamp": 1464009448.804929, "fsm_rich_input": "<_Sleep>", "action_status": "started", "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=CONVERGING>", "task_level": [2, 6, 1]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [2, 6, 2], "action_type": "fsm:transition", "timestamp": 1464009448.805181, "fsm_output": ["<ConvergenceLoopOutputs=SCHEDULE_WAKEUP>"], "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "succeeded"}
{"timestamp": 1464009448.805348, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_type": "flocker:agent:converge", "action_status": "succeeded", "task_level": [2, 7]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [3], "action_type": "fsm:transition", "timestamp": 1464009448.805503, "fsm_output": ["<ConvergenceLoopOutputs=CLEAR_WAKEUP>", "<ConvergenceLoopOutputs=CONVERGE>"], "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "succeeded"}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=WAKEUP>", "timestamp": 1464009458.810326, "fsm_rich_input": null, "action_status": "started", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [1]}
{"timestamp": 1464009458.811234, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:converge", "action_status": "started", "task_level": [2, 1]}
{"timestamp": 1464009458.811765, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:discovery", "action_status": "started", "task_level": [2, 2, 1]}
{"count": 134, "function": "_discover_raw_state", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "timestamp": 1464009458.812132, "message_type": "flocker:node:agents:blockdevice:list_volumes", "task_level": [2, 2, 2]}
{"exception": "subprocess.CalledProcessError", "task_level": [2, 2, 3], "action_type": "flocker:agent:discovery", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009458.840989, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "failed"}
{"exception": "subprocess.CalledProcessError", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009458.841359, "traceback": "Traceback: <class 'subprocess.CalledProcessError'>: Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_logging.py:102:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:534:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:592:output\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/_loop.py:529:output_CONVERGE\n--- <exception caught here> ---\n/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py:150:maybeDeferred\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1798:discover_state\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1345:_count_calls\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1744:_discover_raw_state\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:259:list_volumes\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:121:_list_maps\n/usr/lib/python2.7/subprocess.py:573:check_output\n", "message_type": "eliot:traceback", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "task_level": [2, 3]}
{"task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "log_level": "INFO", "timestamp": 1464009458.841584, "message": "Intentionally delaying the next iteration of the convergence loop to avoid RequestLimitExceeded.", "message_type": "flocker:node:_loop:delay", "current_wait": 10, "task_level": [2, 4]}
{"delay": 10, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "log_level": "INFO", "timestamp": 1464009458.84175, "message": "Delaying until next convergence loop.", "message_type": "flocker:node:_loop:CONVERGE:delay", "task_level": [2, 5]}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=SLEEP>", "timestamp": 1464009458.842023, "fsm_rich_input": "<_Sleep>", "action_status": "started", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=CONVERGING>", "task_level": [2, 6, 1]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [2, 6, 2], "action_type": "fsm:transition", "timestamp": 1464009458.842266, "fsm_output": ["<ConvergenceLoopOutputs=SCHEDULE_WAKEUP>"], "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "succeeded"}
{"timestamp": 1464009458.842427, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:converge", "action_status": "succeeded", "task_level": [2, 7]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [3], "action_type": "fsm:transition", "timestamp": 1464009458.84258, "fsm_output": ["<ConvergenceLoopOutputs=CLEAR_WAKEUP>", "<ConvergenceLoopOutputs=CONVERGE>"], "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "succeeded"}
Where else to look to understand what is broken?
Any other logs to check?
What logs should I provide?
I can provide the sudo flocker-diagnostics
tar!
some infos about my dev env:
Mac-OS-X 10.9.5 (yeah I need to upgrade finally I know!) with 16GB RAM (Apple gives us 32GB in coming MacBooks plz!!)
Virtualbox v5.0.14
Vagrant v1.8.1
ansible 2.0.1
Hi, I have a similar issue with #6 but in my case i am able to create the volume but when I try to run it using this (I am using bare metal, with the same no. of nodes instead of Vagrant) -
ssh ceph3 "sudo docker run --volume-driver flocker
-v test:/data --name test-container -itd busybox"
docker: Error response from daemon: Timed out waiting for dataset to mount..
All apis returning results -
[thbeh@ceph3 ~]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/state/datasets | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 170 0 170 0 0 982 0 --:--:-- --:--:-- --:--:-- 988
[
{
"dataset_id": "9a7e11c6-bcdd-48ad-8909-d6f3e06acd5f",
"maximum_size": 10737418240
},
{
"dataset_id": "031f903e-feee-46a7-94e6-156d6c650613",
"maximum_size": 10737418240
}
]
[thbeh@ceph3 ~]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/configuration/datasets | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 431 0 431 0 0 3970 0 --:--:-- --:--:-- --:--:-- 3954
[
{
"dataset_id": "9a7e11c6-bcdd-48ad-8909-d6f3e06acd5f",
"deleted": false,
"maximum_size": 10737418240,
"metadata": {
"maximum_size": "10737418240",
"name": "test1"
},
"primary": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
},
{
"dataset_id": "031f903e-feee-46a7-94e6-156d6c650613",
"deleted": false,
"maximum_size": 10737418240,
"metadata": {
"maximum_size": "10737418240",
"name": "test"
},
"primary": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
}
]
[
[thbeh@ceph2 ceph-ansible]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/state/nodes | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 225 0 225 0 0 710 0 --:--:-- --:--:-- --:--:-- 712
[
{
"host": "192.168.20.14",
"uuid": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
},
{
"host": "192.168.20.13",
"uuid": "db84d992-5c81-418a-8aa3-29ce0b6aefcf"
},
{
"host": "192.168.20.15",
"uuid": "25ba98c3-f4ed-4956-91b2-6ec2a3688b62"
}
]
This issue is being fixed here https://github.com/ClusterHQ/ansible-role-flocker/pull/3/files
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.