Giter Site home page Giter Site logo

flocker-ceph-vagrant's People

Contributors

sarum90 avatar wallnerryan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flocker-ceph-vagrant's Issues

deployment does not work with latest ansible-ceph (and howto fix it)

DUE to a recent issue/incompatibility in 'ansible-ceph' (with latest debian/ubuntu ceph release), see ceph/ceph-ansible#788
for the deployment to work, you need to apply following ceph/ceph-ansible@85fb03f

Without the change/fix the deployment fails with errors:

TASK [ceph.ceph-common : configure cluster name] *******************************
fatal: [ceph2]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpM8FvRb to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
fatal: [ceph3]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpdhnk_5 to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
fatal: [ceph4]: FAILED! => {"changed": false, "failed": true, "msg": "Could not replace file: /tmp/tmpFoO0Vc to /etc/default/ceph/ceph: [Errno 20] Not a directory"}
...

PLAY RECAP *********************************************************************
ceph2                      : ok=51   changed=7    unreachable=0    failed=1   
ceph3                      : ok=50   changed=7    unreachable=0    failed=1   
ceph4                      : ok=50   changed=7    unreachable=0    failed=1   

`docker volume create -d flocker` fails

After

  • running the full deployment (as documented in the README and Tutorial), with 2 changes to make it work
    • using change #4
    • plus the change referenced in #5
  • and verifying that ceph -s returns health HEALTH_OK
    I cannot get a volume created with the flocker driver.

As proposed in the tutorial, I ran:
vagrant ssh ceph3 -c 'sudo docker volume create -d flocker --name test2g -o size=2G'
and waited some time.. but no flocker volume gets created ( vagrant ssh ceph3 -c "sudo df -h | grep flocker" returns nothing. ps: I double checked without the grep! )

I checked the logs (and googled a bit for similar issues) but cannot see what's going wrong.

vagrant@ceph2:/var/log/flocker$ ls -lart
...
-rw-r--r--  1 root root    2323 May 23 12:56 flocker-docker-plugin.log
-rw-r--r--  1 root root   15492 May 23 12:57 flocker-container-agent.log
-rw-r--r--  1 root root   90136 May 23 12:58 flocker-dataset-agent.log

No errors found in the first 2 log files.. but many 'exceptions' found in the 'dataset' one (always the same 2-3 exceptions are repeated .. it looks like flocker repeatedly tries to get it working, but always fails).
Here's an extract , containing 2x the group of the 3 exceptions plus any logs around

{"exception": "subprocess.CalledProcessError", "task_level": [2, 2, 3], "action_type": "flocker:agent:discovery", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009448.802584, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "failed"}
{"exception": "subprocess.CalledProcessError", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009448.804318, "traceback": "Traceback: <class 'subprocess.CalledProcessError'>: Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_logging.py:102:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:534:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:592:output\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/_loop.py:529:output_CONVERGE\n--- <exception caught here> ---\n/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py:150:maybeDeferred\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1798:discover_state\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1345:_count_calls\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1744:_discover_raw_state\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:259:list_volumes\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:121:_list_maps\n/usr/lib/python2.7/subprocess.py:573:check_output\n", "message_type": "eliot:traceback", "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "task_level": [2, 3]}
{"task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "log_level": "INFO", "timestamp": 1464009448.804492, "message": "Intentionally delaying the next iteration of the convergence loop to avoid RequestLimitExceeded.", "message_type": "flocker:node:_loop:delay", "current_wait": 10, "task_level": [2, 4]}
{"delay": 10, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "log_level": "INFO", "timestamp": 1464009448.804658, "message": "Delaying until next convergence loop.", "message_type": "flocker:node:_loop:CONVERGE:delay", "task_level": [2, 5]}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=SLEEP>", "timestamp": 1464009448.804929, "fsm_rich_input": "<_Sleep>", "action_status": "started", "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=CONVERGING>", "task_level": [2, 6, 1]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [2, 6, 2], "action_type": "fsm:transition", "timestamp": 1464009448.805181, "fsm_output": ["<ConvergenceLoopOutputs=SCHEDULE_WAKEUP>"], "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "succeeded"}
{"timestamp": 1464009448.805348, "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_type": "flocker:agent:converge", "action_status": "succeeded", "task_level": [2, 7]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [3], "action_type": "fsm:transition", "timestamp": 1464009448.805503, "fsm_output": ["<ConvergenceLoopOutputs=CLEAR_WAKEUP>", "<ConvergenceLoopOutputs=CONVERGE>"], "task_uuid": "ed2347d1-f4d4-45ab-ae2c-17bc61385a96", "action_status": "succeeded"}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=WAKEUP>", "timestamp": 1464009458.810326, "fsm_rich_input": null, "action_status": "started", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [1]}
{"timestamp": 1464009458.811234, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:converge", "action_status": "started", "task_level": [2, 1]}
{"timestamp": 1464009458.811765, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:discovery", "action_status": "started", "task_level": [2, 2, 1]}
{"count": 134, "function": "_discover_raw_state", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "timestamp": 1464009458.812132, "message_type": "flocker:node:agents:blockdevice:list_volumes", "task_level": [2, 2, 2]}
{"exception": "subprocess.CalledProcessError", "task_level": [2, 2, 3], "action_type": "flocker:agent:discovery", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009458.840989, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "failed"}
{"exception": "subprocess.CalledProcessError", "reason": "Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1", "timestamp": 1464009458.841359, "traceback": "Traceback: <class 'subprocess.CalledProcessError'>: Command '['rbd', '-p', 'rbd', 'showmapped']' returned non-zero exit status 1\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_logging.py:102:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:534:receive\n/opt/flocker/local/lib/python2.7/site-packages/machinist/_fsm.py:592:output\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/_loop.py:529:output_CONVERGE\n--- <exception caught here> ---\n/opt/flocker/local/lib/python2.7/site-packages/twisted/internet/defer.py:150:maybeDeferred\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1798:discover_state\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1345:_count_calls\n/opt/flocker/local/lib/python2.7/site-packages/flocker/node/agents/blockdevice.py:1744:_discover_raw_state\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:259:list_volumes\n/opt/flocker/local/lib/python2.7/site-packages/ceph_flocker_driver/ceph_rbd.py:121:_list_maps\n/usr/lib/python2.7/subprocess.py:573:check_output\n", "message_type": "eliot:traceback", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "task_level": [2, 3]}
{"task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "log_level": "INFO", "timestamp": 1464009458.841584, "message": "Intentionally delaying the next iteration of the convergence loop to avoid RequestLimitExceeded.", "message_type": "flocker:node:_loop:delay", "current_wait": 10, "task_level": [2, 4]}
{"delay": 10, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "log_level": "INFO", "timestamp": 1464009458.84175, "message": "Delaying until next convergence loop.", "message_type": "flocker:node:_loop:CONVERGE:delay", "task_level": [2, 5]}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fae760e9c50>", "fsm_input": "<ConvergenceLoopInputs=SLEEP>", "timestamp": 1464009458.842023, "fsm_rich_input": "<_Sleep>", "action_status": "started", "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=CONVERGING>", "task_level": [2, 6, 1]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [2, 6, 2], "action_type": "fsm:transition", "timestamp": 1464009458.842266, "fsm_output": ["<ConvergenceLoopOutputs=SCHEDULE_WAKEUP>"], "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "succeeded"}
{"timestamp": 1464009458.842427, "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_type": "flocker:agent:converge", "action_status": "succeeded", "task_level": [2, 7]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [3], "action_type": "fsm:transition", "timestamp": 1464009458.84258, "fsm_output": ["<ConvergenceLoopOutputs=CLEAR_WAKEUP>", "<ConvergenceLoopOutputs=CONVERGE>"], "task_uuid": "de55d127-f2e6-4e5c-b752-26859e1b5431", "action_status": "succeeded"}

Where else to look to understand what is broken?
Any other logs to check?
What logs should I provide?
I can provide the sudo flocker-diagnosticstar!

some infos about my dev env:
Mac-OS-X 10.9.5 (yeah I need to upgrade finally I know!) with 16GB RAM (Apple gives us 32GB in coming MacBooks plz!!)
Virtualbox v5.0.14
Vagrant v1.8.1
ansible 2.0.1

docker run with flocker failed

Hi, I have a similar issue with #6 but in my case i am able to create the volume but when I try to run it using this (I am using bare metal, with the same no. of nodes instead of Vagrant) -
ssh ceph3 "sudo docker run --volume-driver flocker
-v test:/data --name test-container -itd busybox"

docker: Error response from daemon: Timed out waiting for dataset to mount..

All apis returning results -
[thbeh@ceph3 ~]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/state/datasets | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 170 0 170 0 0 982 0 --:--:-- --:--:-- --:--:-- 988
[
{
"dataset_id": "9a7e11c6-bcdd-48ad-8909-d6f3e06acd5f",
"maximum_size": 10737418240
},
{
"dataset_id": "031f903e-feee-46a7-94e6-156d6c650613",
"maximum_size": 10737418240
}
]

[thbeh@ceph3 ~]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/configuration/datasets | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 431 0 431 0 0 3970 0 --:--:-- --:--:-- --:--:-- 3954
[
{
"dataset_id": "9a7e11c6-bcdd-48ad-8909-d6f3e06acd5f",
"deleted": false,
"maximum_size": 10737418240,
"metadata": {
"maximum_size": "10737418240",
"name": "test1"
},
"primary": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
},
{
"dataset_id": "031f903e-feee-46a7-94e6-156d6c650613",
"deleted": false,
"maximum_size": 10737418240,
"metadata": {
"maximum_size": "10737418240",
"name": "test"
},
"primary": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
}
]
[

[thbeh@ceph2 ceph-ansible]$ ssh ceph1 "sudo curl --cacert /etc/flocker/cluster.crt
--cert /etc/flocker/plugin.crt
--key /etc/flocker/plugin.key
--header 'Content-type: application/json'
https://ceph1:4523/v1/state/nodes | python -m json.tool"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 225 0 225 0 0 710 0 --:--:-- --:--:-- --:--:-- 712
[
{
"host": "192.168.20.14",
"uuid": "1333fb7a-7d92-4d52-a78e-1739acc34bd4"
},
{
"host": "192.168.20.13",
"uuid": "db84d992-5c81-418a-8aa3-29ce0b6aefcf"
},
{
"host": "192.168.20.15",
"uuid": "25ba98c3-f4ed-4956-91b2-6ec2a3688b62"
}
]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.