Giter Site home page Giter Site logo

Comments (14)

andybouts avatar andybouts commented on July 4, 2024

FYI, my CAS cluster would not start because of this ...

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ kk describe pod sas-cas-server-default-controller 
<<<>>> 
Events:
  Type     Reason       Age                     From                                  Message
  ----     ------       ----                    ----                                  -------
  Warning  FailedMount  3m40s (x156 over 109m)  kubelet, aks-cas-20262631-vmss000000  (combined from similar events): MountVolume.SetUp failed for volume "nfs-homes" : mount failed: exit status 32

from viya4-deployment.

thpang avatar thpang commented on July 4, 2024

Did you verify that the directories and mount points were created correctly on the NFS/Jump servers? You can create the locations; however, you need to ensure that the permissions are correct. On the Jump server you'll find the /viya-share directory not the /export directory. That directory is created on the NFS server.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

It seems that ansible on the jump-vm may not have been able to ssh back to itself to create the directories for the mounts ...
still investigating.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

Since I am using the Docker command for the deployment instead of Ansible directly, I am struggling to figure out how to configure Ansible that isn't installed, but is instead a function of ansible inside the deployment container to ssh back to the jumpiest. Maybe sleep and coffee will help.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

I modified the Ansible configuration file at .~/viya4-deployment/ansible.cfg and uncommented the line so that the Ansible logs would be written out to ./ansible.log, but that did not work, so I suspect that the Docker container has it's own Ansible.cfg file which still has the logging commented out. Unless someone has a better idea, I'll retry with a pipe of the log into a file.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

The 4 CAS pods are still not starting up ... I am going to try a clean / fresh deployment with different ansible-vars.yaml values for the jump-host.

Yesterday, I re-ran the deployment from a clean / fresh IAC with the following command to re-direct the output to a file, note that I also pre-created the directories in question:

########
#deploy everything with Viya
########

# prereqs
sudo mkdir -p /viya-share/pvs/mtes-tt/astores
sudo mkdir -p /viya-share/pvs/mtes-tt/bin
sudo mkdir -p /viya-share/pvs/mtes-tt/data
sudo mkdir -p /viya-share/pvs/mtes-tt/homes
sudo chmod -R 777 /viya-share/pvs/mtes-tt
sudo chown -R nobody:nogroup /viya-share/pvs/mtes-tt

# deploy it
docker run --rm \
  --group-add root \
  --user $(id -u):$(id -g) \
  --volume $HOME/viya4-deployment:/data \
  --volume $HOME/viya4-deployment/deployments/MTES-TT-cluster/ansible-vars.yaml:/config/config \
  --volume $HOME/viya4-deployment/deployments/MTES-TT-cluster/IAC_files/terraform.tfstate:/config/tfstate \
  --volume $HOME/viya4-deployment/deployments/MTES-TT-cluster/MTES-TT-ns/site-config/sitedefault.yaml:/config/v4_cfg_sitedefault \
  viya4-deployment --tags "baseline,viya,cluster-logging,cluster-monitoring,install" **> docker.ansible.log**

The expected tasks for the jump-host in the logs are:

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ grep -ir "jump-server" .
<truncated>
./roles/jump-server/tasks/main.yml:- name: jump-server - add host
./roles/jump-server/tasks/main.yml:- name: jump-server - lookup groups
./roles/jump-server/tasks/main.yml:- name: jump-server - create folders
<truncated>

All 3 tasks are missing from the log:

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ grep -i "jump-server" docker.ansible.log
jumpuser@mTES-TT-jump-vm:~/viya4-deployment$

I have the following configuration in ansible-vars.yaml, I suppose commenting these out is breaking something, but it should be parsing from the IAC tfstate, so it seems like a valid configuration for it to be commented out:

<tuncated>
## Jump Server: https://github.com/sassoftware/viya4-deployment/blob/main/docs/CONFIG-VARS.md#jump-server
# JUMP_SVR_HOST: # automatically parsed and pulled from the IAC tfstate

## NFS
#JUMP_SVR_RWX_FILESTORE_PATH # not used since the IAC created `/viya-share/pvs`
<tuncated>

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

I believe it should be supported to run the deployment from the jump-host that is provisioned from the IAC and will keep reviewing how this can be done.

from viya4-deployment.

hahewlet avatar hahewlet commented on July 4, 2024

Did you pass in the ssh key information for your jump box when you ran docker? I've seen this behavior when that information was missing. e.g.
--volume $HOME/.ssh/id_rsa:/config/jump_svr_private_key \

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

Thanks @hahewlet , that makes sense and I will try that next (I was about to build it again).

I am used to telling Ansible which private key to use, but could not find it documented in this project how to pass the key information.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

This looks promising, it's running some of the jump-server tasks:

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ tail -f docker.ansible.log
Wednesday 05 May 2021  18:03:58 +0000 (0:00:00.038)       0:00:02.874 *********

TASK [common : Set DEPLOY_DIR] *************************************************
ok: [localhost]
Wednesday 05 May 2021  18:03:58 +0000 (0:00:00.041)       0:00:02.916 *********
Wednesday 05 May 2021  18:03:58 +0000 (0:00:00.071)       0:00:02.987 *********

TASK [jump-server : jump-server - add host] ************************************
changed: [localhost]
Wednesday 05 May 2021  18:03:58 +0000 (0:00:00.062)       0:00:03.049 *********

from viya4-deployment.

thpang avatar thpang commented on July 4, 2024

One thing you have here is a chicken and egg scenario. The Jump server does not exist until after the IAC code base has completed. If you're using the Jump server as your box for Deployment why would you not use the box used to create the IAC in the first place? And if you're using a 3rd VM you've stood up in the cloud provider that would be the CIDR value you'd add to the cidr block in your tfvars file. Asking for clarity.

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

I believe this may be resolved, but it should stay open a tad longer while I verify.

In the event that another admin does not have all prereq's (e.g. docker) on the host that they run the IAC from, and that admin performs the remaining deployment steps from the newly created jump-server, there's 2 things you need to account for:

  1. you will need to add a network security rule allowed the jump-server public IP to ssh back to itself (you can either do this programmatically with Az CLI, with TF, or manually in the portal)

  2. you need to copy (scp / sFTP / etc) the private key over to the jump-server and then add the line @hahewlet noted above into the Docker command for the deployment.

With these 2 things in place, I seem to have the correct structure that Viya needs to startup correctly:

# for example:
jumpuser@mTES-TT-jump-vm:/viya-share/mtes-tt$ ll
total 24
drwxrwxrwx 6 nobody nogroup 4096 May  5 19:57 ./
drwxrwxrwx 5 nobody nogroup 4096 May  5 19:57 ../
drwxrwxrwx 2 nobody nogroup 4096 May  5 19:57 astores/
drwxrwxrwx 2 nobody nogroup 4096 May  5 19:57 bin/
drwxrwxrwx 2 nobody nogroup 4096 May  5 19:57 data/
drwxrwxrwx 2 nobody nogroup 4096 May  5 19:57 homes/

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

Also, confirmation from the Docker / Ansible log:

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ grep jump-server docker.ansible.log
TASK [jump-server : jump-server - add host] ************************************
TASK [jump-server : jump-server - lookup groups] *******************************
TASK [jump-server : jumps-server - group nogroup] ******************************
TASK [jump-server : jump-server - create folders] ******************************
jump-server : jump-server - lookup groups ------------------------------- 0.97s
jump-server : jump-server - create folders ------------------------------ 0.83s
jumpuser@mTES-TT-jump-vm:~/viya4-deployment$

from viya4-deployment.

andybouts avatar andybouts commented on July 4, 2024

The CAS pods are started an healthy, closing this issue for today, can revisit tomorrow is something else appears awry:

jumpuser@mTES-TT-jump-vm:~/viya4-deployment$ kk get pods | grep cas-s
sas-cas-server-default-controller                               3/3     Running                 0          35m
sas-cas-server-default-worker-0                                 3/3     Running                 0          35m
sas-cas-server-default-worker-1                                 3/3     Running                 0          35m
sas-cas-server-default-worker-2                                 3/3     Running                 0          35m
jumpuser@mTES-TT-jump-vm:~/viya4-deployment$

from viya4-deployment.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.