Giter Site home page Giter Site logo

Comments (56)

rkomandu avatar rkomandu commented on September 27, 2024 1

yes it is reporting as that only.

Deleting the account 2k in a loop with sleep 10 interval and observed multiple times the Failed to execute command


Account s3user-12394 deleted successfully
Account s3user-12395 deleted successfully
Account s3user-12396 deleted successfully
Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request
Account s3user-12402 deleted successfully
Account s3user-12403 deleted successfully


Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request

Now let us look at the noobaa.log for the 12397 and 12399 deletion and for the 12398, it had come as "NSFS Manage command"

in noobaa_event.log

{"timestamp":"2024-03-22T00:26:20.271279-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Accou
nt deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3us
er-12397\"},\"pid\":3271588}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[3271588]:","source":"node"}

Now if we check at the above timestamp in the below noobaa.log for the s3user-12397, s3user-12398 and s3user-12399 account deletion


2024-03-22T00:26:20.258310-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:20.259054-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.259709-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:20.259808-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:20.260664-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m20ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json File {} ESC[33m21ESC[39m
2024-03-22T00:26:20.264526-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.464058-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:20.464601-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
...
...
2024-03-22T00:26:20.867639-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867906-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867943-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867968-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.956759-07:00 rkomandu-ip-cls-x-worker1 node[3271531]: [/3271531] ESC[36m   [L1]ESC[39m core.cmd.health:: Error: Config file path should be a valid path /mnt/ces-shared-root/ces/s3-config/buckets/bucket-12890.json [Error: No such file or directory] { code: ESC[32m'ENOENT'ESC[39m }
2024-03-22T00:26:21.309702-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.310118-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310274-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310301-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310324-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.493171-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.493778-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m

..

2024-03-22T00:26:28.675289-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-03-22T00:26:28.675459-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: closing fd ...
2024-03-22T00:26:28.675954-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-03-22T00:26:28.677312-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/buckets
2024-03-22T00:26:28.678024-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:28.678507-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/access_keys
**2024-03-22T00:26:28.944651-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: NSFS Manage command: exit on error Error: No such file or directory**
2024-03-22T00:26:30.464455-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:30.464796-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ^[[33m184549376^[[39m waiting_value ^[[33m0^[[39m buffers length ^[[33m0^[[39m



2024-03-22T00:26:37.530367-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m  [LOG]ESC[39m CONSOLE:: generate_entropy: adding entropy: dd if=/dev/vda bs=1048576 count=32 skip=255731 | md5sum
2024-03-22T00:26:37.538563-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:37.539654-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:37.540691-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:37.540797-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:37.541637-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m21ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json File {} ESC[33m23ESC[39m
2024-03-22T00:26:37.552608-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:40.464133-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:40.464511-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464650-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464676-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464700-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/30629

where as the account is still there

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12398
s3user-12398            /mnt/fs1/s3user-12398-dir                       12398   12398

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12399

[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config
s3-config/                s3-config-backup.tar.bz2
[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
-rw------- 1 root root 376 Mar 21 09:12 /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# less /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# cat /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
{"name":"s3user-12398","creation_date":"2024-03-21T16:12:04.264Z","access_keys":[{"access_key":"EdOlObT16fevKyqcDIY0","secret_key":"73Iako95fyfdaMvFySHxq//BfimhW3+D6LLG+4rv"}],"nsfs_account_config":{"uid":12398,"gid":12398,"new_buckets_path":"/mnt/fs1/s3user-12398-dir","fs_backend":"GPFS"},"email":"s3user-12398","allow_bucket_creation":true,"_id":"65fc5c54a19a460e1e094ab4"}

noobaa.log is about 700MB,

omandu-ip-cls-x-worker1 log]# ls -lh /var/log/noobaa.log
-rw-r--r-- 1 root root 733M Mar 22 01:37 /var/log/noobaa.log

for now, you can check from above logs @naveenpaul1 . uploading in GH is not possible and box also has this restriction. I need to delete the older noobaa.log content and have for the current day only to see if it reduces any size. However you can continue from above log snippets i think

from noobaa-core.

shirady avatar shirady commented on September 27, 2024 1

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account).
If you can attach this please so we will have the full information.

cc: @romayalon

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024 1

@rkomandu @shirady I posted yesterday a fix for the encryption issue, according to @ramya-c3 it's working now, @rkomandu can you try to reproduce with the new code and share if you still see the original internal error now?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024 1

@shirady that is what is mentioned in the slack thread with the noobaa-cli status and list, as shown above

from noobaa-core.

naveenpaul1 avatar naveenpaul1 commented on September 27, 2024

@rkomandu Please run the test after updating the log level, and share the actual error log.
you can search for the staring NSFS Manage command: exit on error

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

Hi @rkomandu,

  1. From what I understand you are creating accounts in a loop and then deleting the accounts in a loop.
    In the details of the issue we can see the example of the account s3user-12398 that was not deleted:
    The server encountered an internal error. Please retry the request
    Since it was not deleted, we can see the config file of it.

  2. I would like to add to the printings of the Internal Error in this issue, it might look like this:
    stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":

  3. I was not able to reproduce this, I created and deleted over 1,000 accounts with sleep of 1,000 milliseconds.
    You can see that I have them all created (and the directory of the accounts config is empty from any config after this).

➜  grep -i AccountCreated run_test_2048.txt | wc -l
    2048
➜  grep -i AccountDeleted run_test_2048.txt | wc -l
    2048

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

it is a recent noobaa 5.15.3 d/s build of 0514


[root@c83f1-app2 ~]# mms3 account delete s3user5001
Failed to execute command for Account s3user5001: The server encountered an internal error. Please retry the request
[root@c83f1-app2 ~]# mms3 account list s3user5001
Failed to list Account s3user5001: Account does not exist
[root@c83f1-app2 ~]# rpm -qa |grep mms3
gpfs.mms3-5.2.1-0.240521.104312.el9.x86_64

[root@c83f1-app2 ~]# mmhealth node show CES S3 |grep s3user5001
S3                 DEGRADED      16 hours ago      s3_access_denied(s3user5001), s3_storage_not_exist(newbucket-5001)
  s3user5001       DEGRADED      16 hours ago      s3_access_denied(s3user5001)
s3_access_denied      s3user5001      WARNING     16 hours ago         Account does not have access to the storage path mentioned in schema.

it shows the account exists though via mmhealth 

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady

i am deleting all the accounts and buckets via mms3 cli and came across the similar problem as shown above

bucket delete passed , account delete failed with Internal error

Bucket bucket-10335 deleted successfully
Failed to execute command for Account s3user-10335: The server encountered an internal error. Please retry the request
10335 is done

noobaa cli for status and list for the account is shown below

noobaa-cli account status --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:13.218253 [PID-3055247/TID-3055247] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:13.218355 [PID-3055247/TID-3055247] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:13.548 [/3055247]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055247) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:13.579 [/3055247]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
  "response": {
    "code": "AccountStatus",
    "reply": {
      "_id": "664e0aa633201f0668d9ac6b",
      "name": "s3user-10335",
      "email": "s3user-10335",
      "creation_date": "2024-05-22T15:09:26.881Z",
      "nsfs_account_config": {
        "uid": 10335,
        "gid": 10335,
        "new_buckets_path": "/gpfs/remote_fvt_fs/s3user-10335-dir",
        "fs_backend": "GPFS"
      },
      "allow_bucket_creation": true,
      "master_key_id": "664e04d7f8bda475703e483b"
    }
  }
}

]# noobaa-cli account list --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:23.493482 [PID-3055620/TID-3055620] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:23.493593 [PID-3055620/TID-3055620] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:23.825 [/3055620]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055620) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:23.856 [/3055620]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:23.862 [/3055620]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 8:55:24.452 [/3055620]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=12
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58\n    at async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)\n    at async Promise.all (index 9)\n    at async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"
  }
}

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024
# noobaa-cli account delete --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 10:19:24.584673 [PID-3183486/TID-3183486] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 10:19:24.584759 [PID-3183486/TID-3183486] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 10:19:24.904 [/3183486]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3183486) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 10:19:24.933 [/3183486]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 10:19:25.531 [/3183486]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=15
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async fetch_existing_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:416:30)\n    at async fetch_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:385:16)\n    at async account_management (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:336:18)\n    at async main (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:112:13)"
  }
}

Infact i would say, that all account deletes have the same problem. this is a blocker and a high priority one for us now 


from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

Every account delete has an issue now

Bucket bucket-10001 deleted successfully
Failed to execute command for Account s3user-10001: The server encountered an internal error. Please retry the request
10001 is done
Bucket bucket-10002 deleted successfully
Failed to execute command for Account s3user-10002: The server encountered an internal error. Please retry the request
10002 is done
Bucket bucket-10003 deleted successfully
Failed to execute command for Account s3user-10003: The server encountered an internal error. Please retry the request
10003 is done
Bucket bucket-10004 deleted successfully
Failed to execute command for Account s3user-10004: The server encountered an internal error. Please retry the request
10004 is done
Bucket bucket-10005 deleted successfully
Failed to execute command for Account s3user-10005: The server encountered an internal error. Please retry the request
10005 is done
Bucket bucket-10006 deleted successfully
Failed to execute command for Account s3user-10006: The server encountered an internal error. Please retry the request
10006 is done
Bucket bucket-10007 deleted successfully
Failed to execute command for Account s3user-10007: The server encountered an internal error. Please retry the request
10007 is done
Bucket bucket-10008 deleted successfully
Failed to execute command for Account s3user-10008: The server encountered an internal error. Please retry the request
10008 is done
Bucket bucket-10009 deleted successfully
Failed to execute command for Account s3user-10009: The server encountered an internal error. Please retry the request
10009 is done
Bucket bucket-10010 deleted successfully
Failed to execute command for Account s3user-10010: The server encountered an internal error. Please retry the request
10010 is done
Bucket bucket-10011 deleted successfully
Failed to execute command for Account s3user-10011: The server encountered an internal error. Please retry the request
10011 is done
Bucket bucket-10012 deleted successfully
Failed to execute command for Account s3user-10012: The server encountered an internal error. Please retry the request
10012 is done

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account). If you can attach this please so we will have the full information.

cc: @romayalon

@shirady , this "internal server" is the latest problem w/r/t master_key as posted in above comments. However when the defect was opened in Mar 3rd week, there is no Enc enabled in d/s ODF 4.15.0. At that time the error is same as now, but don't have the noobaa cli command output

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

this ENC is a new problem with ODF 4.15.3, however the main problem could still be there, as deleting in a loop is what is being in all of my first few updates

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

for the latest problem @romayalon , it is the master_key problem in the CCR. I had updated in the RTC defect https://jazz07.rchland.ibm.com:21443/jazz/web/projects/GPFS#action=com.ibm.team.workitem.viewWorkItem&id=330894
as comment 2

@shirady
the master_key problem is solved as shown below

for example the CCR master_key is

  1  gpfs.ganesha.statdargs.conf
        1  idmapd.conf
       18  _ces_s3.master_keys

# mmccr fget _ces_s3.master_keys /tmp/_ces_s3.master_keys
fget:18

# cat /tmp/_ces_s3.master_keys
{"timestemp":1716479808994,"active_master_key":"664f6740471f94033f84c97b","master_keys_by_id":{"664f6740471f94033f84c97b":{"id":"664f6740471f94033f84c97b","cipher_key":"7uboJzYGucsCimII30BbFCUdgS3zBv/oobwg9TXG0V8=","cipher_iv":"uipRG4v0jVQg8jnR2pcxsA==","encryption_type":"aes-256-gcm"}}}[root@c83f1-app3 s3]#


 cp /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json /tmp/s3user-10001.json

less /tmp/s3user-10001.json

{"_id":"664dffe41b9d01307a23e986","name":"s3user-10001","email":"s3user-10001","creation_date":"2024-05-22T14:23:32.560Z","access_keys":[{"access_key":"8ca0bWSRFCScwdd4s8JJ","encrypted_secret_key":"k07Bhd5oQ2tphVljeV9Y6pUIjyqQLFoEG9nHqFTWTHmwagrgEh/fBg=="}],"nsfs_account_config":{"uid":10001,"gid":10001,"new_buckets_path":"/gpfs/remote_fvt_fs/s3user-10001-dir","fs_backend":"GPFS"},"allow_bucket_creation":true,"master_key_id":"664dd5d4b9b23ffb8378c6da"}




edit the s3user-10001.json file with the master_key of the CCR as per the /tmp/_ces_s3.master_keys
   
 vi /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json (update the master_key of CCR) 

 mms3 account delete s3user-10001 --debug
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-10001 2>/dev/null
Execute command failed
info:
{
  "response": {
    "code": "AccountDeleted"
  }
}

error: <nil>
Account s3user-10001 deleted successfully
it showed the command deleted successfully

We need to get this fixed ASAP, otherwise the 5.15.3-0415 build with Enc is not going to work for basic functionality of account delete

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

@rkomandu
According to the timestamp of the account, this account (creation_date":"2024-05-22T14:23:32.560) is older than the master key ("timestemp":1716479808994 = Thursday, May 23, 2024 3:56:48.994 PM), this is not a valid situation, you had an old account in this config_dir, this is why you see the missing master key error.

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@romayalon ,
the updates that are made for the below comments are occurring on the 5.15.3 d/s build, when there are 2K accounts and 2K buckets are created. The master_key are different in the *.json files for many accounts, why they are different not sure, where Ramya is analyzing why the master_keys are regenerated when "we ran in a loop".
Bottom line as the account are unable to delete and this is happening due to the master_key value is different

#7920 (comment)
#7920 (comment)
#7920 (comment)

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

@rkomandu @shirady any news about this one?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@romayalon ,

as on Physical machine (BM) , with your 30th May provided RPM, we have recreated the delete account problem as 47 when executing concurrently from 3 nodes of 1K each , hit the error.

https://ibmandredhatguests.slack.com/archives/C015Z7SDWQ0/p1717130490257419?thread_ts=1716826449.194409&cid=C015Z7SDWQ0

@Ramya-c has taken up further from Friday , discussed with Guy and she is making experiments.

Unless that is sorted our first, this error even though related would be masked IMO. If you understand and address from code flow, then that is the next move, otherwise we need to wait and try when we have those extra cycles. Priority is to sort that problem

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

thanks @rkomandu for the update, waiting for the logs of the concurrency tests, please keep us updated with the information you capture about this issue.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

Hi @romayalon @rkomandu
I will share that I tried to run a concurrency test using the bash loop on my machine and I will attach the steps (so you can comment on it).

The steps are described for a few accounts so you can test in, and then change the numbers of the iterations - for example, I changed it from for i in {1501..1503} to for i in {1501..1599}.

Requirements:

  1. (optional) Create the config root (so I'm sure there are no accounts from the previous run):
sudo mkdir -p /tmp/my-config
sudo chmod 777 /tmp/my-config
  1.  Create and give permission to the new-buckets-path:
for i in {1501..1503}
do
   mkdir -p /tmp/nsfs_root_s3user/s3user-$i; chmod 777 /tmp/nsfs_root_s3user/s3user-$i;
done

 Steps:
3. Create accounts:

for i in {1501..1503}
do
   sudo node src/cmd/manage_nsfs account add --name s3user-$i --uid $i --gid $i --new_buckets_path /tmp/nsfs_root_s3user/s3user-$i --config_root /tmp/my-config
done

You can check you can see the accounts config: sudo ls -al /tmp/my-config/accounts

  1. Delete accounts:
for i in {1501..1503}
do
   sudo node src/cmd/manage_nsfs account delete --name s3user-$i --config_root /tmp/my-config
done

You can check if you can not see any account config: sudo ls -al /tmp/my-config/accounts

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

Thanks @shirady for checking it, @rkomandu please keep us updated if the root cause of the reproduction Ramya sees is related to encryption or not (and might be a reproduction of this issue)

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@romayalon , as per ramya posting in the channel here https://ibmandredhatguests.slack.com/archives/C015Z7SDWQ0/p1717495547911289?thread_ts=1716826449.194409&cid=C015Z7SDWQ0, from here to the next 4 comments which shows "noobaa-cli with status and list" didn't show any master_key problem as was the case previously (i.e before 30th May build). So this delete error is occurring and now can be related to this defect of why it is failing.

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

@rkomandu I agree, we just need to validate it using a print of the full error object/ stderr, thank you

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu we need the details of the error to investigate.
stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>

Did you try to delete one of the accounts (not in the loop) and see the error?

from noobaa-core.

ramya-c3 avatar ramya-c3 commented on September 27, 2024

Name New Buckets Path Uid Gid User


s3user-31407 /gpfs/remote_fvt_fs/s3user-31407-dir 31407 31407 None
s3user-31038 /gpfs/remote_fvt_fs/s3user-31038-dir 31038 31038 None
s3user-31484 /gpfs/remote_fvt_fs/s3user-31484-dir 31484 31484 None
s3user-31037 /gpfs/remote_fvt_fs/s3user-31037-dir 31037 31037 None
s3user-32316 /gpfs/remote_fvt_fs/s3user-32316-dir 32316 32316 None
s3user-31402 /gpfs/remote_fvt_fs/s3user-31402-dir 31402 31402 None
s3user-32466 /gpfs/remote_fvt_fs/s3user-32466-dir 32466 32466 None
s3user-32171 /gpfs/remote_fvt_fs/s3user-32171-dir 32171 32171 None
s3user-31320 /gpfs/remote_fvt_fs/s3user-31320-dir 31320 31320 None
s3user-31475 /gpfs/remote_fvt_fs/s3user-31475-dir 31475 31475 None
s3user-70749 /gpfs/remote_fvt_fs/s3user-70749-dir 70749 70749 None
s3user-31576 /gpfs/remote_fvt_fs/s3user-31576-dir 31576 31576 None
s3user-31312 /gpfs/remote_fvt_fs/s3user-31312-dir 31312 31312 None
s3user-32180 /gpfs/remote_fvt_fs/s3user-32180-dir 32180 32180 None
s3user-32111 /gpfs/remote_fvt_fs/s3user-32111-dir 32111 32111 None
s3user-31371 /gpfs/remote_fvt_fs/s3user-31371-dir 31371 31371 None
s3user-32250 /gpfs/remote_fvt_fs/s3user-32250-dir 32250 32250 None
s3user-32003 /gpfs/remote_fvt_fs/s3user-32003-dir 32003 32003 None
s3user-31372 /gpfs/remote_fvt_fs/s3user-31372-dir 31372 31372 None
s3user-31041 /gpfs/remote_fvt_fs/s3user-31041-dir 31041 31041 None
s3user-31040 /gpfs/remote_fvt_fs/s3user-31040-dir 31040 31040 None
s3user-70698 /gpfs/remote_fvt_fs/s3user-70698-dir 70698 70698 None
s3user-32120 /gpfs/remote_fvt_fs/s3user-32120-dir 32120 32120 None
s3user-31207 /gpfs/remote_fvt_fs/s3user-31207-dir 31207 31207 None
[root@c83f1-app2 ~]# noobaa-cli account delete --name s3user-31407
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-06-05 01:07:41.693221 [PID-2987141/TID-2987141] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-06-05 01:07:41.693315 [PID-2987141/TID-2987141] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
Jun-5 1:07:42.024 [/2987141] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:2987141) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use node --trace-warnings ... to show where the warning was created)
Jun-5 1:07:42.054 [/2987141] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: closing fd ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
"response": {
"code": "AccountDeleted"
}
}

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@ramya-c3 would you please add an explanation?
The list above is accounts that were not deleted in a loop and then you successfully deleted one of them?

Edit:
As I understand from Ramya and the comment above:

  1. They tried to delete the accounts in a loop - the list is of accounts that had InternalError were not deleted
    One of them is s3user-31407
  2. They tried to delete one of the accounts (not in a loop) s3user-31407 - it was successful:
    noobaa-cli account delete --name s3user-31407
{
"response": {
"code": "AccountDeleted"
}
  1. So the next step is to send up the output of the internal error during the loop.
    The structure is mentioned in the comment above:

stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>

@ramya-c3 @rkomandu @romayalon
WDYT?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , what you mentioned in your above comment as step 1, 2 is correct.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu as I understand in the loop you are using the mms3 command and you need to change the code in order to see the error details, right?

I would also suggest to run the same loop with noobaa-cli and see if there are any issues (as mentioned in comment above I was not able to reproduce it).

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , Dev team will have to change that in mms3 and then only this can be tried.

noobaa-cli i don't think we will use it here

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

@rkomandu any news?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@romayalon , for mms3 Dev has to do the code and try it out.

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@romayalon @shirady

Ran with the (-d) option of the mms3 CLI that is introduced in the 521 and could see the following in the error log of debug message

around 46 accounts couldn't be deleted and all of them are shown with the same error

Bucket bucket-32277 deleted successfully
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-32277 2>/dev/null
Execute command failed
info:
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: No such file or directory"
  }
}

error: exit status 1
Failed to execute command for Account s3user-32277: The server encountered an internal error. Please retry the request
Error Code: InternalError


We have come across this error on a 1K run concurrently from all 3 nodes.

@ramya-c3 FYI

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

Hi @rkomandu ,
From what I understand you are trying to delete 1,000 accounts on a loop, and in some cases you see the InternalError that its details are Error: No such file or directory.

In the flow of delete_account we delete the config file (and unlink the access key symbolic link):

async function delete_account(data) {
await validate_account_args(data, ACTIONS.DELETE);
await verify_delete_account(config_root_backend, buckets_dir_path, data.name);
const fs_context = native_fs_utils.get_process_fs_context(config_root_backend);
const account_config_path = get_config_file_path(accounts_dir_path, data.name);
await native_fs_utils.delete_config_file(fs_context, accounts_dir_path, account_config_path);
if (!has_access_keys(data.access_keys)) {
const access_key_config_path = get_symlink_config_file_path(access_keys_dir_path, data.access_keys[0].access_key);
await nb_native().fs.unlink(fs_context, access_key_config_path);
}
write_stdout_response(ManageCLIResponse.AccountDeleted, '', {account: data.name});
}

I want to make sure that this is still true - When you run again the account delete --name s3user-32277 the account is deleted successfully?

If I add dbg.error or console.log printings for debugging in the manage_nsfs file will you be able to see them? so we will better understand the exact line that causes it.

from noobaa-core.

ramya-c3 avatar ramya-c3 commented on September 27, 2024

@shirady Add the required debug statements in single rpm and share it with Ravi he will validate and get back on the results. It should not be recurring log addition since it takes too much time to reproduce this issue

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@ramya-c3 @rkomandu, thanks.

  • I added dbg.error printings in the flow of account deletion in a branch (link if you want to see it - here)
  • Which RHEL version are you using with the RPM? (I mean which base image: centos:9 / centos:8).

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , we are using RH 9.x rpm on the BM cluster. Secondly, HA capability functionality is tried on the physical machine , once it is done, we can retry with your fix.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu thanks.
I don't have a fix, it's the code with dbg.error printings to help us understand the place in the code where we get the error in the flow.

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

could you try in your env, with these dbg messages, if this can be recreated ?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady has provided the 5.17 master rpm and it is not able to start
noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm

systemctl status noobaa |grep -i Active"
c83f1-app4-hs200:       Active: activating (auto-restart) (Result: exit-code) since Wed 2024-06-26 06:54:21 EDT; 195ms ago
c83f1-app3-hs200:       Active: activating (start-pre) since Wed 2024-06-26 06:54:21 EDT; 672ms ago
c83f1-app2-hs200:       Active: activating (auto-restart) (Result: exit-code) since Wed 2024-06-26 06:54:20 EDT; 1s ago
systemctl status noobaa
● noobaa.service - The NooBaa service.
     Loaded: loaded (/usr/lib/systemd/system/noobaa.service; enabled; preset: disabled)
     Active: active (running) since Wed 2024-06-26 06:01:21 EDT; 465ms ago
    Process: 1522261 ExecStartPre=/usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/upgrade/upgrade_manager.js --nsfs true --upgrade_scripts_dir /us>
   Main PID: 1522272 (node)
      Tasks: 7 (limit: 3297520)
     Memory: 36.3M
        CPU: 615ms
     CGroup: /system.slice/noobaa.service
             └─1522272 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js

Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261]  [WARN] CONSOLE:: config.load_nsfs_nc_config could not find config.json... ski>
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261]  [WARN] core.util.json_utils:: could not find json file /ibm/cesSharedRoot/ces/s3-config/system.j>
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261]  [WARN] core.util.json_utils:: could not find json file /ibm/cesSharedRoot/ces>
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261]   [LOG] UPGRADE:: system does not exist. no need for an upgrade
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261]   [LOG] UPGRADE:: system does not exist. no need for an upgrade
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261]    [L0] UPGRADE:: upgrade completed successfully!
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.944 [Upgrade/1522261]    [L0] UPGRADE:: upgrade completed successfully!
Jun 26 06:01:21 c83f1-app2 systemd[1]: Started The NooBaa service..

it has to create the s3-config and then system.json would be created in the cesSharedroot, which is not done

please talk to @romayalon or @naveenpaul1 and take their help to fix the noobaa start issue

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

hi @shirady , ran from 3 concurrent nodes (1K accounts and buckets each), and couldn't recreate the problem on the Physical machine.

noobaa rpm
rpm -qi noobaa-core-5.17.0-20240613.el9.x86_64
Name : noobaa-core
Version : 5.17.0
Release : 20240613.el9
Architecture: x86_64
Install Date: Thu 27 Jun 2024 04:26:17 AM EDT
Group : Unspecified
Size : 431654776
License : Apache-2.0
Signature : (none)
Source RPM : noobaa-core-5.17.0-20240613.el9.src.rpm

rpm: noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm

from noobaa-core.

ramya-c3 avatar ramya-c3 commented on September 27, 2024

@shirady It may be because the debug statements taking time to print and the deletion is getting some delay. Remove the debug statements and try it will be reproduced

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

Hi @rkomandu @ramya-c3
As you can see I added the printings in the RPM of June 13, so you can use the RPM from the same date and try it out.
Please run: aws s3 ls s3://noobaa-core-rpms | sort | grep 20240613.

from noobaa-core.

ramya-c3 avatar ramya-c3 commented on September 27, 2024

@shirady We used same date rpm and we tried we are unable to simulate the issue. but with the non log/ old build we are able to because the log prints are providing little bit time for the delete option to get the file access

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@ramya-c3 @rkomandu
I want to make sure I understand:

  • You used RPM noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm and the issue was not reproduced.
  • You used RPM of the same date noobaa-core-5.17.0-20240613-master.el9.x86_64.rpm and you reproduced the issue?

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , as mentioned (#7920 (comment)) have tried with "noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm" and didn't recreate the problem.

we haven't tried with master branch for now. Request was to try with the self build rpm and we got this created with the 5.15.3 rpms as per the defect

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu I suggest either testing it on master or branch 5.15.

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , you had given the 5.17 as posted below with debug statements, which was tried. Secondly the issue has been recreated in 5.15.3 builds if you can check from above.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu I understand what you wrote in the comment below:

  1. You used RPM noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm and the issue was not reproduced.
  2. You used RPM of 5.15.3 and you still saw the issue - I suggested that you try to reproduce it on a relevant branch, either master or 5.15.
    5.15.3 is an older version: stage_5.15.4 was merged into 5.15 branch (Merge PR - 8143 as @romayalon posted.)

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady

Ran with the master branch build on 3 nodes concurrently with 2K, 2K and 1K accounts and buckets (creation and deletion), didn't come across any problem with the same. Now am not sure, whether the 5.17 version fixed it or your fix in the current one is really working.

rpm -qa|grep noobaa
noobaa-core-5.17.0-20240704.el9.x86_64

@ramya-c3 @romayalon

from noobaa-core.

romayalon avatar romayalon commented on September 27, 2024

@rkomandu @shirady We will need to know if this fix needs to be backported or not, please check if this happens on 5.15 + Shira's fix.

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , please try it on 5.15

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , did you get any recreate on the physical machine with the 5.15 master branch ?

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu I tested it on 5.15.4 + fix and I didn't see the issue was reproduced.
I run the exsiting scripts and I saw the accounts and buckets created and then deleted.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

Issue Details Summary

The issue was that in a loop that created accounts and buckets, and then a loop of deleting buckets and their accounts - a couple of accounts were not deleted in the loop and we saw the error InternalError.

@romayalon @rkomandu
I will try to summarize what we did on this issue:

  1. In the master branch (version 5.17) we merged the fix (#8183 ) - the issue was not reproduced.
  2. In branch 5.15.4 + fix (cherry-pick) - the issue was not reproduced.
  3. In branch 5.17 + printings - the issue was not reproduced (due to probably changing in timing).
  4. In branch 5.15.4 without any change (noobaa-core-5.15.4-20240710-5.15.el9.x86_64.rpm) - the issue was reproduced (twice):
  • using a script with mms3 commands.
  • using a script with noobaa-cli commands.

According to what it seems from the logs the issue has appeared since before we deleted the account we check if the account has buckets, during this time after we read the entries (in a loop we use nb_native().fs.readFile), if one of the entries was deleted (ENOENT, No such file or directory) now we will continue (will catch this error).

Additional Information

The script is running from 3 different nodes.

The script that was run using mms3 commands:

#!/bin/sh
for i in `seq 10000 10999`; do mms3 account create s3user-$i --uid $i --gid $i --newBucketsPath "/gpfs/remote_fvt_fs/s3user-$i-dir"; echo $i is done; mms3 bucket create bucket-$i --accountName s3user-$i --filesystemPath /gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i ; done
sleep 60
for i in `seq 10000 10999`; do mms3  bucket delete bucket-$i ; mms3 account delete s3user-$i -d; echo $i is done; done

Example of a failure:

Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-15620 2>/dev/null
Execution of command failed:
Info: {
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "cause": "Error: No such file or directory"
  }
}

Error: exit status 1
Failed to execute command for Account s3user-15620: The server encountered an internal error. Please retry the request
Error Code: InternalError
  Error Details: <nil>

The script that was run using noobaa-cli commands:

#!/bin/sh
for i in `seq 21000 21999`; do mkdir "/gpfs/remote_fvt_fs/s3user-$i-dir" ; chmod 777 "/gpfs/remote_fvt_fs/s3user-$i-dir" ; noobaa-cli account add --name s3user-$i --uid $i --gid $i --new_buckets_path "/gpfs/remote_fvt_fs/s3user-$i-dir" ; echo $i is done;  mkdir "/gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i" ; chmod 777 "/gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i" ; noobaa-cli bucket add --name bucket-$i --owner s3user-$i --path /gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i ; done
sleep 60
for i in `seq 21000 21999`; do noobaa-cli bucket delete --name bucket-$i ; noobaa-cli account delete --name s3user-$i ; echo $i is done; done

Example of an error on read file in a timestamp when the deletion loop started:

2024-07-14 07:30:10.983078 [PID-1555502/TID-1555502] [L1] FS::FSWorker::OnError: Readfile _path=/ibm/cesSharedRoot/ces/s3-config/buckets/bucket-31060.json  error.Message()=2024-07-14 07:30:10.983094 [PID-1555502/TID-1555523] 2024-07-14 07:30:10.983100 [PID-1555502/TID-1555524] _uid=No such file or directory

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

thank you for the detailed analysis @shirady in your comment #7920 (comment)

So you will fix this into 5.15.5 ? not sure why the "Validation" label is now added and defect is closed.

from noobaa-core.

shirady avatar shirady commented on September 27, 2024

@rkomandu I closed the issue because we merged the fix.
I added the "Validation" label because you might want to check it in the branch that this fix will be backported to (planned to 5.16.1).

from noobaa-core.

rkomandu avatar rkomandu commented on September 27, 2024

@shirady , I couldn't recreate in the 5.17 (master branch), 5.15.4 you could recreate as well , since we are going with this as our release for our product, it would be good to have the fix in the upcoming 5.15.5.

In branch 5.15.4 without any change

Now coming to your point in 5.16.1 fix, am not sure , whether we would take that version for our future release

Adding concerned @madhuthorat
we need to see what versions are currently outlined for our future releases. which one are we taking 4.16 or 4.17 etc. Let us discuss internally

from noobaa-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.