Comments (56)
yes it is reporting as that only.
Deleting the account 2k in a loop with sleep 10 interval and observed multiple times the Failed to execute command
Account s3user-12394 deleted successfully
Account s3user-12395 deleted successfully
Account s3user-12396 deleted successfully
Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request
Account s3user-12402 deleted successfully
Account s3user-12403 deleted successfully
Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request
Now let us look at the noobaa.log for the 12397 and 12399 deletion and for the 12398, it had come as "NSFS Manage command"
in noobaa_event.log
{"timestamp":"2024-03-22T00:26:20.271279-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Accou
nt deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3us
er-12397\"},\"pid\":3271588}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[3271588]:","source":"node"}
Now if we check at the above timestamp in the below noobaa.log for the s3user-12397, s3user-12398 and s3user-12399 account deletion
2024-03-22T00:26:20.258310-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:20.259054-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.259709-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:20.259808-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:20.260664-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m20ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json File {} ESC[33m21ESC[39m
2024-03-22T00:26:20.264526-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.464058-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:20.464601-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
...
...
2024-03-22T00:26:20.867639-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867906-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867943-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867968-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.956759-07:00 rkomandu-ip-cls-x-worker1 node[3271531]: [/3271531] ESC[36m [L1]ESC[39m core.cmd.health:: Error: Config file path should be a valid path /mnt/ces-shared-root/ces/s3-config/buckets/bucket-12890.json [Error: No such file or directory] { code: ESC[32m'ENOENT'ESC[39m }
2024-03-22T00:26:21.309702-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.310118-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310274-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310301-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310324-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.493171-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.493778-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
..
2024-03-22T00:26:28.675289-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [LOG]^[[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-03-22T00:26:28.675459-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [LOG]^[[39m CONSOLE:: read_rand_seed: closing fd ...
2024-03-22T00:26:28.675954-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [LOG]^[[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-03-22T00:26:28.677312-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/buckets
2024-03-22T00:26:28.678024-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:28.678507-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/access_keys
**2024-03-22T00:26:28.944651-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m [L1]^[[39m core.cmd.manage_nsfs:: NSFS Manage command: exit on error Error: No such file or directory**
2024-03-22T00:26:30.464455-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m [L1]^[[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:30.464796-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m [L1]^[[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ^[[33m184549376^[[39m waiting_value ^[[33m0^[[39m buffers length ^[[33m0^[[39m
2024-03-22T00:26:37.530367-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [LOG]ESC[39m CONSOLE:: generate_entropy: adding entropy: dd if=/dev/vda bs=1048576 count=32 skip=255731 | md5sum
2024-03-22T00:26:37.538563-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:37.539654-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:37.540691-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:37.540797-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:37.541637-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m21ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json File {} ESC[33m23ESC[39m
2024-03-22T00:26:37.552608-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:40.464133-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:40.464511-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464650-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464676-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464700-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/30629
where as the account is still there
[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12398
s3user-12398 /mnt/fs1/s3user-12398-dir 12398 12398
[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12399
[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config
s3-config/ s3-config-backup.tar.bz2
[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
-rw------- 1 root root 376 Mar 21 09:12 /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# less /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# cat /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
{"name":"s3user-12398","creation_date":"2024-03-21T16:12:04.264Z","access_keys":[{"access_key":"EdOlObT16fevKyqcDIY0","secret_key":"73Iako95fyfdaMvFySHxq//BfimhW3+D6LLG+4rv"}],"nsfs_account_config":{"uid":12398,"gid":12398,"new_buckets_path":"/mnt/fs1/s3user-12398-dir","fs_backend":"GPFS"},"email":"s3user-12398","allow_bucket_creation":true,"_id":"65fc5c54a19a460e1e094ab4"}
noobaa.log is about 700MB,
omandu-ip-cls-x-worker1 log]# ls -lh /var/log/noobaa.log
-rw-r--r-- 1 root root 733M Mar 22 01:37 /var/log/noobaa.log
for now, you can check from above logs @naveenpaul1 . uploading in GH is not possible and box also has this restriction. I need to delete the older noobaa.log content and have for the current day only to see if it reduces any size. However you can continue from above log snippets i think
from noobaa-core.
Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id
), full edited message:
"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"
This internal error printing came from the account list.
I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account).
If you can attach this please so we will have the full information.
cc: @romayalon
from noobaa-core.
@rkomandu @shirady I posted yesterday a fix for the encryption issue, according to @ramya-c3 it's working now, @rkomandu can you try to reproduce with the new code and share if you still see the original internal error now?
from noobaa-core.
@shirady that is what is mentioned in the slack thread with the noobaa-cli status and list, as shown above
from noobaa-core.
@rkomandu Please run the test after updating the log level, and share the actual error log.
you can search for the staring NSFS Manage command: exit on error
from noobaa-core.
Hi @rkomandu,
-
From what I understand you are creating accounts in a loop and then deleting the accounts in a loop.
In the details of the issue we can see the example of the accounts3user-12398
that was not deleted:
The server encountered an internal error. Please retry the request
Since it was not deleted, we can see the config file of it. -
I would like to add to the printings of the Internal Error in this issue, it might look like this:
stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":
-
I was not able to reproduce this, I created and deleted over 1,000 accounts with sleep of 1,000 milliseconds.
You can see that I have them all created (and the directory of the accounts config is empty from any config after this).
➜ grep -i AccountCreated run_test_2048.txt | wc -l
2048
➜ grep -i AccountDeleted run_test_2048.txt | wc -l
2048
from noobaa-core.
it is a recent noobaa 5.15.3 d/s build of 0514
[root@c83f1-app2 ~]# mms3 account delete s3user5001
Failed to execute command for Account s3user5001: The server encountered an internal error. Please retry the request
[root@c83f1-app2 ~]# mms3 account list s3user5001
Failed to list Account s3user5001: Account does not exist
[root@c83f1-app2 ~]# rpm -qa |grep mms3
gpfs.mms3-5.2.1-0.240521.104312.el9.x86_64
[root@c83f1-app2 ~]# mmhealth node show CES S3 |grep s3user5001
S3 DEGRADED 16 hours ago s3_access_denied(s3user5001), s3_storage_not_exist(newbucket-5001)
s3user5001 DEGRADED 16 hours ago s3_access_denied(s3user5001)
s3_access_denied s3user5001 WARNING 16 hours ago Account does not have access to the storage path mentioned in schema.
it shows the account exists though via mmhealth
from noobaa-core.
i am deleting all the accounts and buckets via mms3 cli and came across the similar problem as shown above
bucket delete passed , account delete failed with Internal error
Bucket bucket-10335 deleted successfully
Failed to execute command for Account s3user-10335: The server encountered an internal error. Please retry the request
10335 is done
noobaa cli for status and list for the account is shown below
noobaa-cli account status --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:13.218253 [PID-3055247/TID-3055247] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:13.218355 [PID-3055247/TID-3055247] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:13.548 [/3055247] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055247) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:13.579 [/3055247] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:13.583 [/3055247] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:13.583 [/3055247] [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:13.583 [/3055247] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
"response": {
"code": "AccountStatus",
"reply": {
"_id": "664e0aa633201f0668d9ac6b",
"name": "s3user-10335",
"email": "s3user-10335",
"creation_date": "2024-05-22T15:09:26.881Z",
"nsfs_account_config": {
"uid": 10335,
"gid": 10335,
"new_buckets_path": "/gpfs/remote_fvt_fs/s3user-10335-dir",
"fs_backend": "GPFS"
},
"allow_bucket_creation": true,
"master_key_id": "664e04d7f8bda475703e483b"
}
}
}
]# noobaa-cli account list --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:23.493482 [PID-3055620/TID-3055620] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:23.493593 [PID-3055620/TID-3055620] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:23.825 [/3055620] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055620) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:23.856 [/3055620] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:23.861 [/3055620] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:23.861 [/3055620] [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:23.862 [/3055620] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 8:55:24.452 [/3055620] [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=12
{
"error": {
"code": "InternalError",
"message": "The server encountered an internal error. Please retry the request",
"detail": "Error: master key id is missing in master_keys_by_id\n at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n at async Promise.all (index 0)\n at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58\n at async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)\n at async Promise.all (index 9)\n at async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"
}
}
from noobaa-core.
# noobaa-cli account delete --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 10:19:24.584673 [PID-3183486/TID-3183486] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 10:19:24.584759 [PID-3183486/TID-3183486] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 10:19:24.904 [/3183486] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3183486) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 10:19:24.933 [/3183486] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 10:19:24.938 [/3183486] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 10:19:24.938 [/3183486] [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 10:19:24.938 [/3183486] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 10:19:25.531 [/3183486] [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=15
{
"error": {
"code": "InternalError",
"message": "The server encountered an internal error. Please retry the request",
"detail": "Error: master key id is missing in master_keys_by_id\n at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n at async Promise.all (index 0)\n at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n at async fetch_existing_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:416:30)\n at async fetch_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:385:16)\n at async account_management (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:336:18)\n at async main (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:112:13)"
}
}
Infact i would say, that all account deletes have the same problem. this is a blocker and a high priority one for us now
from noobaa-core.
Every account delete has an issue now
Bucket bucket-10001 deleted successfully
Failed to execute command for Account s3user-10001: The server encountered an internal error. Please retry the request
10001 is done
Bucket bucket-10002 deleted successfully
Failed to execute command for Account s3user-10002: The server encountered an internal error. Please retry the request
10002 is done
Bucket bucket-10003 deleted successfully
Failed to execute command for Account s3user-10003: The server encountered an internal error. Please retry the request
10003 is done
Bucket bucket-10004 deleted successfully
Failed to execute command for Account s3user-10004: The server encountered an internal error. Please retry the request
10004 is done
Bucket bucket-10005 deleted successfully
Failed to execute command for Account s3user-10005: The server encountered an internal error. Please retry the request
10005 is done
Bucket bucket-10006 deleted successfully
Failed to execute command for Account s3user-10006: The server encountered an internal error. Please retry the request
10006 is done
Bucket bucket-10007 deleted successfully
Failed to execute command for Account s3user-10007: The server encountered an internal error. Please retry the request
10007 is done
Bucket bucket-10008 deleted successfully
Failed to execute command for Account s3user-10008: The server encountered an internal error. Please retry the request
10008 is done
Bucket bucket-10009 deleted successfully
Failed to execute command for Account s3user-10009: The server encountered an internal error. Please retry the request
10009 is done
Bucket bucket-10010 deleted successfully
Failed to execute command for Account s3user-10010: The server encountered an internal error. Please retry the request
10010 is done
Bucket bucket-10011 deleted successfully
Failed to execute command for Account s3user-10011: The server encountered an internal error. Please retry the request
10011 is done
Bucket bucket-10012 deleted successfully
Failed to execute command for Account s3user-10012: The server encountered an internal error. Please retry the request
10012 is done
from noobaa-core.
Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (
master key id is missing in master_keys_by_id
), full edited message:"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"This internal error printing came from the account list.
I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account). If you can attach this please so we will have the full information.
cc: @romayalon
@shirady , this "internal server" is the latest problem w/r/t master_key as posted in above comments. However when the defect was opened in Mar 3rd week, there is no Enc enabled in d/s ODF 4.15.0. At that time the error is same as now, but don't have the noobaa cli command output
from noobaa-core.
this ENC is a new problem with ODF 4.15.3, however the main problem could still be there, as deleting in a loop is what is being in all of my first few updates
from noobaa-core.
for the latest problem @romayalon , it is the master_key problem in the CCR. I had updated in the RTC defect https://jazz07.rchland.ibm.com:21443/jazz/web/projects/GPFS#action=com.ibm.team.workitem.viewWorkItem&id=330894
as comment 2
@shirady
the master_key problem is solved as shown below
for example the CCR master_key is
1 gpfs.ganesha.statdargs.conf
1 idmapd.conf
18 _ces_s3.master_keys
# mmccr fget _ces_s3.master_keys /tmp/_ces_s3.master_keys
fget:18
# cat /tmp/_ces_s3.master_keys
{"timestemp":1716479808994,"active_master_key":"664f6740471f94033f84c97b","master_keys_by_id":{"664f6740471f94033f84c97b":{"id":"664f6740471f94033f84c97b","cipher_key":"7uboJzYGucsCimII30BbFCUdgS3zBv/oobwg9TXG0V8=","cipher_iv":"uipRG4v0jVQg8jnR2pcxsA==","encryption_type":"aes-256-gcm"}}}[root@c83f1-app3 s3]#
cp /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json /tmp/s3user-10001.json
less /tmp/s3user-10001.json
{"_id":"664dffe41b9d01307a23e986","name":"s3user-10001","email":"s3user-10001","creation_date":"2024-05-22T14:23:32.560Z","access_keys":[{"access_key":"8ca0bWSRFCScwdd4s8JJ","encrypted_secret_key":"k07Bhd5oQ2tphVljeV9Y6pUIjyqQLFoEG9nHqFTWTHmwagrgEh/fBg=="}],"nsfs_account_config":{"uid":10001,"gid":10001,"new_buckets_path":"/gpfs/remote_fvt_fs/s3user-10001-dir","fs_backend":"GPFS"},"allow_bucket_creation":true,"master_key_id":"664dd5d4b9b23ffb8378c6da"}
edit the s3user-10001.json file with the master_key of the CCR as per the /tmp/_ces_s3.master_keys
vi /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json (update the master_key of CCR)
mms3 account delete s3user-10001 --debug
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-10001 2>/dev/null
Execute command failed
info:
{
"response": {
"code": "AccountDeleted"
}
}
error: <nil>
Account s3user-10001 deleted successfully
it showed the command deleted successfully
We need to get this fixed ASAP, otherwise the 5.15.3-0415 build with Enc is not going to work for basic functionality of account delete
from noobaa-core.
@rkomandu
According to the timestamp of the account, this account (creation_date":"2024-05-22T14:23:32.560) is older than the master key ("timestemp":1716479808994 = Thursday, May 23, 2024 3:56:48.994 PM), this is not a valid situation, you had an old account in this config_dir, this is why you see the missing master key error.
from noobaa-core.
@romayalon ,
the updates that are made for the below comments are occurring on the 5.15.3 d/s build, when there are 2K accounts and 2K buckets are created. The master_key are different in the *.json files for many accounts, why they are different not sure, where Ramya is analyzing why the master_keys are regenerated when "we ran in a loop".
Bottom line as the account are unable to delete and this is happening due to the master_key value is different
#7920 (comment)
#7920 (comment)
#7920 (comment)
from noobaa-core.
@rkomandu @shirady any news about this one?
from noobaa-core.
as on Physical machine (BM) , with your 30th May provided RPM, we have recreated the delete account problem as 47 when executing concurrently from 3 nodes of 1K each , hit the error.
@Ramya-c has taken up further from Friday , discussed with Guy and she is making experiments.
Unless that is sorted our first, this error even though related would be masked IMO. If you understand and address from code flow, then that is the next move, otherwise we need to wait and try when we have those extra cycles. Priority is to sort that problem
from noobaa-core.
thanks @rkomandu for the update, waiting for the logs of the concurrency tests, please keep us updated with the information you capture about this issue.
from noobaa-core.
Hi @romayalon @rkomandu
I will share that I tried to run a concurrency test using the bash loop on my machine and I will attach the steps (so you can comment on it).
The steps are described for a few accounts so you can test in, and then change the numbers of the iterations - for example, I changed it from for i in {1501..1503}
to for i in {1501..1599}
.
Requirements:
- (optional) Create the config root (so I'm sure there are no accounts from the previous run):
sudo mkdir -p /tmp/my-config
sudo chmod 777 /tmp/my-config
- Create and give permission to the new-buckets-path:
for i in {1501..1503}
do
mkdir -p /tmp/nsfs_root_s3user/s3user-$i; chmod 777 /tmp/nsfs_root_s3user/s3user-$i;
done
Steps:
3. Create accounts:
for i in {1501..1503}
do
sudo node src/cmd/manage_nsfs account add --name s3user-$i --uid $i --gid $i --new_buckets_path /tmp/nsfs_root_s3user/s3user-$i --config_root /tmp/my-config
done
You can check you can see the accounts config: sudo ls -al /tmp/my-config/accounts
- Delete accounts:
for i in {1501..1503}
do
sudo node src/cmd/manage_nsfs account delete --name s3user-$i --config_root /tmp/my-config
done
You can check if you can not see any account config: sudo ls -al /tmp/my-config/accounts
from noobaa-core.
Thanks @shirady for checking it, @rkomandu please keep us updated if the root cause of the reproduction Ramya sees is related to encryption or not (and might be a reproduction of this issue)
from noobaa-core.
@romayalon , as per ramya posting in the channel here https://ibmandredhatguests.slack.com/archives/C015Z7SDWQ0/p1717495547911289?thread_ts=1716826449.194409&cid=C015Z7SDWQ0, from here to the next 4 comments which shows "noobaa-cli with status and list" didn't show any master_key problem as was the case previously (i.e before 30th May build). So this delete error is occurring and now can be related to this defect of why it is failing.
from noobaa-core.
@rkomandu I agree, we just need to validate it using a print of the full error object/ stderr, thank you
from noobaa-core.
@rkomandu we need the details of the error to investigate.
stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>
Did you try to delete one of the accounts (not in the loop) and see the error?
from noobaa-core.
Name New Buckets Path Uid Gid User
s3user-31407 /gpfs/remote_fvt_fs/s3user-31407-dir 31407 31407 None
s3user-31038 /gpfs/remote_fvt_fs/s3user-31038-dir 31038 31038 None
s3user-31484 /gpfs/remote_fvt_fs/s3user-31484-dir 31484 31484 None
s3user-31037 /gpfs/remote_fvt_fs/s3user-31037-dir 31037 31037 None
s3user-32316 /gpfs/remote_fvt_fs/s3user-32316-dir 32316 32316 None
s3user-31402 /gpfs/remote_fvt_fs/s3user-31402-dir 31402 31402 None
s3user-32466 /gpfs/remote_fvt_fs/s3user-32466-dir 32466 32466 None
s3user-32171 /gpfs/remote_fvt_fs/s3user-32171-dir 32171 32171 None
s3user-31320 /gpfs/remote_fvt_fs/s3user-31320-dir 31320 31320 None
s3user-31475 /gpfs/remote_fvt_fs/s3user-31475-dir 31475 31475 None
s3user-70749 /gpfs/remote_fvt_fs/s3user-70749-dir 70749 70749 None
s3user-31576 /gpfs/remote_fvt_fs/s3user-31576-dir 31576 31576 None
s3user-31312 /gpfs/remote_fvt_fs/s3user-31312-dir 31312 31312 None
s3user-32180 /gpfs/remote_fvt_fs/s3user-32180-dir 32180 32180 None
s3user-32111 /gpfs/remote_fvt_fs/s3user-32111-dir 32111 32111 None
s3user-31371 /gpfs/remote_fvt_fs/s3user-31371-dir 31371 31371 None
s3user-32250 /gpfs/remote_fvt_fs/s3user-32250-dir 32250 32250 None
s3user-32003 /gpfs/remote_fvt_fs/s3user-32003-dir 32003 32003 None
s3user-31372 /gpfs/remote_fvt_fs/s3user-31372-dir 31372 31372 None
s3user-31041 /gpfs/remote_fvt_fs/s3user-31041-dir 31041 31041 None
s3user-31040 /gpfs/remote_fvt_fs/s3user-31040-dir 31040 31040 None
s3user-70698 /gpfs/remote_fvt_fs/s3user-70698-dir 70698 70698 None
s3user-32120 /gpfs/remote_fvt_fs/s3user-32120-dir 32120 32120 None
s3user-31207 /gpfs/remote_fvt_fs/s3user-31207-dir 31207 31207 None
[root@c83f1-app2 ~]# noobaa-cli account delete --name s3user-31407
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-06-05 01:07:41.693221 [PID-2987141/TID-2987141] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-06-05 01:07:41.693315 [PID-2987141/TID-2987141] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
Jun-5 1:07:42.024 [/2987141] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:2987141) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use node --trace-warnings ...
to show where the warning was created)
Jun-5 1:07:42.054 [/2987141] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: closing fd ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
"response": {
"code": "AccountDeleted"
}
}
from noobaa-core.
@ramya-c3 would you please add an explanation?
The list above is accounts that were not deleted in a loop and then you successfully deleted one of them?
Edit:
As I understand from Ramya and the comment above:
- They tried to delete the accounts in a loop - the list is of accounts that had
InternalError
were not deleted
One of them iss3user-31407
- They tried to delete one of the accounts (not in a loop)
s3user-31407
- it was successful:
noobaa-cli account delete --name s3user-31407
{
"response": {
"code": "AccountDeleted"
}
- So the next step is to send up the output of the internal error during the loop.
The structure is mentioned in the comment above:
stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>
@ramya-c3 @rkomandu @romayalon
WDYT?
from noobaa-core.
@shirady , what you mentioned in your above comment as step 1, 2 is correct.
from noobaa-core.
@rkomandu as I understand in the loop you are using the mms3
command and you need to change the code in order to see the error details, right?
I would also suggest to run the same loop with noobaa-cli
and see if there are any issues (as mentioned in comment above I was not able to reproduce it).
from noobaa-core.
@shirady , Dev team will have to change that in mms3 and then only this can be tried.
noobaa-cli i don't think we will use it here
from noobaa-core.
@rkomandu any news?
from noobaa-core.
@romayalon , for mms3 Dev has to do the code and try it out.
from noobaa-core.
Ran with the (-d) option of the mms3 CLI that is introduced in the 521 and could see the following in the error log of debug message
around 46 accounts couldn't be deleted and all of them are shown with the same error
Bucket bucket-32277 deleted successfully
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-32277 2>/dev/null
Execute command failed
info:
{
"error": {
"code": "InternalError",
"message": "The server encountered an internal error. Please retry the request",
"detail": "Error: No such file or directory"
}
}
error: exit status 1
Failed to execute command for Account s3user-32277: The server encountered an internal error. Please retry the request
Error Code: InternalError
We have come across this error on a 1K run concurrently from all 3 nodes.
@ramya-c3 FYI
from noobaa-core.
Hi @rkomandu ,
From what I understand you are trying to delete 1,000 accounts on a loop, and in some cases you see the InternalError
that its details are Error: No such file or directory
.
In the flow of delete_account
we delete the config file (and unlink the access key symbolic link):
noobaa-core/src/cmd/manage_nsfs.js
Lines 516 to 528 in 2fa2ba1
I want to make sure that this is still true - When you run again the account delete --name s3user-32277
the account is deleted successfully?
If I add dbg.error
or console.log
printings for debugging in the manage_nsfs
file will you be able to see them? so we will better understand the exact line that causes it.
from noobaa-core.
@shirady Add the required debug statements in single rpm and share it with Ravi he will validate and get back on the results. It should not be recurring log addition since it takes too much time to reproduce this issue
from noobaa-core.
- I added
dbg.error
printings in the flow of account deletion in a branch (link if you want to see it - here) - Which RHEL version are you using with the RPM? (I mean which base image: centos:9 / centos:8).
from noobaa-core.
@shirady , we are using RH 9.x rpm on the BM cluster. Secondly, HA capability functionality is tried on the physical machine , once it is done, we can retry with your fix.
from noobaa-core.
@rkomandu thanks.
I don't have a fix, it's the code with dbg.error
printings to help us understand the place in the code where we get the error in the flow.
from noobaa-core.
could you try in your env, with these dbg messages, if this can be recreated ?
from noobaa-core.
@shirady has provided the 5.17 master rpm and it is not able to start
noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm
systemctl status noobaa |grep -i Active"
c83f1-app4-hs200: Active: activating (auto-restart) (Result: exit-code) since Wed 2024-06-26 06:54:21 EDT; 195ms ago
c83f1-app3-hs200: Active: activating (start-pre) since Wed 2024-06-26 06:54:21 EDT; 672ms ago
c83f1-app2-hs200: Active: activating (auto-restart) (Result: exit-code) since Wed 2024-06-26 06:54:20 EDT; 1s ago
systemctl status noobaa
● noobaa.service - The NooBaa service.
Loaded: loaded (/usr/lib/systemd/system/noobaa.service; enabled; preset: disabled)
Active: active (running) since Wed 2024-06-26 06:01:21 EDT; 465ms ago
Process: 1522261 ExecStartPre=/usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/upgrade/upgrade_manager.js --nsfs true --upgrade_scripts_dir /us>
Main PID: 1522272 (node)
Tasks: 7 (limit: 3297520)
Memory: 36.3M
CPU: 615ms
CGroup: /system.slice/noobaa.service
└─1522272 /usr/local/noobaa-core/bin/node /usr/local/noobaa-core/src/cmd/nsfs.js
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261] [WARN] CONSOLE:: config.load_nsfs_nc_config could not find config.json... ski>
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261] [WARN] core.util.json_utils:: could not find json file /ibm/cesSharedRoot/ces/s3-config/system.j>
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261] [WARN] core.util.json_utils:: could not find json file /ibm/cesSharedRoot/ces>
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261] [LOG] UPGRADE:: system does not exist. no need for an upgrade
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.943 [Upgrade/1522261] [LOG] UPGRADE:: system does not exist. no need for an upgrade
Jun 26 06:01:21 c83f1-app2 node[1522261]: [Upgrade/1522261] [L0] UPGRADE:: upgrade completed successfully!
Jun 26 06:01:21 c83f1-app2 node[1522261]: Jun-26 6:01:21.944 [Upgrade/1522261] [L0] UPGRADE:: upgrade completed successfully!
Jun 26 06:01:21 c83f1-app2 systemd[1]: Started The NooBaa service..
it has to create the s3-config and then system.json would be created in the cesSharedroot, which is not done
please talk to @romayalon or @naveenpaul1 and take their help to fix the noobaa start issue
from noobaa-core.
hi @shirady , ran from 3 concurrent nodes (1K accounts and buckets each), and couldn't recreate the problem on the Physical machine.
noobaa rpm
rpm -qi noobaa-core-5.17.0-20240613.el9.x86_64
Name : noobaa-core
Version : 5.17.0
Release : 20240613.el9
Architecture: x86_64
Install Date: Thu 27 Jun 2024 04:26:17 AM EDT
Group : Unspecified
Size : 431654776
License : Apache-2.0
Signature : (none)
Source RPM : noobaa-core-5.17.0-20240613.el9.src.rpm
rpm: noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm
from noobaa-core.
@shirady It may be because the debug statements taking time to print and the deletion is getting some delay. Remove the debug statements and try it will be reproduced
from noobaa-core.
Hi @rkomandu @ramya-c3
As you can see I added the printings in the RPM of June 13, so you can use the RPM from the same date and try it out.
Please run: aws s3 ls s3://noobaa-core-rpms | sort | grep 20240613
.
from noobaa-core.
@shirady We used same date rpm and we tried we are unable to simulate the issue. but with the non log/ old build we are able to because the log prints are providing little bit time for the delete option to get the file access
from noobaa-core.
@ramya-c3 @rkomandu
I want to make sure I understand:
- You used RPM
noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm
and the issue was not reproduced. - You used RPM of the same date
noobaa-core-5.17.0-20240613-master.el9.x86_64.rpm
and you reproduced the issue?
from noobaa-core.
@shirady , as mentioned (#7920 (comment)) have tried with "noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm" and didn't recreate the problem.
we haven't tried with master branch for now. Request was to try with the self build rpm and we got this created with the 5.15.3 rpms as per the defect
from noobaa-core.
@rkomandu I suggest either testing it on master or branch 5.15.
from noobaa-core.
@shirady , you had given the 5.17 as posted below with debug statements, which was tried. Secondly the issue has been recreated in 5.15.3 builds if you can check from above.
from noobaa-core.
@rkomandu I understand what you wrote in the comment below:
- You used RPM noobaa-core-5.17-nsfs-nc-delete-accounts-with-printings.0-20240613.el9.x86_64.rpm and the issue was not reproduced.
- You used RPM of 5.15.3 and you still saw the issue - I suggested that you try to reproduce it on a relevant branch, either master or 5.15.
5.15.3 is an older version: stage_5.15.4 was merged into 5.15 branch (Merge PR - 8143 as @romayalon posted.)
from noobaa-core.
Ran with the master branch build on 3 nodes concurrently with 2K, 2K and 1K accounts and buckets (creation and deletion), didn't come across any problem with the same. Now am not sure, whether the 5.17 version fixed it or your fix in the current one is really working.
rpm -qa|grep noobaa
noobaa-core-5.17.0-20240704.el9.x86_64
from noobaa-core.
@rkomandu @shirady We will need to know if this fix needs to be backported or not, please check if this happens on 5.15 + Shira's fix.
from noobaa-core.
@shirady , please try it on 5.15
from noobaa-core.
@shirady , did you get any recreate on the physical machine with the 5.15 master branch ?
from noobaa-core.
@rkomandu I tested it on 5.15.4 + fix and I didn't see the issue was reproduced.
I run the exsiting scripts and I saw the accounts and buckets created and then deleted.
from noobaa-core.
Issue Details Summary
The issue was that in a loop that created accounts and buckets, and then a loop of deleting buckets and their accounts - a couple of accounts were not deleted in the loop and we saw the error InternalError
.
@romayalon @rkomandu
I will try to summarize what we did on this issue:
- In the master branch (version 5.17) we merged the fix (#8183 ) - the issue was not reproduced.
- In branch 5.15.4 + fix (cherry-pick) - the issue was not reproduced.
- In branch 5.17 + printings - the issue was not reproduced (due to probably changing in timing).
- In branch 5.15.4 without any change (
noobaa-core-5.15.4-20240710-5.15.el9.x86_64.rpm
) - the issue was reproduced (twice):
- using a script with
mms3
commands. - using a script with
noobaa-cli
commands.
According to what it seems from the logs the issue has appeared since before we deleted the account we check if the account has buckets, during this time after we read the entries (in a loop we use nb_native().fs.readFile
), if one of the entries was deleted (ENOENT
, No such file or directory
) now we will continue (will catch this error).
Additional Information
The script is running from 3 different nodes.
The script that was run using mms3
commands:
#!/bin/sh
for i in `seq 10000 10999`; do mms3 account create s3user-$i --uid $i --gid $i --newBucketsPath "/gpfs/remote_fvt_fs/s3user-$i-dir"; echo $i is done; mms3 bucket create bucket-$i --accountName s3user-$i --filesystemPath /gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i ; done
sleep 60
for i in `seq 10000 10999`; do mms3 bucket delete bucket-$i ; mms3 account delete s3user-$i -d; echo $i is done; done
Example of a failure:
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-15620 2>/dev/null
Execution of command failed:
Info: {
"error": {
"code": "InternalError",
"message": "The server encountered an internal error. Please retry the request",
"cause": "Error: No such file or directory"
}
}
Error: exit status 1
Failed to execute command for Account s3user-15620: The server encountered an internal error. Please retry the request
Error Code: InternalError
Error Details: <nil>
The script that was run using noobaa-cli
commands:
#!/bin/sh
for i in `seq 21000 21999`; do mkdir "/gpfs/remote_fvt_fs/s3user-$i-dir" ; chmod 777 "/gpfs/remote_fvt_fs/s3user-$i-dir" ; noobaa-cli account add --name s3user-$i --uid $i --gid $i --new_buckets_path "/gpfs/remote_fvt_fs/s3user-$i-dir" ; echo $i is done; mkdir "/gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i" ; chmod 777 "/gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i" ; noobaa-cli bucket add --name bucket-$i --owner s3user-$i --path /gpfs/remote_fvt_fs/s3user-$i-dir/bucket-dir-$i ; done
sleep 60
for i in `seq 21000 21999`; do noobaa-cli bucket delete --name bucket-$i ; noobaa-cli account delete --name s3user-$i ; echo $i is done; done
Example of an error on read file in a timestamp when the deletion loop started:
2024-07-14 07:30:10.983078 [PID-1555502/TID-1555502] [L1] FS::FSWorker::OnError: Readfile _path=/ibm/cesSharedRoot/ces/s3-config/buckets/bucket-31060.json error.Message()=2024-07-14 07:30:10.983094 [PID-1555502/TID-1555523] 2024-07-14 07:30:10.983100 [PID-1555502/TID-1555524] _uid=No such file or directory
from noobaa-core.
thank you for the detailed analysis @shirady in your comment #7920 (comment)
So you will fix this into 5.15.5 ? not sure why the "Validation" label is now added and defect is closed.
from noobaa-core.
@rkomandu I closed the issue because we merged the fix.
I added the "Validation" label because you might want to check it in the branch that this fix will be backported to (planned to 5.16.1).
from noobaa-core.
@shirady , I couldn't recreate in the 5.17 (master branch), 5.15.4 you could recreate as well , since we are going with this as our release for our product, it would be good to have the fix in the upcoming 5.15.5.
In branch 5.15.4 without any change
Now coming to your point in 5.16.1 fix, am not sure , whether we would take that version for our future release
Adding concerned @madhuthorat
we need to see what versions are currently outlined for our future releases. which one are we taking 4.16 or 4.17 etc. Let us discuss internally
from noobaa-core.
Related Issues (20)
- Nc | NSFS | Creating a Bucket with Name That Is the Same as a Internal Directory Throws `BucketAlreadyExists` HOT 1
- NC | Implement logs gathering mechanism
- aws s3 rm with --recursive option does not delete all the objects from the bucket HOT 6
- nsfs metrics from metrics port 7004 are only implemented over http. This should be changed to https as default.
- NC | NSFS | CLI | Events Improvements HOT 2
- NC | NSFS | CLI | Improve Response (To have Details)
- NC | NSFS | Log events to stderr if stderr is enabled
- S3 GetObjectAttributes API - implement new op for compatibility
- [System Test][5.2.1.0] 4k warp workload fails on power architecture because it does not finish closing the connections it opens HOT 3
- List objects with unicode - should sort keys using byte-by-byte order and not using utf8 sort order
- NSFS | Bucket is not listing when the bucket --path missing backslash(/) at the beginning of path HOT 5
- NSFS | RPM build is not generating NSFS RPM HOT 1
- S3 head-bucket should return a header with the service identifier (config option)
- NSFS | S3 | Versioning | List objects returns .versions/ folder HOT 1
- NSFS | NC | List Buckets Fails With `InvalidBucketState` In Case a Bucket Has Invalid Schema Config File
- Acess Denied for S3 buckets is not reporting HOT 1
- NSFS | NC | GLACIER restore flow needs to handle `ENOENT`
- NC | Maintenance and Short Refactoring Tasks
- NC | NSFS | Bucket Policy Should Be Managed Only by Bucket Owner HOT 3
- NC | NSFS | Updating Account's UID and GID Results in `AccessDenied` in Put-Object
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from noobaa-core.