Giter Site home page Giter Site logo

redhat-performance / badfish Goto Github PK

View Code? Open in Web Editor NEW
90.0 90.0 26.0 716 KB

Vendor-agnostic tool for managing bare-metal systems via the Redfish API

Home Page: https://quads.dev

License: GNU General Public License v3.0

Python 99.15% Dockerfile 0.11% Makefile 0.29% Shell 0.06% Smarty 0.39%
automation baremetal bmc dell idrac ipmi oob redfish redfish-api supermicro systems vendor

badfish's People

Contributors

abondvt89 avatar dominikvagner avatar dustinblack avatar grafuls avatar kambiz-aghaiepour avatar naoryaa avatar pablomh avatar quantum-anomaly avatar quantumposix avatar rajpratik71 avatar sadsfae avatar sarahbx avatar wereii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

badfish's Issues

idrac_interfaces.yml doesn't parse

When I try to adjust the boot order using syntax given in README.md, it fails with a parse error:

# ./badfish.py -H mgmt-e24-h33-740xd -u quads -p 492728 -i config/idrac_interfaces.yml -t director --verbose
- INFO     - Systems service: /redfish/v1/Systems/System.Embedded.1.
- INFO     - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- DEBUG    - Getting bios boot mode.
Traceback (most recent call last):
  File "./badfish.py", line 806, in <module>
    sys.exit(main())
  File "./badfish.py", line 800, in main
    execute_badfish(host, args, logger)
  File "./badfish.py", line 741, in execute_badfish
    badfish.change_boot(host_type, interfaces_path, pxe)
  File "./badfish.py", line 301, in change_boot
    _type = self.get_host_type(interfaces_path)
  File "./badfish.py", line 210, in get_host_type
    interfaces[_host] = definitions["%s_%s_interfaces" % (_host, host_model)].split(",")
KeyError: 'foreman_740xd_interfaces'

I think it's not parsing idrac_interfaces.yml but not sure yet.

[BUG] passing `--host` argument gets interpreted as `--host-list`

Your System Details

  • Python Version: Python 3.9.0
  • Operating System: Fedora release 33
  • Target System Type: ANY
  • IPMI / Out-of-band Firmware Version: ANY

Describe the bug
Badfish argument for passing the host name only accepts -H which some people might guess that the analogue --host argument would also be accepted. Instead, BF interprets this as an incomplete call for --host-list and goes into trying to open a file on the current directory with the name of the hostname you are passing resulting in an error trying to read from {HOSTNAME}

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. badfish.py --host mgmt-hostname.com -u root -p {PASS} --check-boot
  2. See error:
    [main] - ERROR - There was something wrong reading from mgmt-hostname.com

Expected Behavior
BF executes the desired action

Additional Details
We might want to look into adding the --host definition for the -H argument as well as preventing argparse from "auto" completing the argument name.

expand --pxe one shot boot functionality

We need to do some further testing / fixes on the --pxe option, it shouldn't necessarily need to have a corresponding -t foreman/director argparse unless we want to tell it to one-shot boot off that particular interface.

This may be useful however to others. This issue covers the following changes:

  • make --pxe work without needing -t (default to foreman interface)
  • if -t director or -t foreman is passed along with config file one-shot boot off that interface instead

Issue with Dell r730xd

I'm seeing an issue using badfish on the Dell r730xd running 2.60.60.60 firmware.

FAIL: POST command failed to create BIOS config job, status code is 503

Add a --check mechanism (only report what things are set as)

We should also employ a --check or -c option that simply reports what the setting is. Output should be usable in a validation format (e.g. check if interface order matches a YAML config key:value pair).

Example:

(Note passing -i isn't needed for config for check)

badfish.py -H mgmt-yourserver.com -u root -p password --check
  • This should just tell you what the boot order is only

(Comparing current interface order to an existing key:value pair entry)

badfish.py -H mgmt-yourserver.com -u root -p password --check -i config/idrac_interfaces.yml -t director
  • This should tell you if the boot order matches what they key:value pair is in the config, exit status or some other reasonable method could be queried to determine (even stdout would be ok) so it can be used as a validation check.

badfish run inside podman within VM cannot resolve iDRAC hostname

Your System Details

Fedora 31 laptop, Centos 8.2 VM, podman podman-1.6.4-10. , docker.io/quads/badfish

  • Python Version: whatever is in docker.io/quads/badfish

  • Operating System: RHEL8.2

  • Target System Type: Dell 740xd BIOS 2.8.1

  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 9 4.22.00.00)

Describe the bug

badfish run from within podman from within a VM on my laptop cannot talk to the iDRAC, but badfish run direct from the VM can.
Why am I doing this? I'm running from a VM because JetSki is having problems with python versions when run direct from my laptop (separate issue). I worked around it by creating a Centos 8 VM on my laptop and then giving it a route to the outside world through my laptop, copying /etc/resolv.conf from the laptop to the VM. However, then JetSki run from inside the VM attempts to run badfish in a container and gets this error.

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected Behavior

badfish run from inside the container should have same behavior as badfish run outside a container on the same host.

Logs / Screenshots
If applicable, add logs or screenshots to help explain your problem.

Additional Details

[root@localhost JetSki]# podman run -it --rm
docker.io/quads/badfish -vv -u quads -p password -i config/idrac_interfaces.yml
-H mgmt-e25-h25-740xd.alias.bos.scalelab.redhat.com --check-boot

  • DEBUG - Cannot connect to host mgmt-e25-h25-740xd.alias.bos.scalelab.redhat.com:443 ssl:False [Name does not resolve]
  • ERROR - Failed to communicate with server.
  • DEBUG -
  • ERROR - There was something wrong executing Badfish.

the same command WORKS when a /etc/hosts entry was added:

[root@localhost JetSki]# podman run -it --rm
--add-host=mgmt-e25-h25-740xd.alias.bos.scalelab.redhat.com:10.19.96.137
docker.io/quads/badfish -vv -u quads -p password -i config/idrac_interfaces.yml
-H mgmt-e25-h25-740xd.alias.bos.scalelab.redhat.com --check-boot

  • DEBUG - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • DEBUG - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
  • DEBUG - Getting bios boot mode.
  • WARNING - Current boot order does not match any of the given.
  • INFO - Current boot order:
  • INFO - 1: NIC.Integrated.1-1-1
    -...

[root@localhost JetSki]# ~/badfish/src/badfish/badfish.py -u quads -p password -i ~/badfish/config/idrac_interfaces.yml -H mgmt-e25-h25-740xd.alias.bos.scalelab.redhat.com --check-boot

  • WARNING - Current boot order does not match any of the given.
  • INFO - Current boot order:
  • INFO - 1: NIC.Integrated.1-1-1
    ...

RFE: Add logging option to badfish

This is to request/track implementing optional Python logging.

Probably an additional argparse value like -l /path/to/log would be ok, or perhaps log by default? The former would allow us to utilize variables / hostnames for separated per-host logging for routine automation which would be nice.

Provide better logging verbosity when -t idrac_interfaces.yml not passed

Your System Details

  • Python Version: 3.7.6
  • Operating System: Fedora 30
  • Target System Type: Dell r630
  • IPMI / Out-of-band Firmware Version: iDrac8 63.60.60.60

Describe the bug
Getting a traceback running latest master branch badfish when running --boot-to-type foreman

I am running this via python3 badfish.py -H mgmt-$host -u user -p pass --boot-to-type foreman

I also get this via a container also.

Additionally, it seems that for the r630 systems --boot-to is not issuing a reboot.

- INFO     - Systems service: /redfish/v1/Systems/System.Embedded.1.
- INFO     - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
Traceback (most recent call last):
  File "badfish.py", line 940, in <module>
    sys.exit(main())
  File "badfish.py", line 934, in main
    execute_badfish(host, args, logger)
  File "badfish.py", line 853, in execute_badfish
    badfish.boot_to_type(boot_to_type, interfaces_path)
  File "badfish.py", line 632, in boot_to_type
    self.boot_to(device)
  File "badfish.py", line 608, in boot_to
    if self.check_device(device):
  File "badfish.py", line 701, in check_device
    if device.lower() in boot_devices:
AttributeError: 'NoneType' object has no attribute 'lower'

Seems to be from the use of .lower here:

https://github.com/redhat-performance/badfish/blob/master/badfish.py#L701

UPDATE

It appears I was not passing -i idrac_interfaces.yml and this is user error. We'll leave this open anyway and implement more verbose logging to tell people like me to stop being boneheads and use the interfaces file so it can consult the key/value pairs it needs to complete.

[RFE] Provide more accurate return/exit codes

Is your feature request related to a problem? Please describe.
Right now, badfish always returns successfully with an exit code of 0 even when the actualy task/command failed. We want to leverage badfish exit codes to do things like

  1. Retry if a badfish command failed
  2. Check if a system is ready to accept comamnd after clear jobs by doing a -check-boot for example. Currently even if the system is unresponsive and badfish failed to check boot order, the exit code is 0.

What System / IPMI Platform?
Dells

Describe the Possible Solution
Provide error codes when actions failed.

[Bug]GracefulRestart not a valid option for G14 Dells

Python Version:
Operating System: Fedora 30
Target System Type: Generation 14 Dell's
IPMI / Out-of-band Firmware Version: 3.34.34.34

Currently a GracefulRestart is called during quads move and rebuild. This is not a valid call as an option:
curl -k --user "username:password" -H "Content-Type: application/json" https://mgmt-host/redfish/v1/Systems/System.Embedded.1 | python -m json.tool | grep -A7 ResetType
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7607 100 7607 0 0 22307 0 --:--:-- --:--:-- --:--:-- 22242
"[email protected]": [
"On",
"ForceOff",
"ForceRestart",
"GracefulShutdown",
"PushPowerButton",
"Nmi"
],
Need to adjust for new generation redfish calls.

[RFE] Replace BasicAuth with Session-based authentication (JWT)

Is your feature request related to a problem? Please describe.
Token based authentication is going to reduce the number of times we must send credential to the server. Servers that use tokens can improve their performances, because they do not need to continuously look through all the session details to authorize the user’s requests.

What System / IPMI Platform?
ALL

Describe the Possible Solution
Create an X auth session via POST with payload {"UserName":$U,"Password":$P} to /redfish/v1/SessionService/. Retrieve the token from the response and store in memory then pass it on every subsequent request through the header like {'X-Auth-Token': $TOKEN}.
Close the session via DELETE to /redfish/v1/SessionService/Sessions/$SESSION_ID.

idrac_interfaces.yml has wrong boot order for director style

Your System Details

  • Python Version:3
  • Operating System: RHEL8
  • Target System Type: (e.g. Dell, SuperMicro) Dell 740xd
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

Describe the bug
A clear and concise description of what the bug is.

When deploying Openstack with Ceph on Alias allocation the overcloud deployment was failing as the nodes were getting stuck at clean_wait and then moved to clean_failed waiting for the ramdisk to boot. For Openstack deployment the host machine has to pxe boot on the interface which is specified in the instackenv.json http://quads11.example.com/cloud/cloud06_instackenv.json.
After investigating, I notice that the boot order for the director style set by badfish is wrong, it should be

director_740xd_interfaces: NIC.Integrated.1-2-1,NIC.Integrated.1-1-1,NIC.Slot.7-2-1,NIC.Slot.7-1-1,NIC.Integrated.1-3-1,HardDisk.List.1-1

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Deploy Openstack using Jetpack

Expected Behavior
Openstack Deployment successful

SuperMicro Support

Test and implement logic to cover supermicro hosts.

Currently badfish has been tested and proven against Dell systems but we need to make sure that we can use this against SuperMicro hosts as well.
This might require different endpoints.

Provide Badfish in a Container

It'd be useful to provide Badfish in a docker container for an easier way to consume without having to worry about Python library dependencies and to be more portable.

[RFE] Enable badfish to work with UEFI mode

Is your feature request related to a problem? Please describe.
Currently, badfish doesn't seem to be check/manipulate boot order when the host is in UEFI mode. It would be good to have badfish support UEFI as more of our testing moves to that.

What System / IPMI Platform?
Dell/UEFI

After clear-jobs, power-cycle fails for 7525

Your System Details

  • Python Version: Python 3.6.8
  • Operating System: RHEL 7.7
  • Target System Type: (e.g. Dell, SuperMicro) dell

Describe the bug
A clear and concise description of what the bug is.

After using badfish to clear jobs on e20-h29-7525, when a run --power-cycle it gives the following error

(.venv) [root@e20-h31-7525 badfish]# python3 badfish.py -H mgmt-e20-h29-7525  -u quads -p  PASSWORD --power-state
- INFO     - Power state for mgmt-e20-h29-7525.example.com: On
(.venv) [root@e20-h31-7525 badfish]# python3 badfish.py -H mgmt-e20-h29-7525  -u quads -p  PASSWORD --clear-jobs --force
- INFO     - Job queue for iDRAC mgmt-e20-h29-7525** successfully cleared.
(.venv) [root@e20-h31-7525 badfish]# python3 badfish.py -H mgmt-e20-h29-7525**  -u quads -p  PASSWORD --power-cycle -v
- DEBUG    - Systems service: /redfish/v1/Systems/System.Embedded.1.
- DEBUG    - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- DEBUG    - Getting allowable reset types.
- DEBUG    - Rebooting server: mgmt-e20-h29-7525**
- DEBUG    - url: https://mgmt-e20-h29-7525***/redfish/v1/Systems/System.Embedded.1
- DEBUG    - Current server power state is: On.
- ERROR    - Command failed to ForceOff server, status code is: 500.
- WARNING  - internal error, 
- DEBUG    - 
- ERROR    - There was something wrong executing Badfish

To Reproduce / What were you Doing?
Steps to reproduce the behavior:
python3 badfish.py -H mgmt-{{ hostname}} -u quads -p password --clear-jobs --force
python3 badfish.py -H mgmt-{{ hostname}} -u quads -p password --power-cycle

Expected Behavior
python3 badfish.py -H mgmt-{{ hostname}} -u quads -p password --power-cycle

- DEBUG    - Systems service: /redfish/v1/Systems/System.Embedded.1.
- DEBUG    - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- DEBUG    - Getting allowable reset types.
- DEBUG    - Rebooting server: mgmt-e20-h29-7525.example.com.
- DEBUG    - url: https://mgmt-e20-h29-7525.example.com/redfish/v1/Systems/System.Embedded.1
- DEBUG    - Current server power state is: On.
- INFO     - Command passed to ForceOff server, code return is 204.
- INFO     - Polling for host state: Not Down
- DEBUG    - url: https://mgmt-e20-h29-7525.example.com/redfish/v1/Systems/System.Embedded.1
- DEBUG    - Current server power state is: On.
- POLLING: [------------------->] 100% - Host state: On  
- INFO     - Command passed to On server, code return is 204.

This command should be executed successfully

Note: I have manually applied the changes from #149

idrac_interfaces.yml has wrong boot order

Your System Details
alias cloud02

  • Python Version: 3
  • Operating System: RHEL8/CoreOS
  • Target System Type: (e.g. Dell, SuperMicro) Dell 740xd
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

Describe the bug
A clear and concise description of what the bug is.

I tried to run Dustin Black's playbook and it fails (log here). In progress of troubleshooting, I realized that the "director" interface that I was supposed to use was "eno1", but I was using ens7f0. But this means I can't use the 25-GbE interface for provisioning network, and I can't use both 25-GbE NIC ports. @dustinblack thought this was wrong, that "director_740xd_interfaces" should have the Foreman interface "eno3" last, so that any of the other NIC ports could PXE boot, right Dustin? This would give us maximum flexibility. But what it does in director_740_interfaces record is make the Foreman interface 2nd, after eno1 interface. When PXE boot gets to the Foreman interface, Foreman will tell it to boot from local drive, so subsequent interfaces will not get used. so the only other interface that we can use for PXE boot is eno1, which is the 10-GbE interface in Alias. If my understanding of the PXE boot sequence is correct, can we please change this?

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. See Dustin Black's IPI directions here
  2. edit inventory/hosts as shown here

[RFE]Set One time Boot / VCD-DVD

Is your feature request related to a problem? Please describe.

add capability already existing to do the following: set next time boot to virtual media OEM by utilizing already dell published methods:

https://github.com/dell/iDRAC-Redfish-Scripting/blob/master/Redfish%20Python/SetNextOneTimeBootVirtualMediaDeviceOemREDFISH.py

add capability to set the next one time boot device using the following for integration:
https://github.com/dell/iDRAC-Redfish-Scripting/blob/master/Redfish%20Python/SetNextOneTimeBootDeviceREDFISH.py

Incorporate ability to attach remoteimage through dell redfish :
https://github.com/dell/iDRAC-Redfish-Scripting/blob/master/Redfish%20Python/BootToNetworkIsoOsdREDFISH.py

What System / IPMI Platform?
Dell Redfish implementation in firmware > 4.22

Describe the Possible Solution

See above

Additional Info
Get rid of clunky absible scripts calling outdated methods.

[RFE] Implement BIOS Password Management

This is an RFE to implement the following functionality in badfish using the Redfish API:

Standalone / Upstream Badfish

  • --check-bios-password Checks whether or not a BIOS password is set, should support --host-list also and present a digestible logger summary.
  • --set-bios-password Set the BIOS password based on a setting in either config/badfish.yml or via STDOUT with --set-bios-password XXYY
  • --remove-bios-password Strips out the BIOS password among a singular host, or a batch via --host-list $file_of_hosts.

QUADS Badfish Library Considerations

  • Same as above but utilize a variable in conf/quads.yml instead for the value of bios_default_password with the default being empty or no password and do not run this unless it's defined.
  • This should be called as part of quads/tools/move_and_rebuild_host.py

[RFE] Bootorder settings in UEFI mode

Is your feature request related to a problem? Please describe.
We need badfish to be able to set director style boot order in UEFI mode.

What System / IPMI Platform?
Redfish supported Dells and Supermicros

Describe the Possible Solution
We would need code to

  1. Detect current bios mode
  2. Change bios mode to requested (UEFI/LEgacy)
  3. Set boot order to director/foreman for that mode using a file similar to idrac_interfaces.yml but for UEFI

Describe alternatives you've considered
Manually doing this in BIOS

Additional Info
Add any other info or context about the feature request here.

changing boot order mysteriously fails in badfish in Alias run from JetSki

Your System Details

Alias cloud 10
e24-h19-740xd is deployer
e24-h{29,31,33}-740xd are masters

  • Python Version:

[kni@e24-h19-740xd ~]$ python3 --version
Python 3.6.8

  • Operating System:

RHEL 8.1 Linux e24-h19-740xd.alias.bos.scalelab.redhat.com 4.18.0-147.el8.x86_64

  • Target System Type: (e.g. Dell, SuperMicro)

Dell

  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

I dunno

Describe the bug
A clear and concise description of what the bug is.

badfish fails to do "-t director" boot order, see jetski playbook log here, task is: Set nodes to director boot order:

http://perf1.perf.lab.eng.bos.redhat.com/pub/bengland/tmp/ocp4/jetski/jetski.08101636.log

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. run JetSki in Alias cluster as documented, see config files in above URL's directory:
  2. See error described above.

However, when I run -t director command as follows, it works:

https://pastebin.com/hrpDXm6M

Expected Behavior

If badfish fails, we should get a better indication of what's wrong than:

  • ERROR - Failed to communicate with mgmt-e24-h29-740xd.alias.bos.scalelab.redhat.com
  • ERROR - There was something wrong executing Badfish.

It appears that the problems JetSki has in Alias lab may stem from this. @smalleni ?

--check-boot with -i config/idrac_interfaces.yml does not always report correct boot-order entry

When the current boot order does not match foreman (or any entry in idrac_interfaces.yaml) the output mistakenly reports that it does match "foreman". See below where the actual boot order is nothing like the foreman boot order.

[root@e25-h21-740xd badfish]# ./badfish.py -u quads -p $password -i config/idrac_interfaces.yml -H mgmt-e25-h25-740xd --check-boot

  • INFO - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • INFO - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
  • WARNING - Current boot order is set to: foreman.

[root@e25-h21-740xd badfish]# grep foreman_740 config/idrac_interfaces.yml
foreman_740xd_interfaces: NIC.Integrated.1-3-1,HardDisk.List.1-1,NIC.Integrated.1-1-1

[root@e25-h21-740xd badfish]# ./badfish.py -u quads -p $password -H mgmt-e25-h25-740xd --check-boot

  • INFO - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • INFO - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
  • INFO - Current boot order:
  • INFO - 1: NIC.Integrated.1-3-1
  • INFO - 2: NIC.Integrated.1-1-1
  • INFO - 3: NIC.Slot.7-2-1
  • INFO - 4: NIC.Slot.7-1-1
  • INFO - 5: HardDisk.List.1-1
  • INFO - 6: NIC.Integrated.1-2-1
  • INFO - 7: NIC.Integrated.1-4-1

no support for AMD servers in Alias lab

Your System Details

Alias cloud12 consisting of e20-{h25,h27}-7425 and e20-{h29,h31}-7525

  • Python Version: Python 3.6.8
  • Operating System: Centos 8.2.2004
  • Target System Type: (e.g. Dell, SuperMicro) Dell AMD 7425, AMD 7525
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

IDRAC 4.22.00.00
BIOS 1.13.4

Describe the bug

need AMD server support for Alias lab

To Reproduce / What were you Doing?

run command below.

Expected Behavior

badfish works out of the box with AMD servers

Logs / Screenshots

[root@localhost badfish]# src/badfish/badfish.py -H mgmt-e20-h27-7425.alias.bos.scalelab.redhat.com -u quads -p bos@253 -i config/idrac_interfaces.yml --check-boot

  • ERROR - Couldn't find a valid key defined on the interfaces yaml: director_7425_interfaces
  • ERROR - There was something wrong executing Badfish

Additional Details

I'm having trouble figuring out what idrac_interfaces.yml should be, because the 7525 in particular looks a bit different than the 740xd.

Dell r730xd and r620 may not still be quite right

Having some issues with Dell r730xd r620 and --boot-to so we should comb through them a little closer, it doesn't seem like it's all of them but just a handful.

Worst case these should be counting against self.retries() and we can handle --boot-to manually.

2019-06-05 14:26:41,462 There was something wrong with your request.
2019-06-05 14:26:49,283 POST command failed to create BIOS config job, status code is 400.
2019-06-05 14:26:49,283 Pending configuration values are already committed, unable to perform another set operation.
2019-06-05 14:26:49,283 Could not set boot order via Badfish.
Traceback (most recent call last):
  File "/opt/quads/quads/tools/move_and_rebuild_hosts.py", line 151, in move_and_rebuild
    "../../conf/idrac_interfaces.yml"
  File "/opt/quads/quads/tools/badfish.py", line 315, in change_boot
    job_id = self.create_bios_config_job(self.bios_uri)
  File "/opt/quads/quads/tools/badfish.py", line 440, in create_bios_config_job
    return self.create_job(_url, _payload, _headers)
  File "/opt/quads/quads/tools/badfish.py", line 428, in create_job
    self.error_handler(_response)
  File "/opt/quads/quads/tools/badfish.py", line 59, in error_handler
    sys.exit(1)
SystemExit: 1

Same thing with a few r620's:

2019-06-08 14:25:31,477 Job queue already cleared for iDRAC mgmt-f20-h01-000-r620.example.com, DELETE command will not execute.
2019-06-08 14:25:31,477 Waiting for host to be up.
2019-06-08 14:25:31,477 Polling for host state: On
  Polling: [------------------->] 100% - Host state: On  
2019-06-08 14:26:30,155 There was something wrong with your request.
2019-06-08 14:26:36,077 POST command failed to create BIOS config job, status code is 400.
2019-06-08 14:26:36,078 Pending configuration values are already committed, unable to perform another set operation.
2019-06-08 14:26:36,078 Could not set boot order via Badfish.
Traceback (most recent call last):
  File "/opt/quads/quads/tools/move_and_rebuild_hosts.py", line 151, in move_and_rebuild
    "../../conf/idrac_interfaces.yml"
  File "/opt/quads/quads/tools/badfish.py", line 315, in change_boot
    job_id = self.create_bios_config_job(self.bios_uri)
  File "/opt/quads/quads/tools/badfish.py", line 440, in create_bios_config_job
    return self.create_job(_url, _payload, _headers)
  File "/opt/quads/quads/tools/badfish.py", line 428, in create_job
    self.error_handler(_response)
  File "/opt/quads/quads/tools/badfish.py", line 59, in error_handler
    sys.exit(1)
SystemExit: 1
2019-06-08 14:26:36,079 Rebooting via IPMI for next run
Chassis Power Control: Down/Off

These are the current troublemaker r620

f20-h01-000-r620.rdu2.example.com
f20-h14-000-r620.rdu2.example.com

badfish.py doesn't clear job queue

I can't seem to clear the job queue with either badfish or DRAC GUI, this is preventing me from changing the boot order.

[bengland@bene-laptop badfish]$ ./badfish.py -H mgmt-e23-h23-740xd -u quads -p 494292 -i config/idrac_interfaces.yml --clear-jobs
- INFO     - Systems service: /redfish/v1/Systems/System.Embedded.1.
- INFO     - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- WARNING  - iDRAC version installed does not support DellJobService
- WARNING  - Clearing job queue for job IDs: ['JID_640133055279'].
- ERROR    - Job queue not cleared, current job queue contains jobs: ['JID_640133055279'].

Dell Job Queue Clear does not Always clear - needs JID_CLEARALL_FORCE

It seems that occasionally Dell systems (thus far r620 but probably others) don't always get their job queues cleared with badfish. These cannot be cleared either via the iDRAC UI.

This causes JID's to pile up and never complete.

2019-06-08 12:49:38,875 Clearing job queue for job IDs: ['RID_843181825116', 'RID_843175376755', 'RID_843175362071', 'RID_843201688905', 'RID_843208253120', 'RID_843158310720', 'RID_843158246559', 'RID_843208272388', 'RID_843053267400', 'RID_843188446456', 'RID_843181842184', 'RID_843195047909', 'RID_843188431875', 'RID_843201674287', 'RID_843195032975'].
2019-06-08 12:51:31,438 Job queue not cleared, current job queue contains jobs: ['RID_843181825116', 'RID_843175376755', 'RID_843175362071', 'RID_843201688905', 'RID_843208253120', 'RID_843158310720', 'RID_843158246559', 'RID_843208272388', 'RID_843053267400', 'RID_843188446456', 'RID_843181842184', 'RID_843195047909', 'RID_843188431875', 'RID_843201674287', 'RID_843195032975'].
2019-06-08 12:51:31,438 Could not set boot order via Badfish.
Traceback (most recent call last):
  File "/opt/quads/quads/tools/move_and_rebuild_hosts.py", line 151, in move_and_rebuild
    "../../conf/idrac_interfaces.yml"
  File "/opt/quads/quads/tools/badfish.py", line 306, in change_boot
    self.clear_job_queue()
  File "/opt/quads/quads/tools/badfish.py", line 412, in clear_job_queue
    sys.exit(1)
SystemExit: 1

Equivalent UI screenshot

Screenshot_2019-06-08_14-15-30

The result of this is a system is sort of stuck in this state and holds up systems that are queued up behind it to go through move_and_rebuild_hosts.py

The only way to clear these is via racadm

e.g.

ssh [email protected] "racadm jobqueue delete -i JID_CLEARALL_FORCE"

We should implement the redfish equivalent of jobqueue delete -i JID_CLEARALL_FORCE here. In QUADS 1.0 we were calling racadm directly via Ansible and this was used to good effect so surely there's an API / redfish way we can implement.

iDRAC gets into not ready state

Your System Details

  • Python Version: python3
  • Operating System: fedora 32
  • Target System Type: Dell FC640s
  • IPMI / Out-of-band Firmware Version: 4.00.00.00

Describe the bug
Badfish command fails to communicate to the server intermittently, it happens often during a large scale deployment - 100 node JetSki.

Badfish Command - badfish -u quads -p password -H mgmt-e22-h18-b02-fc640.rdu2.scalelab.redhat.com --clear-jobs --force --debug

Response:
- ERROR    - Failed to communicate with mgmt-e22-h18-b02-fc640.rdu2.scalelab.redhat.com
- DEBUG    - 
- ERROR    - There was something wrong executing Badfish.

When Badfish tries to validate(here) the supplied hosts credential, it hits the below redfish API and gets 400 response.

RedFish API - https://mgmt-e22-h18-b02-fc640.rdu2.scalelab.redhat.com/redfish/v1/Systems

Response:
  • Code: 400
  • Body:
{
    "error": {
        "@Message.ExtendedInfo": [
            {
                "Message": "iDRAC is not ready. The configuration values cannot be accessed. Please retry after a few minutes.",
                "MessageArgs": [],
                "[email protected]": 0,
                "MessageId": "IDRAC.1.6.SWC0700",
                "RelatedProperties": [],
                "[email protected]": 0,
                "Resolution": "Turn off the system, Remove AC. Wait for 5 seconds. Connect AC. Turn it on.",
                "Severity": "Critical"
            }
        ],
        "code": "Base.1.2.GeneralError",
        "message": "A general error has occurred. See ExtendedInfo for more information"
    }
}

Expected Behavior

Response:
  • Code: 200
  • Body:
{
    "@odata.context": "/redfish/v1/$metadata#ComputerSystemCollection.ComputerSystemCollection",
    "@odata.id": "/redfish/v1/Systems",
    "@odata.type": "#ComputerSystemCollection.ComputerSystemCollection",
    "Description": "Collection of Computer Systems",
    "Members": [
        {
            "@odata.id": "/redfish/v1/Systems/System.Embedded.1"
        }
    ],
    "[email protected]": 1,
    "Name": "Computer System Collection"
}

Additional Details
As an interim solution, iDRAC reset or racadm racreset soft helps.

--host-list not working

Your System Details

Lenovo ThinkPad T470
[bengland@localhost upi]$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/quads/badfish latest 6c3ff7c5d8f3 12 days ago 142 MB

  • Python Version:
    whatever the container image is using

  • Operating System:
    Fedora 31

  • Target System Type: (e.g. Dell, SuperMicro)

  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

Describe the bug
--host-list parameter does not work, always gets this:

[bengland@localhost upi]$ podman run -it --rm quads/badfish \
  --host-list /home/bengland/openshift/upi/masters.list \
  -u quads -p 502328 -i config/idrac_interfaces.yml --check-boot
- ERROR    - There was something wrong reading from /home/bengland/openshift/upi/masters.list
[bengland@localhost upi]$ cat /home/bengland/openshift/upi/masters.list
mgmt-e26-h03-740xd
mgmt-e26-h05-740xd
mgmt-e26-h07-740xd

However, if I drop --host-list masters.list and substitute a -H hostname, it works.

[bengland@localhost upi]$ podman run -it --rm quads/badfish -H mgmt-e26-h03-740xd -u quads -p 502328 -i config/idrac_interfaces.yml --check-boot
- INFO     - Systems service: /redfish/v1/Systems/System.Embedded.1.
- INFO     - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- WARNING  - Current boot order does not match any of the given.
- INFO     - Current boot order:
- INFO     - 1: NIC.Integrated.1-1-1
- INFO     - 2: NIC.Integrated.1-3-1
- INFO     - 3: NIC.Slot.7-1-1
- INFO     - 4: NIC.Slot.7-2-1
- INFO     - 5: NIC.Integrated.1-2-1
- INFO     - 6: NIC.Integrated.1-4-1
- INFO     - 7: HardDisk.List.1-1

If this is not the correct format for the list, what is? Documentation doesn't say.

Expected Behavior
It should perform the requested action for each host in the input host list, or say why it couldn't.

I can workaround by doing 1 host at a time, but would be nice to have this feature.

[BUG] Badfish failure with verify_ssl on Dell R730xd / iDRAC8

Have gotten reports that current upstream badfish fails to work on Dell R730xd systems, I have been able to reproduce this as well pulling from latest upstream (via setuptools method).

This seems to stem from

- DEBUG - _request() got an unexpected keyword argument 'verify_ssl'

This could potentially be from this code here:

async def get_request(self, uri, _continue=False):

This seems to work on R730XD as of 158161b

(user report)

[stack@f13-h26-b08-5039ms ~]$ ~/.local/bin/badfish --host-list ./mgmt-interfaces -u quads -p rdu2@1425 -i ./badfish/config/idrac_interfaces.yml --boot-to-type foreman
[mgmt-f25-h11-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h19-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h13-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h21-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h07-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h23-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h15-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h17-000-r730xd] - ERROR    - Failed to communicate with server.
[mgmt-f25-h11-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h11-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h19-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h19-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h13-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h13-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h21-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h21-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h07-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h07-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h23-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h23-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h15-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h15-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h17-000-r730xd] - ERROR    - There was something wrong executing Badfish
[mgmt-f25-h17-000-r730xd] - INFO     - ************************************************
[badfish.badfish] - INFO     - RESULTS:
[badfish.badfish] - INFO     - mgmt-f25-h07-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h11-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h13-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h15-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h17-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h19-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h21-000-r730xd.rdu2.scalelab.example.com: FAILED
[badfish.badfish] - INFO     - mgmt-f25-h23-000-r730xd.rdu2.scalelab.example.com: FAILED
  • This works with a previous git checkout against an older commit
[stack@f13-h26-b08-5039ms badfish]$ git checkout 158161b3c721c6d52d68692f6def0c7a9058826c

[stack@f13-h26-b08-5039ms badfish]$ python3 setup.py build
running build
running build_py
copying src/badfish/badfish.py -> build/lib/badfish
[stack@f13-h26-b08-5039ms badfish]$ python3 setup.py install --prefix ~/.local


[stack@f13-h26-b08-5039ms ~]$ ~/.local/bin/badfish --host-list ./mgmt-interfaces -u quads -p rdu2@1425 -i ./badfish/config/idrac_interfaces.yml --boot-to-type foreman
[mgmt-f25-h23-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h23-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h23-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h23-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327126037398'].
[mgmt-f25-h23-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h23-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h11-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h11-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h21-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h21-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h23-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h15-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h15-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h13-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h13-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h07-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h07-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h19-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h19-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h17-000-r730xd] - INFO     - Executing actions on host: mgmt-f25-h17-000-r730xd.rdu2.scalelab.example.com
[mgmt-f25-h23-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h23-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h15-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h11-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h19-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h21-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h07-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h13-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h11-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125909328'].
[mgmt-f25-h21-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327155680004'].
[mgmt-f25-h17-000-r730xd] - WARNING  - iDRAC version installed does not support DellJobService
[mgmt-f25-h15-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125845896'].
[mgmt-f25-h07-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125927729'].
[mgmt-f25-h13-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125924071'].
[mgmt-f25-h19-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125904899'].
[mgmt-f25-h11-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h11-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h21-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h21-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h15-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h15-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h17-000-r730xd] - WARNING  - Clearing job queue for job IDs: ['JID_327125959350'].
[mgmt-f25-h07-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h07-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h13-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h13-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h19-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h19-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h17-000-r730xd] - INFO     - Job queue for iDRAC mgmt-f25-h17-000-r730xd.rdu2.scalelab.example.com successfully cleared.
[mgmt-f25-h15-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h11-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h07-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h13-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h19-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h21-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h17-000-r730xd] - INFO     - Command passed to set BIOS attribute pending values.
[mgmt-f25-h11-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h11-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h15-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h15-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h19-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h19-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h21-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h21-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h13-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h13-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h07-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h07-000-r730xd] - INFO     - ************************************************
[mgmt-f25-h17-000-r730xd] - INFO     - POST command passed to create target config job.
[mgmt-f25-h17-000-r730xd] - INFO     - ************************************************
[badfish.badfish] - INFO     - RESULTS:
[badfish.badfish] - INFO     - mgmt-f25-h07-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h11-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h13-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h15-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h17-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h19-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h21-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL
[badfish.badfish] - INFO     - mgmt-f25-h23-000-r730xd.rdu2.scalelab.example.com: SUCCESSFUL

Provide a way to input a list of hosts on which to operate

This is an RFE to provide feeding badfish a list of hosts to operate upon.

e.g.

./badfish.py -l hosts.txt  -u root -p yourpass -i config/idrac_interfaces.yml -t foreman --pxe

This will be useful for scale/perf tenants to operate on their hosts en-masse without having to use a for loop and sequential operation. The async library would be pretty sweet for this as Gonza stated.

Add a retry mode for command execution

This is an RFE for providing a retry mode argparse option. What this would look like is a -r or --retry flag that leverages the exit status of the command/action and retries until it succeeds.

By default badfish should just run once, but if -r or --retry is specified it will try either 10 times or until it succeeds then quit.

badfish.py -H mgmt-yourhost.com -u root -p password -i config/drac_interfaces.yml -t foreman --pxe --retry
  • There should be a sane default for retries (10?) before failing completely (we don't want it running forever)
  • There should be a sane default for timeout between tries (10 seconds?)
  • Number of retries should be configurable e.g. --retry 20 with a reasonable default substituted if not specified.

Why Is this Needed?

We want to replace our legacy Ansible playbooks that manage Dell interface boot order with badfish. This change is recommended for this. It will also let people use badfish in a more automated/utility fashion.

badfish not working reliably with default

I'm probably only missing something because it worked once and then it never did.

It's fresh clone of badfish with default config trying to configure boot_order on dell 630 which should be perfectly possible. Maybe there is need to retry something several times before it succeeds? (blind idea)

[root@c10-h30-r630 badfish]# python2 badfish.py -H mgmt-c10-h33-r630.fqdn -u quads -p password -t director -i config/idrac_interfaces.yml
- PASS: PATCH command passed to update boot order
- FAIL: POST command failed to create BIOS config job, status code is 400
{'cookies': <<class 'requests.cookies.RequestsCookieJar'>[]>, '_content': '', 'headers': {'Content-Length': '0', 'Accept-Ranges': 'bytes', 'Keep-Alive': 'timeout=60, max=199', 'Server': 'iDRAC/8', 'Connection': 'Keep-Alive', 'Cache-Control': 'no-cache', 'Date': 'Tue, 18 Dec 2018 21:51:44 GMT', 'OData-Version': '4.0', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/json;odata.metadata=minimal;charset=utf-8'}, 'url': u'https://mgmt-c10-h33-r630.fqdn/redfish/v1/Managers/iDRAC.Embedded.1/Jobs', 'status_code': 400, '_content_consumed': True, 'encoding': 'utf-8', 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7fd03465ffd0>, 'elapsed': datetime.timedelta(0, 5, 165930), 'raw': <urllib3.response.HTTPResponse object at 0x7fd034676850>, 'reason': 'Bad Request', '_next': None, 'history': []}
[root@c10-h30-r630 badfish]# curl -k -u quads:password https://mgmt-c10-h31-r630.fqdn/redfish/v1/Managers/iDRAC.Embedded.1/Jobs
{"@odata.context":"/redfish/v1/$metadata#DellJobCollection.DellJobCollection","@odata.id":"/redfish/v1/Managers/iDRAC.Embedded.1/Jobs","@odata.type":"#DellJobCollection.DellJobCollection","Description":"Collection of Job Instances","Id":"JobQueue","Members":[],"[email protected]":0,"Name":"JobQueue"}
[root@c10-h30-r630 badfish]# curl -k -u quads:password https://mgmt-c10-h31-r630.fqdn/redfish/v1/Managers/iDRAC.Embedded.1/Jobs | python -m json.tool

{
    "@odata.context": "/redfish/v1/$metadata#DellJobCollection.DellJobCollection",
    "@odata.id": "/redfish/v1/Managers/iDRAC.Embedded.1/Jobs",
    "@odata.type": "#DellJobCollection.DellJobCollection",
    "Description": "Collection of Job Instances",
    "Id": "JobQueue",
    "Members": [],
    "[email protected]": 0,
    "Name": "JobQueue"
}

TypeError: shield() got an unexpected keyword argument 'loop' in latest quay.io/quads/badfish

Attempting to run badfish from podman in rdu2 lab.

Describe the bug

magnolia :: /tmp » /bin/python --version
Python 3.9.7

magnolia :: /tmp » podman pull quay.io/quads/badfish

Trying to pull quay.io/quads/badfish:latest...
Getting image source signatures
Copying blob ba51967de001 done
Copying blob a0d0a0d46f8b done
Copying blob 5429bbf61688 done
Copying blob e57f56899fb2 done
Copying blob c60d7396f087 done
Copying blob 469e8efa9283 done
Copying blob 15c3741996eb done
Copying blob 401188f48cba done
Copying blob 726b4596459a done
Copying blob e3c3d190eb70 done
Copying blob 96be150d610d done
Copying config 87c2ddc5bb done
Writing manifest to image destination
Storing signatures
87c2ddc5bb28c2635752014d29a97d887bed56a1945c1d195d18ed5f2d070aee
magnolia :: /tmp » ansible-ipi-install]# podman run -it --rm --dns xx quay.io/quads/badfish -u XX -p XX -i config/idrac_interfaces.yml -H mgmt-fxx.com
^C
magnolia :: /tmp » podman run -it --rm --dns XX quay.io/quads/badfish -u quads -p XX -i config/idrac_interfaces.yml -H mgmt-fxxcom --power-state
Traceback (most recent call last):
File "/usr/local/bin/badfish", line 11, in
load_entry_point('badfish==1.0.0', 'console_scripts', 'badfish')()
File "/usr/local/lib/python3.10/site-packages/badfish-1.0.0-py3.10.egg/badfish/badfish.py", line 2175, in main
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/badfish-1.0.0-py3.10.egg/badfish/badfish.py", line 1830, in execute_badfish
File "/usr/local/lib/python3.10/site-packages/badfish-1.0.0-py3.10.egg/badfish/badfish.py", line 40, in badfish_factory
File "/usr/local/lib/python3.10/site-packages/badfish-1.0.0-py3.10.egg/badfish/badfish.py", line 70, in init
File "/usr/local/lib/python3.10/site-packages/badfish-1.0.0-py3.10.egg/badfish/badfish.py", line 431, in validate_credentials
File "/usr/local/lib/python3.10/site-packages/async_lru.py", line 237, in wrapped
return (yield from asyncio.shield(fut, loop=_loop))
TypeError: shield() got an unexpected keyword argument 'loop'

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Pull latest badfish from quay.io
  2. Attempt a power status command via podman using latest image

Expected Behavior
Badfish commands to work.

Logs / Screenshots

Additional Details
Seems the image on quay was recently updated

[BUG] Display proper error warning on supermicros unsupported actions

Describe the bug
When running --check-boot against a supermicro with no license, the following is displayed:
`

  • WARNING - Could not retrieve Bios Attributes. Assuming Bios.
    Traceback (most recent call last):
    File "./badfish.py", line 844, in
    sys.exit(main())
    File "./badfish.py", line 838, in main
    execute_badfish(host, args, logger)
    File "./badfish.py", line 770, in execute_badfish
    badfish.check_boot(interfaces_path)
    File "./badfish.py", line 602, in check_boot
    boot_devices = self.get_boot_devices()
    File "./badfish.py", line 147, in get_boot_devices
    data = _response.json()
    File "/home/grafuls/VirtualEnvs/badfish/lib/python3.7/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
    File "/usr/lib64/python3.7/json/init.py", line 348, in loads
    return _default_decoder.decode(s)
    File "/usr/lib64/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    File "/usr/lib64/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
%config Application.verbose_crash=True

`

To Reproduce
Steps to reproduce the behavior:

  1. Run badfish with --check-boot option against a 1029p
  2. See trace on stdout

Expected behavior
A clear and concise error description should be displayed on stdout instead of trace.

[RFE] Implement unmounting of ISO images

There might be some interest to both mount and unmount ISO images via the RedFish API, certainly at least removal of any mounted ISO images is already warranted in our own R&D usage.

For now let's implement checking and unmounting any virtual media / ISOs as a baseline check for validation and this can be expanded upon later to support mounting actual virtual media / ISO if there is demand.

Something like the following might be what we want initially:

badfish --check-iso
badfish --unmount-iso

We can extend this further to provide an ISO host and image file if it's useful.

This seems to be supported on SuperMicro here:

https://www.supermicro.com/manuals/other/RedfishRefGuide.pdf

/redfish/v1/Managers/1/VM1/CfgCD/Actions/IsoConfig.UnMount
7.24.2 Verify whether the ISO is mounted from Redfish command
URL: ${BMC_IP}/redfish/v1/Managers/1/VirtualMedia/CD[mounted_dev_num]
Method supported: Get
Body: LEAVE_IT_EMPTY
7.24.3 Unmount the ISO
URL:
${BMC_IP}/redfish/v1/Managers/1/VirtualMedia/CD[mounted_dev_num]/Actions/VirtualMedia.Eje
ctMedia
Method supported: Post
Body: {}

I'm still poking around for Dell redfish endpoint references but we should do the same thing there.

--reboot-only reboots, twice

I tried this while watching the DRAC console:

[bengland@bene-laptop badfish]$ ./badfish.py -H mgmt-e23-h31-740xd -u quads -p 492728 -i config/idrac_interfaces.yml --reboot-only
- INFO     - Systems service: /redfish/v1/Systems/System.Embedded.1.
- INFO     - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
- INFO     - Command passed to GracefulRestart server, code return is 204.
- INFO     - Polling for host state: Off
  Polling: [>                   ] 0% - Host state: On  
  Polling: [>                   ] 7% - Host state: On  
  Polling: [-->                 ] 13% - Host state: On  
  Polling: [--->                ] 20% - Host state: On  
  Polling: [---->               ] 27% - Host state: On  
  Polling: [------>             ] 33% - Host state: On  
  Polling: [------->            ] 40% - Host state: On  
  Polling: [-------->           ] 47% - Host state: On  
  Polling: [---------->         ] 53% - Host state: On  
  Polling: [----------->        ] 60% - Host state: On  
  Polling: [------------>       ] 67% - Host state: On  
  Polling: [-------------->     ] 73% - Host state: On  
  Polling: [--------------->    ] 80% - Host state: On  
  Polling: [---------------->   ] 87% - Host state: On  
  Polling: [------------------> ] 93% - Host state: On  
- WARNING  - Unable to graceful shutdown the server, will perform forced shutdown now.

<at this point the server had already rebooted, and this forced a 2nd reboot>

- INFO     - Command passed to ForceOff server, code return is 204.
- INFO     - Polling for host state: Not Down
  Polling: [------------------->] 100% - Host state: Off
- INFO     - Command passed to On server, code return is 204.

Why did badfish.py conclude that the server hadn't rebooted?

Enhancements to One-Shot Boot

This is an RFE for some of the things we've discussed recently:

  • Allow a one-shot boot option that does not require the interface string itself like --boot-to does, where you can specify something like foreman or director that would map to an entry in idrac_interfaces.yml

e.g.

badfish.py -H mgmt-your-server.example.com -u root -p yourpass --boot-to foreman
  • If an IPMI FQDN is used it will use this over the default entries for either foreman or director

e.g. if I had the following in idrac_interfaces.yml it would use that custom entry instead

foreman_mgmt-server.example.com_interfaces: NIC.Integrated.1-3-1,NIC.Integrated.1-2-1

This would help in cases where vendor boot interface orders may not always match across a common system type (hasn't happened to us yet but could be useful in other environments) or generally lets people set their own as needed.

Setting director-style interfaces fails

Your System Details
not relevant

  • Python Version:
    Python 3.10.2

  • Operating System:
    Fedora 35

  • Target System Type: (e.g. Dell, SuperMicro)
    Dell fc640

  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)
    Version 4.22.00.53(Build 03)

Describe the bug
Setting the director-style interface order on a fc-640 returns an error:

- ERROR    - Couldn't find a valid key defined on the interfaces yaml: uefi_fc640_b04_interfaces

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Clone the badfish repo
  2. Run the command src/badfish/badfish.py -u quads -p <redacted> -i config/idrac_interfaces.yml -H mgmt-e22-h12-b04-fc640.rdu2.scalelab.redhat.com -t director
  3. Check the output: - ERROR - Couldn't find a valid key defined on the interfaces yaml: uefi_fc640_b04_interfaces

Expected Behavior
No error message and the interface order set to director-style

Logs / Screenshots

t14:/tmp/badfish$ git clone https://github.com/redhat-performance/badfish.git
Cloning into 'badfish'...
remote: Enumerating objects: 1698, done.
remote: Counting objects: 100% (627/627), done.
remote: Compressing objects: 100% (422/422), done.
remote: Total 1698 (delta 341), reused 387 (delta 173), pack-reused 1071
Receiving objects: 100% (1698/1698), 487.17 KiB | 1.32 MiB/s, done.
Resolving deltas: 100% (892/892), done.
t14:/tmp/badfish$ cd badfish/
t14:/tmp/badfish/badfish (master)$ ls
config  CONTRIBUTING.md  Dockerfile  Dockerfile_dev  image  LICENSE  README.md  requirements.txt  rpm  setup.py  src  tests  tox.ini
t14:/tmp/badfish/badfish (master)$ src/badfish/badfish.py -u quads -p <redacted> -i config/idrac_interfaces.yml -H mgmt-e22-h12-b04-fc640.rdu2.scalelab.redhat.com -t director
- ERROR    - Couldn't find a valid key defined on the interfaces yaml: uefi_fc640_b04_interfaces

[Bug] get_job_status endless loop

Describe the bug
When enforcing a boot order and after creating the config job we are querying for job status to be completed. This jobs might get stuck and badfish gets into an endless loop state.

To Reproduce
Steps to reproduce the behavior:

  1. Enforce a new boot order via badfish on a host with an active bios config job.
  2. See endless loop

Expected behavior
Timeout after several retries.

unable to set boot mode to uefi on 740xd's

I tried setting the boot mode to uefi but it fails with boot mode not found
(.venv) [asyedham@perfc-360g8-04 badfish]$ python3 badfish.py -H mgmt-e24-h37-740xd -u $user -p $password --set-bios-attribute --attribute Bootmode --value Uefi

  • WARNING - Could not retrieve Bios Attributes.
  • ERROR - Bootmode not found. Please check attribute name.
  • ERROR - Attribute not found

Boot order changing ends with exception.

I'm trying to change boot order on dell r620 but ends up with exception.

[root@centos7 badfish]# ~/git/badfish/badfish.py -u quads -p password -H mgmt-b08-h03-r620.fqdn --check-boot
Current boot order:
1: NIC.Integrated.1-3-1
2: HardDisk.List.1-1
3: NIC.Slot.2-4
4: NIC.Slot.2-1
5: NIC.Slot.2-2
6: NIC.Slot.2-3
[root@centos7 badfish]# ~/git/badfish/badfish.py -u quads -p password -H mgmt-b08-h03-r620.fqdn --boot-to NIC.Slot.2-4
- PASS: Command passed to set BIOS attribute pending values
- PASS: status code 204 returned for POST command to reset iDRAC
- WARNING, iDRAC will now reset and be back online within a few minutes.
- FAIL: POST command failed to create BIOS config job, status code is 401
{'cookies': <<class 'requests.cookies.RequestsCookieJar'>[]>, '_content': '', 'headers': {'transfer-encoding': 'chunked', 'accept-ranges': 'bytes', 'keep-alive': 'timeout=60, max=199', 'connection': 'Keep-Alive', 'date': 'Tue, 29 Jan 2019 18:01:08 GMT', 'www-authenticate': 'Basic realm="RedfishService"'}, 'url': u'https://mgmt-b08-h03-r620.fqdn/redfish/v1/Managers/iDRAC.Embedded.1/Jobs', 'status_code': 401, '_content_consumed': True, 'encoding': None, 'request': <PreparedRequest [POST]>, 'connection': <requests.adapters.HTTPAdapter object at 0x7fcd2300cf90>, 'elapsed': datetime.timedelta(0, 9, 36103), 'raw': <requests.packages.urllib3.response.HTTPResponse object at 0x7fcd2301fd10>, 'reason': 'Unauthorized', 'history': []}
Traceback (most recent call last):
  File "/root/git/badfish/badfish.py", line 449, in <module>
    sys.exit(main())
  File "/root/git/badfish/badfish.py", line 419, in main
    badfish.reboot_server()
  File "/root/git/badfish/badfish.py", line 204, in reboot_server
    data = _response.json()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 802, in json
    return json.loads(self.text, **kwargs)
  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

It seems to reboot drac but boot order is unchanged once it comes back.

Originally posted by @misacek007 in #8 (comment)

NIC.Slot.7-1-1 and NIC.Slot.7-2-1 are not valid boot devices

Your System Details
Alias e24-h17-740xd managing mgmt-e23-h35-740xd with badfish

  • Python Version: python36-3.6.8-2.module+el8.1.0+3334+5cb623d7.x86_64
  • Operating System: RHEL8.1
  • Target System Type: (e.g. Dell, SuperMicro) Dell 74xd
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 3.34.34.34)

badfish now forbids using a 25-GbE interface as a boot device. It always allowed this before (I only noticed this a few days ago). And that allows me to get both 25-GbE ports into use by Ceph. I have to do it this way AFAICT.

Worse yet, it proceeds on even though it cannot possibly fulfill my request. This ties my system up in the BIOS for 20 min.

I will create a branch of badfish that allows this and submit a PR. That way I can continue to use ocp4_upi_baremetal procedure to install OCS on baremetal.

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. clone badfish from github
  2. edit in the idrac_interfaces.yml record below and take out the other "director" records (not needed).
  3. Run it:

[root@e24-h17-740xd config]# /usr/bin/python3 ./badfish/badfish.py -u quads -p 503399 -i ./badfish/config/idrac_interfaces.yml -H mgmt-e23-h35-740xd.alias.bos.scalelab.redhat.com -t director

  • INFO - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • INFO - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
  • WARNING - Job queue already cleared for iDRAC mgmt-e23-h35-740xd.alias.bos.scalelab.redhat.com, DELETE command will not execute.
  • WARNING - Waiting for host to be up.
  • INFO - Polling for host state: On
    Polling: [------------------->] 100% - Host state: On
  • WARNING - Some interfaces are not valid boot devices. Ignoring: NIC.Slot.7-2-1, NIC.Slot.7-1-1
  • INFO - PATCH command passed to update boot order.
  • INFO - POST command passed to create target config job.
  • INFO - JID_845692066060 job ID successfully created.
  • INFO - Command passed to check job status, code 200 returned.
  • WARNING - JobStatus not scheduled, current status is: New.
  • INFO - Command passed to check job status, code 200 returned.
  • INFO - Job id JID_845692066060 successfully scheduled.
  • INFO - Command passed to ForceOff server, code return is 204.
  • INFO - Polling for host state: Not Down
    Polling: [------------------->] 100% - Host state: Off
  • INFO - Command passed to On server, code return is 204.

Expected Behavior
A clear and concise description of what you expected to happen.

First, immediately reject the request if you have no plan to implement it as specified in idrac_interfaces.yml.

I want to be able to use this record in my idrac_interfaces.yml:

+director_740xd_interfaces: NIC.Slot.7-2-1,NIC.Integrated.1-1-1,NIC.Integrated.1-3-1,NIC.Slot.7-1-1,HardDisk.List.1-1,NIC.Integrated.1-2-1

`helpers` module error

Your System Details

  • Python Version: 3.9.0
  • Operating System: Fedora 33 (also happens in RHEL 8.4)
  • Target System Type: Dell 740xd
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)

Describe the bug
Running any badfish command lead to missing module failure, helpers modules not found.

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Download f068aa (current master branch) source code - git clone https://github.com/redhat-performance/badfish
  2. Ensure all required pip packages are installed - pip3 install -r requirements.txt
  3. Run any Badfish command - ./src/badfish/badfish.py -u quads -p password -i config/idrac_interfaces.yml -H mgmt-e26-h25-740xd.alias.bos.scalelab.redhat.com --clear-jobs --force
  4. It failed with this error
Traceback (most recent call last):
  File "/root/badfish/./src/badfish/badfish.py", line 23, in <module>
    from helpers.async_lru import alru_cache
ModuleNotFoundError: No module named 'helpers'
  1. Manually installed helpers module - pip3 install helpers
  2. Ran the same badfish command, got another error

Expected Behavior
To work as normal :) (It was working last Wednesday)
Logs / Screenshots

Traceback (most recent call last):
  File "/root/badfish/./src/badfish/badfish.py", line 23, in <module>
    from helpers.async_lru import alru_cache
ModuleNotFoundError: No module named 'helpers.async_lru'

Additional Details
Add any other context or details about the problem here.

cc: @mkarg75

--reboot-only flag not working for 740xd

When trying to reboot a target host (740xd in alias) I can see this error. But the same command worked for dell R630's

(.venv) [root@e23-h15-740xd badfish]#python3 badfish.py -H mgmt-e23-h21-740xd.alias  -u quads -p $password  --reboot-only
- ERROR    - Command failed to GracefulRestart server, status code is: 400.
- WARNING  - Unable to complete the operation because the value GracefulRestart entered for the property ResetType is not in the list of acceptable values.
- ERROR    - There was something wrong executing Badfish.

--check-boot aborts with KeyError: 'foreman_740xd_h15_interfaces'

Your System Details

  • Python Version: 3
  • Operating System: RHEL8.0
  • Target System Type: (e.g. Dell, SuperMicro)
    Dell 740xd in Alias Lab
  • IPMI / Out-of-band Firmware Version: (_e.g. iDRAC 8 2.60.60.60)
    iDRAC version 3.34.34.34

Describe the bug
A clear and concise description of what the bug is.

To Reproduce / What were you Doing?
Steps to reproduce the behavior:

  1. Go to '...' e24-h13-740xd.alias.bos.scalelab.redhat.com

  2. git clone https://github.com/redhat-performance/badfish

  3. run this command:
    /usr/bin/python3 ./badfish/badfish.py -u quads -p 502901 -i ./badfish/config/idrac_interfaces.yml -H mgmt-e24-h15-740xd.alias.bos.scalelab.redhat.com --check-boot

  4. See error

  • INFO - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • INFO - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
    Traceback (most recent call last):
    File "./badfish/badfish.py", line 949, in
    sys.exit(main())
    File "./badfish/badfish.py", line 943, in main
    execute_badfish(host, args, logger)
    File "./badfish/badfish.py", line 866, in execute_badfish
    badfish.check_boot(interfaces_path)
    File "./badfish/badfish.py", line 683, in check_boot
    _host_type = self.get_host_type(interfaces_path)
    File "./badfish/badfish.py", line 229, in get_host_type
    interfaces = definitions["%s
    %s_interfaces" % (_host, host_model)].split(",")
    KeyError: 'foreman_740xd_h15_interfaces'

Expected Behavior

[root@e24-h13-740xd ~]# cd badfish
[root@e24-h13-740xd badfish]# git checkout -b March4
f9598fc
M config/idrac_interfaces.yml
Switched to a new branch 'March4'

[root@e24-h13-740xd badfish]# /usr/bin/python3 ~/badfish/badfish.py -u quads -p 502901 -i ~/badfish/config/idrac_interfaces.yml -H mgmt-e24-h15-740xd.alias.bos.scalelab.redhat.com --check-boot

  • INFO - Systems service: /redfish/v1/Systems/System.Embedded.1.
  • INFO - Managers service: /redfish/v1/Managers/iDRAC.Embedded.1.
  • WARNING - Current boot order is set to: foreman.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.