theforeman / foreman_maintain Goto Github PK

View Code? Open in Web Editor NEW

53.0 9.0 71.0 1.86 MB

The Foreman/Satellite maintenance tool

Home Page: https://projects.theforeman.org/projects/foreman-maintain/issues

License: GNU General Public License v3.0

Ruby 99.38% Shell 0.12% HTML 0.01% Python 0.49%

hacktoberfest

foreman_maintain's Introduction

Foreman Maintain

The foreman_maintain aims to provide various features that helps keep the Foreman/Satellite up and running. It supports multiple versions and subparts of the Foreman infrastructure, including server or smart proxy and is smart enough to provide the right tools for the specific version.

Usage

Subcommands:
    health                        Health related commands
      list                          List the checks based on criteria
      list-tags                     List the tags to use for filtering checks
      check                         Run the health checks against the system
        --label label               Run only a specific check with a label
        --tags tags                 Run only those with all specific set of tags

    upgrade                       Upgrade related commands
      check --target-version TARGET_VERSION   Run pre-upgrade checks for upgrading to specified version
            --disable-self-upgrade            Disable automatic self upgrade (default: false)
      run --target-version TARGET_VERSION     Run the full upgrade
          [--phase=phase TARGET_VERSION]      Run just a specific phase of the upgrade
          --disable-self-upgrade              Disable automatic self upgrade (default: false)

    advanced                      Advanced tools for server maintenance
      procedure                     Run maintain procedures manually
        run                           Run maintain procedures manually
        by-tag                        Run maintain procedures in bulks

    service                       Control applicable services
      start                         Start applicable services
      stop                          Stop applicable services
      restart                       Restart applicable services
      status                        Get statuses of applicable services
      list                          List applicable services
      enable                        Enable applicable services
      disable                       Disable applicable services

    backup                        Backup server
      online                        Keep services online during backup
      offline                       Shut down services to preserve consistent backup

    restore                       Restore a backup

    maintenance-mode              Control maintenance-mode for application
      start                         Start maintenance-mode
      stop                          Stop maintenance-mode
      status                        Get maintenance-mode status
      is-enabled                    Get maintenance-mode status code

Upgrades

Foreman-maintain implements upgrade tooling that helps the administrator to go through the upgrade process.

The foreman-maintain tool is intended to self upgrade itself to the next major version of the project. This is needed before upgrading, run:

foreman-maintain self-upgrade

To perform just the pre-upgrade checks for the system, run:

foreman-maintain upgrade check --target-version TARGET_VERSION

The upgrade tooling is able to handle the full end-to-end upgrade via:

foreman-maintain upgrade run --target-version TARGET_VERSION

The upgrade is split into several phases with different level of impact the run of the system:

pre-upgrade check - this phase performs the checks to ensure that the system is in ready state before the upgrade. The system should still be operational at the current version, while this phase runs.
pre-migrations - these steps perform changes on the system before the actual upgrade starts. An example is disabling access to the system from external sources, a.k.a. maintenance mode or disabling Katello sync plans during the run.

After this phase ends, the system is still running the old version, and it's possible to revert the changes by running the post-migrations steps.
migrations - this phase performs the actual migrations, starting with configuring new repositories, updating the packages and running the installer.

At the end of this phase, the system should be fully migrated to the new version. However, the system is not fully operational yet, as the post-migrations steps need to revert the pre-migrations steps.
post-migrations - these steps revert the changes made in pre-migrations phase, turning the system into fully-operational again.
post-upgrade checks - these steps should perform sanity check of the system to ensure the system is valid and ready to be used again.

The state of the upgrade is kept between runs, allowing to re-run the upgrade run in case of failure. The tool should start at the appropriate point. For example, in case the upgrade is already in migrations phase, there is no point in running the pre-upgrade check phase. In case the upgrade failed before migrations phase made some modifying changes, the tool tries to rollback to the previous state of the system.

Self-upgrade for rubygem-foreman_maintain package

When a user runs any foreman-maintain upgrade sub commands (e.g. foreman-maintain upgrade check or foreman-maintain upgrade run) then,

If update available for rubygem-foreman_maintain package, the sub command tries to update this package. After successful package update, it returns the exit code 75 and requests user to re-run with the updated source code.

Here, exit code (value 75) is to indicate that it can not continue with further execution & needs re-run. e.g.,
```
# foreman-maintain upgrade check --target-version TARGET_VERSION
Checking for new version of foreman-maintain...
rubygem-foreman_maintain.noarch   repository

Updating foreman-maintain package.

The foreman-maintain package successfully updated.

Re-run foreman-maintain with required options!

# echo $?
75
```
If update is not available for rubygem-foreman_maintain package, then sub command simply executes the further steps without halt.
If user wants to skip self-update mechanism then --disable-self-upgrade flag can be used with upgrade sub commands. e.g.,

# foreman-maintain upgrade check --target-version TARGET_VERSION --disable-self-upgrade
# foreman-maintain upgrade run --target-version TARGET_VERSION --disable-self-upgrade

Satellite notes

To use custom organization/activation key for configuring repositories during upgrade, set the following environment variables

export EXTERNAL_SAT_ORG='Sat6-CI'
export EXTERNAL_SAT_ACTIVATION_KEY='Satellite QA RHEL7'

Implementation

The foreman_maintain maps the CLI commands into definitions. This allows to keep the set of the commands the user needs to know immutable from version-specific changes. The mapping between the CLI commands and definitions is made by defining various metadata.

Definitions

There are various kinds of definitions possible:

Features - aspects that can be present on the system. It can be service (foreman, foreman-proxy), a feature (some Foreman plugin), a link to external systems (e.g. registered foreman proxy, compute resource) or another aspect that can be subject of health checks and maintenance procedures.
Checks - definitions of health checks to indicate health of the system against the present features
Procedures - steps for performing specific operations on the system
Scenarios - combinations of checks and procedures to achieve some goal

The definitions for this components are present in definitions folder.

Features

Before foreman_maintain starts, it takes the set of features definition and determines their presence by running their confine blocks against the system.

The confine block can run an external command to check if the feature is there, or it can check present of other features.

A feature can define additional methods that can be used across other definitions.

class Features::Foreman < ForemanMaintain::Feature
  metadata do
    label :foreman

    confine do
      check_min_version('foreman', '1.7')
    end
  end

  # helper method that can be used in other definitions like this:
  #
  #   feature(:foreman).running?
  def running?
    execute?('systemctl foreman status')
  end
end

The features can inherit from each other, which allows overriding methods for older versions, when newer version of the feature is present in the system. This way, we shield the other definitions (checks, procedures, scenarios) from version-specific nuances.

Checks

Checks define assertions to determine status of the system.

class Checks::ForemanIsRunning < ForemanMaintain::Check
  metadata do
    for_feature :foreman
    description 'check foreman service is running'
    tags :default
  end

  def run
    # we are using methods of a feature.
    # we can define additional steps to be executed as a follow-up
    # of assertion failure
    assert(feature(:foreman).running?
           'There are currently paused tasks in the system'),
           :next_steps => Procedures::ForemanStart.new)
  end
end

Similarly as features, also checks (and in fact all definitions) can used label, description confine and tags keyword to describe themselves.

Every definition has a label (if not stated explicitly, it's determined from the class name).

In case some operation take more time, it's possible to enable a spinner and update the spinner continuously with with_spinner method.

def run
  with_spinner do |spinner|
    spinner.update 'checking foreman is running'
    if feature(:foreman).running?
      spinner.update 'foreman is not started, starting'
      feature(:foreman).start
    else
      spinner.update 'foreman is started, restarting'
    end
  end
end

Procedures

Procedure defines some operation that can be performed against the system. It can be part of a scenario or be linked from a check as a remediation step.

class Procedures::ForemanStart < ForemanMaintain::Procedure
  metadata do
    for_feature :foreman
    description 'start foreman service'
  end

  def run
    feature(:foreman).start
  end
end

Preparation steps

Some steps can require some additional steps to be performed before we can proceed. A typical example is installing additional dependencies. A preparation step is usually a procedure.

class Procedures::InstallPackage < ForemanMaintain::Procedure
  metadata do
    # definitions of parameters of the procedure
    param :packages, 'List of packages to install', :array => true
  end

  def run
    packages_action(:install, @packages)
  end

  # if false, the step will be considered as done: it will not be executed
  def necessary?
    @packages.any? { |package| package_version(package).nil? }
  end

  def description
    "Install package(s) #{@packages.join(', ')}"
  end
end

class Checks::DiskIO < ForemanMaintain::Check
  metadata do
    description 'check foreman service is running'
    preparation_steps { Procedures::InstallPackage.new(:packages => %w[fio]) }
  end

  def run
    execute!('fio ...')
  end
end

When running a scenario, all the preparation steps in that scenario will be collected, and run if necessary (the necessary? method returning true). The preparation steps will be run as separate scenario.

Scenarios

Scenarios represent a composition of various steps (checks and procedures) to achieve some complex maintenance operation in the system (such as upgrade).

class Scenarios::PreUpgradeCheckForeman_1_14 < ForemanMaintain::Scenario
  metadata do
    description 'checks before upgrading to Foreman 1.14'
    confine do
      feature(:upstream)
    end
    tags :pre_upgrade_check
  end

  # Method to be called when composing the steps of the scenario
  def compose
    # we can search for the checks by metadata
    steps.concat(find_checks(:default))
  end
end

Hammer

In some cases, it's useful to be able to use the hammer as part of check/fix procedures. It is as simple as:

def run
  feature(:hammer).run('task resume')
end

Before executing the command the feature checks if it has valid hammer configuration to run the command. Foreman maintain always use the 'admin' account to run the commands. The password is taken from the hammer config or installer answer files or asked from the user interactively (in this order). The valid credentials are stored and reused next time if still valid.

Usually we want to do the user interaction at the beginning of our scenario. The easiest way to achieve this is to include ForemanMaintain::Concerns::Hammer module:

include ForemanMaintain::Concerns::Hammer

which adds Procedures::HammerSetup as a preparation step to your metadata. We are adding this to all procedures and checks automatically.

Metadata

A set of data that describes and gives information about any definition.

You can describe a definition using following methods available in metadata:

label - specify an unique name per definition
tags - comma separated labels attached for the purpose of creating groups of definitions
description - specify short description about definition
param - declare parameters for a definition using this method
for_feature - specify feature name for a definition. It implicitly confines the presence of that feature.
preparation_steps - takes block using which you can perform additional steps before executing actual definition
confine - takes block as argument to restrict execution of it
advanced_run - takes a boolean value for procedure definition & will restrict execution of procedure from advanced procedure run sub-command
before, after - methods used to define order for particular check. Specify label of other check.

Implementation components

In order to process the definitions, there are other components present in the lib directory.

Detector - searches the checks/procedures/scenarios based on metadata & available features
Runner - executes the scenario
Reporter - reports the results of the run. It's possible to define multiple reporters, based on the current use case (CLI, reporting to monitoring tool)
Cli - Clamp-based command line infrastructure, mapping the definitions to user commands.

Testing

Since a single version of foreman_maintain is meant to be used against multiple versions and components combinations, the testing is a crucial part of the process.

There are multiple kind of tests foreman_maintain:

unit tests for implementation components - can be found in test/lib
- this tests are independent of the real-world definitions and are focused on the internal implementation (metadata definitions, features detection)
unit tests for definitions - can be found in test/definitions
- this tests are focusing on testing of the code in definitions directory. There is an infrastructure to simulate various combinations of features without needing for actually having them present for development
bats test - TBD
- to achieve stability, we also want to include bats tests as part of the infrastructure, perhaps in combination with ansible playbooks to make the testing against real-world instances as easy as possible.

Execute rake to run the tests.

Bash completion

The completion offers suggestion of possible command-line subcommands and their options as usual. It can also suggest values for options and params where file or directory path is expected.

Bash completion is automatically installed by RPM. To use it for development setup cp ./config/foreman-maintain.completion /etc/bash_completion.d/foreman-maintain and load it to the current shell source /etc/bash_completion.d/foreman-maintain. Make sure the $PWD/bin is in PATH or there is full path to foreman-maintain-complete executable specified in /etc/bash_completion.d/foreman-maintain.

Bash completion for foreman-maintain needs pre-built cache that holds description of all subcommands and its parameters. The cache is located by default in ~/.cache/foreman_maintain_completion.yml. The location can be changed in foreman-maintain's config file. The cache can be built manually with foreman-maintain advanced prebuild-bash-completion or is built automatically when completion is used and the cache is missing (this may cause slight delay). The cache expires after installer scenario answer file changed (it indicates that the features on the instance may have changed which has impact on foreman-maintain CLI options and subcommands).

Available value types

Completion of values is dependent on CLI option and parameter settings, e.g.:

  parameter 'BACKUP_DIR', 'Path to backup dir', :completion => { :type => :directory }

Possible options for the :completion attribute are:

{ :type => :flag } option has no value, default for flags
{ :type => :value } option has value of unknown type, no suggestions for the value, default
{ :type => :directory } value is directory, suggestions follow directory structure
{ :type => :file, :filter => '\.txt$' } value is file, suggestions follow directory structure, optional :filter is regexp to filter the results.

Difference between maintenance-mode status and is-enabled:

maintenance-mode status gives a brief output with On/Off message. This includes status of each step.
maintenance-mode is-enabled returns 0 or 1 output depending upon the maintenance-mode status. Here, 0=ON & 1=OFF.

If users would like to check whether maintenance-mode is ON/OFF on system in their external script then they can use subcommand foreman-maintain maintenance-mode is-enabled.

Exit codes with special meanings -

Every command returns an exit code. Any other exit status than 0 indicates a failure of some kind. Foreman Maintain uses following exit codes with special meaning.

Exit Code	Description
75	Temporary failure and needs re-run
78	Command executed with warning(s)

How to contribute?

Generally, follow the Foreman guidelines. For code-related contributions, fork this project and send a pull request with all changes. Some things to keep in mind:

Follow the rules about commit message style and create a Redmine issue. Doing this right will help reviewers to get your contribution merged faster.
We have a development handbook to help developers understand how Foreman developers code.
All of our pull requests run the full test suite in our Travis CI system. Please include tests in your pull requests for any additions or changes in functionality

License

This project is licensed under the GPLv3+

foreman_maintain's People

Contributors

Stargazers

Watchers

Forkers

swapab san7ket gnurag kgaikwad lzap mbacovsky inecas mccun934 amitkarsale ntkathole ahumbe upadhyeammit johnpmitsch ares saper karli-sjoberg xprazak2 sean797 cfouant adamruzicka treydock ogajduse alda519 pondrejk abradshaw atix-ag pjgunst pmoravec patilsuraj767 jameerpathan111 jturel wbclark xbytez deathowl ofedoren rohan21lobo ianballou waldirio ehelms akshay196 jeremylenz martijndegouw swadeley riffraff169 jcotton1123 jlsherrill anandkumaragrawal sufanek1 amarmh gauravtalreja1 jjeffers pipopopo evgeni ekohl ezr-ondrej maccelf shubhamsg199 sjha4 jcpunk sayan3296 stejskalleos jpasqualetto majamassarini odilhao lpramuk martin-schlossarek archanaserver griffin-sullivan

foreman_maintain's Issues

Users should only be presented with whitelisted steps when the step has been explicitly marked as skippable.

Migrated from https://projects.theforeman.org/issues/36264

Clean all QPID queues to zero

Introduce configuration file for foreman-maintain

put it in config/foremain_maintain.yml (should be in `.gitignore)
have config/foreman_maintain.yml.example in the repository
expose all configuration options (as stated in https://github.com/iNecas/foreman_maintain/blob/ce490caa16cfe2525a8406ddff73fdaf869ec4de/lib/foreman_maintain/config.rb) there (especially configuration of log directory).

How to: persistence test - test SQL queries

Take inspiration from dF? https://github.com/Dynflow/dynflow/blob/master/test/persistence_test.rb

Or maybe something simpler where we can mimic katello/foreman database into the test environment, pump in data and query accordingly.

Cannot setup foreman_maintain, complains about foreman-maintain-hammer.yml not found

Looking at the https://github.com/iNecas/foreman_maintain/tree/master/config , I can see just two config files foreman_maintain.yml.example,hammer.yml.example.
When i run the foreman-maintain have the setting configured


./foreman-maintain health check
 
Running checks with tags [default]
================================================================================
check for paused tasks:                                               [FAIL]
There are currently 2 paused tasks in the system
--------------------------------------------------------------------------------
There are multiple steps to proceed:
1) resume paused tasks
2) investigate the tasks via UI
Select step to continue, [n(next), q(quit)] 1
resume paused tasks:                                                  [OK]
Error: Custom configuration file /root/foreman_maintain/config/foreman-maintain-hammer.yml does not exist.
--------------------------------------------------------------------------------
Rerunning the check after fix procedure
check for paused tasks:                                               [FAIL]
There are currently 2 paused tasks in the system

Am I missing something?

Command: Wipe capsule and start over

Pulp does not handle broken repositories well, it just reports errors like "Will not create a symlink to a non-existent source" but a repo will never start working again. We need a command that will wipe all capsule package data (and mongo) and starts again:

https://bugzilla.redhat.com/show_bug.cgi?id=1276911#c7

allow to enter multiple tags for check filtering

foreman-maintain health check --help command gives two options:

--label label                 Limit only for a specific label. (Use "list" command to see available labels)
--tags tags                   Limit only for specific set of labels. (Use list-tags command to see available

Lets say, tags = [default, pre-upgrade, post-upgrade]
I have tried below commands to apply default & pre-upgrade tags for filtering checks which is not working.

foreman-maintain health list --tags default --tags pre-upgrade -> it is overriding default tag value with pre-upgrade.
foreman-maintain health list --tags default,pre-upgrade -> no any output
foreman-maintain health list --tags default, pre-upgrade -> ERROR too many arguments.

Is there any way to apply multiple tags to filter checks

Offer to delete the tasks that are not valid any more

Offer to install required packages

Inspired by: https://github.com/iNecas/foreman_maintain/issues/11#issuecomment-284317669

This could be a procedure, executed as a check before every scenario?

Check: Verify self registration

It would be good to create a check that searches for katello-ca-consumer-*.rpm installed and compares the hostname in the RPM package with hostname of the system.

If they are same, I'd issue WARNING: This system is self-registered.

Server can be sometimes self-registered, therefore this is not an ERROR but WARNING. We've encountered an issue when Capsule was self-registered and we did not notice that, it took lot of time to figure out. A warning could should this immediately.

Tool: Content view filter repoclosure

This command finds all dependencies of a package for a pariticular repo, might be useful tool for all satellite versions:

PACKAGE=foreman
REPO=rhel-server-rhscl-7-rpms
repoquery --requires --resolve --recursive --qf "%{repoid}:%{name}" $PACKAGE | grep $REPO | cut -d: -f2 | sort -u

devtoolset-3-elfutils-libelf
devtoolset-4-elfutils-libelf
devtoolset-4-runtime
devtoolset-6-elfutils-libelf
devtoolset-6-runtime
python27-python
python27-python-libs
python27-runtime
ruby193-ruby
ruby193-rubygem-actionmailer
ruby193-rubygem-actionpack
ruby193-rubygem-activemodel
ruby193-rubygem-activerecord
ruby193-rubygem-activeresource
ruby193-rubygem-activesupport
ruby193-rubygem-arel
ruby193-rubygem-bigdecimal
ruby193-rubygem-builder
ruby193-rubygem-bundler
ruby193-rubygem-diff-lcs
ruby193-rubygem-erubis
ruby193-rubygem-hike
ruby193-rubygem-i18n
ruby193-rubygem-io-console
ruby193-rubygem-journey
ruby193-rubygem-json
ruby193-rubygem-mail
ruby193-rubygem-mime-types
ruby193-rubygem-minitest
ruby193-rubygem-multi_json
ruby193-rubygem-net-http-persistent
ruby193-rubygem-polyglot
ruby193-rubygem-rack
ruby193-rubygem-rack-cache
ruby193-rubygem-rack-ssl
ruby193-rubygem-rack-test
ruby193-rubygem-rails
ruby193-rubygem-railties
ruby193-rubygem-rake
ruby193-rubygem-rdoc
ruby193-rubygem-ref
ruby193-rubygems
ruby193-rubygem-sprockets
ruby193-rubygem-therubyracer
ruby193-rubygem-thor
ruby193-rubygem-tilt
ruby193-rubygem-treetop
ruby193-rubygem-tzinfo
ruby193-ruby-irb
ruby193-ruby-libs
ruby193-runtime
v8314-runtime
v8314-v8

Option to 'skip' or 'continue' the current check

Change 'basic' tag to 'default' : command `health check`

@iNecas If I remember, we had a discussion about changing the default tag for health check command from basic to pre_upgrade.
If this holds true, I'll send a PR against this issue.

Following is health check list:

sat62 :: ~/foreman_maintain ‹master*› » ./bin/foreman-maintain health list
[disk-io] Check for recommended disk speed of pulp, mongodb, pgsql dir. [basic]
[foreman-tasks-not-paused] check for paused tasks                       [basic]
[foreman-tasks-not-running] check for running tasks                     [pre-upgrade]

Right now only 1 check with pre-upgrade tag, should we assign rest of the checks to pre-upgrade tag?

Command: dhcp list/add/remove reservation

Hello,

I wrote and tested these commands on both Server and Capsule versions 6.2 and 6.3. It would be nice idea to write a small script/command to allow users to add/list/delete DHCP reservations:

foreman-maintain dhcp list-reservations [subnet]
foreman-maintain dhcp add-reservation [subnet] [ip] [mac] [name]
foreman-maintain dhcp remove-reservation [subnet] [ip]

LIST OF RESERVATIONS FOR NETWORK 192.168.220.0

curl -ks --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem --cacert /etc/foreman/proxy_ca.pem https://$(hostname):9090/dhcp/192.168.220.0 | json_reformat

ADD RESERVATION FOR 52:51:00:aa:bb:cc IP 192.168.220.201 AND NAME one.nested.lan

curl -ks --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem --cacert /etc/foreman/proxy_ca.pem -X POST -d '' "https://$(hostname):9090/dhcp/192.168.220.0?ip=192.168.220.201&mac=52:51:00:aa:bb:cc&name=one.nested.lan"

DELETE RESERVATION IP 192.168.220.201

curl -ks --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem --cacert /etc/foreman/proxy_ca.pem -X DELETE "https://$(hostname):9090/dhcp/192.168.220.0/192.168.220.201"

On Capsule server the path for certification files is different, the rest is the same:

curl -ks --cert /etc/foreman-proxy/foreman_ssl_cert.pem --key /etc/foreman-proxy/foreman_ssl_key.pem --cacert /etc/foreman-proxy/foreman_ssl_ca.pem https://$(hostname):9090/dhcp/192.168.220.0 | json_reformat

The script needs to check if PEM file exists so it will work on both Capsule and Server.

Using Ruby REST HTTP library would be preferred. Note this DHCP API is very stable and does not change a lot.

Get rid of type nil host statuses

Migrated from https://projects.theforeman.org/issues/37166

Sometimes this happens during deleting a host:

PG::ForeignKeyViolation: ERROR: update or delete on table "hosts" violates foreign key constraint "host_status_hosts_host_id_fk" on table
"host_status" DETAIL: Key (id)=(HOST_ID) is still referenced from table "host_status".

'foreman-maintain service stop' causes Pulpcore core dump when database connection lost

Migrated from https://projects.theforeman.org/issues/37182

Submitted based on community 36512: https://community.theforeman.org/t/pulpcore-coredumps-during-stopping-services/36512

A user is doing some sort of "offline backup" operation which shuts down the database. Afterwards, the user calls 'foreman-maintain service stop', which expects the db endpoint to still be online; this causes a core dump in Pulpcore. Ideally, 'service stop' should be able to handle this edge case without core dumping, whether that is through handling a bad connection or via other means.

Excerpt from messages.log file on community post:

Jan 16 04:00:18 hostname pulpcore-content[210923]:  File "/usr/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 275, in get_new_connection
Jan 16 04:00:18 hostname pulpcore-content[210923]:    connection = self.Database.connect(**conn_params)
Jan 16 04:00:18 hostname pulpcore-content[210923]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 16 04:00:18 hostname pulpcore-content[210923]:  File "/usr/lib/python3.11/site-packages/psycopg/connection.py", line 728, in connect
Jan 16 04:00:18 hostname pulpcore-content[210923]:    raise ex.with_traceback(None)
Jan 16 04:00:18 hostname pulpcore-content[210923]: django.db.utils.OperationalError: connection failed: server closed the connection unexpectedly
Jan 16 04:00:18 hostname pulpcore-content[210923]: #011This probably means the server terminated abnormally
Jan 16 04:00:18 hostname pulpcore-content[210923]: #011before or while processing the request.

Command: Storage Planning

As a user, I'd like the ability to plan how much storage I'd need for repositories that I plan to synchronize, taking into account that Pulp only stores the RPM once on disk. Basically, I'd want a tool like pulp-planner, but with the capability of gathering this information for both Red Hat and non Red Hat repositories.

enforce better variable name than `e` for exceptions via rubocop

          Looks like that cop is designed to help enforce a standard across your project, and defaults to `e`. So we could set it to `error` in the Rubocop config (https://www.rubydoc.info/gems/rubocop/0.75.0/RuboCop/Cop/Naming/RescuedExceptionsVariableName)

Granted, looks like we do not see this error currently because we are using => e everywhere already. Ok - stick with our current standard - file me a Github issue?

Originally posted by @ehelms in #856 (comment)

Can the cells grow dynamically?

Currently, the cells are restricted to 1 and 2 rows.
STDOUT greater than 2 rows fits in but without the cell decoration.

sat62 :: ~/foreman_maintain ‹disk_utility*› » ./bin/foreman-maintain health check
Running checks with tags [basic]
--------------------------------------------------------------------------------
| Check for recommended disk speed of pulp, mongodb, pgsql dir.:    [FAIL]     |
| Slow disk detected /var/lib/pulp mounted on /dev/mapper/rhel-root.
             Actual disk speed: 1113 MB/sec
             Expected disk speed: 80 MB/sec.|
--------------------------------------------------------------------------------
| check for paused tasks:                                           [FAIL]     |
| There are currently paused tasks in the system                               |
--------------------------------------------------------------------------------
Continue with step [resume paused tasks]?, [yN]y
| resume paused tasks:                                              [OK]       |
--------------------------------------------------------------------------------

Procedure: isc dhcp configuration check

Manual edits in ISC DHCP often lead to "Cannot add Subnet" error or other parsing issues as our parser is very limited. This procedure (should not be really check but a "command" you can run when needed) would do this (tested on Satellite 6.1+):

#!/bin/bash
curl -ks --cert /etc/foreman/client_cert.pem --key /etc/foreman/client_key.pem --cacert /etc/foreman/proxy_ca.pem https://$(hostname):9090/dhcp
dhcpd -t -cf /etc/dhcp/dhcpd.conf

The first command returns JSON of subnet data, error message otherwise. The second is syntax check of ISC, returns some text (always), we need check the return value instead.

[foreman-tasks-delete] Improve Error message for incorrect state

# ./bin/foreman-maintain advanced procedure run foreman-tasks-delete --state stopped

Running ForemanMaintain::Scenario
================================================================================
delete tasks: 
/ Deleting stopped task                                               [FAIL]    
Invalid State
--------------------------------------------------------------------------------

Specify valid state options in error message
Mention valid states in command description

Installation guide

As $subject suggests, can we have a document for installation and configuration of this tool ?

Checks with confine block should explain why it had failed

Confine block such as following, would fail if any of the packages is not installed. There is no explanation in such cases. IMO the check should either FAIL with a message OR just say the reason for check failure.

  confine do
    execute?('which hdparm') && execute?('which fio')
  end

Check: Number of interfaces

If a host has more than 100 interfaces it is likely an issue we have in Satellites 6.0-6.3 - changing interfaces gets created and never deleted.

Host.find(27).interfaces.count
=> 1426

Check: If there are unread mail for root issue warning

We have many crons, they end up in root mail folder (or file technically).

service restart exits OK, even if there were issues

Migrated from https://projects.theforeman.org/issues/34431

[root@centos8-stream-katello-nightly ~]# foreman-maintain service restart
Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 

Stopping the following service(s):
redis, postgresql, pulpcore-api, pulpcore-content, [email protected], [email protected], tomcat, dynflow-sidekiq@orchestrator, foreman, httpd, dynflow-sidekiq@worker-1, dynflow-sidekiq@worker-hosts-queue-1, foreman-proxy
- stopping httpd                                                                
Warning: Stopping foreman.service, but it can still be activated by:
  foreman.socket
\ stopping pulpcore-content                                                     
Warning: Stopping pulpcore-api.service, but it can still be activated by:
  pulpcore-api.socket

Warning: Stopping pulpcore-content.service, but it can still be activated by:
  pulpcore-content.socket
| All services stopped                                                          

Starting the following service(s):
redis, postgresql, pulpcore-api, pulpcore-content, [email protected], [email protected], tomcat, dynflow-sidekiq@orchestrator, foreman, httpd, dynflow-sidekiq@worker-1, dynflow-sidekiq@worker-hosts-queue-1, foreman-proxy
\ starting httpd                                                                
Job for [email protected] failed because the control process exited with error code.
See "systemctl status [email protected]" and "journalctl -xe" for details.
/ All services started                                                [OK]      
--------------------------------------------------------------------------------
[root@centos8-stream-katello-nightly ~]# foreman-maintain service status
Running Status Services
================================================================================
Get status of applicable services: 
…
Some services are not running (dynflow-sidekiq@orchestrator)
--------------------------------------------------------------------------------
Scenario [Status Services] failed.

The following steps ended up in failing state:

  [service-status]

Resolve the failed steps and rerun
the command. In case the failures are false positives,
use --whitelist="service-status"

Yes, my dynflow is really borked, no it's not F-M's fault, but it should detect that?

[root@centos8-stream-katello-nightly ~]# systemctl start dynflow-sidekiq@orchestrator
Job for [email protected] failed because the control process exited with error code.
See "systemctl status [email protected]" and "journalctl -xe" for details.
[root@centos8-stream-katello-nightly ~]# echo $?
1

bin directory is missing from installed gem

bin directory is missing from the list of directories to be built into the gem.

Add a progress-bar

There are a lot of progress bar out there. We can use any of them or write our own intuitive one.
Should be something similar to opt-in spinner: iNecas@9786c4a

Check: Issue warning if there are errors in logs

We could grep all known logs for (ERROR|error|fatal) or similar and in case there are some we could put the check into WARNING state. I would not do ERROR state because this will often show false alarm.

Running foreman-maintain on Debian leaves behind files

This may only be when running from git, but whenever packages update is run on Debian it will create two files depending on if assumeyes is present:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	false}
	true}

Check: Verify if EPEL is enabled

We had a case when EPEL was enabled on a Capsule and qpidd was upgraded from there to more recent version. We should inform users this as an ERROR that EPEL must not be enabled.

Check: Incorrect /etc/hosts entry

I very often see this:

IP_ADDRESS ALIAS FQDN

Which leads to incorrect reverse resolution (e.g. hostname -f). It must be:

IP_ADDRESS FQDN ALIAS

IP_ADDRESS FQDN ALIAS1 ALIAS2 ...

But not the other way around.

Clean errored tasks in 6.2.z, if they are from previous sat version 6.0 or 6.1

While upgrade, there are two pending tasks since 6.0 install. If we resume them they error out. So would be great if such tasks could be self healed or user should be able to remove them (w/ a prompt) or at least if they are non-blockers for upgrade we could convert them to warning.
Paused task on system named as "Actions::Katello::ContentView::NodeMetadataGenerate"

Enhance logging

Currenttly, the logging is very simple, but we can do better. Things to improve:

log into file in debug mode
include timestamps in log file names
keep history of X log files, delete the older log files

Let user to fix failing check if possible

We should not only check for state, but if it's possible and implemented, we should allow users to fix the inconsistencies or errors interactively. This should be opt-in feature.

Add disk read speed check as part of preupgrade checks

Disk read speed should be at least 80 MB/sec for mongodb, pgsql, pulp directories present under /var/lib.

Define order for checks

Define which check executes in what order.
Maybe an extra metadata or a configuration file to list down the checks in numerical order. Mostly beneficial for users/customers.

Pros:

Categorize the type of checks.
Can be customized by users.
Easy to go with an offline checklist.

Cons:

Extra configuration apart from existing foreman configs(not sure how many are there)
User intervention results in educating the users/Cu.

This could be a good to have feature.

Offer to disable sync plans before upgrade

Check: Verify if iSCSI is mounted before services

Experience from field engineering customer. When data folders are mounted over NFS or iSCSI, we need to query systemd if RequiresMountsFor is set. Prefered method of configuring that is via .d directory so it won't get overwritten by the installer:

# cat /etc/systemd/system/postgresql.service.d/mounts.conf
[Unit]
RequiresMountsFor=/var/lib/pgsql

Do not grep the file directly, use systemctl show postgresql.service | grep RequiresMountsFor to find out, in case it appears in mount output.

Typical services affected:

postgresql
mongod
pulp data

Check: Compare repositories in Katello-Pulp

We need a check to compare repositories, something like

    echo "katello repositories:"
    su - postgres -c "psql foreman -c \"select * from katello_repositories;\""

    echo
    echo "pulp repositories:"
    mongo pulp_database --eval "DBQuery.shellBatchSize = 10000000; db.repos.find().shellPrint()"
    echo; echo; echo