I have already written an article to fix "No Valid Host Was Found. Not Enough Hosts Available" which is one of most common problem seen during overcloud deployment. The discovery and introspection process must run to completion. However, Ironic's Discovery Daemon (ironic-discovered) times out after a default 1 hour period if the discovery ramdisk provides no response. There can be various possible issues which you might see depending upon your environment and setup while performing the introspection so you must know the files to look out, to troubleshoot OpenStack ironic introspection related issues.
Ideally introspection should not take more than ~10-15 minutes to complete depending upon the time taken by your node to power off and power on so if you see "Waiting for introspection to finish" running for more than 10-15 minutes then you should login to the console of your target node using the iLO IP and check if you see any error message on the screen.
Common errors: Troubleshoot OpenStack ironic introspection
Error: Invalid provision state for introspection
Analysis:
You may get this error "Invalid provision state for introspection" as soon as you trigger or start the introspection.
Started Mistral Workflow. Execution ID: 7b9d7b9e-e1e5-4a80-92ce-d0b65471fcb9 Waiting for introspection to finish... Introspection completed with errors: Failed to run action [action_ex_id=bc6b453b-efb7-478c-afe9-67ff75bfb049, action_cls='<class 'mistral.actions.action_factory.BaremetalIntrospectionAction'>', attributes='{u'client_method_name': u'introspect'}', params='{u'uuid': u'83513ac9-f7bb-48d8-a360-25142e8b89e5', u'new_ipmi_username': None, u'new_ipmi_password': None}'] BaremetalIntrospectionAction.introspect failed: <class 'ironic_inspector_client.common.http.ClientError'>: Invalid provision state for introspection: "available", valid states are "['manageable', 'inspectfail', 'enroll', 'inspecting']"
Solution:
Introspection can only be performed when the ironic node is in manageable
Provisioning State. So you need to manually change the provisioning state
of the respective ironic node to manageable
using the below command
# openstack baremetal node manage <node UUID>
Every node for introspection must have:
- Power State should be power off
- Provision State should be available
- Maintenance should be False
- Instance UUID likely set to None.
Error: Look up error: Could not find a node for attributes
Analysis:
This will most likely happen if your input json file and the original target host do not match or you must have re-run the import multiple times for the same node. For any scenario if a import fails partially you should delete the existing ironic node and then re-attempt the import.
For example I missed to delete the existing ironic node and performed a re-import so I ended up with below error
2018-09-30 07:08:43.647 23412 ERROR ironic_inspector.utils [-] [node: MAC 52:54:00:f7:14:10] The following failures happened during running pre-processing hooks:
Look up error: Could not find a node for attributes {'bmc_address': u'', 'mac': [u'52:54:00:f7:14:10']}
Solution:
To fix such issues first delete the respective ironic node from the registry
$ openstack baremetal node delete <node UUID>
For example:
$ openstack baremetal node delete 83513ac9-f7bb-48d8-a360-25142e8b89e5
and then re-perform the import by correcting your json file
$ openstack baremetal import --json instackenv-controller.json
How to check the current progress status of the introspection?
You can check the progress of the introspection using below command from a different terminal
$ sudo journalctl -l -u openstack-ironic-inspector -u openstack-ironic-inspector-dnsmasq -u openstack-ironic-conductor -f
Which log file to check for introspection logs?
The introspection logs (from ironic-inspector) are located in /var/log/ironic-inspector
. If something fails during the introspection ramdisk run, ironic-inspector stores the ramdisk logs in /var/log/ironic-inspector/ramdisk/
as gz-compressed tar files.
# ls -l /var/log/ironic-inspector/ total 1012 -rw-r--r--. 1 ironic-inspector ironic-inspector 1027065 Sep 30 10:33 ironic-inspector.log drwxr-x---. 2 ironic-inspector ironic-inspector 4096 Sep 30 07:08 ramdisk
Here ironic-inspector.log
will contain the current progress status of the introspection and ramdisk will contain the information collected from the introspection stage.
Here I have extracted one of the archives created at the introspection stage
# ls -l
total 216
-rw-r--r--. 1 root root 1814 Sep 30 06:48 df
-rw-r--r--. 1 root root 840 Sep 30 06:48 ip_addr
-rw-r--r--. 1 root root 275 Sep 30 06:48 iptables
-rw-r--r--. 1 root root 122681 Sep 30 06:48 journal
-rw-r--r--. 1 root root 172 Sep 30 06:48 ps
-rw-r--r--. 1 ironic-inspector ironic-inspector 19572 Sep 30 06:17 unknown_20180930-101719.297007.tar.gz
To collect introspection logs on success as well, set always_store_ramdisk_logs = true
in /etc/ironic-inspector/inspector.conf
, restart the openstack-ironic-inspector
service and retry the introspection.
I hope the steps from the article to troubleshoot OpenStack ironic introspection was helpful. Let me know your suggestions and feedback using the comment section.
Good One! Keep up your good work!
There needs to be a way to cancel an introspection in progress If it is obvious it will not be able to work, I don’t want to wait an hour for it to finally decide it’s going to fail completely. I’d like to kill the process, potentially fix the problem, then restart without having to wipe out the entire undercloud deploy and start from scratch.
You can always monitor the introspection by checking the console. If you do not see any activity on the node’s console for more than 5-10 min then it should be assumed there is some problem. Check the log files to be sure and then you can kill the process or send an interrupt. I have done it many times and I didn’t faced any issue. Although you should never stop or kill an overcloud deployment and wait for it to fisinh or else your stack will become useless.