Looking at the reported issues with tests/responses,
I wondered how it was missed, and it was because there
was no test. So now that we are fixing it, add the test.
Change-Id: Ia1f5dee01d31880aedd1e78710054ef079708d10
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
During review for the commit that introduced this microversion, it was
pointed out that a single method was more readable. This is the promised
follow-up to that request.
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Id7fed978e7e6c5d739906576de1ed339fe5b28d9
The port API endpoints were not returning the vendor and category fields
when they should have been. A number of places the api-ref was incorrect
and missed adding information about the vendor and category fields but
other fields were omitted as well.
Change-Id: Iaa19c384556b4c7453141c14ea76700c6ecae05d
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When we create major release mappings, we should not forget to
include the alias mapping so users don't have to look it up
separately.
Change-Id: If1970f0d63be220c45bec272187502954ced2adf
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
In Ie86ec57e428e2bb2efd099a839105e51a94824ab this code was added but it
appears to have been targeting an earlier version of the spec index
Ib93e62076207e3e25960111bd0b46b83fe481c69. Up to version 8 of the spec
there is mention of a 'registry' DB field which would have been added
and then parsed in a get_registry_fields() helper method. But it was
ultimately dropped. This bit of code still contained those references
resulting in the endpoint returning errors.
Closes-Bug: 2137596
Change-Id: I79ed016edd2ea6bfb94bf303f1e815b4d9b16dfd
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This reduces logging when NetworkAdapters are missing from a redfish bmc
from warning level to debug level. This resolves an issue where loud
logging was reporting on hardware without redfish NetworkAdapters
support.
Generated-by: Claude-code 2.0
Closes-bug: #2133727
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: If48757c6ec4a1f7978bd973830020161c55922e4
When validate-interfaces runs if there is no interfaces key then we
would just have a KeyError exception that would be logged to the node.
This provides a clearer message back as to why this happened.
Change-Id: I307848a9a1733ecff534ae37541e59465b4e96b7
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
sdmitriev1 reported issues using the service verb to perform firmware
updates utilizing the management interface. Turns out we had a bug
which was not detected previously as the tested paths went through
firmware interface updates and the dell idrac variation interface.
In any event, this change should fix these issues. Also adds deploywait,
because operators can choose different firmware for deploy templates as
well, and just to be on the safe side while also matching the other
step handling logic.
Further enough, we needed to properly call resumption of the next
step, which was furhter pointed out by the reporter.
Claude was kind enough to write some unit tests, and noted as assisted
by for it's assistance.
Closes-Bug: 2136895
Assisted-By: Claude Code/Claude Sonnet 4
Change-Id: I786c90effda9892bfef3a90700d1fc2a3372cfde
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Co-Authored-By: Stanislav Dmitriev <sdmitriev1@gmail.com>
A similar function is provided by oslo_utils.netutils .
Change-Id: I6b24ab9deedf9e9802ef1bb5701ddeea91caed69
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
It seems http:// is quite unstable while git:// is stable.
Change-Id: Iedd7356eb3777ce829961ee6392fbb677f26ae22
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
... instead of only documenting the valid choices.
Change-Id: I5f73da7f69242ec21dd60da36e7b176213888db1
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
... instead of duplicating similar logics.
Note that this introduces the X-OpenStack-Request-Id response header,
along with the existing OpenStack-Request-Id , because X-... is
globally used in multiple OpenStack services.
Change-Id: Ieeb9ed606af615f97373a4d37c7d3954b2845bd5
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
This was somehow missed during initial implementation. Adding ability to
filter portgroup by shard. This was mostly vibe coded with claude, with
me interupting to suggest better implementations when it did something
silly. Tested manually by a human using fake drivers :).
Closes-bug: #2134566
Generated-by: Claude code (claude)
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ic67c02763c2d832f616dc4526e4be891d639b976
The neutron network interface's add_ports_to_network() function only
checked for 'pxe_boot' capability when determining PXE capability,
but iPXE is also a form of PXE booting and should be treated the
same way. This caused inconsistent behavior for boot interfaces like
'http-ipxe' that have 'ipxe_boot' capability but not 'pxe_boot'
capability.
Without this fix, iPXE boot interfaces were incorrectly treated as
non-PXE capable, causing the neutron interface to create ports for
all baremetal ports with local_link_connection info during cleaning
operations, regardless of their pxe_enabled setting.
This change adds 'pxe_boot' capability to both iPXEBoot and
iPXEHttpBoot classes, ensuring that iPXE boot interfaces are
correctly recognized as PXE-capable.
Additionally, this adds the missing pxe_boot capability check to
the remove_ports_from_network() function, which was previously
missing this logic entirely. This ensures consistent port creation
and deletion behavior, preventing orphaned neutron ports after
cleaning operations.
Change-Id: I7721f917fb723e8a4cef69e0f7be1ece0238d7ed
Signed-off-by: Milan Fencik <milan.fencik@rackspace.co.uk>
The built-in inspection rules cannot be loaded because the jsonschema
validates them against the expected API however the built-in rules had a
'built-in' key that is not part of the schema and included the 'scope'
key which was ultimately dropped before inspection rules support landed.
The built-in rules also did not validate that the data was a list of
rules before attempting to utilize it giving an incorrect error.
Closes-Bug: 2136776
Change-Id: I36c290c9f92189281e11633e9a587918b0699ae3
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
These inspection rules actions are implemented but were not documented
so add some documentation for them. The redfish inspection interface
also supports inspection rules.
Change-Id: I65894191affd9171bf68dc9b15725ed34a9724f9
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When size_gb='MAX' is specified, Ironic was calculating the maximum
volume size and including it in ``CapacityBytes``. This calculated size
doesn't account for controller metadata overhead, causing iLO and other
controllers to reject the request with UnsupportedOperation.
Fix by omitting ``CapacityBytes`` from the Redfish payload when
size_gb='MAX', allowing the controller to calculate the optimal size
automatically and the actual volume size is queried and stored after
creation via ``update_raid_config()``.
Unit tests generated by AI.
Closes-Bug: #2132936
Assisted-by: Claude Sonnet 4.5
Change-Id: Ica2e31783b18fc2306369b0ee0d467aca17d4975
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The intention of this code is to read the system product name which is
stored in the model field per the Redfish spec and not in the name field
which will always store the name of the object we are working with. This
results in the value always being 'System'.
Closes-Bug: 2136233
Change-Id: I375fbe27253d7965e458be7b147d5b72cffa4e89
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Updated the docs and code to match the same order and to break out each
operation like it is in the code. Fixed incorrect indent of an example.
This is just mechanical to make visual inspection of the docs to the
code easier.
Change-Id: Ic96c5a1993d20347968c23c60393a4cde2de9a0c
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Implements the generic-switch wrapper around the
networking-generic-switch implementation.
- GenericSwitchDriver implementation integrating with the
networking-generic-switch library for managing physical network
switches from multiple vendors
- Switchport configuration operations (update_port, reset_port) with
support for access/trunk modes and VLAN management
- LAG method stubs for future implementation
- Configuration translation using driver_translators from the
framework to convert between ironic and driver formats
This driver enables the networking service to manage physical switch
port configurations for bare metal node network connectivity. The
driver is loaded and managed through the driver factory and adapter
infrastructure from earlier commits.
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: I7f5d3c9996e1641ce3b309dbd67ba6a0c6c47d78
Signed-off-by: Allain Legacy <alegacy@redhat.com>
This change establishes the base framework for switch drivers in the
networking service by introducing the SwitchDriverBase abstract class
that defines the required interface for all switch driver implementations.
It also restructures how the adapter and translators are wired
together. This is now done implicitly based on the loaded drivers
rather than explicitly using hard coded driver names.
Key changes:
- Add SwitchDriverBase abstract class defining the switch driver interface
- Add NoOpSwitchDriver implementation for testing and development
- Move BaseTranslator from driver_translators module to base module
- Refactor driver adapter to accept drivers and register their translators
- Update networking manager initialization order
- Add entry point for noop switch driver
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: I2feaad3058717e11948fa476debe038ef5b9caf8
Signed-off-by: Allain Legacy <alegacy@redhat.com>
This extends the base driver factory to allow for a 2 phase
initialization of drivers for some use cases. This allows
loading the classes separately from initialization the
instances of the drivers in cases where the class needs to
be examined or a class method needs to be invoked prior
to initialization of the driver.
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: Ifc505abd86c7b425c12de2090f2bc0262ff17527
Signed-off-by: Allain Legacy <alegacy@redhat.com>
In order to support the deployment of OCI images utilizing bifrost,
we also need to explicitly support basic authentication.
This has been extended to support inclusion of the authentication
credentials into the pull secret string, and fallback on the deploy
interface settings for static configuration. While this will likely
never be perfect, it is at least a forward step to better supporting
a variety of use cases.
Additionally, doing some of this highlighted some extraneous guard
rail style checks which exist elsewhere in the overall image handling
flow. A exeception check was added to prevent tag based deployments
from failing because the existing code structure of the guard rail
checks cannot gain the context around basic auth.
In part becasue the check directly uses the image_source as well,
as opposed to any other state data which is presently available
in deploy_utils when the image information is identified.
That situation highlights why additional fixes are needed,
but bug 2133885 was opened for that separately.
Claude also helped me out with the unit tests.
Assisted-By: Claude Code - Claude Sonnet 4.5
Change-Id: I13f4d5cd8b98ad88e7b6088c79c7b014b6461668
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
While working on trying to get OCI support in CI, I realized that the
default pattern setup with Bifrost was to setup a registry *without*
HTTPS.
This is different from the common practice and expectation of operational
OCI registries always utilizing HTTPS as the underlying transport mechanism.
The net result is an idea of offering the ability to "fall back" to HTTP
automatically, and make it a configuration option which needs to be
chosen by an operator.
The code pattern is such that the invocation of the client code paths
automatically identify the SSLError, and then attempt to fallback
to HTTP, while also saving the fallback on the class instance so the
additional URL generation calls for the underlying HTTP(S) client
gets an appropriate URL.
By default, this new option is disabled.
Claude helped with the tests, which was nice of it.
Assisted-By: Claude Code - Claude Sonnet 4.5
Change-Id: I3f28c8d6debe25b63ca836d488bc9fd8541b04d9
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The from_environ method provides the native interface (by keyword
arguments) to pass additional arguments to build a RequestContext
instance.
Also fix the ignored kwargs.
Change-Id: Id02e2212e1877c7913218d87188ba8b359ce2757
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
When the using the agent inspector interface and an exception happens
during execution of the hook, the node is not cleaned up resulting in
stale Neutron ports and ramdisk files.
Closes-Bug: 2135265
Change-Id: I69ceec12fc0beea586176a768d864a22261cdb93
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
- _collect_lldp_data(): Collects LLDP data from Redfish NetworkAdapter Ports via Sushy library, walking the Chassis/NetworkAdapter/Port hierarchy
- Integration with inspect_hardware(): LLDP collection is called during hardware inspection and results are stored in plugin_data['parsed_lldp']
The implementation supports standard Redfish LLDP data from Port.Ethernet.LLDPReceive fields and can be extended by vendor-specific implementations (like, Dell DRAC OEM endpoints) through method overriding.
Change-Id: I25889b2a2eb8f6a2d796dfbeb598875a7c07b22c
Signed-off-by: Nidhi Rai <nidhi.rai94@gmail.com>
It was deprecated long ago in favor of policy.yaml and is being removed
soon.
Change-Id: I1a5804cd15e1bc79ad1dc9900e61584902ef4468
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Ironic now depends on a minimum version of sushy that has integrated
sushy_oem_idrac into the code base so there is no point in falling back
to pulling in sushy_oem_idrac so remove the path.
Change-Id: I17217e0fe07b4819863706f473af12d87da46429
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Ports can be listed by conductor group since Flamingo, but due to an
error in the api-ref update, we weren't properly documenting it.
Change-Id: I98b329897946ef05ff82df5f1683075f17ecd3c0
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Retry failures report only the last error which could be misleading,
so include all relevant errors in the final exception.
Closes-Bug: #2098977
Change-Id: I8c0fb0328a6b3ee084813961d9a959af996a6dcb
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Pass the ``is-empty`` rule check when checking fields that don't
exist in the inventory.
Closes-Bug: #2132346
Change-Id: I177740dd3a8558ed357af22c581e5cbf1c3e862a
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Move the image size check earlier in the deploy flow so it runs only
when it matters and reuses already-fetched image info.
Closes-Bug: #2133885
Change-Id: I40518762e3032bbdcfe1d8e7e929147a761a95f8
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The intention was not to allow nested paths here but instead to only use
base paths to files we will serve up.
Change-Id: I877a7da4ed41bceb9f6f4ee229e8e9dc938d9e5b
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Currently, the nova virt driver for ironic has a file containing nothing
but our states as constants. A recent bug was caused, in part, by these
not being properly updated. The goal here is to move ironic state
machine code and constants into separate files -- once merged, I will
update the nova driver to use a copy of this file (and add a comment to
the file here saying it's synced over there). This should help prevent
this kinda issue in the future and in the long run cause less duplicated
work.
Assisted-by: Claude Code (claude)
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ief4533b69899c893f150ef3a7006fb99f7e42964
As discussed during the PTG, VTEP support for OVN is being
removed in order to eliminate confusion and also the Ironic
community is working on a suitable solution to these problems
which integrates with Neutron.
Closes-Bug: 2106460
Change-Id: I4147371c28cf786edb6f29ea83e3b7103f268347
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Signed-off-by: Jay Faulkner <jay@jvf.cc>
This patch adds support for extracting PCIe function identification
fields (device_class, device_id, vendor_id, subsystem_id,
subsystem_vendor_id, revision_id) during redfish hardware inspection.
The fields are extracted from PCIe functions and stored in a flat
structure in the inspection inventory, making them available for
inspection rules and hardware identification.
Also adds test coverage including edge cases like
missing PCIe devices, empty collections, and partial data scenarios.
[Removed This changes]Additionally adds system.model field extraction with proper None
handling and test coverage.
Depends-On: I1ec49e35a53abb8efdae639629cd819ccabbe620
Change-Id: I218c3b3865c07cc2c7fffc21a766cdef36759cd8
Signed-off-by: Nidhi Rai <nidhi.rai94@gmail.com>
Adds a tool that will ingest a TBN configuration file along with test
network/port-like objects in order to simulate how TBN would plan the
network for a node.
Change-Id: Ia7fbb9b651e4ed4c63a105484856de7f38bc541c
Signed-off-by: Clif Houck <me@clifhouck.com>
Implements the core business logic for the networking service manager,
replacing stub implementations with fully functional network operations:
- Port configuration methods (update_port, reset_port) with VLAN
validation and switch driver integration
- Port channel methods (update_portchannel, delete_portchannel) with
placeholder implementations for future support
- Switch discovery and information retrieval (get_switches)
- Decorator-based serialization for thread-safe switch operations
- Decorator-based VLAN validation against allowed/denied lists
- Dynamic switch driver selection and management using the driver
framework from the previous commit
This implementation uses the driver factory and adapter infrastructure
to dynamically load and manage switch drivers, enabling support for
multiple switch vendors. The manager coordinates network operations
across distributed switches while maintaining thread safety through
optional per-switch operation serialization.
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: I0722b116d29cddae02a4a79a4ea4b767709ecad2
Signed-off-by: Allain Legacy <alegacy@redhat.com>
A new ``ironic.console.container`` provider is added called
``kubernetes`` which allows Ironic conductor to manage console
containers as Kubernetes pods. The kubernetes resources are defined in
the template file configured by ``[vnc]kubernetes_container_template``
and the default template creates one secret to store the app info, and
one pod to run the console container.
It is expected that Ironic conductor is deployed inside the kubernetes
cluster. The associated service account will need roles and bindings
which allow it to manage the required resources (with the default
template this will be secrets and pods).
This provider holds the assumption that ironic-novnc will be deployed in
the same kubernetes cluster, and so can connect to the VNC servers via
the pod's ``status.hostIP``.
Assisted-By: gemini
Change-Id: Ib91f7d7c15be51d68ebf886e44efaf191a14437b
Signed-off-by: Steve Baker <sbaker@redhat.com>
The centos Containerfile still exists and the launch scripts have been
adapted to work on both distros.
The ubuntu container has been tested with noble. The container built
in the CI jobs is bound to the version of ubuntu which the host is
running, which will provide functional testing validation when jobs are
moved to newer releases.
Change-Id: I1954e418543acf939bf65189121484e038f3737c
Signed-off-by: Steve Baker <sbaker@redhat.com>
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
When the per-node external_http_url feature was introduced by
c197a2d8b2, it only applied to a config
floppy. This fix ensures that it is also used for a configdrive ISO. The
previous patch (0d59e25cf8) started using
it for boot ISOs.
Change-Id: I0e1e8dbba5a62a6196a5e6a8a9773fa89db6bc76
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The functionality has been implemented in the change:
https://review.opendev.org/c/openstack/ironic-python-agent/+/963200
This change documents the possibility to specify the is_root_volume
parameter of the target_raid_config of a node, which either makes
it the root volume, or prevents it from becoming a root volume.
Change-Id: I72fab180eaf361f9bedb04a1a24dfb9bcdf230cf
Signed-off-by: Jakub Jelinek <jakub.jelinek@cern.ch>
Implements the foundational driver framework for the networking service,
providing abstraction and loading mechanisms for network switch drivers:
- Driver factory for loading and managing switch driver plugins using
stevedore, with support for multiple concurrent drivers
- Driver adapter for preprocessing switch configuration files and
managing driver lifecycle
- Driver translators for converting between ironic network data formats
and driver-specific configuration formats
- Utility functions for network configuration validation, VLAN range
parsing, and RPC transport detection
This framework provides the foundation for integrating various network
switch drivers (e.g., networking-generic-switch) with the ironic
networking service. The framework is used by the manager implementation
added in the subsequent commit.
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: Ifb6e662ef59f9e12aad7c34356d2e78c3ebb4143
Signed-off-by: Allain Legacy <alegacy@redhat.com>
This script runs a liveness check on the configured conductor hostname
and will fail if the conductor is not online. Its intended purpose is to
be used as a kubernetes pod startup or liveness probe for the conductor
container.
Change-Id: I88288e0d7a1da4ec99f31c20771299cce2499bf0
Signed-off-by: Steve Baker <sbaker@redhat.com>
We only use DIB based ipa ramdisks and changed bifrost jobs
names.
Depends-On: I569a766826405513f7beab5d45a52a8bbf42ddfd
Change-Id: I8dc17087d595872d660c9a90c8dbafef268ad02a
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
These options have been unused since wsman driver interfaces were
removed[1].
[1] 578f24bf18
Change-Id: Id080693495dde60330a05fb960b5f10a155a3b3c
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Implements the foundational infrastructure for a new standalone
networking service that can operate independently of the main ironic
conductor. This commit establishes the service skeleton with:
- RPC API layer with oslo.messaging integration for remote calls
- Public API interface for conductor/API to interact with the service
- RPC service implementation for handling network requests
- Stub networking manager with method signatures (implementation
added in subsequent commit)
- Service entry point (ironic-networking command) for deployment
- Configuration options for service behavior and networking backend
- Infrastructure and packaging changes for the new service
The manager includes stub implementations that raise NetworkError,
with the full implementation of network operations, driver framework
and switch drivers are added in subsequence commits.
Related-Bug: 2113769
Assisted-by: Claude/sonnet-4.5
Change-Id: I351c7afe96cbcebd6b2e2bb5f0b4f17b5d804ceb
Signed-off-by: Allain Legacy <alegacy@redhat.com>
... which were somehow overlooked in the previous attempt.
Change-Id: I242baa622079a3a4facde4cf19fb1818593fb668
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Remove a few remaining references to ironic-inspector, which were not
covered by 32dd5ec596 which removed
integration with ironic-inspector.
Change-Id: Ib391c0f697c7f90f99660cf333deb0cd25a3bc05
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Ensure that the path where we are copying the bootloader into exists
before we copy the file in. Unfortunately the existing code wouldn't
work in the intended case because os.path.split("snponly.efi") returns
("", "snponly.efi") so we would never pass the first check so we would
not create the path. This keeps the same behavior of allowing a nested
path structure to not change the behavior so that this can be
backportable while fixing the issue and switching the pathlib to provide
a safer interface going forward.
Change-Id: Iaf756f634832310431020abd758b59c749aecb21
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When deep image inspection is disabled, the incorrect path was used to
determine the image format of the file resulting in a no such file or
directory exception which bubbled up and made it appear as if the cache
was missing.
Change-Id: Ibaf1486da9510fdad479523159797815e783e5f6
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
To justify a new major release, there were some significant
changes since 32.0, even though not breaking:
Removed inspector inspect interface (was already deprecated)
Removed sushy-oem-idrac from requirements
API & RPC Updates:
API version: 1.101 → 1.104
RPC version: 1.61 → 1.62
New Features:
API 1.104: instance_name field on nodes
API 1.103: category field on portgroups
API 1.102: physical_network field on portgroups
Deprecations:
Final deprecation of irmc hardware type
ilo driver deprecation warning added
Deprecation of ironic.api.wsgi:initialize_wsgi_app
Change-Id: I205e8968a4e74746cab59cf2e737f9c8f3779327
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
Split up the error message to wrap the download and conversion to the
actual file system linking steps.
Co-Authored-By: Marek Skrobacki <skrobul@skrobul.com>
Change-Id: I644fc0ff7d56de43189f572ce3f901ceffc1ffd5
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Marek Skrobacki <skrobul@skrobul.com>
Moves configdrive utility functions to the more appropriate configdrive
dedicated utils module to improve code organization and separation of
concern.
Closes-Bug: #2113892
Change-Id: I5851e84fe8f15de05dcddca773b1f28f639dc617
Signed-off-by: David Nwosu <nwosudavid13@gmail.com>
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The json-rpc debug logging can be... very... verbose. And that
verbosity when including all of the data crossing the RPC bus
provides limited value when your just focused on addressing
a performance issue.
The key is much more "when did I send a request" and what
was the ID, and similarly "when did I get a response".
By default, if there is a request ID when we're in debug
logging mode, we will keep the entire result to a brief
result.
Assisted-By: Claude Code - Claude Sonnet 4
Change-Id: Ib6e4db0e8689ed2081f29b1d1d22a7f01a0e1221
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Bumps to the latest versions of the tools we use in pre-commit except
for codespell which will come in a follow up.
Change-Id: I61f69d914b28bb13a1183315d6181db872cca638
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
We are running pycodestyle and pyflakes checks via ruff so we can
disable to double run via hacking using flake8. Remove ignores that were
not used or covered files that did not exist.
Change-Id: I342ef72e0ad007fa6f5b72f634ee90ef30137446
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
... because the project was retired some time ago.
Change-Id: I41d7656f6c87a340afedcdbf67c582d68a08744d
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Deprecation of ironic-inspector was announced long ago (during 2023.2
cycle) and ironic-inspector was retired this cycle. So it's time to
drop the all remaining code to force migration to the built-in
agent interface.
Change-Id: I14a87599f9f47b167f8f1a84704982301d033381
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
The code did not initialize a client so it resulted in an exception
always occurring. It also used the image attribute on the client
attribute of the service but the client attribute of the service is
already the image attribute. Create a wrapper method to use the API
correctly and prevent similar issues.
Closes-Bug: #2099276
Change-Id: Ib803c066ca28d1c05a345b7a982a0daabbd7d52e
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Adds a configuration file class for Trait Based Networking.
The class can read, validate, and parse a YAML config file conforming to
the expected structure of a TBN configuration file.
Parsing renders the configuration to TBN objects.
Change-Id: I69802006274d2373e73ba3d2779c29e365caea85
Signed-off-by: Clif Houck <me@clifhouck.com>
Base models for the majority of the Trait Based Networking feature.
Adds a lark-based parser for filter expressions found in Trait Based
Networking configuration files.
Change-Id: I4414463c70d37a7c6b5a957941a2607b5c15ab9e
Signed-off-by: Clif Houck <me@clifhouck.com>
The combined Ironic service was passing the no_fork parameter
to ServiceLauncher, which was wrongly mapped to ProcessLauncher [1]
Switch to ProcessLauncher which properly supports no_fork since
oslo.service 4.2.0. This ensures VNC signal handling works correctly
and matches the pattern used by other Ironic services.
[1] 0dfdf810ac
Change-Id: Iea150a5c3f147b7e4f8a778510bfc061a14f289a
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
I took a quick look and felt like it wasn't clearly demonstrating
current state, so decided to revise the text. It should be more
clear now as to the state of reality.
Change-Id: I1b3c808f6d75e1e7fa532d18df82418a4747071a
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Add validation check that raises NetworkError if the bound port list
is empty after attempting to bind all ports.
Closes-Bug: #2131962
Change-Id: I1d533b942144680c2622fec63caa092f96e481f5
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Adds support for comma-separated, non-consecutive port ranges
in ``[console]port_range``.
E.g., ``'1000:1100'`` and ``'1000:1100,2000:2500,3000:3100'`` are now
both valid.
Some of the unit tests were generated with AI.
Closes-Bug: #2131055
Assisted-by: Claude Sonnet 4.5
Change-Id: Ie35cfb6f431a58857f50b9ceda0daf601c8a6737
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
When nodes use out-of-band management interfaces (Redfish, iDRAC
Redfish, iLO, iRMC), the BMC address is already known and configured
in Ironic. This change adds an 'agent_skip_bmc_detect' flag to the
lookup API response config that tells the agent to skip BMC address
detection via ipmitool.
This reduces deployment time and avoids unnecessary ipmitool calls
during hardware inventory collection.
The flag is automatically set based on the node's management_interface
and is included in the config section of the lookup response.
Assisted-By: Claude Sonnet 4.5
Change-Id: I6a432db3eb238894e0ed2676243ce69ec300a9eb
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
... because ironic-inspector has been retired.
Change-Id: Id5568cbac8f559821dffd004ab9b6db3e4f4bca6
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Clear cached instance images during deployment failure state
transitions, guarded by a new configuration option
[conductor]clear_image_cache_on_deploy_failure, defaulting to ``False``
The default preserves cached images for retry attempts, which benefits
users with small image sets. Images are cleaned up eventually via
periodic cache cleanup (TTL-based), and hash validation failures
already prevent corrupted images from being cached. Operators can
enable immediate cleanup by setting the option to ``True``.
Closes-Bug: #2076124
Change-Id: I440841d171c3f616e0944ea580bce4677cf85ada
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Our most basic images require 2500mb of ram, minimum, for fake nodes.
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: I0939f6fbb8dfd91c4e3f20b3a785e6acc9feb9bb
There is no way to migrate inspection data automatically.
This script documents one of the ways how to copy data
between two Swift buckets.
Change-Id: I4a86faab5e7abef17064e3c716dc17b6a2f21f39
Signed-off-by: Jakub Jelinek <jakub.jelinek@cern.ch>
Assisted-by: Claude 4.5 Sonnet (Anthropic AI)
The iRMC hardware type for Fujitsu PRIMERGY servers has been
unmaintained for an extended period.
This change marks the iRMC hardware type and all associated
interfaces (bios, boot, inspect, management, power, raid, and
vendor) as unsupported. All configuration options in the [irmc]
section have been marked as deprecated for removal. The
documentation has been updated with a prominent warning about
the deprecation.
Users of the iRMC hardware type should begin planning migration to
alternative hardware types. The driver and all associated code will
be removed in a future Ironic release.
Change-Id: I78b822e5fe3bd1ce4d7ea410c4569d6b830dc214
Signed-off-by: Jacob Anders <janders@redhat.com>
Assisted-by: Claude 4.5 Sonnet (Anthropic AI)
This adds node.instance_name as a top level field
Additionally, to provide forwards-compatability for nova clients,
we will automatically set node.instance_name if
node.instance_info.display_name is being set.
Tested the following in devstack, using manual CURL api calls:
- Viewing an instance_name via GET /v1/nodes/node-name
- Adding an instance_name
- Clearing instance name on undeploy
- Setting an instance_name via PATCH /v1/nodes/node-name
- Setting a instance_info/display_name and validating it sets instance_name
- Setting a instance_info/display_name when instance_name already exists and
validating it DOES NOT OVERRIDE existing instance_name
- node.instance_name not returned for API version < our micro version
- querying /v1/nodes with ?instance_name=somename
- (cid) Sort of fully tested integrated with nova by observing CI logs
- (jayf) Added missing comments around API versions, and added a reminder
comment
Generated-By: Claude code
Change-Id: Ic24b2e8dbe88c59f0df52a0f5581d48492ba8cd7
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Signed-off-by: Jay Faulkner <jay@jvf.cc>
When plugging a baremetal port in using the 'neutron' interface, send
the 'physical_network' value of the baremetal port to Neutron as part of the
binding_profile for the port. This can be useful for VXLAN underlay
connected machines where the networks in Neutron are VXLAN networks
which then have segments on them that are VLAN based segments which bind
the VNI to a VLAN for attachment for the node to connect to the VNI.
Ref: https://bugs.launchpad.net/ovn-bgp-agent/+bug/2017890
Ref: https://bugs.launchpad.net/neutron/+bug/2114451
Ref: https://review.opendev.org/c/openstack/neutron-specs/+/952166
Partial-Bug: #2105855
Assisted-by: Claude Code 2.0
Change-Id: I6e0185e203489676d530e6955929997f4871b8fa
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This change adds a NoDeploy class to allow for a truly minimal
deployment interface with no-op implementations for all required
methods.
Closes-Bug: #2106550
Change-Id: Ic6faf34860efef9165ad868d57972cd5007eacd4
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
We likely need to begin to strip out the undionly.kpxe stuffs as well,
but first lets see what people think with this and we can go in that
direction.
Change-Id: I09f15e87372390219193c93ba3b5d309f29df900
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The tests create the portgroup and then create ports in the DB while
setting the portgroup_id on the created port. The result is that the
physical_network field is not kept in sync on the portgroup with the
port's value as is expected since I5a9d9c19182b232bc1b8446644cab0bf6d68d139
resulting in inconsistent data for the tests. Since the tests only
confirm that the portgroup physical_network is of the expected value,
which is empty so they pass.
Change-Id: Ied1f5c884652ff4e7ddeb748199dbf20ebc879bd
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The code intended to not fail when a Port object was missing but to
instead skip the handling of that node but instead there is an unhandled
exception because the get_by_address() method throws PortNotFound
instead of returning None when a port couldn't be found.
Change-Id: I04dfa09ada7e6a9d22ba16051cb5737daf3bc668
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The data has not been stored as JSON which caused issues.
Closes-bug: #2130790
Change-Id: I1d13fb227e6c3ba713dac58c6e02a199f589209f
Signed-off-by: Jakub Jelinek <jakub.jelinek@cern.ch>
... because usage of a string value is deprecated since oslo.middleware
3.0.0[1].
[1] 40135b76a92cef4197e2f68be46fd129d41630c6
Change-Id: I3d6b67c221f9411cad59b7e6b9b3abf89c5508a8
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Validate ``ipmi_address`` field for valid IP addresses or hostnames
during node validation.
Closes-Bug: #1666223
Change-Id: Ie33e7aed7521b552efcd851228072f43ebfec620
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The current containerised graphical console approach has a Selenium
script managing a Chrome browser session. This change replaces that with
firefox and a custom extension to perform the required actions to login
and load the BMC console. This supports the same vendors as the previous
approach (iDRAC, iLO, Supermicro).
This change is required by Red Hat as Chrome is not packaged in RHEL.
However switching to firefox has allowed a more robust and featureful
implementation so it is presented here on its own merits.
This is implemented with bash, calling out to dedicated python scripts
for these specific tasks:
- Detecting which vendor specific javascript to use for the
redfish-graphical driver
- Building the required certificate fingerprint when app_info.verify_ca
is false, which is written to the profile's cert_override.txt
- Building a custom policy.json which is specific to the BMC and vendor
implementation.
Functional differences with the chrome/selenium version
- Firefox kiosk mode has a more locked-down environment, including
disabling context menus. This means the brittle workaround to disable
them is no longer required.
- Firefox global policy allows the environment to be locked down
further, including limiting accessing to all URLs except the BMC.
- There is now a dedicated loading page which can show status updates
until the first BMC page loads. This page shows error messages if any
of the early redfish calls fail.
- VNC client sessions are now shared with multiple clients, and firefox
will be started on the first connection, and stopped when the last
connection ends.
- Starting Xvfb is now deferred until the first VNC client connection.
This results in a never-connected container using 5MB vs 30MB
once Xvfb is started. Starting Xvfb has ~1sec time penality on first
connection.
- The browser now runs in a dedicated non-root user
- All redfish consoles now hide toolbar elements with a CSS overlay rather than
simulating other methods such as clicking the "Full Screen" button.
- ilo6/ilo5 detection is now done by a redfish call and the ilo5 path
has less moving parts.
Change-Id: Ib42704a016dc891833a0ddbeae8054cac2c57d4d
Signed-off-by: Steve Baker <sbaker@redhat.com>
Assisted-By: gemini
Log the unit status and journal log when systemctl start fails for a
console container.
Also before the console is stopped, call journalctl for that unit. If
debug is enabled then the journal output will be logged.
Additionally, after a container is started, attempts are made to open a
socket to the VNC port and read some data. There is a delay between the
container starting and x11vnc actually listening and this race can be
triggered in automated tests, so this delays changing the console
enabled state until it is *really* ready.
Change-Id: I2c4867b6773f4f4eaa8b98e50a63881f0f4d08b0
Signed-off-by: Steve Baker <sbaker@redhat.com>
Adds a new category field to the portgroup object. Foundational
work for first milestone of trait based port scheduling.
Depends-On: https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/955799
Change-Id: I5100144a330602996c27ed18d2bbde09be6e9571
Signed-off-by: Clif Houck <me@clifhouck.com>
Treat HTTP 400 and 409 errors as success when the node is already
in the target power state, preventing deployment failures from
race conditions between power state change completion and state
verification timeout.
Also refresh system state to get current power state from BMC
instead of using potentially stale cached data.
Assisted-By: Claude Sonnet 4.5
Change-Id: Id66ff9c70a9dd6969e3ac7fc74328dfc6e0431bd
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
Change- heading was outdated and confusing. This patch updates
the title to accurately reflect the content of the installation guide.
This resolves: bug/2072349
Change-Id: Ibfdbbc40ad2ae1cda46de1d1937c15e5926d4308
Signed-off-by: Abhijith P C <abhijithpc25@gmail.com>
The sushy-oem-idrac code has been integrated in sushy version
5.6.0 so we don't need the sushy-oem-idrac package anymore.
Change-Id: Ief49300f6ac0c49686b9469be98b8680e018a618
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
Add openstack release schedule with reference to ironic
projects and when to release them during the release
cycle.
Also remove all old references to unmaintained projects and
move all links to the bottom of the document as best practice.
Assisted-By: Claude Sonnet 4
Change-Id: Ie01bb22e6391ea22c8983ebc7bf5b90c688f8afd
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
Adds an upgrade check that warns operators if they have nodes using
the ilo or ilo5 hardware types or any ilo-specific interfaces
(ilo-pxe, ilo-ipxe, ilo-virtual-media, ilo-uefi-https, etc.).
Change-Id: I1e90cbb08d5268e54132e4c3dba510d211e11007
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Its complicated, but basically because we run full size VMs which
take a while to boot, the multinode tests need a bit more than 600
seconds to deploy a node. They can get there in just about that time
and even sometimes beat the time window, but sometimes the job
times out internally and kills the test run.
This changes the time to be 2000 seconds, which is more consistent
across other jobs. Independently, the defaults in the tempest plugin
will need to be made sane.
Change-Id: I890d551122489e5a0b3162f08dbc10270968fb00
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
As discussed during the PTG, minimally we need update some of the
text on the framing of Redfish to help users understand what might
be happening or why it might not work.
Change-Id: I3aae292158e9c8fb6b67524b809310130b18e452
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Include a generic vendor variable for triggering
``requires_full_boot_request`` during boot. This is required because
some vendors, i.e. Lenovo, offer hardware with a variety of BMCs, some
of which require a full boot request.
Therefore, there is a need for a general vendor property variable for
these cases where it is not possible to add a vendor in order to avoid
duplicate entries.
Change-Id: Ic1745ff8eb6744d5a21ca8e0d5580bdb7e466e83
Signed-off-by: Massimiliano Favaro-Bedford <max@stackhpc.com>
Use extended timeout (by default 300 seconds) for BMC firmware
updates to handle BMC transitional states during firmware update process,
unless a different timeout is specified by the operator.
Assisted-By: Claude Code Sonnet 4
Change-Id: I2125ff4cdcbd07a89b364968dda4bb60e059121c
Signed-off-by: Jacob Anders <janders@redhat.com>
The fix for UEFI boot on ubuntu noble in metal3-dev-env [1]
has merged, we should be able to run it now with no
changes.
[1] https://github.com/metal3-io/metal3-dev-env/pull/1497
Change-Id: I5563d79540ce0ab1e299161a1fc9f484ba7cdf7f
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
A recent image update has caused CI to begin failing against anaconda. This change is required to unblock it, and must be backported to unblock ironic-tempest-plugin merges.
Change-Id: I6a8a7baf54f7c0718b897f490671e8c3ac946e45
Signed-off-by: Jay Faulkner <jay@jvf.cc>
This change introduces a new configuration option 'force_inspection_dhcp'
in the inspector group that enables DHCP on all available network
interfaces during managed inspection. When enabled, it:
- Skips injection of static network configuration into virtual media ISO
- Forces the inspection ramdisk to use DHCP on all interfaces
- Automatically enables LLDP collection across all interfaces
- Helps ensure comprehensive network discovery during inspection
This is helpful in scenarios where the inspection network is separate
from the provisioning network. Since the static network configuration
data is typically intended for the provisioning network it may not be
applicable/valid on the inspection network.
Related-Bug: 2113769
Change-Id: I200730069d4177e8c2960e5f3ce8b1bbcca0f062
Assisted-by: Claude Code/claude-sonnet-4
Signed-off-by: Allain Legacy <alegacy@redhat.com>
This change refactors the JSON-RPC client and server implementation to
support multiple configuration groups, enabling different JSON-RPC
services to use separate configuration sections within the same file.
Key changes:
- Modified Client and WSGIService to accept conf_group parameter
- Updated session management to cache sessions per configuration group
- Made configuration option registration reusable for different groups
- Added comprehensive test coverage for multi-group functionality
This enables services like ironic-networking (future) to use their own
JSON-RPC configuration section while sharing the same underlying
implementation.
Related-Bug: 2113769
Change-Id: I8ce46331878f852c9c6a3e6fc9c08c3b9d789fad
Assisted-by: Claude Code/claude-sonnet-4
Signed-off-by: Allain Legacy <alegacy@redhat.com>
Add API documentation noting that instance_uuid does not follow
RFC 6902 behavior in that the "add" operator cannot replace existing
values, preventing race conditions between Nova compute agents.
Closes-Bug: #1310843
Change-Id: I1caaf5b6133d756cf9484d3e5b56f7b8280525db
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Console containers are run as systemctl --user units with the stack
user. Unlike in the locally running case, in a job there may be no
active user session running to allow these units to run. This change
ensures there is a stack user service running, and "loginctl
enable-linger" will start one again at boot time. These actions are only
taken when ir-novnc is enabled.
This change also installs the package slirp4netns for the required
user-mode networking, and adds fake-graphical to the list of enabled
console interfaces when ir-novnc is enabled. enabled_console_interfaces
is passed to tempest.conf so that tempest can run tests or not based on
whether fake-graphical is enabled.
Additionally the console container will bind to a high port on localhost
instead of a high port on the host IP. This still allows
ironic-novncproxy to connect to the vnc endpoint while avoiding iptables
rules.
Change-Id: Ibcd5b7b05c466d898ba69bff35a1e767be3699a3
Signed-off-by: Steve Baker <sbaker@redhat.com>
Making the fake-graphical driver available for all hardware types is
proposed for the following scenarios:
- functional testing with fake-hardware
- integration testing with ipmi or redfish hardware types, but when the
target server is virtual and does not emulate BMC remote console
- cloud operators who want to verify working ironic-novncproxy with
existing real nodes
Change-Id: I9ea46d76ff80bf27571144f1f671acdb06a0fcf0
Signed-off-by: Steve Baker <sbaker@redhat.com>
This logic is broken in a few places when dealing with real world
redirect cases, such as Debian Cloud images redirecting to mirrors.
It seems that the code was written with an incorrect assumption that
requests does not limit redirects by default. It does, the default
max_redirects seems to be 30. We can change it on Session if we need.
The anaconda case is now handled by checking the url field of the
response.
Closes-Bug: #2127154
Change-Id: I200d631e166075ceab80dcd4b0ff596d1860aa3b
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Adds a new physical_network field to the portgroup object.
Adds logic to forbid changing Port.physical_network when said Port is
already part of a Portgroup. Adds logic to Portgroup to cascade
Portgroup.physical_network changes/updates to member Ports of the
Portgroup.
Adds RPC call to update physical_network on Portgroup.
Foundational work for first milestone of trait based port scheduling.
Depends-On: https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/955799
Change-Id: I5a9d9c19182b232bc1b8446644cab0bf6d68d139
Signed-off-by: Clif Houck <me@clifhouck.com>
They are permafailing, the Neutron fix is not ready yet.
Change-Id: Ie5d9f76c97fb08edcd295fdfa82bd0b4539ff410
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
When an `external_callback_url` is configured, we were overriding
only the base IPA API URL and not the inspection callback.
Closes-Bug: #2101173
Change-Id: I5a84907e65ec1282805fa04f0dff75a848e1b09c
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Prior to this change, as reported in bug 2126991, a generated vmedia ISO
would not be created properly using information in the
[conductor]/*_kernel_by_arch and [conductor]/*_ramdisk_by_arch.
This change restores the documented behavior of checking driver_info
first, then checking *_by_arch, then checking the global default.
Assisted-by: Claude Code 2.0
Closes-Bug: #2126991
Signed-Off-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I63197791b4b54072310dfd8525b40044e514ff7f
Despite what the api-ref says [1], the 'bios.[*].value' field included
in responses to 'GET /v1/nodes/{node_ident}/bios' can be null. The
BIOSSetting database model in 'ironic.db.sqlalchemy.models' confirms as
much. Update the schema and api-ref to reflect this. This leaves only
the 'name' and 'created_at' fields as non-nullable.
Change-Id: Idcc03e6ce377ecf6b9db511e3283fb6f2496b037
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-bug: #2127079
The 'SchemaValidator.validate' method (from
'ironic.api.validation.validators') already catches the
'jsonschema.ValidationError' exception and raises Ironic's own
'InvalidParameterValue' exception in its place. Thus, we need to catch
the latter, not the former.
Change-Id: Ic8668afe5a2ff85a5c089c7adae9c8af541f7e84
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-bug: #2127079
Removing the snmp CI job, as it doesn't make sense to execute as we're
going to remove it.
Change-Id: I3da676da959fde5d4c858538888ccd7b0682cb3b
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
There have been reports of firmware upgrades failing on Gen11 iLO
machines with GET NetworkAdepters returning 400s responses. This change
attempts to resolve this by catching the exception relevant to the fault
Change-Id: I62095c2b61d14688d2dcbcdcfd29e9391af2c0ba
Signed-off-by: Jacob Anders <janders@redhat.com>
Resolves a bug where firmware updates fail intermittently on some
hardware models due to invalid or unstable BMC responses immediately
after firmware update completion. The BMC may return inconsistent
responses for a period after firmware updates, causing the update
process to fail prematurely.
This change adds comprehensive BMC state validation that requires
multiple consecutive successful responses from System, Manager, and
NetworkAdapters resources before considering the firmware update
complete. This ensures the BMC has fully stabilized before proceeding.
Generated-By: Claude Code Sonnet 4
Change-Id: I5cb72f62d3fc62c3ad750c62924842cef59e79b8
Signed-off-by: Jacob Anders <janders@redhat.com>
Added multiple redirection response for HttpImageService.validate_href()
function inside ironic/common/image_service.py:
- 301: MOVED_PERMANENTLY
- 302: FOUND
- 307: TEMPORARY_REDIRECT
- 308: PERMANENT_REDIRECT
For all the response, the HTTP server should generate a Location header field
containing a URI as new reference.
Closes-Bug: #2126069
Change-Id: I985b3587984ba78570c3a163c08af58cf8a5d0c1
Signed-off-by: Ettore Simone <ettore.simone@gmail.com>
The default values defined in code are automatically picked up by
oslo-config-generator and added to the config file generated by
the tool.
Change-Id: I4a8db8905baea9d10b49e24a85bb506102cc00ee
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
We've just discovered that this field does not do much. It basically
only affects the create_root_volume/create_nonroot_volumes arguments.
No RAID interfaces use it to populate root device hints.
Change-Id: I2ed780e13c59713127bd7f4ca30269e0c0865440
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Fixes the redfish-virtual-media boot interface logic to provide more
clarity to a user when an error occurs as Dell iDRAC10s do not work
with the present virtual media code, and users should instead use
the idrac interface variant.
Change-Id: I96642a5e9b65eb08c3c42da3e35f376d5e264fbc
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
* Pattern for nova-compute which we don't really recommend
at this point, as nova wants folks to move away from it.
* and cleaned up the drivers page which had older release
references which don't make sense in the current day.
Change-Id: I9c74ae757e2479fc44b86dc9dc4e5f6d8e146b8c
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
So much outdated... Now fixed. Also found a molteniron reference which
should just be deleted.
Change-Id: Icc1069f8465ddcec887f3ba2a8f7338c6f2cba82
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Our docs still talked about tinyipa and some other outdated details
and my AI agent spotted them, so this commit fixes them.
Some of this was also just outdated references, bad links.
Change-Id: I270255854caa917d141573638ab3ba7c5fc4f473
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
While playing with AZ documentation, I realized we were really lacking
on our SEO game. So, this is an attempt to augment our core docs in
a simple way, add metadata tags.
Specifically I asked claude to review and propose tags based upon
the content in each file it considered core documentation based
upon the structure and content.
Assisted-By: Claude Code - Claude Sonnet 4
Change-Id: I3efe42c50dd22e2c03d4e4bbf9746da18c9c6abb
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
A recent ?Linked-In? or ?Reddit? thread I saw bemoned Ironic's
documentation as stale. This perception largely seems to be due
to backwards looking notes in the text.
For example, it doesn't make sense for current docs to talk about
Juno, Kilo, or upgrade to Queens.
And so, after getting some context loaded into Claude, I was able
to ask it to cleanup the docs for versions older than 2023.1.
Mostly, it looks like just cleaning up some sentences of our prior
context so it seems more up to date.
Assisted-By: Claude Code - Claude Sonnet 4
Change-Id: Ic86db90db441909848f7ec566c94e4018e322faf
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
A few recent questions regarding availability zones raised a
question which really needed to be augmented into Ironic's
documentation. This is an AI assistant based attempt to
revise the documentation after forcing it to read all our docs,
and some of nova's and neutron's docs, and then try to fill
in the details in a way which can also be easily found via
searching.
Assisted-By: Claude Code - Claude Sonnet 4
Change-Id: Id9ae61bd6e4a9a65f919c1560eadba1ee42d935c
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
We need to make sure that all artifacts are built in case
ipa-builder is branched after ipa.
Change-Id: Ia6c5f318a9df7cc6179f2b00e6c8a64d7b1a5ba9
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
iDRAC10 requires using the Virtual Media Slot1,
the problem is that some bmcs doesn't return
the resrouces ordered, this causes the provisioning
to fail because we inserted the iso in the wrong slot.
This patch adds support to detect the version
of idrac, this is based on redfish information acessible
via manager.model. With the information about the idrac
version, we can force ironic to use a specific Virtual
Media slot.
Closes-Bug: #2125571
Assisted-By: Claude Code - Claude Sonnet 4
Change-Id: I38c8286c644d93a6f16137bd73f6e267948642b1
Signed-off-by: Iury Gregory Melo Ferreira <imelofer@redhat.com>
60s as a default is reasonable with perfect hardware, but very small
amounts of hardware we deal with are perfect. A 2 minute default is
unlikely to have negative operator side effects and will decrease
conductor and node BMC load.
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: I921171b14a61cded4ba728dfd044746965a02e74
When caching an image between different file systems, the hard link
operation would fail. This is fixed by falling back to a copy
operation.
Assisted-By: claude-4-sonnet
Change-Id: Id1eced2e0a30044b0da7dd5f4f2dedc50a5297b6
Signed-off-by: Riccardo Pittau <elfosardo@gmail.com>
get_ports_by_portgroup_id accepts id instead of uuid.
Change-Id: Ia21beb5675d2c4383a734a5a434d623aa628db6c
Signed-off-by: Kaifeng Wang <kaifeng.w@gmail.com>
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Currently, while IPA still reports the client-id correctly,
existing ports are not getting updated with the client id.
Closes-Bug: 2118579
Change-Id: I0124e9df57783326fcb2e95a38cb2f205d3f64c0
Signed-off-by: John Garbutt <john.garbutt@stackhpc.com>
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The only thing of note is the use of data files: while pbr allowed you
to include directories in a glob, setuptools only allows individual
files. This necessitates expanding out the list of files we wish to
copy.
Change-Id: I65156249c3494708d79789be23afb2d69c194848
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
When using ORAS to upload a file to Quay, the tag points directly to
the manifest, not to the index of manifests. Currently, Ironic is not
capable of handling the former.
First, GET API to a manifest does not accept
application/vnd.oci.image.index.v1+json, only
application/vnd.oci.image.manifest.v1+json, so indicate that we
understand both.
Second, the logic in the OCI image service needs to be adjusted to this
case. If the index is a manifest, it is now treated the same as an index
with only that manifest.
Third, the manifest's body does not contain its own digest. Instead, it
can be fetched from the docker-content-digest header, so get it and
store as a virtual field dockerContentDigest.
As part of the change, I had to refactor identify_specific_image since
it exceeded the allowed complexity with my changes.
Also fix get_blob_url to work with image URLs without oci://. This is
not possible to trigger through the API but ensures internal
consistency.
Also provide more specific error messages instead of piling everything
under the very generic ImageNotFound and provide more logging.
Change-Id: Iba84bbe5da541700d20a445818a4a0d584f1eca8
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This is the OpenStack standard place for it, apparently. Even though
Ironic only has one WSGI server, we still need to add the extra module
layer.
Assisted-by: Claude code
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Change-Id: Ia046124ede28991be6e1fe43b29819f2f28c8c9a
Collect SKU (Service Tag) from Redfish ComputerSystem
and include it in system_vendor inventory data.
This enables downstream tools like Nautobot to access
the Dell Service Tag or other vendor SKU information
during hardware inspection.
The change:
- Safely checks for SKU attribute existence
- Adds SKU to system_vendor dictionary
- Handles missing SKU gracefully
- Includes unit test coverage
Change-Id: I6623fd33b356d6149001c43a7179297a7c8568d8
Signed-off-by: Nidhi Rai <nidhi.rai94@gmail.com>
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html
Change-Id: I717f33d9eaa4cd365a73c72e15ef5a2466835076
Signed-off-by: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Generated-By: openstack/openstack-zuul-jobs:roles/prepare-zanata-client/files/common_translation_update.sh
Add file to the reno documentation build to show release notes for
stable/2025.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.
Sem-Ver: feature
Change-Id: I30d4a763778faa8f2e701b5dd78683d168e5a845
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
This is a good canary for schema-related issues. It seems reasonable to
only run on direct API-related changes. Anything that breaks on the SDK
side should be caught by the SDK job.
Change-Id: I2d6ba3666e569f867fd13b695d16d13e44e3fd44
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The `stack` script has been consistently failing with:
```
Error: Nexthop has invalid gateway.
```
when it comes time to add a route to $PUBLIC_SUBNET_IP [1].
One solution to this is to add the `onlink` flag [2] to the link
mentioned above:
```
onlink pretend that the nexthop is directly attached to
this link, even if it does not match any interface
prefix.
```
[1] daf856cd2d/devstack/lib/ironic (L2583)
[2] https://man7.org/linux/man-pages/man8/ip-route.8.html
Change-Id: Ia6363e6b68de344dd82106077efff86143e63d39
Signed-off-by: Luan Utimura <luan.utimura@luizalabs.com>
Much of the validation already exists here. We're just shuffling it
about. We do end up with some validation, particularly with regards to
the code in ironic.common.inspection_rules.validation, but we can clean
that up in future changes once we've decided whether we're okay with
error message content changing or not (hopefully yes).
Change-Id: Id97b6d524913188cec557e0df64ebc1a2fc3eccd
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This is defines in Nova and is intended to prevent issues like those
seen in change I08e6bef63478df7c69a3f1f9864859a95c2e755e.
Change-Id: I57407ee5db0da88a7c76bf007dc803dd963bbf66
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Using ironic.api.wsgi:initialize_wsgi_app to provide custom config file
to ironic will prevent the oslo_backend to be set to threading and will
bypass the RLock on threading leading to race condition during threads
starts.
Instead, document a way to give config dir/file from environment
variables.
Change-Id: I2aabe72f4d28ec55727e06788d3a9d2976dda4bb
Signed-off-by: Arnaud M <arnaud.morin@gmail.com>
When starting the wsgi application (e.g. using uWSGI), the app will
start using oslo_service eventlet backend.
We explicitely need to force the threading backing until it will be set
the default.
Change-Id: I5190cf2bb68f62448cce3659ee7f42951a304558
Signed-off-by: Arnaud M <arnaud.morin@gmail.com>
Unlike the existing Metal3 job, this one covers a large number of
different Ironic configurations and is also sensitive to performance
regressions on the API layer.
Claude Code was used for the initial pass of converting the existing
Github workflow to Ansible.
Assisted-By: Claude Code
Change-Id: I80490c4ca89ab40d3cdc4ced7964d3dc06cd9a05
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Otherwise hacking can complain about things like unused imports
immediately before ruff goes and removes them.
Change-Id: Ie8809cca3d947a5ecab3de99a8c926581aea212b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Ironic has an issue, we don't run pxe filter in its CI so when
we removed the wait for start logic which is available and possible
with eventlet, we broke the pxe filter.
Change-Id: I8ee7ed7167362438da396aed6980a027ceaaaa72
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
In threading mode the options implemented in oslo.service are not used,
in favor of the options imported from cotyledon.
Change-Id: I8a94bfea5fe9f9f54077e5d958198ede09f78903
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
This reverts commit 907df2c40a
as the centos stream maintainers have indicated they fixed the
build issues as it relates to mirroring the images with yesterday's
push to the CDN.
Change-Id: I8cfcbe83267c3fc28bec117248b7cb3caa42197f
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
This change adds support for collecting and exposing the node's hardware model, manufacturer and system UUID via redfish sensors data interface. This allows better insight into node
identity and vendor information through sensor metrics.
Closes-Bug: #2121630
Change-Id: I62a2e58a8fd7c1c6d3f45e52ec12c4338d8711b4
Signed-off-by: ikoliveira <igor.oliveira@ccc.ufcg.edu.br>
The initial concern was the multi-process model that cotyledon uses.
Commit 71dd34a7bd moved API to the
conductor process, so this no longer applies.
This reverts commit 3831464751.
Generated-By: Claude Code
Change-Id: Iaeabcfb8f6558220a10060ccca788f1f4b959f0e
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Also, fix the json in the example config.json document.
Change-Id: I0c0ad427afdeba6740e1c4ef812f1c7552b32a00
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Ironic has fully removed its usage of eventlet. This change
introduces a new hacking check to ban any new eventlet imports
from being added to the codebase.
This follows the post-migration guidance from the eventlet
removal documentation:
https://removal.eventlet.org/guide/post-migration-guidance/#hacking-checks
Generated-By: gemini-cli and gemini-2.5-pro
Change-Id: I48ac535325f68ba0b2a9f50aec4ce19ac6265e77
Signed-off-by: Hervé Beraud <hberaud@redhat.com>
Ironic, like most openstack services these days, needs to have it's wsgi
module referenced as a module, and used under something like uwsgi or
gunicorn instead of with Apache mod_wsgi.
Assisted-by: claude code
Change-Id: I6a2c2688f73b71f94103622a9e821cab67be053e
Signed-off-by: Jay Faulkner <jay@jvf.cc>
It appears we have an address conflict as it relates to the
ir-novnc service, but we don't need it for these jobs.
So, disable the extra service.
Change-Id: I28fc766f62d9dda93f2d3469eaaec73e63057415
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Indirection proved to impose very high performance costs. Even though
we've identified and fixed a few major contributors to the regression,
using RPC for a process to access itself has always been a stopgap
measure rather than a proper architecture.
This change moves the API from a separate service under the Launcher
to a separate thread in the conductor, similarly to how the JSON RPC
server is started. This paves the way to remove the local RPC entirely
in the next patch.
Change-Id: I0f3d854336bcc7ea1062f9a995e6d8979cb0cc22
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
I don't know why it does not fail in the CI, it fails consistently
locally with:
ironic.common.exception.ConsoleContainerError: Console container error with
provider 'fake', reason: No 'ironic.console.container' driver found,
looking for 'fake'
Generated-By: Claude Code
Change-Id: Ia051a603b2eab17310bc21027666f556dc9a4fc1
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Include the execution of the inspection rules in the redfish inspector
to behave more like the agent inspector behaves and allow for feature
parity between the two.
Change-Id: Ib0e69d361a7336a3f978d948043d651021ba1061
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
First, create_node already returns a complete TraitsList object as
node.traits, there is no need to fetch it again over RPC.
Second, there is no need to make get_trait_names an RPC call since it
operates on the local TraitsList object and does not access the DB.
Change-Id: Ic5563e5d1c36e7a4a8aa4aeeca4bf66255d55e7a
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
On Fedora and Gentoo kernels, this value is insufficiently small. Given
the worst case scenario is 19.2MB of extra wasted RAM for this being
raised, it's sensible to pick a default that'll work everywhere.
Change-Id: I548546e24bef90a27bff70906880f7779c29fcea
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Return 400 Bad Request instead of 403 Forbidden for client-side
URI structure errors.
Closes-Bug: #1673877
Change-Id: I941c867c7a9400f2577de8d489a96628556f5b54
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The function should either return valid inventory data dict or raise an
exception that there is no inventory data. While not likely the type
checker does correctly state it is possible for an error to happen so
remove the possibility.
Change-Id: Iefe5b5e1df034514e4ab29761ff1adf95f2dc2a6
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Move the validation of the inventory and node.properties outside of the
node lock context manager. This does not need to happen inside of the
context manager and allows us to refresh the node object from the DB and
validate that the data was actually saved and not saved.
Change-Id: Ie3c1f8415a2e58553e24ce26e06d01468c34e4c5
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
a node in power failure state indicates ironic does not
have the ability to control bmc, we can directly return
instead of further aync vmedia attach/detach action.
Closes-Bug: 2121118
Change-Id: I9a5a2bfd9f4b538cc7217aefb7333df9ccdb9095
Signed-off-by: Kaifeng Wang <kaifeng.w@gmail.com>
Fixes bug 2121058 by adding RequestLogMiddleware to log API request
details including method, path, status code, and duration. This
addresses the loss of access logging after switching to CheRoot web
server, which doesn't have built-in access log functionality.
The middleware intercepts all requests, captures timing information,
and logs request details after completion using oslo.log at INFO level.
This helps with debugging by providing visibility into request patterns,
response codes, and performance metrics.
Initial prompt: fix https://bugs.launchpad.net/ironic/+bug/2121058
solely in Ironic. The change needs to be in a single commit and unit
tested. Use tox to run unit tests.
The bulk of this change was created by claude-code, reviewed by
claude-code, then heavily edited by me. This also includes a snippet
of code from code review that was created by Google Gemini.
Closes-Bug: #2121058
Assisted-By: claude-code
Assisted-By: gemini
Change-Id: Ifd3b60bb5d773460469414fd0dda65f4a7f000ed
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Adds instructions on how to use abort verb to exit from service states.
Change-Id: Ibf654dc277eb5bc4c38ed1e804bbb6df42d43617
Signed-off-by: Jacob Anders <janders@redhat.com>
This intentionally high CPU overhead function is called for every API
and JSON-RPC request when Basic HTTP authentication is enabled. With the
recent indirection enablement this is causing a performance regression
for Metal3 due to the extra JSON-RPC calls. This change would improve
the performance of all branches of Metal3 if backported.
Change-Id: I2740035d2882aacddca9c541362d6e533140650f
Closes-Bug: #2121105
Signed-off-by: Steve Baker <sbaker@redhat.com>
threading.stack_size() doesn't do implicit type conversion to int, we
need to do it explicitly.
This will hard-crash if this environment var is set to something that's
not an integer. I consider this a feature and not a bug.
Change-Id: Ibf976fb9b290d971525bc0b5488bc7029d6fad8a
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Otherwise, the same message appears twice: on the server and on the
client sides.
Change-Id: I69f2bc9024f2b399ae73d665b971202398f4d2e4
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
Currently, Ironic codebase allows aborting servicing state regardless
of whether a servicing step has abortable flag set or not. This patch
fixes this by adding handling of service wait states to abort code paths
and adding the missing state machine transition.
Generated-By: Claude Code Sonnet 3.5
Change-Id: Ie07490bdb9c6461bd6ac7a6315773dcfb13592f9
Signed-off-by: Jacob Anders <janders@redhat.com>
Back when we developed service, we expected operators to
iterate to fix their issues, but we also put in abort code.
We just never wired in the abort code to the abort verb.
It really seems like we really should have done that, and
this change changes API and Conductor code path to make this
happen.
Closes-Bug: 2119989
Assisted-By: Claude Clode - Claude Sonnet 4
Change-Id: Ic02ba87485a676e77563057427ab94953bea2cc2
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
While trying to create some fake data, I realized the randomizer
code was not running, nor were changes being committed, and largely
the code was still patterned on ipmi, when really we should be
patterning on fake. Also drops the number of nodes to create to 5000,
instead of 10,000, as we're ultimately going to create a fairly unhappy
fake ironic database with this model.
Using the fake interface *and* fake config, provides us an easy
path to begin to benchmark drastic changes to the conductor model
as part of removing eventlet.
Change-Id: I179c842d369eb9a3a60878556559746cca27bcaa
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Until we define these API schemas more strictly, we should allow JSON
patch values to be any type.
Change-Id: If1f3de04d6f5c6948df504f382309bae16ee2b5e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Adds a new category field to the port object. This is foundational work
for the first milestone of trait based port scheduling.
Change-Id: Ica76ae3da08bdf743a495781fe958cb71493a2e7
Signed-off-by: Clif Houck <me@clifhouck.com>
Signed-off-by: Jay Faulkner <jay@jvf.cc>
Also does some minor doc revisions to provide improved clarity,
and some minor follow-up fixes related to prior changes.
Change-Id: I0409da2ad45df06f2dbd1c5cd3c2afd83ec10c32
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
This reverts commit bd69ef1c57
which was a temporary change in order to facilitate the switch-over
from using eventlet to using threading.
Change-Id: I41efb4ed7c63d67fc1f709055727e624717c91eb
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Adds the capability to utilize the indirection API model.
The vast majority of this change is the removal of legacy
comments and the swap of decorators on object methods to
allow the Versioned Objects Indirection API to route the
calls upon objects through the RPC layer to the conductor.
In order to make these calls occur, we need to also send along
a context, which is required to call the RPC layer, which is
fine. Original tests still work in that API surface calls still
ultimately trigger DB API calls in the backend, thus test changes
are actually minimal, similar to very slight changes in trait
retrieval as well. In that we really can't nest objects across
the indirection layer.
Change-Id: Ia161fef67b8a116fdf7cad9b2e559ba75263e196
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
In order to complete the eventlet migration, we need to
be able to use the indirection_api surface to be able to
disjoint the API surface from the message bus, so we're
not fighting between processes for database locks.
Previously, the calls were directly routed directly
to the database layer to streamline performance.
This attachment allows the calls to run through
the indirection layer to the object, and return
the data to the caller.
As a result, API RPC versions are incremented.
Change-Id: I7358ac2a70198c78a0a9ba48511fc1289c64294f
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The VNCProxy does some things which must be done in the parent
process, i.e. signal handling. The only compromise to keep it
in the parent process, is to utilize no_fork.
This is a compromise as the new process model works nicely with
systemd, so only tries no_fork if VNC is enabled with the single
process mode.
Change-Id: I45aa9f4ffad946c59ca3f551c971b979fbc64efd
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
This change removes eventlet usage and explicitly invokes
threading instead of eventlet.
To do this, it also revises signal handling because this
change also moves from an all in one process to a multi-process
runtime model. The side effect of this is an increased memory
footprint, depending on the process launch model.
Change-Id: I184245d67edb1a2543aa24654836392f38777d71
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
As part of the broader effort to remove eventlet support from Ironic
this patch updates the conductor’s worker pools to use
`futurist.DynamicThreadPoolExecutor` in place of
`futurist.GreenThreadPoolExecutor`
Although we are now explicitly configuring DynamicThreadPoolExecutor,
eventlet monkey-patching remains in effect, meaning these threads are
still cooperative green threads underneath.
One or more follow-up patches will be required to remove monkey-patching
entirely, switch backends and service launchers to native thread-based
implementations.
The goal here is to decouple from `GreenThreadPoolExecutor`,
which is deprecated, and start validating behavior and tuning under the
real thread interface. To this end, we also change the rejector out
from the futurist provided default rejector, to one which better models
the threaded behavior Ironic will exhibit once Ironic is launching
in threading mode.
Note: This does *not* complete the migration, but is one of the last
major steps before we explicitly begin to remove the invocation of
eventlet.
Depends-On: https://review.opendev.org/c/openstack/futurist/+/955217
Change-Id: I433b0b51f80d7c238e8109878b5c8bc15f9f5849
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
In order to merge eventlet removal patches and fixes related to
metal3, we need to merge four separate changes in series to get
the metal3 CI job back into a happy state. This is because:
* Eventlet removal changes the process model to sub-processes
* Metal3 Integration uses sqlite, which we've learned in the past
can have locking issues between processes.
* The fix requires removal of direct database calls from the
API surface, and instead for calls to be routed through the RPC
layer.
Once the changes are merged together, the metal3 job has been shown
to work in other test runs, so we have high confidence overall, we
just unfortunately need give ourselves the window where the job is
not passing to merge sequence of changes which will be stacked
after this change.
Change-Id: I3e6cb1c25b04ff965fa40ff6dbac9bd1bb53c44b
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Building docs locally may build with stale extension content without
this change.
This change also adds the required bindep.txt packages for the pdf-docs
tox target.
Change-Id: If2264241d20ab0286b1b5ea723ef370c4d772693
Signed-off-by: Steve Baker <sbaker@redhat.com>
A new periodic task to automatically remove conductor
records that have been offline for longer than a configured timeout
period. This addresses the issue where deleted or decommissioned
conductors would remain in the database indefinitely.
Closes-Bug: #2069771
Assisted-by: Claude Sonnet 4.0
Change-Id: I90eb159abad94d8369b8792fa17c20d80201569a
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Add missing state machine transitions from SERVICEFAIL to SERVICEWAIT
and SERVICEHOLD for reserved wait/hold steps.
This fixes the edge-case where nodes in service failed state would
incorrectly transition directly to active state when wait/hold steps
were executed, bypassing expected intermediate states.
Closes-Bug: #2119990
Change-Id: I0a55ad45138c4d033570014bf45956dacaf11e72
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Adds a new vendor field to the port object. This is foundational work
for trait based port scheduling.
Depends-On: https://review.opendev.org/c/openstack/ironic/+/957166
Change-Id: Ifce7da0a123e9f36a83f1a6a34759b25c9b2e416
Signed-off-by: Clif Houck <me@clifhouck.com>
Recently networking-baremetal got a patch to leverage a relatively new
rpc call against neutron called report_state. However... in late 2024
neutron's RPC executions were split from the API, and grenade never
learned to restart the new service.. and that may also be intentional.
Until there is clarity on that front, the path is clear... to restart
the service.
This should clear up the overall grenade job execution failure.
Closes-Bug: 2118780
Related-Bug: 2117227
Change-Id: I9f942df892783a85387e6feb3a8bdb5396103e15
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
To avoid auto-clearing admin-set maintenance during recovery,
only set maintenance/fault on max retries when not already
in maintenance mode.
Closes-Bug: #2119618
Change-Id: Iac00adc1c5335a426c5411fdce2fa7911a42ad14
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The meteal3 job has traditionally operated as a very tight memory
condition for quite some time. Specifically running minikube, ironic,
and two VMs in 8 GB of memory also influenced the job design.
However, with the removal of eventlet, this memory footprint and
process model swells a little bit creating conditions where the
job is essentially guarenteed to fail. Further discussion with
Riccardo, one of the other Ironic/Metal3 contributros yielded
that they were already thinking of increasing the size of the
VM because they had been encountering memory issues.
Given that this job already runs lean, albeit with no swap
which is intentional to prevent disk IO generated noisey neighbor
conditions, the only real choice is to just increase the VM size
to the next realistic size in OpenDev CI.
Change-Id: I87f9c94e6585347d8a35a1d04dd7d101a9e68261
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Fix race condition where Redfish power commands fail with
BadRequestError when hardware reaches desired state between Ironic's
state check and command execution. Now verifies actual power state on
exception and treats as success if already in target state.
Closes-Bug: #2119423
Change-Id: I46f84318ae28498901000283bef49c46260f80ea
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The local RPC code previously checked for IPv6 availability by examining
the existence of /proc/sys/net/ipv6/conf/lo/disable_ipv6 and reading its
contents. This approach was unreliable as it depended on filesystem
checks rather than testing actual IPv6 functionality.
The fix replaces the file-based check with an actual socket binding test
to ::1 using a context manager for proper resource management. Socket
reuse is enabled to prevent port conflicts. Debug logging is added when
IPv6 is unavailable. Unit tests are updated to provide comprehensive
coverage of the new implementation.
Change-Id: I1e3afabc78f1382ff5248707ff2ca8114d10dd90
Generated-By: Claude Code Sonnet 4
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
The variable rpc_node is assigned in a try block but used
outside of it. If the try block fails, rpc_node will not be
assigned and an error will occur.
Change-Id: Ie02c724c3165dd66aedaf14cec80a3c760d62dfe
Signed-off-by: Tatiana Kholkina <t.kholkina@maxima-int.com>
The logging for verify_streps used the deploy-steps attribute,
leading to incorrect logs. Correct the logging statement to use
instance.verify_steps to accurately reflect the verification steps.
Change-Id: I6c9d415913e468b8e529ff88066fda81c4c6b456
Signed-off-by: Tatiana Kholkina <t.kholkina@maxima-int.com>
We maintain the different HTTP error codes for GET operations (HTTP 404
(Not Found)) versus other operations (HTTP 405 (Method Not Allowed))
when requesting an earlier microversion.
Change-Id: I8e215b483f0c2d7b25657fa413296415629fe96e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Added the redfish inspection interface to be able to run inspection
hooks in the conductor on the inspection data in the same way the agent
inspection interface operates. Removed the special casing of the
processor and the PXE interface since we now use the standard hooks to
populate this data. Fixed up some tests on node properties to now show
that the data matches correctly.
Change-Id: Ia8db39b4818a981fe0ff76f5c9441c98d1b442ed
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The processor inspection tests actually are looking at the instruction
set value and expecting it to be x86_64 while the processor architecture
value is x86 which we map to i686. The current tests confuse this by
mixing the two different return values from the get_members() mock and
the summary mock so it doesn't break things but later usage of redfish
inspection breaks the existing hook tests so fix up the mocks to be
consistent. The _get_processor_info() function additionally operates on
an out-parameter which is a bit awkward in Python so use the more
idiomatic return value to set the expected data. Lastly the cpus field
on the node properties is not a typical field populated by other
inspection methods so drop it entirely. Then match the behavior of
ironic-inspector and ironic-python-agent by ensuring the architecture
and count fields are always set in the cpu inventory data.
Change-Id: Id99d6948f8bef73302281e84411b2263716a278f
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Adds a release note for the fix in patch 955432.
Related-Change: https://review.opendev.org/c/openstack/ironic/+/955432
Change-Id: I2703e91ee4a3bdb65a17d5ad8511fdc909ee2262
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
Its easier to configure NGS switches via their hostname.
LLDP is usually reporting the hostname. Putting that into
switch_info certainly helps where you don't know the switch mac.
Change-Id: Ibcc1604c2792936a51ad2ef9fe6b8b9c1dd18289
Signed-off-by: John Garbutt <john.garbutt@stackhpc.com>
The mock data for redfish inspection was not correct after the change in
Ide947f410c3d0d0f67a735c18b30f8cb56caa6b3 which wasn't detected in the
existing tests but follow on code shows it.
Change-Id: I5eb876b35437e0adc7310814b64c48b103a52294
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This updates admin documentation for the Redfish-based manual
clean step `set_bmc_clock` and automated `verify_bmc_clock`.
- Updates the cleaning.rst with an example of using `set_bmc_clock`.
- Adds a section to the steps.rst for the `verify_bmc_clock`.
- Explains how to enable the verify step using the
`enable_verify_bmc_clock` config option under `[redfish]`.
Change-Id: Ied7fa289178510ccb0ceaa790fe8d89ed7e481b6
Signed-off-by: Queensly Kyerewaa Acheampongmaa <qacheampong@gmail.com>
via Redfish Manager
This patch adds two new capabilities to the Redfish management
interface in Ironic for setting the BMC clock:
1. A manual cleaning step (`set_bmc_clock`) that allows operators
to set the BMC clock explicitly by providing datetime and
timezone offset.
2. An automated verify step (`verify_bmc_clock`) that, if enabled
via configuration, sets the BMC clock during node verification
using the current UTC time.
These steps aim to prevent certificate validation failures caused by
incorrect BMC time, particularly when dealing with TLS certificates.
A new configuration option `redfish.enable_verify_bmc_clock` has been
added to control the automated verify behavior.
The minimum version of `sushy` has also been updated to is 5.7.0
to support these features.
Related patches:
- https://review.opendev.org/c/openstack/sushy/+/950539
(Add support for Manager DateTime fields in sushy)
- https://review.opendev.org/c/openstack/sushy-tools/+/950925
(Fix Manager DateTime field handling in sushy-tools)
Partial-Bug: #2041904
Change-Id: I75cbd39a60f8470224dc5a2fe0a4f17c22acd1cd
Signed-off-by: Queensly Kyerewaa Acheampongmaa <qacheampong@gmail.com>
The fail_on_port_binding_failure option is actually registered so we
can safely assume the attribute is always present.
Change-Id: I4157a9dcf1f94904b5b09efcd6fc78d0b2983fda
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Fixes the issue that when accelerator devices were
removed from node, a re-introspection does not remove
previous accelerator device information from the node.
Closes-Bug: 2118958
Change-Id: Ia57902a9095b78e0128a728693c03e0a01c6421b
Signed-off-by: Kaifeng Wang <kaifeng.w@gmail.com>
Log steps performed during step-based flows in Node History
at the beginning and at completion (or abort).
Closes-Bug: #2106758
Change-Id: Ieffacf174180036d6a2418a8faf72a94eea74fb8
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
The snmp job has started failing because we don't have
an event reconcilation loop for asyncio in the main service.
It seems somehow the asyncio version of pysnmp is slipping into
the CI job which ultimately is breaking us at this point, although
I see no actual state of it in the logs. Its just weird.
In any event, it is a known issue.
The other issue is the grenade jobs are failing on neutron upgrades.
This appears to be due to legacy names being used in the CI job
configuration.
We should fix those, but we've got an open bug for that now:
https://bugs.launchpad.net/ironic/+bug/2118780
Change-Id: I1fbe4b0c519b5911db6f92e2963df99a882fa317
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
While looking at the threading issues related to the removal
of eventlet, it dawned on me that it wouldn't be a bad idea for
Ironic to log a warning suggesting corrective actions an operator
could take for non-ideal performance behavior.
For example, if we're not launching enough power sync workers, or
enough sensor data collection workers, then the check interval
begins to become the minimum, and the task just re-launches
after the sweep.
That, itself, is not a huge issue except it can begin to reduce
the meaningfulness of and ready reliability of the status data
in larger clusters, where operators should likely take actions
which may be increase the workers, or add conductors.
Change-Id: Ic9277c5389c7e8f2d68e72bf6338a4f509989e75
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
This change enables Ironic to skip initial reboot to IPA when
performing out-of-band firmware.
Change-Id: Id055a4ddbde3dbe336717e5f06ca6eb024b90c9f
Signed-off-by: Jacob Anders <janders@redhat.com>
Fix the package name for grub image (now we should use one of -ia32
and -x64). Also the image file paths are different in RHEL.
Change-Id: Id3972c28bb781b97d8a0070edbbc00fc63734aa8
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
The xinetd package is no longer available since CentOS Stream 9, and
the pre-set service to server tftp server by dnsmasq is now provided
by a sub package in RDO.
Change-Id: Ia62c3bba5bc83a262e1e6bec692dfcf8ec63f56b
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Installation guide for oepnSUSE and SLES was removed due to removal of
their OpenStack packages.
Also drop explicit mention of RHEL/CentOS versions from the description
common for at least CentOS Stream 9.
Change-Id: I8555b8ea5ac5dcc07112773c2d4e9668e038d859
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
This commit adds documentation about the support for NIC
Firmware Update via Firmware Interface
Change-Id: I0514d6122a38e00ac4150a5693c1f4ffa788ffda
Signed-off-by: Iury Gregory Melo Ferreira <imelofer@redhat.com>
When a node recovers from a BMC failure via power state sync, the
`last_error` field may continue to persist stale errors on an otherwise
healthy, operational node, which gives a misleading impression of failure.
If `[conductor]node_history` is enabled, error traces are preserved,
so it's safe to reset `last_error`.
Change-Id: Ia42bdc00a5ab3191c64d223eeae978994fb96868
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
We no longer have to use the generic execute since rootwrap was
removed.
Change-Id: I2e700c9ba5b5740aaf3b89172423d23f51177f39
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Ironic no longer uses rootwrap since iscsi deploy was removed. See [1]
for details.
Remove the config files and the command for rootwrap because these are
useless. We can remove these from the repository first to force distros
get rid of these.
[1] be09717be2
Change-Id: I0a8e26e8990eae8108537541159f7810d35b70f1
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
validate-interfaces inspection hook expect each interface
entry in the inventory to include a 'name' field.
The 'identity' attribute from the Redfish EthernetInterface resource is used to populate this field because it provides unique identifier for each interface.
Change-Id: Ide947f410c3d0d0f67a735c18b30f8cb56caa6b3
Signed-off-by: haseeb <syedhaseebahmed12@gmail.com>
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html
Change-Id: I773fafe045897fb56fe03cb9c4c6e6fd1fecaac9
Signed-off-by: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Generated-By: openstack/openstack-zuul-jobs:roles/prepare-zanata-client/files/common_translation_update.sh
While looking at issue reports, I noticed we are likely stressing
the ramdisk too much and running it out of space.
Anyhow, one of our CI jobs fails more than others, and it needs to
be swapped around so the image is not downloaded, then converted in
the ramdisk. It is the only job which does it, and we should keep that
behavior, but we need to get CI in a happier place first.
Related-bug: 2116135
Change-Id: I77c30c370cf5288703663e495ab9e60f3e8a7b2e
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
This change modifies the API response schemas for firmware to make the 'created_at' and 'updated_at' fields mandatory, matching the actual behavior.
Generated-By: Cursor
Change-Id: I777dff47e43bd3c8e307fe1461f1b2172513c682
Signed-off-by: Sharpz7 <adam.mcarthur62@gmail.com>
It used to be an optimization for single-process Ironics with RPC
enabled. Unfortunately, this approach won't work after the migration
from eventlet.
I don't expect this change to be user-visible, although purely
theoretically there may be edge cases with misconfigured RPC that get
broken.
Change-Id: If4be34178f920ab6b2f6318b15a490672bd6ec3d
Signed-off-by: Dmitry Tantsur <dtantsur@protonmail.com>
This commit aims to add support for NIC firmware updates.
Since vendors don't have a standard for naming NICs, we
will be adding a prefix "nic:" to the NIC name.
Closes-Bug: #2107998
Change-Id: I7090534ad66a77ca103f2ca154f9b6feea818e88
Signed-off-by: Iury Gregory Melo Ferreira <imelofer@redhat.com>
In preparing to remove eventlet, we discoverd eventlet was hiding
three distinct issues in our tests.
1) Eventlet was hiding some issues, like... some tests explicitly
trying to stop a fake service.
2) Our tests at time crossed the boundry of a thread. While not
awful, and provides us a higher level of assurance.
3) Some of the bad patterning sort of amplified in some of the code
which we'll eventually need to sort out.
In any event, The cleanest path was to make a test centric start
method which does the basic needful for the tests, *and* which also
explicitly launches Synchronous Executors as opposed to threads.
This pattern can be leveraged to start to sub divide the actual
service launches at some point, while also keeping explicit
synchronous executor behavior which will be absolutely required
for use of threads in the post-eventlet world.
Change-Id: Ib75406a3e386b78197ea78cce291c3bd295d20cf
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
When doing firmware updates for BMC, we saw cases where Ironic wouldn't
be able to contact the BMC, marking the node in a failed state because
of it.
This patch adds a configuration option that tell for how long ironic
should wait before proceeding with the reboot to finish the update.
We will attempt to improve the waiting time in a follow-up, trying
to identify when the bmc was unresponsive and when it was back.
Closes-Bug: #2092398
Change-Id: I53ffc8a06d5af8b0751553c3d4a9bb1c000027ae
Signed-off-by: Iury Gregory Melo Ferreira <imelofer@redhat.com>
The value for some property in the sensor data is a dictionary
e.g.: 'state':{'_value_': 'Enabled',
'_name_': 'ENABLED',
'__objclass__': "<enum 'State'>"}
Which doesn't add useful information, this commit attempts to
prioritize sending only the `_value_` when possible.
Closes-Bug: #2113877
Change-Id: I44d07b637357040a5b9d1a975a50e6cbccfb43db
Signed-off-by: Iury Gregory Melo Ferreira <imelofer@redhat.com>
Removing the eventlet invocation in the unit tests which matches
the base change. Given the slight syntax change, its not really
a big deal, just cleanup.
Change-Id: Ia007e08dff44e4530f22ad344aaf2f8889000763
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Took a look at the green subprocess invocation in ipmitool.py
and it became clear we can just remove it at the same time as
the monkey patching.
Change-Id: Ifd1c296fc61a86dcf1f3d32b3f5f166bcec8f74b
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Updated the change_actions method in inspection rules to account for
unexpected 'loop' values being added to the actions field by the API.
Although the API is not expected to include 'loop', it's still
appearing in some responses, causing issues.
This update ensures such cases are handled gracefully.
Related-Bug: 2105478
Change-Id: I9e03cd8d7640037cb98bc2586ddc03ed264ce2ac
While exploring removing the invocation of eventlet.hubs.use_hub()
from the novncproxy service code, I discovered that we needed to
go ahead and optionally invoke the eventlet.hubs.use_hub()
depending on if the service launch has occured such that the runtime
has been monkey patched.
This is because the service does not operate properly when moneky
patched without the eventlet.hubs.use_hub() invocation. Similarly
if the eventlet monkey patch is removed entirely then the service
works as expected.
Change-Id: I976086d1645a3a8cc8b169adbeafdb2522452153
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
Add option in ironic.conf's condutor section:
`[conductor]/graceful_shutdown_timeout`.
Set `deprecated_group=DEFAULT` and add `deprecated_reason`.
Change-Id: I7dcc66802d0dc2baef7a932458ae255bfa0f9514
This reverts commit 9406b44657.
Reason for revert:
There is still on-going discussion about the new license expression
and it may be changed in a near future. Until we get more stable
conclusion let's stick with the older format. See [1] to find details.
[1] https://review.opendev.org/c/openstack/nova/+/951226
Change-Id: I7e0ec1df01b9fea9e86d961427a1edc6342f224c
We're seeing some CI jobs fail due to the ramdisk running out of
storage. This should increase the memory allocation slightly to
overall hopefully enable CI jobs to pass cleanly without issues.
Change-Id: Iec639cfc029065e378eb69f09200bf92d2313ee0
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
The various *_by_arch configurations were being applied inconsistently
across boot drivers. This unifies the logic across the drivers and
ensures consistent setting of kernel/ramdisk.
An inconsistent setting of kernel/ramdisk is any case where we can't get
*both* the kernel and ramdisk to use from the same configuration type.
For instance, a node-level override in driver_info that specifies a
deploy_kernel but no deploy_ramdisk. These represent a misconfiguration
at a base level, since kernels and ramdisks must be tightly coupled.
This adds validation at startup such that if any X_ramdisk_by_arch and
X_kernel_by_arch configurations exist that are not mirrored -- e.g. a
kernel exists for aarch64 but no ramdisk, Ironic will print a warning
log on start.
We've also unified behavior when inconsistencies occur at runtime:
- If CONF.conductor.error_on_ramdisk_config_inconsistency is True, an
inconsistency will cause an exception to be immediately raised,
failing whatever operation was needing to boot a ramdisk to
immediately fail.
- If CONF.conductor.error_on_ramdisk_config_inconsistency is False,
Ironic will fall back to a less specific configuration -- for
instance, if driver_info[deploy_ramdisk] is set but not
driver_info[deploy_kernel], Ironic would fall back to the
next-less-specific option, the deploy_*_by_arch config options. If
those are inconsistent, we'd fail back to deploy_kernel/deploy_ramdisk
-- the global default.
Previous behavior varied by driver, but in the worst cases would combine
a deploy_kernel from one level (e.g. driver_info) and deploy_ramdisk
from another (e.g. global default) or vice versa. This behavior is
considered a bug as kernels are matched up with ramdisks and generally
are not interchangable.
We expect at a future Ironic release to enable strict validation of
ramdisk/kernel consistency.
Closes-bug: 2097798
Change-Id: I429a651894be4b31a6faa5dfac0f58dd75ce8f79
A bug was observed in nova behavior of interacting with ironic where
metadata could be missing in the payload to Ironic. While not great
it is also not awful. In any event, If there happens to be no DHCP
the lack of an MTU can be very problematic and result in a mismatch
between environment configuration and node state.
Closes-Bug: 2110322
Change-Id: Iea85ac4789d646dc85d0d8b22aa8e596b246234b
Serve Ironic REST API & JSON-RPC via Cheroot instead of eventlet
Claude code used to find and fix minor issues when running the service.
Generated-By: claude-code
Change-Id: I5440c5898abfe525807434447b69ec4a32e56f2d
By putting CI job descriptions into the place they are defined, it will
be much more difficult to forget to update the documentation.
Change-Id: I7836fa3d2f6adf6a97762a6cd13b92177a2cd12e
This adds documentation for the new automated cleaning by runbook
feature.
LLM tooling created an extremely basic draft which I enhanced.
Generated-by: claude-code
Change-Id: I504f5e04c46190b09013875c5d99eb1f0298f2a0
This enables an operator to override Ironic's autogenerated cleaning
step functionality, instead providing a runbook to be used for
automated cleaning.
Operators will be able to configure runbooks globally, by resource
class, and as a node override. Configuration exists to enable/disable
this functionality at the will of the deployer, and defaults to
maintaining existing behavior. Runbooks are also validated, by default,
against node traits and will fail cleaning on a mismatch; this
behavior is also configurable.
Unit tests generated and fixed by the various different AI agents I've
been trying out through the lifetime of this change, then heavily edited.
Generated-By: Cursor, Jetbrains Junie, claude-code
Closes-bug: #2100545
Change-Id: I7c312885793ee72b1ca8c415354b9e73a3dac9d7
Add opts for inspector related groups auto_discovery,
inspection_rules and pxe_filter to ironic/conf/opts.py so
that these groups/options are included in sample config
and documentation.
Change-Id: I02da059af77b984e9075568d17aecc033e565b45
In If6125cf7af84dd1b4fa869f13932c43ac013d443 we identified a few different
minor issues, some syntax, and one concern over metadata and the data
length. In this patch, we address final pass reviewer comments *and*
shorten the interface ID values to more generic values which
fullfill the named label/id value in the configuration drive
schema, and are also values one could potentially use at some point
down the road with vlans.
Instead of "eth" as a prepended field name, we use "iface", since
we can't have any value which really suggests a direct interface
name mapping either.
Change-Id: I637412591ae57a8591a1a948f54be56785a8a1e3
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
A KeyError for 'built_in' could occur in `ironic.api.method`
when processing node inspection rules if an invalid priority
(e.g., -1) was provided in a POST request. This resulted in a
500 server error.
This issue was identified while writing tests for inspection rule
priority as part of the work for LP#2105478. The
server-side error was observed in logs during POST requests to
`/baremetal/v1/nodes`, notably triggered by the
`tempest-TestInspectionRules` test, as seen in the log entry:
"ERROR ironic.api.method [...] Server-side error: "'built_in'""
This patch ensures the 'built_in' key is correctly accessed or
handled within the inspection rule processing logic, particularly
when validating or processing rule priorities, to prevent this error.
Related-Bug: 2105478
Change-Id: I3ecf95d316687bc6b82d28cdd945eaba8115aedf
The only thing to watch for is that GET calls fail with HTTP 404 (Not
Found) before this microversion, while POST, PATCH and DELETE calls fail
with HTTP 405 (Method Not Allowed). Also, PATCH was introduced in a
different, later microversion to the others.
Change-Id: Ie9a6f2d282bfc97a27e868cb66f56f840b9c6a0d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
It doesn't make sense to return a HTTP 500 in production and after
everything has actually been processed just because schema validation
failed. Instead, we should only do this in test environments. Make it
so via a new config option which is now set by default in unit and
integration tests.
Change-Id: I3cac41e024d569dfe05f21767d90d585f54e3eac
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Adds an advanced operations standalone test which utilizes a new proposed
tempest test to execute against the API which exerises the dhcp-less
virtual media path AND passes the node through the rebuild scenario case
which has been identified as problematic in the past.
With this, we know it works, so \o/.
Presently as non-voting.
Change-Id: Ibb6f9228672966c3708227e37bead6a45648e177
Functionally, the behavior is a little different from Nova's metadata
generation. Specifically nova exposes internal interface names in the
form of the name of taps to the instance, and in the context
of baremetal, that is much more noise than signal. Furthermore, Nova's
network_data generation is a bit simplifed in that it also just refers
to networks as network[0,1,n]. Ironic uses the actual Ids instead.
It also looks like Ironic is properly injecting MTUs... which is likely
an entirely separate bug which we can fix later. Filed as
https://bugs.launchpad.net/ironic/+bug/2110322
Where Nova would generate something like:
{"links": [
{"id": "tap397cd29c-f2",
"vif_id": "397cd29c-f267-4925-bd3f-58d39bfd685a",
"type": "phy",
"mtu": null,
"ethernet_mac_address": "52:54:00:b1:25:62"},
{"id": "tap88fd06bc-c0",
"vif_id": "88fd06bc-c006-46f6-bc9a-ff04a7ee0779",
"type": "phy",
"mtu": null,
"ethernet_mac_address": "52:54:00:a8:f0:64"}],
"networks": [
{"id": "network0",
"type": "ipv6_slaac",
"link": "tap397cd29c-f2",
"ip_address": "fd34:4b51:21c8:0:5054:ff:fe53:5cb4",
"netmask": "ffff:ffff:ffff:ffff::",
"routes": [{"network": "::", "netmask": "::", "gateway": "fd34:4b51:21c8::1"}],
"network_id": "fef900f0-fb86-4f6e-9d24-d379b6a57f9c", "services": []},
{"id": "network1",
"type": "ipv4_dhcp",
"link": "tap88fd06bc-c0",
"network_id": "a73bc031-3984-4f40-80d1-a4d9d62a4caa"}],
"services": []}
Ironic neutron integrated VIF code generates:
{"links": [
{"id": "privateport",
"type": "phy",
"ethernet_mac_address": "52:54:00:b1:25:62",
"vif_id": "397cd29c-f267-4925-bd3f-58d39bfd685a",
"mtu": 1380},
{"id": "privateport2",
"type": "phy",
"ethernet_mac_address": "52:54:00:a8:f0:64",
"vif_id": "88fd06bc-c006-46f6-bc9a-ff04a7ee0779",
"mtu": 1430}],
"networks": [
{"id": "e070764c-7641-4b35-80db-6056a8937193",
"network_id": "fef900f0-fb86-4f6e-9d24-d379b6a57f9c",
"type": "ipv6",
"link": "privateport",
"ip_address": "fd34:4b51:21c8:0:5054:ff:feb1:2562",
"netmask": "ffff:ffff:ffff:ffff::",
"routes": [{"network": "::0", "netmask": "::0", "gateway": "fd34:4b51:21c8::1"}]},
{"id": "1923957a-94f2-4169-a1bb-b133b342c2b6",
"network_id": "a73bc031-3984-4f40-80d1-a4d9d62a4caa",
"type": "ipv4",
"link": "privateport2",
"ip_address": "192.0.2.96",
"netmask": "255.255.255.0",
"routes": [{"network": "0.0.0.0", "netmask": "0.0.0.0", "gateway": "192.0.2.1"}]}],
"services": []}
Related-Bug: 2110322
Closes-Bug: 2106073
Change-Id: If6125cf7af84dd1b4fa869f13932c43ac013d443
Adds a new configuration option ``bootloader_by_arch`` to support
architecture-specific ESP images for virtual media boot, similar to
how ``pxe_bootfile_name_by_arch`` works for PXE.
Closes-Bug: #2110132
Change-Id: I54fb4b2f379c2d06a7c49402d32403aa2ee67e70
Signed-off-by: Afonne-CID <afonnepaulc@gmail.com>
We've long ago pruned out the rest of the explicit partition image
testing, so explicitly remove build process to save CI time and
resources while also making the logs a bit more straight forward.
Change-Id: I93dcce21bffe473fa2d708d7dc6439db8a042e50
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
- Make DHCP setup compatible with q-dhcp or neutron-dhcp
- Fix restacking when dnsmasq package has already been installed
(without the explicit removal of dnsmasq; the dnsmasq-base removal
errors due to dependencies)
Change-Id: I7eafc34ff71b84a0ae5199db95cb89bdd7abbf29
This change moves multinode jobs to be leveraged across multiple
"compute" nodes with an increased amount of memory, which increases
the overall test resources available and limits controller node
hot spotting for deployment operations.
This effectively chagnes multinode jobs from being a single
compute node with a single controller node, to two compute
nodes and a single controller node. The controller node's
hosted virtual machines is also dialed back.
This was done to eliminate usage of tinyipa in favor of a more
realistic Centos based IPA ramdisk, and also removes fallback
logic to use tinyipa on more limited resource nodes.
Change-Id: Ib52f7039072901ce72ac96e660d35a10cca59737
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
We long ago moved past use of baremetal_admin and baremetal_observer
roles in our interactions. At this point, they just make sense to
remove from devstack since otherwise the code path is outdated.
Change-Id: I14caae1fc81512bebbc150da8a965449fcae63cf
The [inspector]update_pxe_enabled configuration option controls
whether the pxe_enabled field of a Port is updated during node
inspection. This patch adds logic to honor that setting.
Change-Id: I3e28e439b386c9f73e377b62513346bcadbd56b2
... according to the following warning from setuptools.
SetuptoolsDeprecationWarning: License classifiers are deprecated.
********************************************************************************
Please consider removing the following classifiers in favor of a SPDX license expression:
License :: OSI Approved :: Apache Software License
See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
********************************************************************************
Change-Id: Ie5d16633d3094a98a4b20215683a58976c96fe65
Drop the last eventlet usage in pxe filter script for the native
threading equivalent. Behavior "should" be identical.
Change-Id: I2b490a78288ad477131dbe60ea64d7ea905953ec
The inspector inspect interface seems to only need a
fire-and-forget background call to ironic-inspector.
Replace `eventlet.spawn_n` with native threading,
excising the last runtime dependency on `eventlet` in this file.
Change-Id: I07172e48207e09c0858298e34eea038c776d3c74
This reverts commit 5f7c7dcd04.
Reason for revert: seems to be causing an authentication loop between sushy and sushy-tools
Change-Id: Ic6142bcfa7ea2746d704625548edbef3bb57cad1
Our CI broke this morning with an error around start_neutron_api
function not existing. That function was removed from devstack
in Nov 2022 in a52041cd3f067156e478e355f5712a60e12ce649.
Upon further research, is_service_enabled neutron-api appears to have
been returning false, even for neutron-enabled jobs, for a long time. A
recent fix to this behavior in devstack exposed this dead code and the
lurking breakage which we've now experienced.
I'm making the assumption that since our CI has been fine for 2.5 years
without this code block running, it'll be fine for now too. We likely
want to follow up on if missing these calls have a side effect.
This change also disables voting for the
ironic-tempest-ovn-uefi-ipxe-ipv6 CI job which is in a similar state
as the other CI jobs which use networking-generic-switch.
This is a side-effect due to a lack of the uwsgi launched neutron having
the configuration files to load plugins. That issue is being worked
separately and once networking-generic-switch is fixed, the job will
be returned to voting status.
Change-Id: If47e74751ba66a1296f16d9c43433033c04beffb
This change involves:
- Moves ironic-standalone jobs to use 32GB nodes which is a
relatively simple change.
- Changes other jobs excluding multinode jobs to use DIB image
builds by default.
- Changes one of job names to remove tinyipa from the name.
- Also notes a job which can be removed, but removal will be in
a later change... and adds a release note in case anyone looks.
Change-Id: If9110c8f5041428df3e59f40fe0cb71bcf8580a8
A system object in redfish with no processors in the wild has been
observed causing system.processors to return an empty list -- not the
None we were previously checking for. Now, instead, we check truthiness
which will fail on an emptylist as well.
Change-Id: I9aa4eba54a66643b20fa6b493e63eab60aecb3d8
While investigating https://bugs.launchpad.net/ironic/+bug/2110698
I noticed you could end up with an additional exception raised which
would be considered a hard failure. Overall, the code path looks okay
for it, its just not in the doc strings, so I figured easy enough to
fix it.
Change-Id: I528b6f327fac95cdb8b620843c073fc9cdec1833
Fixes an error that node does not move to failed state when
removing vif failed due to unexpected errors during tear down.
Closes-Bug: #2110917
Change-Id: I362aa42ff30696e9ec18239b316d03e65a1a65d1
The 'args.validate' decorator currently performs two functions:
transformation of serialised parameters into their unserialised formats
(i.e. a CSV string into a list of strings) and validation of the
parameters. While we are replacing use of the function for the latter,
we are retaining it for the former (at least for now).
The JSON Schema schemas rely on the transformed values, so we need to
ensure this transformation happens _before_ we run schema validation.
Ensure this is the case by looking for side-effects of the schema
validation decorators and erroring out if they're found (since that
implies they've already run). See [1] for more details.
[1] https://review.opendev.org/c/openstack/ironic/+/945943/2/ironic/api/controllers/v1/firmware.py#76
Change-Id: I642d582030eeacb0e060b3d375406e011abfb76a
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
It is possible that an agent is booting in an environment with firewalls
doing evil things like not closing sockets, or where a FIN-ACK never makes
it to the conductor, or whatever.
This can result in the client hanging and eventually timing out.
Ironic's agent client code automatically retries. Which is cool.
The agent records it got a command from the first attempt, and then again
from the retry. Everything goes swimmingly until Ironic goes to assess
if the agent is a "freshly booted" agent, or not. At which point, the
check logic would see the multiple "get_xxx_steps" calls in the agent
logs, and declare the agent not to be freshly booted due to the retry
attempts.
So instead, we now explicitly evaluate the results of the command in
whole to account for retires. This commit also adds additional tests
as the helper was previously only really being exercised with empty
lists in unit tests.
Closes-Bug: 2110698
Change-Id: I460751b761462dbb630368e474e207fed90f289a
One of the challeges with eventlet is it has helped make some of
our tests run nice and quick because time.sleep ends up getting replaced
deep under the hood when we monkey patch.
In order for us to get to a point where we can begin to consider removing
the patch, the tests need to begin to change. To that end, we have some
test reliance upon that patching and also calling helpers and then testing
the helper in the driver tests as well which increases the test time, and
really is sort of "excess" complexity.
So the goal here is to see if folks are good with such minor changes,
since we can do them now and should try to avoid making changes to these
tests when we begin to really remove eventlet.
As a note, this is the first of two, or possibly three patches to do this
overall cleanup.
Change-Id: I28a36ed1facdf4fd6ced2efceacb39d12de72e8f
When ``redfish_address`` contains trailing slashes, the Redfish driver
would preserve them in the ``root_prefix``, causing sushy to error
attempting to instantiate the client using a potentially malformed URL
which results in a "Resource not found" error.
This may or not be a breaking change since we are now stripping trailing
slash(es) from custom paths, as well as explicitly setting a default
``root_prefix`` value if path is/becomes empty (inline comment suggested
that was the case but it was never implemented).
E.g: /test/redfish/v0/ -> /test/redfish/v0
/ -> /redfish/v1/
/// -> /redfish/v1/
Closes-Bug: #2070791
Change-Id: I03f794350c6c0efd68131e6ede80c8a698983228
Cases exist where you can request a port to be created, in such
a way where neutron has to defer assignment of addressing.
In Neutron's world, it wants to map the host so it can be ensured
that *correct* physical network is available and mapped to for the
address range. Networking-baremetal helps backfill physical connection
context so Neutron can make the appropriate address assignment.
In order to ensure we do the right thing while also ensuring
appropriate security around the state of the port and the bind,
we need to go ahead and facilitate an *initial* context
to neutron so address assignment can occur, but explicitly limits
the provided information to a highly limited scope to prevent
actual binding to a physical port.
With the assignment of addresses, it becomes possible to begin to
generate, or eventually *fix* network configuration metadata
being provided to a baremetal node.
In this *entire* process, we also identified you can create neutron
ports with binding information "out of the gate" to be immediately
bound. Thanks metalsmith! So the code pattern now checks the port,
and unbinds the VIF before rebinding it.
Related-Bug: #2106073
Change-Id: Ic53c626afe641ce63d71a7858e65df1fb250e3c0
Turns out some of the standalone jobs, anaconda in particular,
can reference some artifacts on disk in such a way which causes
the security logic to block the request. This is an easy fix.
Change-Id: I79204117cdbffab1f619981767471475870b4571
Before this change, Ironic did not filter file:// paths when used as an
image source except to ensure they were a file (and not, e.g. a
character device). This is problematic from a security perspective
because you could end up with config files from well-known paths being
written to disk on a node.
Now, we forbid any path that provides access to system configuration,
including /dev, /sys, /proc, /boot, /run, and /etc. Additionally, we've
added an allowlist configuration item which limits the acceptable paths
under which images will be pulled to a list provided by the operator.
The allowlist default list is huge, but it includes all known usages of
file:// URLs across Bifrost, Ironic, Metal3, and OpenShift in both CI
and default configuration.
Generated-by: Jetbrains Junie
Closes-bug: 2107847
Change-Id: I2fa995439ee500f9dd82ec8ccfa1a25ee8e1179c
This reverts commit a5750a4322.
Reason for revert: Further testing and investigation lead me+cid to believe this change is a noop; the previous index provided by CONSTRAINT is likely enough. We were unable to find a benchmark where this improved things.
Change-Id: I7c81b7d798d2ced39d4e6bbd493c3756d3326023
This patch improves the contributor documentation by:
- Listing all documentation sources used in Ironic
- Adding cross-platform instructions to preview built documentation
These updates make the documentation-contributing guide more accurate
and accessible for contributors.
Partial-Bug: #2072364
Change-Id: Icd0bf1188cf0b276ee278a78c1c58dd70c55f232
Allow detaching vif in available state to avoid mac
confict during node rescheduling
Change-Id: I9b55054a97256cecb9d41029f03b1dfdf69c253c
Closes-Bug: 2109300
This job runs Ironic under python 3.9, which is no longer supported by
OpenStack. It can be made voting again once it is revised to use a
supported python version.
Change-Id: I8e5e9de27faaed0315b3846d1eb04561cc4b192f
There is an unalignement between service/rescue operations in state transition,
currently we don't have such route from service* to deleting. When a node is
in service* states, request to destroy the instance will clean up part of
resources and leave ironic node at unexpected state, without proper clean up.
Unprovision from service failed is addressed by https://review.opendev.org/c/openstack/ironic/+/944966
This patch fix unprovisioning from the service wait state.
Change-Id: Ib2d2e7ad9deb1e7011f60bc9b55224ebd7c153e3
Closes-Bug: 2087523
This should improve JOIN operations when retrieving ports
filtered by node conductor groups or other node attributes.
Change-Id: Ie1d4ef243e33c1edaa36111cad3979e0fdcf2cfd
The content from architecture.rst was moved to install/get_started.rst
and merged under a new heading "Overview of Ironic".
To avoid breaking links and search engine results pointing to the old location,
this patch restores architecture.rst with a redirect notice to
the new location.
Partial-Bug: #2072353
Change-Id: I950713434132acc5b225a39c0e4cdd724691f9e6
The SNMP driver uses pysnmp-lextudio, we'll be lucky if it works through
the deprecation period. If by the start of the 2026.2 cycle it's not
been migrated to a different library, it will need to be removed.
In order to keep CI running on an unsupported driver, I've added an
option to the devstack plugin to skip failing on upgrade check.
Related-Bug: 2106674
Generated-By: Jetbrains Junie
Change-Id: Ibe5576d04fc3ca1cc102f126853ed3d1e8c404d2
- If image availability is shared and conductor_project_id is in the
image shared member list, allows access
Closes-Bug: #2099276
Change-Id: I6b8a10fe82b41aa37b4f14bca9d3c0c498882bd1
When using drivers like `networking-generic-switch`, there can be issues
with changing the VLAN the server is attached to.
Currently, the instance will still continue to attempt to boot, in some
cases the instance can go into ACTIVE (if the server happens to be on
the provisioning network already), but the server is not attached to the
correct VLAN, generally causing the network to fail to come up.
If we look at the ports, they are marked as binding failed. This is
something that Nova checks for after doing the port update to attach the
binding information. Lets add that check into Ironic, such that the
above build will fail with a good hint at where to look for problems
(namely Neutron).
Change-Id: I3811941a3dff0a9f968258d05cc020e1f52e3e40
Allow special characters in patch field keys, as long as they
are correctly encoded per JSON pointer and patch specs RFC 6901
and RFC 6902.
Closes-Bug: #1604148
Change-Id: I7eeb52b51a0e8ba96103e0863819653021c79271
Release notes currently have a title with a relatively meaningless
version number. Replace this with a title indicating it's in development
and unreleased.
Change-Id: I08c8fe3b20bb9c73fa8e06afd99052257f3cb334
- Add 'community' visiblity alongside 'public'
- Make the auth token check configurable with a new option
'allow_image_access_via_auth_token' (default: OFF in
master, ON is stable)
- Add testing for is_image_available
Partial-Bug: #2099276
Change-Id: I10df3e8fd0091e70f3fb1bc19524aada296c13c1
followup to Iff2be28c64a0469a3796003f3b8ed28d70631761
such images currently fail to be provisioned by ironic 29/Epoxy
(in a ramdisk deploy scenario) even with all image inspection disabled.
This patch adapts the code from nova's patch
https://review.opendev.org/c/openstack/nova/+/931833
Related-Bug: #2091611
Change-Id: I5a1333b0dae941269a397ef4e6bc5b40ccfceefc
We'll likely want to revert this one, but overall the multinode job
has been failing quite a bit because it is trying to run *all* of
the secenario jobs. While, that is not a big deal under normal
conditions, at end of development cycle, CI loads are much higher
and the failure rate can also be much higher as a result.
So for now, dial back the test to the most beneficial/needful
test which could be executed on the node.
Change-Id: Id4d050e1abd67f431026a2d813f97ffacf24407e
- Kept the "abort" term in the relevant documentation as
it's the actual verb in the API per reviewer feedback.
- Updated sentence "CentOS 7 development images are no longer updated" by removing "development".
Partial-Bug: #2072365
Change-Id: Ibc09693f26a0c4e26bbd3ba2fc7918383c4d81ec
We tested and successfully built Rocky Linux and Alma Linux Ironic images using Diskimage-builder. Add the method.
Change-Id: I22758dd59c54038e7ed72e513c61fcddba0ef473
This module being 'cmd' means that when using unittest native test
discovery, it tries to load our cmd module instead of the built-in. This
obviously does not impact test running in CI or via tox, but by renaming
this we'll make our ironic tests compatible with vscode (and I presume
any other unittest-discover based IDE testing setups).
Change fully generated by cursor IDE with review and minimal editing by
me.
Generated-By: Cursor
Change-Id: I6c9b92e6b0bee366ff40795c722bd70d16cf0e4f
Allow to deploy virtual environment with port groups by using
following environment variables:
IRONIC_USE_PORT_GROUP: boolean to enable port group usage
IRONIC_PORT_GROUP_MODE: the mode for port group, by default is
balance-rr.
Related-Bug: #1718481
Change-Id: I9cc8e54cf94ecc65ac93d01671f8778be2f6dc78
We've had a couple asks about enabling ipmitool-socat console usage
with the redfish hardware type.
This doesn't pickup parameters from driver_info to figure out the IPMI
username, password, and hostname/address, which turns out to be a rather
invasive change to ipmi handling code and test code as well.
Change-Id: I69dfeeec59317871c9f6a206966dd6a9d2ca9e3e
Adds basic suppot to spin up a SONiC VM instance as a switch to have
wired to the switch VM to enable command behavior verification.
Also fixes some related issues due to an earlier rebase need on
the switch test VMs where the interface name was changed and
ultimately a different field need to be extracted for the
later commands to execute properly in order to provide data
to later callers for actions such as creating ports in ironic.
Change-Id: Ie4a2ac4da08359d20b5aa35faf741c5307bef6e0
These got missed in a rework of the initial patch, likely because they
weren't being used yet.
Also correct a small issues that looked like invalid Python but clearly
isn't.
Change-Id: Ie6cf882d2eca55f2ce01c893d18cb1ca1bbe4a01
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This call can fail for a lot of very different reasons (e.g. right now
I'm looking at secure boot operations getting HTTP 403 from iLO 6, but
it could just as well be something coming from IPA).
Change-Id: I4a884e3ccc0e1bf12aa9287cf521e1295f8a2476
Set up Ironic lookup endpoint (api/controller/v1/ramdisk.py) to send container configuration to IPA.
Partial-Bug: #2100556
Change-Id: I5fd593e58b0d33541a63ebb817ed8f3c0a62071c
When TLS is handled by a reverse proxy, use_ssl may be set to False
while clients are still supposed to use TLS. Add a new option for that.
Change-Id: Ie1be180ce36bbeb81427ea1ed4a2654c880aff2c
The one in json_rpc was actually designed as a new base class for all
WSGI services in Ironic. Make it such.
Change-Id: I3f63088b21e09f88476e23b7c9ff3099daab781f
port like obj can be passed to smartnic checker while it's not aware
of portgroups, giving a confusing log of unknown port data.
Change-Id: Ia9d1ef48ebcf0cc1f1a987fd02f2dd7e74b96271
Adds additional basic simulator support to stand up a virtual
Cisco Nexus 9000 switch for testing in devstack to faciltate
development and testing.
Change-Id: Id66e6bcc646a6d35a2caa5ecbc6b8cd881adb7aa
Adds documentation around support for switch simulators
to the contributor documentation to appropriately scope aspects
as to why the support has been added for the devstack plugin.
Change-Id: Ic141804e6adc8d08957875e1b169dca52c99e448
Adds necessary logic to support spinning up a local network simulator
for Dell Force10 OS10 switches which is a Linux based operating system
image as opposed to the former force10 OS 9 switches.
This change takes a *very* similar approach to OS9 support, but there
are several differences between OS9 and OS10, mainly in configuration
formatting, commands, access control, and even the overall virtual
machine installation process which leverages ONIE and multiple
"disk" artifacts.
Change-Id: Iab3c69031eeff1f612e254d099539c8fc146b553
In order to test NGS compatability and generally move the state forward
we need to be able to wire in switch simulators.
This is *not* intended to be run in CI, due to known performance issues.
This first pass hooks up Dell Force10 switches with OS version 9.13, and
does so we can configure the switch as part of the setup.
This makes the prior behavior of configure-vm.py and the VM templates
to be able to execute as it did before
I0ef1ad1b2e50cb26839c618a1367704d51ed8a4d to enable the simulator attachments
because we can't exercise network switch simulators with dynamic
post-vm start network attachments, becuase the attachment to the switch sim
must be done in advance of switch VM launch.
Change-Id: I4addd71adea0b3f6e56b967db848546b5c56561e
Create an Ironic support resources page with a table of the community's platforms
Closes-Bug: #2072357
Change-Id: I0904f09fe72cf7b08da21b0b251fe9c6e2d148f5
The job usually takes 30-40 minutes to run if successfull, it
does not make sense to have a 3 hours timeout value.
Also there is no metal3 user so just using zuul for remote logs
collection, also fixing the ip of the remote node.
Change-Id: Ie738195f38a547cf03d94e7cc5d78f7c2b8d4539
Add file to the reno documentation build to show release notes for
stable/2025.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.
Sem-Ver: feature
Change-Id: Ia5b45f166e7754a282f939969fda4f11735c9c69
Migrates Inspector rules over to Ironic
This change defers the addition of the configuration options;
``supported_interfaces``, ``[auto_discovery]`` ``inspection_scope`` and
the ``default_scope`` as otherwise specified.
And, while the ``scope`` field may remain in the database for easier
migration, it's also outside of the 'scope' of this change as well,
and there's a chance we use the existing `traits` field in nodes as an
alternative for node-to-rule association instead.
A future follow-up should address these excluded implementations.
Change-Id: I6baf00273e63bb96e133f0cf5da6d8953f97af5a
Improve documentation on how to create, associate and use runbooks
with nodes via traits for cleaning and servicing operations.
Change-Id: If99866ef8bfc200b430f17cff784cf96e916536d
Delete the deploy kernel, ramdisk, and ISO files during cleanup
to trigger rebuild on the subsequent stack.sh run.
Closes-Bug: #2076358
Change-Id: I6600b67c9b3455d8191126b24a1941ae7c384e36
Add a new config variable to ignore project_id checks in administrative tasks
Partial-Bug: #2099276
Change-Id: I3c6ba8f995a2781229c07c047f66e6737109cdc9
In Rackspace env a bigger partition is available and configured
as /dev/xvde1 but not mounted.
We should use that as working dir for the metal3 job as the
standard one is too small and the job fails regularly when
running there.
This allows to make the metal3 job voting again.
Change-Id: I4d50b9c07367a5b1ad25887d87cf2e29ac6b4257
When the metal3 job fails the libvirt logs can't be collected
causing the entire log collection process to fail, de facto
making any troubleshooting process very difficult if not
impossible.
We need to ignore the error form the libvirt logs collection
task to allow the logs collection to finish correctly.
Change-Id: Ibbbb86457cbcf6c28392baad52478dc232d160bd
The default total value is 2400 seconds, so 40 minutes, which
is proved to be too long especially in case of failures.
We reduce it to 1800 seconds, 30 minutes, for the time being.
Change-Id: I533e6411fb55fa6e4a609c50bb385279dddcc3d3
The [ssl] options from oslo.services are deprecated, because these have
been used to enable ssl in eventlet-based wsgi server, which is now
deprecated.
Introduce the dedicated options to [vnc] section to replace these
deprecated options.
Because the vnc console feature is not yet released, no release note
is added by this change.
Change-Id: If9d0e1413b46d7157bcd085280b80bf1e0e6355b
The code here builds a dictionary from a glance v2 image object that
roughly resembles the glance v2 image object. The current behavior
nests the 'properties' set on the image under it's own 'properties' key
resulting in the image properties never being seen by the Ironic code.
The breakage results from the change from glanceclient to the SDK which
changed the shape of the returned object. The glanceclient object shape
was that of FakeImage while the SDK returns a Resource based object
which includes attributes for all possible fields which are defined at
the top-level of the object. Since the tests run against a different
value they did not cature the failure. Just changing to the SDK object
results in us copying all these new values to the properties dict which
is definitely not the intention. For maximum compatibility to backport
this filters any value not set from being set into properties and sets
the rest. The broken path is any user of get_image_properties()
(both copies from common/images and deploy_utils) checking for user
supplied properties.
Closes-Bug: 2099953
Change-Id: I1842e2651fd2bd8455646db9a3a80c3b9ece5c97
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This change cleans up and elaborates on existing graphical console
documentation, and also adds an overview document describing how it all
works together.
Closes-Bug: 2086715
Change-Id: I16b7ffb993e1ca5148b5205f0a35a74db85337d5
We still need a custom dnsmasq, but due to recent ubuntu upgrades and
devstack changes we always fail the version check and skip the install
of newer dnsmasq. Instead, now we use a sentinel file.
Change-Id: Iefde1721d4ab24521dc2b8f1fe46bf8bd4519f6f
Metal3 job is intermittently failing running out of disk space.
Until this can be resolved, we need to go ahead and disable this job
from voting.
Change-Id: I60bf98d8a529406ca3c97b3d5953d4b2d01e93d0
It seems dnsmasq is back to crashing.
neutron-dhcp-agent[62018]: ERROR neutron.agent.linux.external_process [-]
dnsmasq for dhcp with uuid 1fc184a0-e8c6-47a6-86fe-0291bc22017f not found.
The process should not have died
Instead of routing around by messing with the dnsmasq version now,
just give OVN a spin.
Change-Id: I222ffffacfc0968e23b4ae7f4c33ec5808136694
Updated the documentation for the anaconda deploy interface based on how
it is functioning. Rewrote some sections in an attempt to add clarity
around the behaviors and operations of the interface.
Change-Id: Id4a9b0033d356446b4fe392a20faa63625b4c20d
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The files in tools/vnc-container allow a container image to be built
which supports Ironic's graphical console functionality.
For each node with an enabled graphical console, the service ironic-novncproxy
(or nova-novncproxy) will connect to a VNC server exposed by a container
running this image.
If the devstack ir-novnc serivce is enabled then this container image
will be built locally and ironic configured to used it for the systemd
console container provider.
This makes a devstack environment functional in accessing graphical
consoles for Dell, HPE and Supermicro.
Related-Bug: 2086715
Change-Id: I0842570cca22ac0e67d358c30225e8e08561f459
New ``console`` drivers ``redfish-graphical`` and ``fake-graphical``
have been added. This allows the graphical console to be accessed for
Dell iDRAC, HPE iLO, and Supermicro hosts. The ``fake-graphical`` driver
is useful for demonstrating the full integration of
``ironic-novncproxy`` and the ``systemd`` provider of
``ironic.console.container``.
Related-Bug: 2086715
Change-Id: If1899aedbcda606895bab120e301a006818b85a5
A new entry point ``ironic.console.container`` is added to determine how
console containers are orchestrated when ``ironic.conf``
``[vnc]enabled=True``. By default the ``fake`` provider is specified by
``[vnc]container_provider`` which performs no orchestration. The only
functional implementation included is ``systemd`` which manages
containers as Systemd Quadlet containers. These containers run as user
services and rootless podman containers. Having ``podman`` installed is
also a dependency for this provider. See ``ironic.conf`` ``[vnc]``
options to see how this provider can be configured.
The ``systemd`` provider is opinionated and will not be appropriate for
some Ironic deployment methods, especially those which run Ironic inside
containers. External implementations of ``ironic.console.container`` are
encouraged to integrate with other deployment / management methods.
Related-Bug: 2086715
Change-Id: Ib890c3c7be91ddd78a43b9c5261dd1d8c1054c04
The method was deprecated in jsonschema 4.14.0[1] and now triggers
the following warning.
DeprecationWarning: FormatChecker.cls_checks is deprecated. Call
FormatChecker.checks on a specific FormatChecker instance instead.
Also drop redundant override of FormatChecker. The overridden check
method is identical with the same method in the parent class.
[1] cd8f0592b9
Closes-Bug: #2089051
Change-Id: Ibdf450ccf204f1ad39fa4dbc8ff10f6fd0bb17a7
Some of the DB api methods are defined as classmethods but these are
expected to be usual methods by logic calling these.
Change-Id: I772f0f2e52bd5b16adea5dd8e47a99651ece0634
iso8601.iso8601.UTC has been equivalent to datetime.timezone.utc in
Python 3, so the current usage is quite redundant.
Change-Id: I86f21e0555b02da11284b788fbee75cd6ca97f3f
User mdfr reported an issue where a user with ironic, who
had member privileges of the node's owner project, reported
they would get an error about ironic being unable to validate
the cleaning network when trying to bind a baremetal port to
a portgroup.
This is rooted in checks to provide early feedback of ironic
configuration issues, which just work if a user is an admin
scoped user... However the networking client utilizes the
credentials from the task, meaning the credentials of the
user with member access.
That being said, we only need to do the additional checks
if the user is an "admin". Modifies the existing code
and test to test/assert the admin role.
Closes-Bug: 2100520
Change-Id: Idfbf0f58c9976bedb60e1eca1dd282875c89977f
Some vendors insist that floppy images need to be exactly 1440 KiB in
size and have a suffix of ".img". Let's adapt to this and assume that
this doesn't break other vendors.
Closes-Bug: 2100276
Change-Id: I5be6380e8c8c3eac5bea1c189b205b05a9fae625
Recently, logic was changed around network testing, but now we're seeing
test timeouts when the test is still running. In other words, we pushed
the time around and we started hitting the overall job timeouts.
As such, extend the window slightly on the default job so the test runs
can hopefully wrap up before the timeout trigger occurs.
Change-Id: Ie9d4fa6d747b93b2570f8abc13ef48c68e265644
- Refactored the import logic for `sushy_oem_idrac` to use a direct import from
`sushy.oem.dell` if available, falling back to `importutils.try_import`
only if the module is not found. This improves clarity and ensures that the
correct version of the library is preferred.
- Adjusted corresponding tests in `test_raid.py` to reflect the changes in
the import logic.
Depends-On: https://review.opendev.org/c/openstack/sushy/+/940557
Change-Id: I0dbb0ad341059969b86a508a5ccd1e3654cf613b
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
In order to stabilize CI we need to have clear understanding
on what is happening on the nodes. Right now we a blind cats
and trying to do predictions based on logs which is hard.
This patch enables atop installation on all jobs by default
which will provide a clear picture of what is happening
on the environment during tests, how many VMs are booted
concurrently etc.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/939578
Change-Id: I79c9c91e81c6a94f4e17191e05d78902dbb3b1f8
This job was previously attempting to run all of our scenario tests,
which is nice, but the reality is that so many test steps also increases
chances for build history.
As such, dialing the job so we're performing the basic needful and
not trying to perform every test possible.
Change-Id: Ie4845fb5810a379bf6209179693eed27301b24a3
Wraps `wget` commands with sleep and mutiple retry support
for resilient network downloads.
Partial-Bug: #2098417
Change-Id: Id3e083cc97b71211e5080ad21e2c09d04d8559fa
By default, nova's tempest code checks every second for the status of
an instance being built. But for baremetal, this can take longer. Much
longer because the many steps in a sequence of action to facilitate
deployment.
As such, changing the timer to 10 second will reduce the amount
of logging generated by CI test jobs, which presently can fail
with too much data to be logged causing subunit to fail and rendering
logging lost.
Change-Id: I1f7e0198b61717ffaaeb471dfcb200a5ab58c506
This is a forklift of the nova novncproxy service to act as the noVNC
front-end to graphical consoles.
The service does the following:
- serves noVNC web assets for the browser based VNC client
- creates a websocket to proxy VNC traffic to an actual VNC server
- decouples authentication traffic so that the source server can have
a different authentication method than the browser client
The forklifted code has been adapted to Ironic conventions, including:
- [vnc] config options following Ironic conventions and using existing
config options where appropriate
- Removing the unnecessary authentication method VeNCrypt, leaving only
the None auth method.
- Adapting the ironic-novncproxy command to use Ironic's service launch
approach, allowing it to be started as part of the all-in-one ironic
- Replace Nova's approach of looking up the instance via the token.
Instead the node UUID is included in the websocket querystring
alongside the token
- Removing cookie fallback when token is missing from querystring
- Removing expected protocol validation in the websocket handshake
- Removing internal access path support
- Removing enforce_session_timeout as this will be done at the
container level
Related-Bug: 2086715
Change-Id: I575a8671e2262408ba1d690cfceabe992c2d4fef
So... the neutron common code is modeled around PXE, and when you
go to do something with vmedia, you can get errors logged about not
having PXE enabled ports which is misleading.
The reality is the common code needs to base the decision on if all
ports need to be added or if just PXE enabled ports need to be added
based upon the loaded driver via configuration.
As such, now we consult the boot interface's capabilities field
and alternatively pull in the additional interfaces as appropriate
for virtual media users.
Which then also fixes the misleading error!
Closes-Bug: 2098791
Change-Id: I8c9d07e2ded75f138897ece6a67016a6f0020ce6
When running the devstack plugin on Centos, the default
libvirt artifact permissions on the filesystem prevents
libvirt from launching UEFI VMs.
This allows for the VM to be able to launch.
Change-Id: I04fcc86175e90e6ca024a44841f4f05bcb5b1f63
Shellinabox hasn't received an update in 7 years.
Debian recently asked for maintainership to be handed over due to
open issue counts and lack of responses.
All sorts of open issues exist. It appears branches were deleted
in late 2024, forks still have them though.
Basically, looks like shellinabox is abandoned, and we should
treat it as such and abandon support in Ironic.
Change-Id: I5704e1a6a6a816e1cca3b5d0c791eed030cfc563
The emulator *and* the EFI binary paths are different
when using Centos/Fedora, and Fedora/Centos are distinctly
different with EFI folder paths.
Change-Id: I2c6ba884735f22cc9153de0a24282758ffbdc496
Allow for multiple inspection interfaces to load hooks for future
expansion of hook execution. Added some wording to the conf option to
make it clear which inspection interface these options are for.
Change-Id: I182bc8537927cb3565a07dcebe813e22926ecdc8
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
If an Ironic Node is connected to multiple physnets, and wants to
connect to a Neutron VIF whose network has segments in multiple
physnets it may be unclear which one to connect to.
If the VIF is already allocated to a subnet with a segment mapping,
this can be used to resolve the process to a physnet.
When a network uses Neutron routed segments, this patch first
checks the port's fixed_ips to see if a segment/physnet can be
determined from the subnet it is attached to. If this cannot be
identified (or fixed_ips is unset), the behaviour falls back to
returning all physnets available in the segmented network.
A similar change in Neutron was made in
I56b22820d29b2d57faf28a2f9b685ab0b2c924b4
Change-Id: Ie693257cdb8c44eeaf49cec9678de047f35d5221
While doing some work on a fips-enabled machine, using centos,
I noticed the check is looking for a ubuntu package version.
Realistically, that is wrong, since 2.90 in general is what
we're seeking.
Change-Id: I02179f10a360a5dd83f4efe28c1ecbb51afb57ab
IPv6 job using UEFI and OVN with dhcpv6-stateful address mode.
Updates the devstack plugin to ensure CentOS DIB ironic-python-agent is
always used for dhcpb6-stateful, udhcpc in tinycore does not support
DHCPv6.
Ensure mtu on the ironic-provision network matches PUBLIC_BRIDGE_MTU
when Ironic IPv6 is used. This ensures we do not get packet drops from
over-mtu.
Devstack plugin will ignore any HOST_IPV6 address discovered, always
using the magigv6 interface and 'fc00::1' as IRONIC_HOST_IPV6.
Change-Id: Iab97d78d7a075eaef3bdcfc08fc4f184a5ea490a
These are functions used by both the novnc-proxy and the graphical
console drivers related to session management. They are added in this
position in the series for ease of reviewing, and to keep the
novnc-proxy change specific to code which has been forklifted and
adapted from Nova.
Change-Id: I72aa2205f92c153809300fd304558427141cda78
This change takes the identified authorization header and sends it
in the command to IPA as an argument. This enables a future IPA
patch to recognize an authorization rejection, and to leverage the
header to authenticate to the remote image service.
Also addresses a case where we neglect to preserve the auth token
in the case of a container URL reference with digest value and adds
a corresponding test which didn't exist either.
Change-Id: I8346eb56e90a5a3e2bc68a9e5cd345121f734245
When testing, I guess I didn't actually test loading the token
from config, and relied upon mocking. However, turns out the code
used the wrong load command (loads, versus load), which passed
unit testing, but didn't work when I gave the config a try.
Fixes the call and the testing so it properly passes now.
Change-Id: I4750a82ea07bc803600fddebd16f14a201ae406e
While doing some additional testing, I've started to get 429 errors
from Quay which were causing my requests to hang. This was because
the built in retry logic with adapter use. As such, I removed the
adapter use and I now get a 429 error as expected and logged properly.
This was not caught with existing testing because it was getting
captured and held inside of urllib3 with the adapter usage.
Change-Id: I68a532a9765fbf90870ef4372b93738940eabd9e
To allow other inspect interfaces to execute hooks in a common way, move
the execution code into a common inspect_utils module.
Change-Id: Idfe0a36443969347cff41fdb6900a3bc79209823
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Adds a ``bootc`` deployment interface which can be enabled to
perform deployment of bootable containers. This enables a streamlined
workflow where an operator/user can push container updates and does not
need to build intermediate disk images and then post those disk images
to facilitate the deployment of a bare metal node.
Closes-Bug: 2085801
Change-Id: Iedb93fe47162abe0bd9391921792203301bfc456
So the prime driver behind pinning the MTU down on our interfaces is so
traffic can cross mutlinode vxlan tunnels between nodes where the devstack
plugin is executing to support more complex tests.
But the reality is that doesn't always make sense, and when Neutron
has a default mtu override based upon "upstream" traffic constraints,
that is likey okay as well.
Part of the CI configuration auto-pins the MTU down, which is fine
for single node testing, however with multinode we need to pin the
MTU further down to try and prevent packets from being dropped on
the internal interfaces use to wire up test VMs.
Change-Id: Idc145f4eea87a8db69202b8d7953975d7d5cba2c
Give an overview of the metal3 integration job, its workflow,
and adds usefule links to familiarize with the metal3 project
and reach the metal3 community.
Change-Id: I94bd6a90f813af7323a7c3363577953a69e62ade
Just split apart the deploy info generation method so glance
handling is isolated to it's own method as it is is quite complex
for a limited portion of Ironic's user base.
Change-Id: I036b384263e57b3345101a8f5376962faa5c1d2b
... for conductor downloads.
The issue is we don't have the underlying library for requests to do
Zstandard decompression, but userspace tools are common in linux
distributions, and opportunistically we will try to detect, and
de-compresse artifacts.
Zstandard is popular for compression of artifacts in container
registries.
Change-Id: I0f6b3b7a8685bb2724505836c770e080bc0e0632
Adds support to use an OCI url for references to artifacts
to deploy.
- Identification of a disk image from the container registry.
- Determination of an image_url which IPA leverages to download
content. Content cannot be compressed at this point in time.
- Ironic can download the file locally, and with a patch to
detect and enable Zstandard compression, can extract the
URL, starting from either an OCI container URL with a Tag,
or a specific manifest digest, which will then download the
file for use.
- User driven auth seems to be good.
- Tags work with Quay, however not OpenShift due to what appears
to be additional request validation.
Closes-Bug: 2085565
Change-Id: I17f7ba57e0ec1a5451890838f153746f5f8e5182
The _format_doc function filters out fields from the docstring when generating Sphinx documentation but does not account for a case where there may be blank lines between fields. As a result, only the last group of fields may be filtered out. This fix filters out all lines which are fields.
Closes-Bug: #2097310
Change-Id: I7e702b82b4d2ce20520479d8a8210be36bfbdd5e
Each run of devstack results in the dnsmasq version being restored to
the distro package version before being replaced by this override. This
means that a second run of stack.sh fails because the dnsmasq checkout
directory already exists.
This change moves the checkout to a tmp dir. This also stops git
complaining about nested git repos from the devstack repo.
Change-Id: Ida3892f2e706fa5a791a048f26440d84876be125
This command is invalid and produces errors. Based on context, assuming
it was intended to be a delete.
Change-Id: I8d0d693d757edeb16f7781a09b01a487a170d08d
In order to implement port group testing on CI we need to make
sure that we can change mac addresses of interfaces which is
possible with tap interfaces, but not supported for direct mode.
This patch updates VM setup to use taps for interfaces.
Related-Bug: #1718481
Change-Id: I0ef1ad1b2e50cb26839c618a1367704d51ed8a4d
There was a hanging ` but at the same time having two blocks of fixed
width text next to each other it led to some user confusion so reflowed
it to be a bit more clear.
Change-Id: I1f326bf8f6807c8ce1967d954d1b643f31fbaba3
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The redfish_password option is optional, make sure that the SessionCache
does not throw an error when it is not set.
Closes-Bug: 2097019
Change-Id: Idf792c982a883a4c07ae1dad72e3c54bc73b96a1
Added some documentation to define the shape of inspection inventory
data so that hooks can be standardized between inspection interfaces.
Change-Id: I8c04e0b96edd6fc86b038f72edbaa2952bd645f6
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
dot is used to generate some graphics for the docs so it needs to be
allowed to be run.
Change-Id: I1d88b652d8698014da1d0e4ade95c34c1528382e
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When mixing in-band out-of-band steps, the out-of-band status polling
flag was not being cleared, and was being left to remain in the node
driver_internal_info field, thus preventing future heartbeat operations
from the baremetal node from being processed to check the actual
completion status of a step.
We now always clear the field based upon the workflow in-progress
before starting a new step and should asynchronous steps also
be recorded as a result of any step's actions such as if a reboot
is required.
Special thanks goes to keekz for promptly providing upstream with
the information necessary for us to identify the root cause.
Closes-Bug: 2096938
Change-Id: I5198d9169cff8474c7a990332639b2d0758e6e1a
Rather than masking individual fields, driver_internal_info really
should be masked using the same method as driver_info. This change is
mainly for masking values in future changes, but this change will also
mask `agent_secret_token_pregenerated`.
Change-Id: I7096532ff7615f1390db092bb1659d5a7c909d10
Moved the initialization of inspection hooks into the inspect_utils
module for future work of having multiple inspect interfaces utilize
this code. The hooks/base module defines the abstract interface for
hooks and provides code documentation. The rest of the directory has
other hooks so this shouldn't blend internal servers for executing the
hooks.
Change-Id: I756067000867f3eaa3a005f88e571a3666bea784
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
When Ironic's CI went to Ubuntu Noble, IPv6 testing broke.
Ultimately, that was because the job was changed from BIOS
boot mode to UEFI.
Which in theory was a good thing.
But in reviewing the EDK2 known bugs[0], it is clear IPv6 testing
moving forward needs to be manual and cannot be done automatically.
For the reader's context, EDK2 is the source of the firmware used
for Libvirt UEFI VMs.
In attempting to work through the issues, we discovered:
- EDK2 doesn't care about stateless (auto discovery) mode, and
thus doesn't work when that is set. It attempts to solicit
an address, which is an open bug [1] from 2019, opened by
an Ironic contributor.
- EDK2 *always* attempts PXE v4 first, which takes 60 seconds to
timeout, before attempting PXE v6.
- PXE v6 *might* work before it times out, which takes 5 minutes to
occur, if the dhcp server has cleared leases, but there is no
guarentee. In most cases the result is the DHCP server views the
address as handed out and that there are no free addresses available
to supply. This is also rooted in bug [2], also opened in 2019.
- EDK2 then switches gears and tries HTTPBoot... simiarly, but
the way it does so, noted in bug [3], is also incompatible
with dnsmasq.
There are additional bugs, but one sort of gets the idea. Some of this
is compounded by aspects like dnsmasq attempting to be strict about
responding to requests in the DHCPv6 model. A different DHCP server
*might* demonstrate a little differently, but fundimentally the same
underlying issues in EDK2 will make testing difficult.
In attempting to fix this issue, we also attempted to revert back to
BIOS mode. This mode uses iPXE ROM images built for QEMU, yet we
quickly discovered these pre-built ROM images lacked IPv6 support
in Ubuntu Noble. This likely a regression of Ubuntu, but bug tracking
points directly to Upstream iPXE which is not valid as it is a compile
time option.
Testing the ROMs showed only DHCPv4 being attempted and IPv6 router
advertisements being entirely ignored. In a sense, if it did work,
it would still kind of be cheating as the iPXE ROM is able to skip
the first part of the complexity related to PXE in general.
In other words, it is not an entirely realistic test
when compared to Bare Metal.
As such, we don't have a forward path to "fix" this CI job as is.
We know the code works. We know vendor firmware sometimes has quarks like
needing stateless or stateful operation, We know Ironic does the
right thing... within it's capabilities. We just can't test this in
CI.
[0]: https://github.com/tianocore/edk2/issues?q=is%3Aissue%20state%3Aopen%20%20ipv6
[1]: https://github.com/tianocore/edk2/issues/9832
[2]: https://github.com/tianocore/edk2/issues/9828
[3]: https://github.com/tianocore/edk2/issues/9689
Change-Id: Ifc25bc1e1abb949892a1297a313d63f74937c9a1
Somehow... the hold and wait steps were dropped or were lost in
from when hold/wait step logic was developed. This fixes it and
adds them to a test which exercises the validation logic.
Also takes into account the unhold verb call from Dmitry's change
in https://review.opendev.org/c/openstack/ironic/+/913707 and
adds a test accordingly.
Change-Id: I8c23db46b4a5772d907f6c73ed5b975fdaaf80c8
Recently metal3-dev-env moved from bios to UEFI boot mode [1]
but the integration job on the metal3 CI still runs on
ubuntu jammy.
The location of the libvirt loader is different between
jammy and noble so we temporary use this workaround until
we support UEFI on ubuntu noble in metal3-dev-env.
[1] https://github.com/metal3-io/metal3-dev-env/pull/1326
Change-Id: I89caf341ac7078182e836f0fdd9bf08376837b20
This a continuation to the efforts to ensure that the documentation is free from typos and grammatical mistakes so that the reader is not confused. Includes fixes for some of the documentation in doc/source/admin/*
Change-Id: Idc29be9815fb53def3482ac0b290a237a0a1b3da
This migrates ironic-lib code and usages to code in ironic.common.
Relevant unit tests were migrated as well.
Also removes support for ironic-lib from CI and devstack.
Change-Id: Ic96a09735f04ff98c6fec23d782566da3061c409
I thought I deleted this in the past, but apparently it is still
around. Removing the xclarity/__init__.py file.
Change-Id: I4f74f68b19fc6340bc06310689fbccf08bf80883
This commit:
- Disables image format checks and safety checks for ISO disk image cacheing
After some discussion in the community it has been decided that instead of
changing the format detection and safety check logic, disabling the format and
safety checks have a smaller maintenance footprint.
Related-Bug: 2091611
Change-Id: Iff2be28c64a0469a3796003f3b8ed28d70631761
Signed-off-by: Adam Rozman <adam.rozman@est.tech>
The fix for CVE-2024-47211 results in image checksum being required in
all cases. However there is no requirement for checksums in
file:// based images.
This change checks for this situation. When checksum is missing for
file:// based image_source it is now calculated on-the-fly.
Change-Id: Ib2fd5ddcbee9a9d1c7e32770ec3d9b6cb20a2e2a
The crypt module was removed in Python 3.13 . Replace the module by
new methods from oslo_utils.secretutils .
Closes-Bug: #2083955
Change-Id: Ib574fc1b0f267e9b42f899f02a7ef84188ef9e85
Used pycodestyle, pyflakes, flake8-logging-format, and flake8-logging to
bring ruff to the ironic tree.
Change-Id: I4e355b0d2cf065f8844794b14474c34b65e7562b
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
During the PTG, we discussed one of the challenges was keeping up on
removal of jobs *and* openly giving ourselves permission to remove
jobs which are *not* required after a certian point in time.
The critical aspect to this noting when/what we can clean up without
risk so we don't feel the need to keep things going forever when
there is not as much value to the overall health of the project.
Change-Id: I64f8f09c087d94376cbc32ef678a5da6595a805a
This a continuation to the efforts to ensure that the documentation is free from typos and grammatical mistakes so that the reader is not confused. Includes fixes for some of the documentation in doc/source/admin/*
Change-Id: Ibbce369ff5fcccaf0f3aea90f2780a7a700698a1
We've got code here and no reason it shouldn't be included in our sanity
lints.
Change-Id: I0d7d668bc8e2d214799f7f876a795d4af7346105
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This function is used in this file but never imported.
Change-Id: Ic9e2de7505a11c4fa9267fbde2989db31e14dd34
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
... and a typo fix from a prior change.
This change clarifies local link information, by default is attempted
to be collected through introspection use, but also clarifies when
that might not work as expected, or when configuration may need to
be updated, and details the command to use to fix that information.
Change-Id: I4ab2d870858279892a0222b89e31cec80a72fa6b
Cleaned up lint warnings for the automated_steps sphinx doc plugin.
Removed unused imports. Fixed using .format() in a logging message.
Fixed using an ambiguous variable name. Fix line length.
Change-Id: Ic171103f7e7d21a08e330552f2588bf69ada4837
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
This commits makes sure we call update_node_cache
after we finish a successful cleaning/servicing.
Change-Id: I62403120c758caac38a4d2b3912a9c43f65161cc
One caveat around dhcpv6 and network interfaces, is the binding logic
in the ``neutron`` network interface is able to generate multiple
dhcp addresses for use, where as the flat interface is not able to.
Change-Id: If89c618ec951f75b9b09d7218b8500fd43a0d381
In Ubuntu Noble OVN is at version 24.03 and Openvswitch at 3.3.0
Both versions are new enough that can be used instead of
recompiling from source.
Change-Id: I0d0a75944759e97d135341c18a3be9cb09202ddb
This is effectively a carbon copy of the code from Nova, Manila, Cinder et
al but modified to work with pecan instead of Routes. We do not use all of
the new code yet, but we will in a future change.
Related-Bug: 2086121
Change-Id: I76c1600036c82ead436cd0fb7e7dee1e34e21907
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The change https://github.com/metal3-io/metal3-dev-env/pull/1470
adds support for ubuntu 24.04 to metal3-dev-env.
We can now migrate the metal3 integration job to noble based node.
Change-Id: I7734c0bb2402a48f2247f5c8890c79c5e11a1e97
Update the docs so port creation/setting commands and their
flags are explicitly denoted instead of part of a weird setup/use
state the docs were in.
Change-Id: I32e010efaa34f90d3023a25e7900b048467e62fb
Apparently we had most of our networking context in the multitenancy
document, leading to confusion and also a lacking background context
which is needed for users to understand mechanics.
These changes have been broken into several different changes
because the file is going to be drastically different once we
done clarifying the documentation.
Change-Id: I7fd76a00d9bb344453fc1b214f170113d73fe9bc
As alternative to docker registry image.
This is to avoid issues related to dockerhub rate limits.
Change-Id: Ia0b543aaedc03d4030e0335236b9b336ef0ce355
So the multitenancy docs are, in a weird sense, the primary
reference point for ironic networking documentation.
And in order for it to be digestable, we need to set some
appropriate context so the reader will understand what needs
to occur for use.
Change-Id: I0f8067d9b4db2bb057e60d723dff913afbc16027
...and apply it to our first controller, the controller for the
'/shards' API.
Rather than having a check inside the method or overriding
RestController._route, indicate the minimum (and potentially maximum API
version supported by a decorator. This is similar to what Nova et al do,
but without the descriptor protocol-derived hijinks. Doing things this
way allows us to attach metadata to the controller which can later be
inspected by a schema generator.
Later changes can add more API versions.
Related-Bug: 2086121
Change-Id: I9ccfe8240860d6300bbec5ae7d06f1dfc47f788c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
I have requested a new release from dnsmasq here:
https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2024q4/017828.html
but until they perform one, we should at least checkout and build
a version of dnsmasq with this fix, instead of downgrading to one that
is slightly less broken.
Related-Bug: 2026757
Change-Id: I8abac5fa729035341c90d7881cb35aff751da101
In order to troubleshoot some of the locking issues, we need to
understand if we're getting held up on the start of the lock or
on the far end of the log upon completing the actual needful
work.
This change adds additional logging on the front end if it is
on the front end, and also sets the rpc object to None explicitly
before returning the update_node result, since it appears a stream
of them can cause issues for sqlite's release of the database from
one thread to another.
Change-Id: I8ef68ff62e8f93f7bc5bd9ee7c6ac8d96328f929
For oslo-config-generator, you have to list the external libraries that
provide configuration options so it knows to pull them in. Several of
the listed ironic_lib modules are no longer in that project causing
failures.
Change-Id: If270ac0701769a6ce8131816b1cb4921120bd7ab
Take into account that safety_check() now raises instead of returning
False. This also allows us to have reasonable log messages.
Account for the fact that the resulting format for raw conversion of
whole-disk images is "gpt". Add this value to the default permitted
formats.
Change-Id: I72fb4b94a2d3ce9dc8e66142e4e0fa2dd8c25845
The validator in oslo.utils fails on certain kernel images because it
considers them GPT images with an invalid boot partition. In the end,
these images are raw from our perspective, and should not be checked.
I'm marking the whole TFTPImageCache as not using validation.
Change-Id: Idac3a270e2a294a0a5df08ffe66817e05cb0bb76
In the runbooks change, I43555ef72cb882adcada2ed875fda40eed0dd034,
new policies were added for a user sending a list of service steps
or clean steps to the API.
This was done with the generic check_policy helper, however the helper
does not understand how to populate the ``node`` mapping data to enable
RBAC rule value matching. Doing so requires a special node policy
checker method.
As such, the policy checker was changed, and additional tests were added.
One final note, strucutrally the new policies were being checked *after*
we stated to do state verification of the request. RBAC checks should be
performed upfront... which also eases the burden of testing the RBAC
model. Accordingly, the policy checks were moved together
in the provision state logic.
Closes-Bug: 2086823
Change-Id: I18c56cb4becf9e6181689ddc0f1c7433327a3aa6
As noticed on If6e4432b000996789346a1f7449410cfc8497fe1
libpq is likely not needed in the jobs. As such, removing.
Change-Id: I16cdd1f84f8fe1bdb8fe08536ae2a7d7ef6a70a9
* In tear_down_agent: lock down the agent and keep the node powered on.
* In boot_instance: reboot the node instead of powering it on.
* In tear_down_inband_cleaning/service: avoid powering off the node.
* In tear_down_service: lock down the agent before tearing down.
* In tear_down and clean_up: do not power off, reboot after clean-up.
* In rescue/unrescue: use one reboot instead of power off/on.
Not locking down in tear_down_inband_cleaning because the node will not
end up on a tenant network (and lockdown may disrupt fast-track).
Depends-On: https://review.opendev.org/c/openstack/ironic-python-agent/+/934234
Partial-Bug: #2077432
Change-Id: Ib2297ec5b69df6b2ecd11942fb8fdc9d640de6bc
Disallow using disable_power_off with the neutron network interface by
default and add an option to allow it.
Change-Id: Ie5e3f82422514a13610fd15025b52c35f55b2eac
Split away the soft power off code to make the function simpler for
future additions related to disable_power_off.
Also fix incorrect interpolation in the adjacent log message.
Change-Id: Ifd091b10dd42f68ad8951a5fce79aee691fc77f8
Changes the logic when starting and finishing inspection to avoid using
the power off call (reboot is used instead).
Change-Id: I03134b30c819a62f7a289fbcc62dda49540e9d9f
Ironic has maintained a CI job for years after postresql support was
deprecated in order to prevent unintentional breakage of that support.
Now, we have confirmed evidence that other openstack components, such as
keystone, required for testing this postgresql support no longer
function in this job.
As a result, ironic can no longer test postgresql support. Operators
utilizing postgresql who have not yet migrated must migrate now.
Change-Id: If6e4432b000996789346a1f7449410cfc8497fe1
So, there are cases where say you may have multiple DPUs in a
physical server, each card when fully operating can consume 100-150
watts. In some cases, these cards can have external power supplies,
but need the physical host in a running power-on state in order for
the device to be powered on.
Conversely, we also now power off the child nodes if the parent
node has been requested to be powered off, since we *really* don't
want cause inadvertent harm to the child node.
This is realistically a fix we should backport once we sort
through the details, if we agree this makes sense to do, as is.
Change-Id: Ib2bfe04cdaa82264ba8bb1e71477899bb6268179
olso.policy 4.5.0[1] changed the config options policy_file
default value to 'policy.yaml', which means it is changed
for all the OpenStack services and they do not need to
override the default anymore.
NOTE: There is no change in behaviour here, oslo.policy provides
the same configuration that services have overridden till now.
[1] https://review.opendev.org/c/openstack/releases/+/934012
[2] https://review.opendev.org/c/openstack/requirements/+/934295
Change-Id: I98be6739dcdc3203effb2c21f13c6f71332f1813
Removed default config options while bumping the versions of some
pre-commit hooks. Moved the configuration of doc8 to pyproject.toml to
hopefully consolidate everything in one place. Enable codespell hook to
correct the spelling for users.
Change-Id: I76933b52ed8009f5e97c382b82dd786adf3a5444
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Deduplicate the way to flag that a reboot is requested after a BIOS
reset or BIOS settings change. This makes it easier to have a periodic
timer poll the status in the future.
Change-Id: I44a4008b75139494aa36d8b01e8c9c86bbcdf494
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The EFI handover protocol has been deprecated since a while
and recently moved to be optional and enabled by default [1].
As a consequence, the linuxefi and initrdefi binaries that
were specifically compiled to use that option, are
also deprecated and they have been removed in most of
the recent linux distributions in favor of the generic
linux and initrd that are now compatible with UEFI boot.
This patch changes linuxefi to linux and initrdefi to
initrd in all the grub templates, using the generic
entries for all the platform architectures.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc3fdda2876e58a7e83e558ab51853cf106afb6a
Closes-Bug: #2081305
Change-Id: Ie5b2265d7afc8b71fabfca6ca6687e0e34ce3b5b
The unittests want qemu-img available now so add that as a dependency
that users need to install before running the tests.
Change-Id: I2169988c653088115c7b388113b5e76e721e2429
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
The following error occurs when using SQLite:
oslo_config.cfg.NoSuchOptError: no such option hostname in group [DEFAULT]
`hostname` is not defined in [DEFAULT] group, use `host` instead.
Closes-Bug: #2086682
Change-Id: Ic93b5d7c6ea27a3c47daa6b2c6671aaa401b5427
The option's help says that the option is ignored when fast track is on,
but in reality it only happens with the legacy inspection.
The clean-up logic diverges between the two inspection implementation,
so change the new implementation to reuse the old clean-up code.
Add a few missing unit tests.
Change-Id: I2e84aa285b5673bcc911d35439ba80a739460f59
pip 23.1 removed the "setup.py install" fallback for projects
that do not have pyproject.toml and now uses a pyproject.toml
which is vendored in pip.
To address that, this change adds the minimal pyproject.toml
to enable pbr to be properly used to build editable wheels.
This is required to support installing devstack on
centos stream 9 and related distros with GLOBAL_VENV=True
Without this change the wsgi scripts are not generated in
editable mode. i.e. pip install -e /opt/stack/keystone
See https://pip.pypa.io/en/stable/news/#v23-1
and https://github.com/pypa/pip/issues/8368 for more
details on the removal of the fallback support.
setuptools v64.0.0 is used to support editable installs
via its PEP-660 implmentation
https://github.com/pypa/setuptools/pull/3488
This patch was taken nearly verbatim from the equivalent nova change.
Co-Authored-By: Sean Mooney <work@seanmooney.info>
Change-Id: I34888e8f87b4a3ab09546ba58ef5f2cf495bc7e3
By default, the decision if to clean is a "system" decision,
and not necessarilly a "user" or "operator" decision. However
some operators may choose to have custom policies to enable
specific tenants to have additional rights without granting
special system scoped users.
This change just changes the labeling on the default rule to
permit it to match a project scoped user while leaving the
default rule in place. This slightly changes the resulting
error, but doesn't change the error code, and enables operators
to run with custom rules for this entry.
Change-Id: Ie963abcbff079664b8407499c3e943ad3fd8f315
Migrate all existing linters to pre-commit. This consolodates our bandit
and codespell job into the general pep8 job.
Change-Id: I6b40a3338d98fab500e22918b6bd5b8bff2106fd
The doc8 linter found several syntax problems in our docs; primarily a
large number of places we used single-backticks to surround something
when we should've used double-backticks.
This is frontrunning a change that will add these checks to CI.
Change-Id: Ib23b5728c072f2008cb3b19e9fb7192ee5d82413
Trailing whitespace is soon to be caught by the global pre-commit
linter changes. This fixes this issue in anticipation of that lint.
Change-Id: I48597afde4c55775ccca56f927c30ca4f3465523
There were several duplicated entries. Where they appeared
actually-different, I changed the name. Where they appeared identical, I
deleted the duplicate.
These will be linted in the future once we switch to pre-commit.
Change-Id: I880dc6b6de593c12e5ac026edfbe95258e87bcde
As part of the migration to pre-commit for CI, we will begin linting
JSON files in the repo. These were all invalid JSON and are being
updated in anticipation of that update.
Change-Id: Ib6c7581fb20211d2b7134f506286c73e5c2cd6bb
This adds a wsgi entrypoint module which can be used with a wsgi runner,
such as uwsgi, to launch Ironic API processes without the need of a
separate script.
The legacy WSGI script is currently being installed by PBR, and as part
of the migration to a pyproject.yaml-compatible PBR, we cannot use the
wsgi-scripts plugin anymore, and will be removing the script installed
by it in a future Ironic release.
The new WSGI script, because it has statements at the module top-level,
cannot be autodocumented; we now exclude it.
Also we don't treat all warnings as errors in pdf docs builds to allow
the use of mock autosummary, starting with including the wsgi module.
Co-Authored-By: Doug Goldstein <cardoe@cardoe.com>
Change-Id: I584ac6a25c4e6cd9744a609b50d12b434a930dc6
An interesting, and frustrating aspect of 4k block devices is that the math begins
to be impacted across the whole of the useage of the device.
Specifically the LVM block spacing also begins to be thrown
"out of alignment" which changes user calculations.
Most users doing smaller allocations likely won't matter, but users doing
thin volumes or filling the percentage of the remaining usable volume, also then
break.
So realistically, the best path to ensure we have appropriate 4k device testing,
and our dependent tooling in diskimage-builder is also getting tested, is to run
the more complex case in our CI job.
This change is dependent upon two other changes which are under review.
Change-Id: I5b23403c783fa84b4158708741524c3dc9a92722
Python 3.8 was removed from the tested runtimes for 2024.2[1] and has
not been tested since then.
Also add Python 3.12 which is part of the tested runtimes for 2025.1.
Now unit tests job with Python 3.12 is voting.
[1] https://governance.openstack.org/tc/reference/runtimes/2024.2.html
Change-Id: I706c8b22fbf29e057942990a1004a42763594746
Added some documentaition that details how to change the ironic localdev microversion for testing purposes.
Rendered View: https://files.mcaq.me/944ch.png
Change-Id: I1e21a12ad1413046a41f856ddf229e399f82523a
grenade by default enable GLOBAL_VENV which means it
install and run everything from virtual env
- https://review.opendev.org/c/openstack/grenade/+/930507
We faced the error in ironic grenade scripts in virtual env
so GLOBAL_VENV was disabled explicitly. This fixing the scripts
and enable GLOBAL_VENV in ironic jobs also.
Change-Id: I48ee1dd4adc2e5bcc18c5f116d979e7524248495
No jobs are setting this, nor have any set it in some time. Remove it.
Change-Id: I38a092de125e382607d89d8e5a3b85db809a6d61
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Nothing is setting this anymore, making this a layer of indirection
we do not need. Remove it.
Change-Id: Iba3674536ee98ba4d2d0cb5ffb0ec52e5286b7e7
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Centos Stream and ultimately RHEL have switched to asynchronous
device initialization, which impacts root device hints and their
usability on those systems, in large part because context which
people have traditionally had, no longer holds true on those newer
kernels.
This doc update attempts to provide the needful context to guide
operators to the best possible outcome given the distribution changes.
Change-Id: I541086cfe235b10f1f1dba95fad95022a22f9ce7
While working another issue, we discovered that support added to
the ironic-conductor process combined the image_download_source
option of "local" with the "force_raw" option resulted in a case
where Ironic had no concept to checksum the files *before* the
conductor process triggered an image format conversion and
then records new checksum values.
In essence, this opened the user requested image file to be
suspetible to a theoretical man-in-the-middle attack OR
the remote server replacing the content with an unknown file,
such as a new major version.
The is at odds with Ironic's security model where we do want to
ensure the end user of ironic is asserting a known checksum for
the image artifact they are deploying, so they are aware of the
present state. Due to the risk, we chose to raise this as a CVE,
as infrastructure operators should likely apply this patch.
As a note, if your *not* forcing all images to be raw format
through the conductor, then this issue is likely not a major
issue for you, but you should still apply the patch.
This is being tracked as CVE-2024-47211.
Closes-Bug: 2076289
Change-Id: Id6185b317aa6e4f4363ee49f77e688701995323a
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
When we were fixing the qemu-img related CVE, in our rush we didn't
realize that the logic for storage sizing, which only falls back to
actual size didn't match the prior interface exactly. Instead of
disk_size, we have actual_size on the format inspector.
This was not discovered because all of the code handling that side
of the unit tests were mocked.
Anyhow, easy fix.
Closes-Bug: 2083520
Change-Id: Ic4390d578f564f245d7fb4013f2ba5531aee9ea9
Add a CI job to leverage a 4k logical block disk image which is
deployed to the remote system to ensure the build pipeline and
code to naviate 4k disk images is in working order.
Change-Id: If7aee654f9282b33ea489558f45f45cfed86e9d1
Checks are added to three places:
1) Power state change API
2) Power sync loop in the conductor
3) The common node_power_action call
Partial-Bug: #2077432
Change-Id: Ifcc539b32022870bf8e96aa17fdeb2d111d2a393
According to the driver-requirements.txt, nironic requires pysnmp >= 5
now, so this logic is just useless.
Change-Id: Iea843689ebf04fa0539c0ff2c783c18131646dff
Recently we became aware that some operators might need a larger
block size, but our CI testing doesn't represent any ability to
assert a different block size.
We can now assert a block size override in the scripting which
allows us to create a CI job.
Change-Id: I8470fb5b2827226dc155938a94c3a2cbe98912b5
This patch adds some initial documentation
for the update step available via
the redfish firmware interface.
Change-Id: I4a70e2e78d725fd96a2ddd116c6d6e0d9c3b9639
Both the driver and the conductor code try to transition the node to
INSPECTFAIL, with the 2nd attempt failing. Rework the driver code to
only do implementation-specific clean-up. Also safeguard the conductor
code against this case.
Change-Id: Ie1c64b4807ecf29fa0da54501798d363675977c8
Its existence is probably a legacy of the iSCSI deploy times. Currently,
we have 4 different base classes/mixins in agent_base, which is
confusing even for a long-term contributor like me. AgentDeployMixin is
only used in CustomAgentDeploy, so it makes sense to get rid of it to
simplify the code navigation.
All deploy steps are moved to CustomAgentDeploy. Two two helper methods,
prepare_instance_to_boot and configure_local_boot are only used in
AgentDeploy, so moving them there.
Change-Id: Ib670571eb511d2f2e724ecfab1d2abb1ab471346
Add file to the reno documentation build to show release notes for
stable/2024.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.2.
Sem-Ver: feature
Change-Id: I96466f75eab2af275b1c903f1b5f3151f539e34e
The feature is no longer considered experimental. Remove a few warnings
and wordings to reflect the current state.
Change-Id: I1a3dcb54e3a948aa9fc6f59d67f72de51df20e84
This environment was used by SQLAlchemy 1.4 and is no longer necessary
since SQLAlchemy was bumped to 2.0 .
Change-Id: I0e01f61529b633251f99d5a1a3e00ffca6c8837f
This a continuation to the efforts to ensure that the documentation is free from typos and grammatical mistakes so that the reader is not confused. Includes fixes for some of the documentation in doc/source/admin/*
Change-Id: I9ff40f1982ffad86a41e44395b6bee3a8dbfe43a
Adds microversion headers to the root endpoint so the '/' and '/v1'
endpoints consistently include microversion headers.
Closes-Bug: #2079023
Change-Id: Iea78b33e04e256c1139dd46a25f6d6a2be8e1ccc
The qemu-img command is required not only in Red Hat family but in
the other families such as Ubuntu, Debian or OpenSUSE.
Ensure the command is installed by bindep.
Change-Id: I94960fc644e2b8524d14633960a88a71437f0618
ZeroMQ support by oslo.messaging was removed during Stein cycle so
the description is no longer useful.
Change-Id: I7f3fddc49d97195fc18fd2df41a9c505745e43db
It relies on risky stuff like nested read transactions, which are known
to be problematic on SQLite.
Change-Id: I61a885c0cb7555919279b3e21e872752dcffc64b
It's useful, but nobody is looking at it anyway. While I'm going to try
fixing it, running it every time is a waste of resources.
Change-Id: I794e1975f27f4a643b56dc81db8358700a71b8bd
RBAC config options enforce_scope and enforce_new_defaults
were disabled by default in oslo.policy and Ironic had to override
the default value to enable those by default. Now oslo.policy
(4.4.0 onwards[1]) changed the default values[2][3] and enabled
by default for all the services. OpenStack service does not need
to override the default anymore.
NOTE: There is no change in behaviour here, oslo.policy provides the
same configuration that Ironic has overridden till now.
[1] https://review.opendev.org/c/openstack/releases/+/925032
[2] https://review.opendev.org/c/openstack/oslo.policy/+/924283
[3] https://review.opendev.org/c/openstack/requirements/+/925464
Change-Id: I280ae374048b16f1d27a55b09a4d7729de43f469
The default has changed for oslo.policy, no need for us to do
explicit enabled testing overall. As such removing.
Change-Id: I2d91a0c219bd3a2d59cad2775cde5aab46130921
It was recently learned by the OpenStack community that running qemu-img
on untrusted images without a format pre-specified can present a
security risk. Furthermore, some of these specific image formats have
inherently unsafe features. This is rooted in how qemu-img operates
where all image drivers are loaded and attempt to evaluate the input data.
This can result in several different vectors which this patch works to
close.
This change imports the qemu-img handling code from Ironic-Lib into
Ironic, and image format inspection code, which has been developed by
the wider community to validate general safety of images before converting
them for use in a deployment.
This patch contains functional changes related to the hardening of these
calls including how images are handled, and updates documentation to
provide context and guidance to operators.
Closes-Bug: 2071740
Change-Id: I7fac5c64f89aec39e9755f0930ee47ff8f7aed47
Signed-off-by: Julia Kreger <juliaashleykreger@gmail.com>
To make the node fast trackable as soon as
inspection finishes, in addition add a wait for the
agent to callback should it not be available when
fast track is attempted.
Closes-Bug: #2078820
Change-Id: I8a95fc08cf355b7b745a565e3a05c9dc0875a63e
Ironic already has support for automatically setting a lessee on
deployment, but it is only supported for direct deployments with Ironic,
as it uses request context which is not preserved in the Nova driver.
Now, when combined with the related Nova change, Ironic can support this
behavior for fully integrated installations. On deploy time, Nova will
set several fields -- including project_id -- in instance info. If
enabled, Ironic will then use that project_id as the automatic lessee.
The previous behavior of using the project_id from the request context
is still supported as a fallback.
This is being tracked in nova as blueprint ironic-guest-metadata.
Closes-Bug: #2063352
Change-Id: Id381a3d201c2f1b137279decc0e32096d4d95012
Many testing scenarios, including testing "full size" DIB ramdisks
instead of using tinyipa, require adjustment of our proscribed values
for Ironic VM size. Document this in the devstack guide.
Change-Id: I58823fa19d65c12ea2f9229394080f83d1d397f4
With the removal of the wsman interfaces in the idrac driver and only
redfish being supported, the idrac driver should inherit from the
redfish driver to ensure that it properly supports all the redfish
supported interfaces. Furthermore with several of the interfaces being
no-op passthru to the redfish implementation there is no reason to not
let the user select those interfaces as well. With an eye towards not
having to support these in the future, direct users to use the stock
redfish versions in the docs as well.
Change-Id: I79ab44f31660e6d5311db46223e8bd60d2b3f213
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
It only passes because the boot interface handling is broken in
ironic-tempest-plugin. Once something like
https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/925981
merges, it will start failing with a timeout.
Temporarily remove it from the list to allow progress on other changes.
Change-Id: I155f520be9b5815f390364c4de12528920b7594a
Lots of references to deprecated ways of doing things, as well as two
entire separate sections dedicated to how disk erasure works.
Also ensured we reference new valid config options surrounding disk
erasure.
Additional improvments could include adding documentation around how to
skip disks per node (or linking to any preexisting docs around it).
Change-Id: Ifa029e26eff0637b443d094d85e773b885d0979b
This patch updates network_data to include dns nameservers. This
is especially important when booting virtual media in a dhcp-less
environment.
Change-Id: Icf0d9b5781edd193b2313441e8586b745574edbb
Since we're a plugin, the TARGET_BRANCH instructions in the normal
devstack guide are not enough. We should specifically instruct people to
avoid this pitfall.
Change-Id: I7c9fd98c582984036e0b19714b8f387a31e8715f
Currently, if the inspection network is not provided, neutron-based
network interfaces fail with something like:
Driver redfish does not support inspection (disabled or not implemented)
This is utterly misleading. Use a hand-crafted error message instead.
Same for the PXE boot interface. Also add missing documentation.
Change-Id: I79086db1c270e02a6c74b870acc336e8da54dea3
The documentation contains a significant amount of grammar mistakes.
This could cause confusion in certain scenarios to correctly understanding the
context. Starting to go though the documentation and pushing this commit
as a start.
Change-Id: If2c18909a83ba501b5ffae494934fb631b009e54
Addresses the inline TODO within the Ironic codebase,
to set the ``kernel_append_params`` to the same value as
in the [pxe] configuration after the Xena release.
Current Release: Dalmatian
Change-Id: I1ce3ab560ab04979b7f31393a9877c4d1314925c
Update api-ref, documentation to reflect the new
endpoints and the new way to set node provision state.
Related-Bug: #2027690
Change-Id: I2106691c08eb04d1001ccf97e6e08fc811356874
Removes reference to the deprecated and removed config
option, ``[pxe]ipxe_enabled`` mentioned as a valid
configuration option in error message.
Change-Id: I7747a52f74513645b0dce48781e6ad5dd08fd1e2
Implement cross-referencing to configuration options
through out the Ironic documentation.
Closes-Bug: #2076111
Change-Id: I28712a3a92eb7e7d9875e49ea3ed8800168262fe
ironic-tempest-partition-uefi-redfish-vmedia was renamed to
ironic-tempest-uefi-redfish-vmedia a long time ago
Change-Id: Iaa63e9cf12d47667955973033586fa65dd18e6b7
Ironic docs improvements. Addressing one of the issues from
the Ironic documentation audit. Using gerunds in titles and
including *Ironic* in the title to improve SEO.
Closes-Bug: #2072351
Related-Bug: #2072349
Change-Id: I9f9c47654386df416b51e8a0cd48f5a89f55e799
This change makes it possible to test the new "agent" implementation.
The PXE environment is not migrated so far, so managed inspection is
assumed by default.
Change-Id: I60a11454aefc01333e3f788e2b09ec6e47423223
Adds runbooks; the new API feature that makes it possible for
project members to self-serve maintenance tasks through curated step
lists associated with target nodes via traits.
In addition to basic CRUD support, runbook extends current API flow for
performing manual cleaning and servicing to support runbooks in lieu of
an explicit/arbitrary ``clean_steps`` and ``service_steps`` user-defined
lists.
Demo Video: https://youtu.be/00PJS4SXFYQ
Closes-Bug: #2027690
Change-Id: I43555ef72cb882adcada2ed875fda40eed0dd034
Right now, when restacking to get new code checked out, we fail due to
the dnsmasq directory already existing. Now, skip the downgrade if we
detect the correct version -- as we would on a second run.
Change-Id: I5c3d28f75b66d14540cbafa03bff8b7def688da5
To utilize the idrac-redfish interfaces, you need the sushy-oem-idrac
package to be installed along side of sushy itself.
Change-Id: I3376cd0b40fce49345121ad84d35749241e9dbe8
Allow operators to provide a list of disabled boot modes for
new deployments ``disallowed_deployment_boot_modes`` and/or
enrollments ``disallowed_enrollment_boot_modes``.
Defaults are an empty list, [], indicating all modes are
allowed.
Closes-Bug: #2068530
Change-Id: I1404c81718cd6bb2977e6f298d9b7d11664226d0
Currently, Ironic doesn't add Targets parameter to SimpleUpdate call
when updating firmware. This patch makes Ironic aware of multi-System
BMCs and send Targets parameter if this condition is detected. This is a
prerequisite for using sushy-tools simulated firmware upgrades for
testing.
Change-Id: I5fd0228200afc28b24d90595244d3961b05acc52
It turns out the `controllers` and `storage_controllers` fields
will always simply exist in sushy.
So, a change of approach.
Change-Id: Ia67531178c33bbb7fc79a6385a043f6fd682116c
In trying to chase down why the raw tftp boot of grub is not
happy, I determined that the tftp folder being created had the
wrong permissions out of the box. Ironic has an optional knob for
this, so we're going to set it by default.
Change-Id: If2a0e5e47163a3525ecd245e8b54cacea9a615de
Change I45ee1c8a73ed13511bc47a69130105f16d34be1e inadvertently broke
the anaconda deploy interface because it sends an empty callback url.
Seems valid enough in that case, it is now handled.
Change-Id: Ife6fa3469ee6eb0663b4af63197deab96ed6aa1e
While troubleshooting grub network boot issues,
I did some reading and found out our model of config
was technically wrong to use a menuentry to load config
in another menuentry which may or may not be loaded.
I mean it worked, but it is simpler to just say
"go source this content into our state".
Change-Id: I5e2ec2dc5110fa0a4f9e11478502a199354454f5
Long story short, in some circles the EFI grub network boot
over http with VMs is regarded as unstable. What appears to
be happening, with service logs at least, is we get a HEAD
request (grub code always issues a HEAD request first to minimize
memory usage), and then re-requests the file contents.
So what we end up seeing on the grub side is:
error: Fail to receive a response! status=104
error: Fail to send a request! status=0x800000000000000f.
What appears to happen is things get gumed up in firmware and
the generally that means we can't run this test in CI.
Change-Id: I1471c9429b742abb250b9a3a910108f1711ad574
We added this option, and advertised it's default would change several
years ago. This completes the migration.
Change-Id: I64f80fa2f971a223156cc5bf4231b59da0189885
This configuration directive is completely untested, undocumented,
and requires deployers to manually configure significant
infrastructure for it. It also bypasses several deploy-time sanity
checks around whether or not we expect the server to boot.
Deprecating it for removal in 2025.2 release.
Related-bug: #2071741
Change-Id: Id73d9097e9e4152c7b635a4269b548c9dbdda0a6
This is a quick and easy way to get Ironic up for testing; updated to be
even quicker and easier if you don't need multiprocess or mysql.
Co-Authored-By: CID <cid@gr-oss.io>
Change-Id: Ibef8a24868fd1f507e69e6d615d6327031d11495
Issues an error on removed items still used in the configuration.
Issues a warning on deprecated items or nodes that use removed drivers
or interfaces.
Change-Id: Iebb4cd611f7111cde20acf9ba3d4c9127925b6cf
Closes-Bug: #2051954
Replaces depracated field ``storage.storage_controllers`` with the
new ``storage.controllers`` in the Redfish driver.
Closes-Bug: #2070485
Change-Id: Ibe66c73c8d2e402fabaa7a3a2fbc2f3c44e47dbd
This commit makes changes neccessary for redfish.firmware.update to work
as both clean_step and service_step. This is done by adding a service
step decorator, adding conditional code to pass the execution to the
appropriate subsequent functions and modifying the periodics used to
handle async tasks.
Change-Id: I20a40127f66f734005a03365b806310a155dc237
This change adds missing RPC calls and handlers to bring service steps
to parity with deploy and clean steps, allowing service steps to run
asynchronously.
Change-Id: I82f95236797f24b84798be92b53deb7ec4f46dce
Adds a pretty straightforward Sphinx plugin that reads the JSON profile
file and renders it nicely in a document that is then included from
the Redfish page.
Change-Id: Ic2da61cb510897eac8a2e162816cfd05cc22994c
* Remove @Redfish.Copyright, it's not an allowed field.
* Remove unused fields such as AssetTag.
* Mark fields used in inspection and other optional features as
optional.
* Add missing VirtualMedia on System and VirtualMedia fields.
* Add missing Thermal information and its subresources.
* Clarify which fields on System are links.
* Add Purpose field to all fields and actions.
* Expand the BIOS object with the actual requirements.
* Add ServiceRoot and services we use.
* Add missing resources related to RAID.
Change-Id: I35f8f49e4c70e736b685c3eeebf79326592b6314
This is largely inspired by the excellent feedback we got from David
Welsch, although this patch is only a very early first step towards
where we want to be with the documentation.
First, I'm splitting the large administrator guide into several large
sections: features, operation, architecture. Some of their topic might
actually find a better home outside of the administrator guide, but I
don't go that far in this change.
Second, I'm grouping several separate things together with the larger
topics:
- API topics are relevant for users and are grouped with the user guide
- Configuration guide and release notes are grouped with the
administrator guide.
- The command reference is renamed for clarity and also grouped with the
administrator guide since these are not user-visible commands.
- I'm dropping the "Advanced topics" subsection. While I like its
intention (and I think it was me who added it in the first place),
it's clear that such separation makes these topics much less
discoverable.
Third, I'm playing with :maxdepth: here to make the sub-pages more
informative.
Change-Id: Icd0a35b252136b7da107c6346c48473cf1b99bcb
The goal here is to give newcomers an easier overview of the contributor
guide. Currently, the index page only points at a couple of sections in
the contributor index, which may be confusing. So:
1) Expand the contributor reference from the index page one more level.
2) Update headings in the contributor guide to match the toctrees and
their expected level.
3) Expand toctrees in the contributor index one more level.
4) Move references to the development enviroment to a higher level
toctree to make them visible in the index.
5) Apply consistent upper case heading.
Change-Id: Ifb9fdc96b368095437771217090120e83eaa0fa7
Instead of only file-based persistence which leaves files
with credentials on the conductor disk for the duration of
the session.
User can now pass ``True`` to the ``store_cred_in_env`` parameter
which instead stores IPMI password as an environment variable, still
for the duration of the session, but limiting exposure to just the
user session of ironic and anyone that has access to it.
Defaults to ``False``.
Closes-Bug: #2058749
Change-Id: Icd91e969e5c58bf42fc50958c3cd1acabd36ccdf
The syntax we're using there is not valid, change to a definition list,
add double ticks and change a mention of an option to a link.
Change-Id: Idf37436d034fe8bb65bff92eddadfd82d7431df0
Redfish hardware usually will support better methods -- e.g.
redfish-https or redfish-virtual-media, however we've had some user
requests for the http boot interfaces on Redfish.
Since we can generally expect the generic boot interfaces to work on
redfish, this enables all of them.
Related-Bug: #2032380
Change-Id: I9c36072f6165baaa985862113b283f34bed7bee4
This commit introduces support for provisioning ARM (aarch64)
fake-bare-metal VMs in Ironic for the purpose of eventually supporting
CI testing on ARM64 architecture-based hardware.
Change-Id: Ie4bff8892228275ad0fb940c30e8071f7f4c423f
There has been no testing of this hardware type in quite some time,
and the last we heard the vendor was moving towards redfish.
Change-Id: Ib32db463981ec54430884ac760956b7c7b40b17f
Implement execute_service_step() in AgentBaseMixin that will
asynchronously execute service step on the agent. Without it, Ironic
will try to find <step_name> attribute on the object that implements
interface specified by the servicing step.
Example:
Step: [{"interface": "deploy", "step": "burnin_cpu"}]
Error: AttributeError: 'AgentDeploy' object has no attribute 'burnin_cpu'
Closes-Bug: #2069430
Change-Id: Idb1d5b50656c3765ea5c9e21b7844946ae4cfc67
Signed-off-by: Przemyslaw Szczerbik <przemyslaw.szczerbik@intel.com>
When [pxe]enable_netboot_fallback option is enabled, it's necessary to
build PXE config for nodes in SERVICING provisioning state. Otherwise
node servicing tear down will fail and node will be placed into
servicing failed state.
Closes-Bug: #2069413
Change-Id: Ib00504563f9fa7bed99a0fa1949ac99ea6870875
Signed-off-by: Przemyslaw Szczerbik <przemyslaw.szczerbik@intel.com>
Provides a complete documentation for metrics that the Redfish
management interface can collect.
The Power payload refers to InputRanges in a broken way: this field is a
list, but the code treats it as a singular resource. No hardware I have
access to provides it this way. Since input ranges are constants and
thus arguably don't qualify as runtime metrics, removing them instead of
fixing.
Change-Id: Ida1be1341346df917073e649a23a2f116b262e66
This page is huge and keeps growing. So:
* Move additional topics to sub-documents.
* Move ESP creation to the install guide (it's not even
Redfish-specific).
* Create a generic firmware updates document.
Provide a feature listing at the top for easier navigation.
Change-Id: Ic58c139da5e1e60f5ce4d2cec18972ebee9e2485
Currently, Ironic creates a pxe link file for every port,
even when a port's pxe_enabled property is set to false,
which means it can still boot from this port when it shouldn't.
With this commit, unless explicitly configured otherwise, only
pxe_enabled ports (pxe_enabled=True) will have the pxe link file.
Closes-Bug: #1741422
Change-Id: I013861dd5b9a7525058606f8dc8b05502a28af1e
The [conductor] graceful_timeout option does not exist, and
the [conductor] heartbeat_timeout option is actually used instead.
Change-Id: I689fcf8c392eecbcf8ee12b2f67f78f9f22d17aa
A recent evaluation of the lookup code yielded an awareness that
while we're sort of following the overall community pattern of
testing what we expect in terms of patterns and behavior, really
needed just a little bit more in the way of testing.
While ultimately, these tests are really just exercising front
end validation, it is still important to check to have increased
assurance of a secure codebase.
Change-Id: Iaa917191e0f118f8828161174ea1fe8c55c8f4ee
Apparently, this has been around for ages, btu the error was likely
not exactly right as a result of this. Anyway, quick fix.
Change-Id: Idee3c1edfdd65928eaa5f8d30b62474d85dec277
While looking at the overall heartbeat/agent workflow, it seemed
like the [agent]require_tls setting should likely be True by
default, as we are well past the initial phase where operators
might not have the TLS capability when upgrading.
Change-Id: Id526e948e6c5ed032d7542232b1c1a31cb285b26
While agent_url is software generated, it is still a public endpoint
and at least needs some upfront filtering applied. To do this, we
can leverage urllib in the standard library to disassemble the
url, and reconstruct it based upon the standards. The plus of this
approach is that it will remove some invalid formatting for us, and
if things are too out of line, an exception is raised as ValueError.
An important note, this is *not* explicitly urlparsing security[0] as
denoted in the Python urllib documentation, but that the application
should operate defensively.
[0]: https://docs.python.org/3/library/urllib.parse.html#url-parsing-security
Change-Id: I45ee1c8a73ed13511bc47a69130105f16d34be1e
Determine the appropriate GRUB commands during UEFI boot
based on the node's CPU architecture.
Closes-Bug: #2050054
Change-Id: I0c5f513cdc8f4112f8dfdeb4ccaf566d3424a2ca
Replace all instances of `datetime.datetime.utcnow()`,
which is deprecated, with the timezone-aware oslo's
`timeutils.utcnow()` method, across the Ironic project.
Closes-Bug: #2067740
Change-Id: I998681c14f945846f58e723b9be2202dbe8ea12c
Just changing "deployment or cleaning operations" to
"inband operations", since the agent can run in many
different inband operational steps.
Change-Id: Iaaa03ebc3dab724eb7afb0ee686bd22c8a2879be
Some of the configuation knobs require more specific details
regarding security or denial of service related possibilities
if tuned to inappropriate values.
Overall, just some minor improvements for clarity
Change-Id: I008d6e00a528bddba0f843f34968155a9da3ff36
A quick review of the security documentation yielded a need
to revise and clarify the security documentation a little,
which includes a couple security related features.
And also fix the syntax on the example policy entries while
adding a missing third, ! results in a "default false" response.
Change-Id: I3d10ca4631703051109c443d5591a7e86f858c66
This highlights:
- The dev-quickstart guide, which is more up to date than most
contributor docs.
- Common docs (OpenDev / OpenStack contributor guides)
- Bug information (also very up to date)
This removes:
- Top-level link to BfV and multitenant devstacks; these configs are
linked from devstack-guide
- Hilariously out of date information around branch support
Change-Id: If47d9776c65c91b972a3fab8364eacc50a29b2bb
Depends-On: https://review.opendev.org/c/openstack/ironic/+/920365
Codespell upgrade caused failures, fixed spelling where
appropriate, added ignores where appropriate.
Some new package release broke pep8 runs; fixed by no
longer pinning Pygments version.
Change-Id: I670bbb170823d6a0ace8eeb9d9e486e8e9bf7404
Python regexes are deprecated in Zuul, so this commit updates the
configuration to use RE2-compatible syntax.
Change-Id: If4973be103076f5a3879dc630e104d129377f7da
... instead of using the whole url string. The url string may include
the pattern and the current logic is not robust enough to ignore
wired naming.
Change-Id: I80c59c67773f868b45f1ff3b34877c1bab73b225
The state digram is constrainted into a frame of the page rendering
and the prior configuration set it to be a maximum of 660 pixels,
however we should allow the image to be aligned to page size which
can result in a larger image, but still constrained slightly so
spinx includes a link to the image.
Change-Id: I19350fc010bd5aac798b2d57ea3d2eb98239a457
Good news, pydot (original) is maintained again and pydot2 apparently is
not. By switching to pydot instead of the fork, svg generation works
now.
This adds states for servicing, and swaps us back to svg for the
regenerated diagram.
Change-Id: I410182ee04293434d889747ddec229870c908d91
This makes all the image upload commands in the devstack plugin use
--file instead of stdin redirection, and also uses an absolute path.
One of the commands was already doing it this way. By doing the upload
like this, it makes the devstack plugin usable with the OCaaS devstack
mode (for faster openstack client ops) since we can't pass the image
stream via stdin. Most people will be using --file for uploading
anyway, so this is probably more realistic anyway.
Change-Id: I8d97ed731133d02aed46a078c50769692ad7ba04
Cirros partition images have some underlying limitations,
meaning it is not ideal for any step which requires the image
to hae commands executed in it to perform operations, such as
mounting additional filesystems in UEFI mode, or installing
grub in BIOS mode.
This is because cirros images are an unpacked ramdisk, in other
words, the posted disk image *has no* contents on the root
filesystem of the image. While we attempt to unpack[0] this as well,
this can also fail creating false failures resulting in check
jobs failing and then working on recheck.
As the constraint is the same as the BIOS mode check, and there
is no realistic fix, this change removes the boot mode check and
thus always disables partition image testing with tempest *when*
cirros is in use.
note 0: We presently unpack using a virtual machine launch so it
takes place with the same process as when cirros starts, however
linux doesn't always boot, and the tools don't really determine
if that is the case or not, and if we retool it, we should just
move to a direct extraction and image re-pack.
Change-Id: I7687ff1eddb14d22b981860d4c4c9b172bae45b7
The bugs these work around have been fixed for a long time, and we
require modern eventlet for Ironic. Let's remove the workaround.
Change-Id: Idecb3c5a774aecc6b65d0abd0262fe4b8625c6b7
pre-commit is a git hooks framework which does lots of useful things
before you commit, like validating lint and codespell -- easy things to
forget, especially in a post-codespell world.
Related-bug: 2047654
Change-Id: I22738f9dceebe194e5aedff8815cd786013de456
Had someone try to boot the tinycore ISO on a UEFI machine, and they
got a nice error. Just turns out we needed to update our docs a little
bit to provide appropriate clarity.
Change-Id: I1adfb62ea22d0b58740ceadc8c338fc04d9b78de
These are detected as errors since the clean up was done[1] in
the requirements repository.
[1] 314734e938f107cbd5ebcc7af4d9167c11347406
Also remove the note about old pip's behavior because the resolver
in recent pip no longer requires specific order.
Change-Id: I742ea0192398b9e9b78b969fa81f65621d9490de
Boot from volume feature has a ipxe template render step, and it need
to generate iscsi urls for booting the volume. However, it not works. In
the function, lun field should be hexadecimal instead of decimal,
according to SAN URIs description at https://ipxe.org/sanuri. So we
need to fix it.
Closes-Bug: #2055355
Change-Id: I080ca42c9ba05f2a4e0752312b79a32bef825752
Signed-off-by: frankming <chen27508959@outlook.com>
To serve as a mechanism to allow an interlocking device identification
this patch injects a publisher id value into ISO images *and* the kernel
command line for any software running from the ISO image to match
the ISO in use to the location of data housed locally from within the
image.
Related-Bug: 2032377
Change-Id: I9b74ec977fabc0a7f8ed6f113595a3f1624f6ee6
- /v1/nodes/test.json will now only mean node with the name
"test.json"
- /v1/nodes/test.json.json will mean a node with the name
"test.json.json" and,
- /v1/nodes/test will mean a node with the name "test".
So /v1/nodes/test.json will no longer default to "test" and
will HTTP 404 unless a node with the name "test" actually exists.
This also removes the backward compatibility with the
guess_content_type_from_ext feature
Closes-Bug: #1748224
Change-Id: If4b3a23e2a09065f5e063e66cff66b96af4d3393
The [molds] password option accepts a secret value apparently. So its
value should not appear in debug logs.
Change-Id: If8a54c1d4f74516f1c24f7286e76955b2e424f5c
oslo.config provides the URIOpt class which enforces valid URI(URL)
format. Use this built-in feature to detect any malformed values for
better feedback.
Change-Id: I0d846f78f8132a2d63266b7b3331ec7118cea1b4
This may be the most overridden default of Ironic, which means
we need to change the default value.
The default for ``[redfish]use_swift`` was historically ``true``,
however we've generally found that BMCs are particularlly sensitive
to extra characters in the URL as the characters may signify a
dynamically generated file, which would be problematic as virtual
media webservers also generally require range retrieval support.
This change makes the default ``false`` which should lead to one
less override for operators being necessary in practical operation.
Change-Id: Iad57b3c6423bced0e3cb6fb4e31aad6d805f26fa
The functional test job needs neutron for neutron dependent
tests to be able to execute. As such RabbitMQ and Neutron
are enabled for the functional test job at this time.
Change-Id: I04f76ca1570b969136c0922eb07bc85360369920
Adds a release note, unit test, and documentation update as a follow-up to the
`microversion-parse change <https://review.opendev.org/c/openstack/ironic/+/913793>`_
Change-Id: I535af988125a511e4f54c9d81acd47c327413774
Currently, service steps are only supported as a user-requested action.
This change removes references to overriding priorities for service
steps as these overrides will have no effect and may cause confusion.
Change-Id: I35a8b59b17fdff3161df835903acec529e732c4f
Currently, _heartbeat_service_wait() is causing an error due to a
missing parameter in the call. This change resolves this issue by
removing the reference to the missing parameter.
Change-Id: I03faa67953daf282ae1b576a2a949c94a2efa973
Currently, service steps may fail to start in scenarios dependent on IPA
fasttrack. This change attempts to resolve this by incorporating
servicing states in the fast track allowed states whitelist while also
making _FASTTRACK_HEARTBEAT_ALLOWED a superset of _HEARTBEAT_ALLOWED
instead of duplicating values in the two constants.
Change-Id: I47984469c1432e7fc7b4f1494b9f6c551c34672f
A small bugfix - you now need to ensure that the user env var is set before running any openstack commands (At least the ones that devstack uses)
Change-Id: I4afad7ea588cf6505a7b1186c749d13827b24290
Resolve the following RemovedIn20Warning warning:
The Engine.execute() method is considered legacy as of the 1.x
series of SQLAlchemy and will be removed in 2.0.
Closes-Bug: #2061345
Co-Authored-By: Stephen Finucane <stephenfin@redhat.com>
Change-Id: Ib0519af8a15ca02e351f8d739d52f4e658f7615a
We ended up using two names for the same flag (and forgot it in one
place completely). To not just fix the issue but also prevent it in the
future, refactor asynchronous steps handling into a new helper module
with constants and helper functions.
I've settled on servicing_reboot as opposed to service_reboot because
that's the value we currently set (but not read), so it provides
better compatibility when backporting.
Remove excessive mocking in the Redfish unit tests.
Change-Id: I32b5f860b5d10864ce68f8d5f1dac3f76cd158d6
Serious issues:
- Nothing powers on nodes after servicing, so they end up active and
powered off in the end.
- Restoring power state was done three times.
Minor issues:
- Function _tear_down_node_servicing is called twice causing a traceback.
- Furthermore, process_event('done') is also called in another place
in deploy utils.
- Make sure nodes are never considered for fast-track when servicing, it
prevents clean-up of virtual media devices.
Change-Id: I92fd7a0009a816e93e316e4674c7509b61a474d4
Unlike clean, deploy and verify steps, service steps cannot run
automatically and thus do not have a usable notion of priority. It's not
possible to provide a priority through the API but our validation code
still requires it. This change gets rid of most priority handling for
service steps, leaving only some foundation for future enhancements.
Change-Id: I82aefc03a5c062b67e0f457612fe568399226dc8
Currently, service steps do not work with virtual media deployments
because states.SERVICING and states.SERVICEWAIT are missing from the whitelist
of valid provision_states. This change resolves this issue.
Change-Id: I5e3ec08d128b35385f2d90c9c852140b757b8dbf
Fixes usage of redfish detach virtual media feature to be conform to
the general implementation.
Before the detach virtual media API call using redfish driver was not
working as intended and caused the operation to fail.
The method implementation was allowing only a single device_type
while it should be multiple devices to match the conductor manager
implementation.
Change-Id: I9edd3b77eeb3ec1b0484d4e6f0c6dea53e83f9ad
The generate path does not contain the node UUID, causing conflicts.
Also make sure to always clean up any existing files first.
Change-Id: I30f948d64e7b87f33841dc22828db60338a62dd8
Cirros, by default, as part of its initialization, copies the initial
ramdisk contents over the filesystem on disk. This changes the partition
image creation job so we do it upfront so the partition image looks like
and matches what we generally expect from a partition image as opposed
to just a kernel, ramdisk, and bootloader.
Change-Id: Idde30e33e9453f8564a7c3b9109c4e567146dee7
For compatibility with pysnmp-lextudio and pyasn1 we increase the
minimum required version of python-scciclient to latest available.
Also capping proliantutils to avoid breaking changes.
Change-Id: I64587d24383dc05927135d7e7e3a2a6975a58558
Currently the online database column is not considered when displaying
the "baremetal conductor list" Alive status. This means that when a
conductor is stopped gracefully it will be shown as (inaccurately)
alive for the duration of [conductor]graceful_timeout.
This change adds the online field to the alive evaluation, so the
conductor must be online *and* have a recent heartbeat.
Change-Id: Ic5a8d56ec236faca1b9797bd0d3e42c956469fab
Add file to the reno documentation build to show release notes for
stable/2024.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.1.
Sem-Ver: feature
Change-Id: Ie669c480f361e7df0ba0566931685a3bf38045a2
This commit increases the length of the 'user' column to
accommodate longer UUIDs, ensuring that the full user UUIDs are stored
without exceeding the column limit.
Closes-Bug: #2054594
Change-Id: I59b435ca2bb5850bb2338228b64868c2003bfea3
Changing the ironic-tempest-uefi-redfish-vmedia and
ironic-tempest-ovn-uefi-ipmi-pxe jobs to only run
tempest test_baremetal_server_ops_wholedisk_image.
We saw failures on the partition tests for this jobs.
Related-Bug: #2057972
Change-Id: I2e26d7955ade11046bf89b6f4c9c2c4f16da1574
Configuration was fine for the gate, but I've updated it to ignore
directories and files that can be found in a well-used development
checkout.
Fixed any new spelling issues.
Change-Id: Icef5addba938b27911c26e841a37a2e9ba3fbe43
The latest pyasn1-lextudio patch has been yanked [1] and
all the working versions have been removed from pypi, it
looks like lextudio won't maintain pyasn1 anymore.
We should move back again to plain pyasn1 [2].
Also bump proliantutils min required version as it
has already switched back to working pyasn1 and pysnmp.
[1] https://pypi.org/project/pyasn1-lextudio/
[2] https://pypi.org/project/pyasn1/
Change-Id: Id2e7f75690c42fe0785b4ab0fb0a885261a44aef
DMTF now changes their Virtual Media URI to support Systems. Redfish driver now
support this resource to boot, so it is needed that Ironic have a way to use it.
Closes-Bug: #2039458
Change-Id: I66e8edb847e93f96374072525222f05e7561fb07
This patch adds implementation of attach/detach generic virtual
media device to the Redfish driver.
Also the redfish vendor eject vmedia action is now deprecated
and it will be removed during the next cycle in favor of the
generic API.
Change-Id: I9daff59128f537a3da2e882adf5c97be9c4ab8d9
Ironic team decided to switch back the project to launchpad, so
the reference in README should be updated accordingly.
Change-Id: I4a05e998614aed7ebdd62bf2bc3e28a7fa17a444
Some of the endpoints are *highly* restricted in ironic's newer
more stringently enforced RBAC world. Some of these endpoints would
emit 500s by default, when realistically it was the policy definition
saying "only system scope could be used" for the endpoint, but the
reality is that 403 is what should have been returned for a client to
properly understand what is going on.
Change-Id: If5e13764dad886ba3ee1a848f3ff9f3279f4d7f6
Adds a job which utilizes the redfish-https boot mechanism
code which recently landed in ironic, which operates similar
to virtual media
Change-Id: Iad55a263ed34e6b121495b72a3c79449d7471901
Reopen web console may occasionally result in duplicated
sol session. get_console action open
one console process while another sol session remains.
This patch adds "sol deactivate" action before get
console. Make sure the current connection always a success.
Change-Id: Ie5d9c94a3e9e3561b6aa1a52462d6739662d4eb0
Currently, if the image download fails, there are no traces of the
error. This change adds logging and populates last_error.
Change-Id: I73ea2f94fb910daf21a5d4f52d6839aac3bad579
``redfish_system_id`` is being passed multiple times to the node at
creation as ``node_options`` never defaults back to it's initial state
throughout the iteration of the while loop.
Though it is surprisingly functional, it's fragile and this change aims
to fix that.
Closes-Bug: #2054597
Change-Id: I2c151afafb86191f047985ac00075a791639646d
A temporary path forward to increase CI stability, by pinning
to what appears to be a "good working version" of upstream dnsmasq
which does not crash fon us.
Change-Id: I3295c92fd7b7871ad351b94f4c6cf0f554279db0
Currently, arguments like "fields", "shared" or the new "device_types"
only accept comma-separated strings. While there is no single standard,
the most common approach is to repeat the arguments, i.e.
NOT /nodes?fields=uuid,name
BUT /nodes?fields=uuid&fields=name
Unfortunately, at least GopherCloud already relies [1] on the more common
(but not currently working in Ironic) behavior. Let's make it work.
[1] 8455d01343/openstack/baremetal/v1/nodes/testing/requests_test.go (L87)
Change-Id: Ia780b10986929d79dc4f334d278bcb00a9984fd0
* Adds httpboot enabled CI
Sets the interfaces to be enabled on the standalone job
in anticipation for standalone job support for these
interfaces[0] and switches the boot from volume
job over to utilize httpboot by default.
* Phases out now-redundant ironic-standalone job
The ironic-standalone job does all the same work as the
ironic-standalone-redfish job, and IPMI CI jobs are covering
IPMI cases. So we don't need this anymore.
Also makes a note on another redfish job which is likely
redundant as well, although does not yet remove it.
* Fix IPMI Partition Job name
This is more of a test of OVN than of partition specifically,
in fact, it runs both wholedisk and partition jobs already. Make
the name more sensible.
[0]: https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/902171
Co-Authored-By: Julia Kreger <juliaashleykreger@gmail.com>
Change-Id: I6c41b8f124e2e1fbc314243bf821153d79e2e09b
DNSMasq is not cooperating with us; it's respawning frequently.
Temporarily make these non-voting while we troubleshoot.
Change-Id: I0dcb09f31254d81d3a5ec9a52304bab93901e8f6
The logic to handle dnsmasq hostfiles is moved from ironic-inspector
with only cosmetic changes. The logic to purge the hostsdir is not
copied since it relies on running commands with root privileges.
A documentation example is added instead.
The change is missing the RPC call to notify the filter about changes.
It will be done in a follow-up.
Change-Id: Ie32018c760c39873ead1da54cfaeae87eaaaf043
dnsmasq-2.86 shipped in Ubuntu jammy has a
known issue[1] which is fixed in dnsmasq-2.87
but it's not yet released with Ubuntu jammy.
Until fixed version is available in Ubuntu
jammy let's use source install instead of
using a older version from Ubuntu focal.
[1] https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2022q3/016562.html
Update from Julia:
Pushing forward the source fix again as ubuntu removed the
prior path we were using as a focal package and replaced
it with a package which is demonstrating the same basic issue.
Related-Bug: #2026757
Change-Id: I7ffcd167fc1e3a8c1192d766743bb5620d85ef35
doc/source/admin/drivers/redfish.rst has a ESP configuring script.
However, it not works. In script, DEST is the destination esp image
path that can not mkdir or cp on it. We need fix it and using mtools
instead of sudo+mount.
Story: 2011051
Task: 49615
Change-Id: Ibdd60bea2f742e8e698d9dbcfef7059a9a71242f
Signed-off-by: frankming <chen27508959@outlook.com>
The current implementation in common has a lot of assumptions that the
manager is a conductor manager. To be able to reuse the same base RPC
service for the PXE filter, split the conductor part away from the
common one.
Change-Id: I4d24cf82d62cb034840435ef15b5373748b65f09
While looking at fixing a bug around enforcement of
the presence of specific ironic.conf parameters which
don't need to be present in all cases, I noticed we
had translated indicator message strings (which of course
are not actually translated, but really should be the actual
configuration paramter name in ironic.conf *OR* driver_info.
So ultimately, I decided to fix the text to be accurate
and appropriately verbose for the current ways it can be
configured, as opposed to the singular way it was available
when the capability was first added to Ironic.
Change-Id: If402814791554ef3143e25426fdc7e49a5b04810
In the early days of the neutron network interface, we had a hard
launch failure added to prevent ironic.conf from having a neutron
network configuration which was not valid when the neutron network
interface was in use.
But as time has moved on, these settings became node-settable,
and ironic configuration largely became mutable as well, so they
can always be added after the process has been launched.
But we kept the error being returned. Which doesn't make sense
now that it can always be back-filled into a working state
or just entirely be "user supplied" via the API by an appropriate
user.
Closes-Bug: 2054728
Change-Id: I33e76929ca9bf7869b3b4ef4d6501e692cf0a922
Starting from jsonschema v4.21.0 the message for empty objects
validation changes from "is too short" to "should be non-empty" [1].
Handle the two cases so we don't break in case of upper-constraints
change.
[1] a1d4cb3b94
Change-Id: I01336d2966bdad8f8e2aec7a522644cff1d5c341
Special cases boot/uefi record setup to focus on UEFI
nvram updates instead of attempting nvram updates *and*
setting the boot device to disk.
Closes-Bug: 2053064
Change-Id: Ic6584479a47146577052d17fa3f697eef64ac73c
Extension to extend the default service project name
value, which if set can be overrridden in Ironic's policy
configuration.
Change-Id: I60cc53a34c7062261703492e720989efedca4f2b
If user runs `ironic node-create -d <driver>` with a driver that is not
enabled or activated, or when no registered conductor service which
supports the given driver in a specified conductor group is found,
the current exception may not be clear enough or confusing for users.
This proposed change would prompt user to ensure that the input driver
is valid and enabled, and guide to run the `driver-list` subcommand to
show the current supported drivers; adding context for the reason for
the exception.
Now, if input driver is not valid or enabled, we get below message:
"No conductor service registered which supports driver <driver> for
conductor group "<group>". Ensure the driver is valid and enabled. (HTTP 400)"
Implements: clarity of exception message
Closes-Bug: #1398286
Change-Id: I592f3ce278d1b536ed91c3340b7f270985e309ac
Signed-off-by: Afonne-CID <delightinbusiness@gmail.com>
When I thought change I2b4bcc748b6e43e4215dc45137becce301349032
was going to fix everything, that was with the mental model that
it was going to be enabled by default. That didn't happen in
review as part of the service, but the reality is we still have
some adjacent CI jobs which need it to operate properly.
Given CI, it is just invoked when scope enforcement is enabled
for CI purposes
Change-Id: I60074504742d8b09017acbb42d2706215b0169af
Previously, we updated node_periodic so we understood from the
logs when a periodic task was completed, so we could understand
where things were at in our hunt for database lock racess.
In any event, we now explicitly log in the _sync_power_state
method of the conductor, because it is not a node_periodic.
Change-Id: Iaec9926fe031e65de4732ff0bc7988c5604d4755
This is the first of potentially many commits to update away from openrc auth towards RBAC auth in devstack.
This commit focusses on the documentaition changes only, with future commits dealing with shell script changes and anything else that needs to change.
Change-Id: Ibb13993a0fd251f8948c7a813ed67cae701dab01
Third in a series of commits to add Codespell to Ironic Repos. This commit adds the Tox Target to CI
A future commit could potentially add a git-blame-ignore-revs file if their are lots of spelling mistakes that could clutter git blame.
Change-Id: I82239bd5ca1b184e36c63d08413362c76fa8d4b4
Second in a series of commits to add Codespell to Ironic Repos. This one adds the command that was used to fix the spelling errors.
Future Commits will add CI support and potentially a git-blame-ignore-revs file if their are lots of spelling mistakes that could clutter git blame.
Change-Id: I206f51f277d19bbcec450ed5312cd30d6fba8432
This is the first in a series of commits to add support for codespell. This is continuning the process completed in ironic-python-agent.
Future Commits will add a Tox Target, CI support and potentially a git-blame-ignore-revs file if their are lots of spelling mistakes that could clutter git blame.
Change-Id: Id328ff64c352e85b58181e9d9e35973a8706ab7a
The tox deps option grants installation of single dependencies and
requirements, optionally pinned using constraints, before installing
a package, therefore not granting installation of the correct
constraint during the package installation.
To fix that tox 4.4.0 has introduced the constrain_package_deps
option [1]
[1] https://tox.wiki/en/4.12.1/faq.html#using-constraint-files
Change-Id: I94f02c99d1301e9dcdecb8b5565ef6a24204dc69
Related Bug: https://bugs.launchpad.net/ironic/+bug/1628422
This change makes sure that the caught error is passed through to node_history_record()
Change-Id: I9b78ec37f37024d04928403bbf0b85ed96906441
This change adds two network boot interfaces, ``http`` and
``http-ipxe``. These interfaces are based upon the underlying PXE
boot interface code in ironic, and where this differs is it signals
to Ironic that we must do the boot loader needful in terms of telling
DHCP to send a URL instead of a filename and IP address for PXE
as a starting point.
The naming of the interfaces focuses more on the transport mechanism
and then specific style. Very similar to existing ``pxe`` and ``ipxe``
interface modeling, except in the ``ipxe`` case, it is more a specific
loader and mechanism to be utilized.
Related-Bug: #2032380
Change-Id: Ie7ace88b62b9179f640ef2a732dd228e12bd320d
We added servicing and did not update release mappings, nor did we
update release mappings for final Ironic release of 2023.2.
Change-Id: If4c43e353eb4bba7ae62def84d74877039b170b0
The VTEP switch support patch merged with a constraint of jsonschema
version 4.19 or above.
Except Debian only currently has 4.10, Centos 9 Stream only has 4.16,
and at present launchpad and the ubuntu mirror list is non-functional.
So in the interest of of packagers, we'll lower the version.
Note: I was able to successfully execute the unit tests with jsonschema
4.0.0 installed in the py3 virtualenv.
Change-Id: Ic3667a7663b7bd5dfad4665321d9c82cc08cc885
A long time ago, Mario filed a BZ. But nobody fixed it.
It was an easy fix, and I've done it here.
Closes-Bug: 1662326
Change-Id: I89d4fd9dd93950ff59419c913fe292de17b112e7
tox now always recreates an env although the env is shared using envdir
options.
~~~
$ tox -e genpolicy
genpolicy: recreate env because env type changed from
{'name': 'genconfig', 'type': 'VirtualEnvRunner'} to
{'name': 'genpolicy', 'type': 'VirtualEnvRunner'}
~~~
According to the maintainer of tox, this functionality is not intended
to be supported.
https://github.com/tox-dev/tox/issues/425#issuecomment-1011944293
Change-Id: I71faa02e183ab29511768bca5f84461bcd3b1fe3
This is a MVP of auto-discovery with no extra customization and no new
auto_discovered field from the spec.
Change-Id: I1528096aa08da6af4ac3c45b71d00e86947ed556
Turns out the service role support doesn't quite work,
because you could not enumerate nodes regardless of node
owner or lessee in order to enable services like Nova to
enumerate nodes to be able to schedule upon them, or
networking-baremetal to enumerate ports in update mapping
in Neutron.
So this change enables permissions to be modified to allow
service project users with the service role to enumerate the
list of resources, and grants rights similar to "system scoped
members" to the service project's users with the "service" role
which aligns with update actions to provision/unprovision nodes.
Adds some additional rbac testing to ensure we appropriately
covered these access rights.
Closes-Bug: 2051592
Change-Id: I2b4bcc748b6e43e4215dc45137becce301349032
pytz will be removed from RHEL/CentOS 10 because of the built-in
zoneinfo[1].
Because the current usage of pytz can be very easily replaced, this
removes the dependency on pytz.
[1] https://issues.redhat.com/browse/RHEL-219
Change-Id: Ia72c528eadeccf6075894ff58477fecade65ad71
Remove the sphinxcontrib-seqdiag dependency as the Pillow upgrade to
version 10.x (from OpenStack upper constraints) breaks its usage.
In the ironic source docs, reference the svg files in the rst files,
and keep the .diag files in the doc/source/images/ directory as backup.
Closes-Bug: #2026345
Change-Id: I54cea22e963441b729d4201ad9f8a055a65b54f8
I totally missed Julia's comment in the review, this commit
adds unit tests for the RedfishFirmwareInterface and also more
logs when a specific component is missing.
Change-Id: Ice2c946726103d9957518c2d30ddad3310ee145d
I spotted a typo in the api configuration section for a knob which
was added a while back. Super quick fix.
Change-Id: Ie488143b8383c350de1e36aa0b5cb9b9424ebb3f
The following services were removed some time ago.
- nova-objectstore
- nova-consoleauth
- glance-registry
Change-Id: I3a577c44abe46fc6cb146f3540bded1c5cb4a511
Change the default RBAC policy in ironic such that the new RBAC
policy is enforced by default and the legacy policy is not usable
unless explicitly re-enabled.
Depends-On: https://review.opendev.org/c/openstack/metalsmith/+/905012
Change-Id: Id559f1d8b9a76c8a570b598585c2d58c56d08837
This value is a way for an operator to signal Ironic whether they have
an infrastructure for unmanaged inspection. Previously, unmanaged
inspection was considered to be always supported. With this change,
the inspector-based inspection works as previously, while the new
built-in inspection defaults to only managed inspection.
Change-Id: I4a9125881dc5822656efde1346807c3dd749973e
Currently, the code expects a given hostname to be used by one one node.
This is not necessarily the case for Redfish where several Systems can
co-exist under the same BMC. Use MAC addresses to distinguish them.
Add more inline comments to explain the process.
Change-Id: Ifc5a18bffc7cbcdd8bbbd660aba61fa11403e7e8
The host currently hard-coded is not functioning. This replaces
the hard-coded mirror by the local CI mirror detected. In case
mirror info is not available then upstream centos mirror is used.
Change-Id: I96a8cb45154c9dbb50efecc22d34c4ff75c6722a
Adds basic support for passing OVN VTEP switch metadata to
neutron via Ironic's port.local_link_connection field.
Adds microversion 1.90 to Ironic's API, adding support for
new schema in port.local_link_connection
Bump version of the jsonschema library to ensure consistent
behavior with new schema configurations.
Add documentation warning: This has not been tested as no
Ironic developers have access to the hardware in question.
Closes-bug: #2034953
Co-Authored-By: Austin Cormier <acormier@juniper.net>
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: Ie98dc4552ec2ea16db1e2d382aed54ce9dfef41b
While os.link is supposed to follow symlinks, it's actually broken [1]
on Linux. As a result, Ironic may end up creating a hard link to
a symlink. If the symlink is relative, chances are high accessing the
resulting file will cause a FileNotFoundError.
[1] https://github.com/python/cpython/issues/81793
Change-Id: Ic52f0ddb0c94410dd854ee525e3c57b2e78ea84d
Apparently when I refactored this document, I omitted the actual
section which instructed you how to create the machine. This corrects
that omission.
Whoops.
Change-Id: Ibee5afa33920647a457d3c5182c28bbd92d55ef6
In the backports to fix the policy of the original change, Dmitry
noted that it was actually wrong, because we should have instead
raised NotAuthorized. Dmitry was absolutely correct, because in hind
sight I made the change trying to keep exactly the same behavior,
but the reality is this is a case where we should be explicit,
and tell the user they have done something forbidden.
This revert of the revert fixes that change.
Original Change: https://review.opendev.org/c/openstack/ironic/+/905038
Dmitry's Review Feedback: https://review.opendev.org/c/openstack/ironic/+/905088
Change-Id: I5727df00b8c4ae9495ed14b5cea1c0734b5f688d
The [deploy] image_server_auth_strategy option effectively supports
only noauth or http-basic. However in case an operator set this to
an unsupported value, it results in incomplete behavior without any
explicit error. This is quite confusing and can cause difficulty in
troubleshooting.
This adds validation of the configuration to make operators aware of
any invalid value to improve operator's experience.
Change-Id: I2acf12f81bb64027eeb50924c2ac215c4270ca87
The standalone job has not passed on the master branch in a
very long time. I stopped looking at the history as of August 2023,
but it was only passing on Xena and Wallaby stable branches.
A few things to note:
* It appears to be intended to exercise inspector as a WSGI app,
and loads quite a bit of different configuration.
* Overall, it is failing due to the image it is trying to use
(cirros) which doesn't support setting a bootloader up, but
that is sort of required.
* Job is configured to netboot a node for an OS on the disk,
but that is not something ironic does really anymore. Quite
possibly, that is why the job is failing.
* Inspector, as a non-integrated thing, is on the path of
deprecation at this point, so we don't really need to focus
on this specific case anymore.
Overall, it seems like a job we should have just removed some
time ago. So... doing so.
Change-Id: I3aca63c183af863d9db1f27a4cfe0d6495bb03c2
The tl;dr, is when the allocation owner support was added,
it was done so to try and use the same exception
being raised for two distinct cases. An invalid request,
and a mismatch. The reality is, we should be raising them
separately because there are two different cases we need
to guard against.
This was discovered when changing Ironic's default RBAC
policy enforcement so that the legacy policy is no longer
enabled, which meant the default path on the owner logic was
thus triggered, resulting in the failure and need to fix it.
Closes-Bug: #2048698
Change-Id: I0feefc273a2d18e7812139f59df3f43aba7d7936
Before this change, if a user requested a node to be cleaned
or "managed" with cleaning enabled when the user is in the
system scope, Ironic would attempt to user's token to
make the request to Neutron.
This, unfortunately, does not work, as the neutron client explicitly
requires a project ID to make the request to Neutron. As a result,
Ironic now falls back to it's internal credential configuration to make
the forward request, which matches the behavior if a node has been
unprovisioned and the cleaning has been started automatically.
Closes-Bug: 2048416
Change-Id: Id91ec6afcf89642fb3069918e768016b8b657a31
After removing the iSCSI deploy and changing ISO parsing code to use
a corresponding library, Ironic no longer executes any commands as root
and it should stay this way.
Change-Id: I47d2bab9b94345fbcf89a2a80028853050a041ea
To a large extent, copy-paste from the Inspector docs. Adjusted some
wording to be more generic. Migrated the (un)managed docs since they
apply to both implementations and are important for understanding.
Change-Id: I7d6cdb34f1ffce53b3cac48c8e2df09f8a861422
Adds a redfish-https boot interface, based upon the
redfish-virtual-media boot interface, however substantially copies
some base methods because of simplification offered to use by
putting "attach/detach" logic into how the sushy library handles
the application and reset of a URL as a boot setting.
This feature also increases the requirement for the Sushy library
to version 4.7.0 which includes support to set the HttpBootUri
field in the BMC and automatically unset it as well.
Closes-Bug: #2032380
Change-Id: I991611cd67cb91aea21fc30bbae7cd24409dbbfa
Bandit 1.7.5 introduced some changes which broke the bandit job,
which caused the job to fail ages ago. We've since fixed those.
But, moving forward we need to fix these relatively quickly when
they occur, as such changing the test to voting improves our overall
security posture by forcing us to address these as they occur.
Change-Id: I4a7954bfd20eafdb578896e1f61204edc7f9ec7e
We have some drivers, such as SNMP, which do not support metrics.
Environments with these nodes should not get "N" messages for "N" nodes
that can't generate sensor data.
Closes-bug: 2047709
Change-Id: Ibc1f3feb055521214512c8b350d67933491c2550
The tag_svn_revision option was already removed[1]. The values set to
the other two options are effectively same as their defaults.
[1] https://github.com/pypa/setuptools/issues/619
Change-Id: I6e0cbebac7cd33ad970d921ae761444db7f89a81
Two jobs are changed to test a reduced Redfish implementation:
one PXE job uses the minimum version (only boot/power management)
one vmedia job uses the reduced version (+ NICs, virtual media)
Change-Id: Ib3afdb26b9cd36c0e4f3d736b9c69a5bf508fc0e
Automated cleaning is not guaranteed to be enabled, and in any case it's
too late to cache the components at that point: firwmare upgrades may
happen before the transition to "available".
Change-Id: I6b74970fffcc150c167830bef195f284a8c6f197
This commit removes 'VolumeType' which param has long been
deprecated in DMTF Redfish schema, also removes 'Encrypted'
param as per discussion, and places 'Drives' inside 'Links'
as per the new DMTF schema.
Closes-Bug: 2045645
Change-Id: I91d2decab19e352ca3560227d17acfaa1a1dca94
First, it tries to create components even if the current version is not
known and fails with a database constraint error (because the initial
version cannot be NULL). Can be reproduced with sushy-tools before
37f118237a
Second, unexpected exceptions are not handled in the caching code, so
any of them will cause the node to get stuck in cleaning forever.
On top of that, the caching code is missing a metrics decorator.
This change does not update any unit tests because none currently exist.
Change-Id: Iaa242ca6aa6138fcdaaf63b763708e2f1e559cb0
I've seen a situation where heartbeats managed to completely saturate
the conductor workers, so that no API requests could come through that
required interaction with the conductor (i.e. everything other than
reads). Add periodic tasks for a large (thousands) number of nodes, and
you get a completely locked up Ironic.
This change reserves 5% (configurable) of the threads for API requests.
This is done by splitting one executor into two, of which the latter is
only used by normal _spawn_worker calls and only when the former is
exhausted. This allows an operator to apply a remediation, e.g. abort
some deployments or outright power off some nodes.
Partial-Bug: #2038438
Change-Id: Iacc62d33ffccfc11694167ee2a7bc6aad82c1f2f
Our validation does not expect a host-port pair, not having colons in
the hostname. We don't need to verify all possible cases: we will return
404 for a conductor that does not exist.
Change-Id: Iea65575f540a89a0de280fb730e430647c5733dc
Use the 'volume_name' field from the logical_disk in the
target_raid_config field of a node, instead of just 'name' (which is
incorrect as per the Ironic API expectation), to create the RAID volume
Change-Id: Ib8b2589d91be67a848411ab6be852bcb4de58bc7
Reorganize the existing docs to give space to more information.
Cover the most critical topics, except for installation.
Change-Id: If0f185e0303d6f8071306edbc64b9c5704f58d16
At least on some Dell machines, the Redfish SecureBoot resource is
unavailable during configuration, GET requests return HTTP 503.
Sushy does retry these, but not for long enough (the error message
suggests at least 30 seconds, which would be too much to just integrate
in Sushy). This change treats internal errors the same way as
mismatching "enabled" value, i.e. just waits.
Change-Id: I676f48de6b6195a69ea76b4e8b45a034220db2fa
This commit:
- fixes a few nits that were pointed out after the feature
was merged
- doesn't affect the functionality of the feature
Closes-Bug: #2021947
Change-Id: I1dd024b9994df2b367f61cea75eb71fabe57abfd
tl;dr, devstack no longer supports focal, and now errors which
results in the job failing.
Also changes the snmp job to utilize the test_ramdisk_iso
feature, as *opposed* to a full deployment because iPXE
shipped with Ubuntu no longer likes to chain boot in UEFI
mode to a block device. The easiest path, is just to run
a ramdisk in that case, which also sort of mirrors what
users of the SNMP power interface *tend* to do.
Related-Bug: #2034588
Change-Id: I276885b8f0492ead8cea38fe13826123131984ea
Object create/delete operations translate clearly from swiftclient to
the SDK. Switching the temp URL handling is a little more disruptive but
the result is slightly more centralized and enables key rotation.
Change-Id: I8df2f032224bd5e540139a798a7ab76a1aeebb06
Closes-Bug: #2042493
We fixed a bug in wsgi_service around cleaningo up unix sockets; we
should document the fix in a release note.
Change-Id: I6ecb489ea1a9e6490c5ddca5c7467b0c4324dfd1
Currently, it logs a lot of entries without context, which won't be
readable when several nodes are deployed at the same time, nor when
someone greps for a node UUID.
Make fewer log entries and add node UUID for context.
While here, modernize the code a bit.
Change-Id: I3a840e47a09e77a9f8d35a7cf400c4bdd4111f91
Per the consensus during the 2024.1 PTG, configuration molds
are being deprecated in favor of a to be developed in the future
step templating mechanism.
Change-Id: Ieab94972e89ca9cded7fae225191bd63d9311581
First, the *_by_arch options are not a replacement for plain options:
the cpu_arch property is neither required not standardized. This is why
older options with *_by_arch equivalents are not deprecated.
Second, the example in the documentation is wrong: oslo.config does not
use Python dictionaries. Which makes me suspect that the feature has
never been properly tested (indeed, it's not used in the devstack CI,
and Bifrost uses the older options).
Change-Id: If1e633930909ce9d80e14f3ec3daa0bf8d48b7f0
When the per-node external_http_url feature was introduced by
c197a2d8b2, it only applied to a config
floppy. This fix ensures that it is also used for the boot ISO, both
when it is generated locally (by _prepare_iso_image()) or just cached
locally (by prepare_remote_image()).
Change-Id: Ic241da6845b4d97fd29888e28cc1d9ee34e182c1
Closes-Bug: #2044314
This patch allows to attach or detach a generic image as
virtual media device after a node has been provisioned.
Closes-Bug: #2033288
Change-Id: I97b68047d769f6fb686c53e89084b5874e02b8c7
An outcome of the Ironic 2024.1 PTG was that we would go ahead and
deprecate the ibmc, xclarity, and the wsman interfaces of the idrac
hardware type.
The forward path is Redfish, as evidenced by the idrac hardware
type having both wsman and redfish based interfaces available
for users to choose from.
These changes are being made by the Ironic team due to a lack of
recent upstream contact with any of the related driver maintainers.
Change-Id: Ia4aa99f4987570426bb155af8f437c9ba6013837
Sending signal ``SIGUSR2`` to a conductor process will now trigger a
drain shutdown. This is similar to a ``SIGTERM`` graceful shutdown but
the timeout is determined by ``[DEFAULT]drain_shutdown_timeout`` which
defaults to ``1800`` seconds. This is enough time for running tasks on
existing reserved nodes to either complete or reach their own failure
timeout.
During the drain period the conductor needs to be removed from the hash
ring to prevent new tasks from starting. Other conductors also need to
not fail reserved nodes on the draining conductor which would appear to
be orphaned. This is achieved by running the conductor keepalive
heartbeat for this period, but setting the ``online`` state to
``False``.
When this feature was proposed, SIGINT was suggested as the signal to
use to trigger a drain shutdown. However this is already used by
oslo_service fast exit[1] so using this for drain would be a change in
existing behaviour.
[1] https://opendev.org/openstack/oslo.service/src/branch/master/oslo_service/service.py#L340
Change-Id: I777898f5a14844c9ac9967168f33d55c4f97dfb9
Some of the object unit tests grub Mock object unintentionally, and
that results in failure during initializing an versioned object,
because the Mock object does not present its version correctly.
This fixes that problem.
The sqlalchemy-2x job is made non-voting because this job requires
oslo.utils 6.3.0 which is blocked by this problem.
Closes-Bug: #2043116
Related-Bug: #2042886
Change-Id: I1a622ab9c766d46b7eb4442848e91f25b26f6c61
without the ipxe config adopted nodes that would have needed
the fallback ipxe config to boot from disk will fail (as they
continuously attempt to network boot) and instead boot into
the discovery image.
Story: #2009259
Task: #43471
Change-Id: I42e555a1a01eb4124e3152669578f3403db83801
It's important for consistent behavior to monkey_patch eventlet
before importing anything. While it makes an attempt to green any
existing created objects or locks, that code is buggy and fails
in some cases -- especially around rlocks.
It's not my belief that this resolves any specific bugs, but this
does reflect a better overall practice.
Change-Id: I57b2c91f9853287a08ee79ac87ae6e1767ddfb6f
We don't use pysnmp anymore, and we likely won't update this doc
in the future if libraries change again.
Change-Id: If3a3bf02167f187a0e4f8f0d20a77621b5def3eb
DEFAULT_IMAGE_NAME no longer exists in devstack; also our config
now always leads to multiple networks being created, so remove
the fork in the instructions.
Change-Id: I1b3e134c4d5a9633028af367e89ecc44699561cb
Adds the 'local-link-connection' and 'parse-lldp' inspection hooks in
the agent inspect interface for processing data received from the
ramdisk at the /v1/continue_inspection endpoint.
Change-Id: I540f03b961b858e8fc00cd4abbc905faa8f0c6c5
Story: #2010275
In-process token cache is deprecated since 4.2.0 release
and may be removed. Add the setting of memcache for
the auth_token token cache.
Change-Id: I23ad1d9fb1b33160452ab353972fa1274cde363d
It's possible to use virtual media based provisioning on
servers that only support DVD MediaTypes and do not support CD
MediaTypes. The problem in this scenario is that Ironic will keep
the media attached since it will only eject the ones matching the
CD device, now we check if there is any DVD device with media inserted
when looking for CD devices.
Closes-Bug: 2039042
Change-Id: I7a5e871133300fea8a77ad5bfd9a0b045c24c201
The new call prepare_remote_image contains the logic around
image_download_source, fetching images using a cache and publishing them
for further consumption. The code was extracted from _prepare_boot_iso,
which is now more straightforward.
Change-Id: I8567a10b77cdc3785686b79defcdafd75af53df0
So, I got myself nice and confused with testing parent_node logic
when I used a name, but the ironic internals are modeled around
queries involving UUID matching.
We now identify names, and reset the values to be a UUID.
Change-Id: I46ece586c254c58b80723bc905cad3144691fc5d
This change takes the first step in decomposing image_utils into more
manageable and reusable parts. The publishing bits there are coupled
with nodes and drivers. The new code is node-agnostic and features
a clean separation between Swift and local webserver code.
Change-Id: If95b46272abaeea314fd61bb50d2c40200386f98
Long story short, we auto-clamp down everything to 1400 bytes
due to VXLAN tunneling for multinode testing. But there are other
reasons to clamp it smaller, and we will need to clamp that further
for multinode should we mix it with OVN.
Anyway, this should make things cleaner and we should rely upon the
gate calcualted MTU as a starting place, not the guess based upon
interface list. i.e. test VM could be wrong but gate could know better.
Change-Id: I385679fe30d1447f1ed94cdf5a419e6acefbc595
Grenade is currently failing not finding the neutron command, we should
likely not be using it anyway since the deprecation message says it may
disappear after Z.
Change-Id: Ic24d59379bafcc5a630fe5c074fcc13303902965
This adds an `online` argument to the conductor touch methods so that
touch can be called with `online=False`. When called periodically this
allows the conductor `updated_at` to be within the threshold to avoid
locked nodes being failed as orphans by another conductor.
This will be used by drain shutdown (and graceful shutdown) so that
tasks can complete on existing locked nodes within the shutdown timeout,
while the conductor is also removed from the hash ring so new tasks are
not started on that conductor.
This change introduces the api but the existing behaviour won't change
until BaseConductorManager.del_host() no longer calls keepalive_halt().
Change-Id: Iedd62193fac1009137b9ee47a6ef5a9a8576f261
Especially in a single-conductor environment, the number of threads
should be larger than max_concurrent_deploy, otherwise the latter cannot
be reached in practice or will cause issues with heartbeats.
On the other hand, this change fixes an issue with how we use futurist.
Due to a misunderstanding, we ended up setting the workers pool size to
100 and then also allowing 100 more requests to be queued.
To be it shortly, this change moves from 100 threads + 100 queued to
300 threads and no queue.
Partial-Bug: #2038438
Change-Id: I1aeeda89a8925fbbc2dae752742f0be4bc23bee0
Adds basic testing for PXE/iPXE boot secenarios where the OVN
DHCP service is used instead of dnsmasq.
Also adds a release note and documentation to cover the details
and caveats of using ovn as we have discovered through this process.
Change-Id: I28cd20a7f271220d8ca335895ca9e302452fd069
They are huge, may expose sensitive data and are normally stored in
local files instead. Match the inspector behavior and drop logs.
Change-Id: I569ef8c7f9d78a7a65c48b6b46c12493c5c571c3
The kickstart unit tests were written in such a way that if
the tests are run on a system with kickstart validator present,
then the test behavior is different (and fails) than if it runs
without. Specifically, when it is present, an error is generated:
TypeError: write() argument must be str, not MagicMock
This is because we pass in a mock value for unit testing.
Removes the alternative path of if the validator is present
for unit testing, and locks the test into the false which
simplifies the validation path for the kickstart interface.
Change-Id: Idfb6b4f3b49901aa1a222c6fedc4367ef3bfd2a2
Adds the 'raid-device' and 'root-device' inspection hooks in the agent
inspect interface for processing data received from the ramdisk at
the /v1/continue_inspection endpoint.
Story: #2010275
Change-Id: I075ccd93a312b8bb17a36527e6c5d56386bb5c23
Adds the 'memory', 'pci-devices', and 'physical-network' inspection
hooks in the agent inspect interface for processing data received from
the ramdisk at the /v1/continue_inspection endpoint.
Change-Id: I67631ec5b94d1b29afcdc9a971b1052cf35bda1f
Story: #2010275
Adds these inspection hooks in the agent inspect interface for
processing data received from the ramdisk at the
/v1/continue_inspection endpoint: 'accelerators', 'boot-mode',
'cpu-capabilities', and 'extra-hardware'.
Change-Id: I63a528eba15391292c841693d6a0cc2f3b683720
Story: #2010275
Add file to the reno documentation build to show release notes for
stable/2023.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.
Sem-Ver: feature
Change-Id: Ia3d5f975e26f33b9f43610dd46246b5da03bc10e
If redfish_address is in brackets, unwrap it and check
that it is a valid IPv6 address. If that is the case use
the unwrapped address to avoid "Name or service not known".
Also add a unit test for normal_ipv6_as_url.
Closes-Bug: #2036455
Change-Id: I8df20e85e40d8321bd5f88c09fae33b6015bcf51
When parsing redfish driver info wrap IPv6 address in brackets
before appending default scheme/authority.
Updated common.utils.wrap_ipv6() to ignore ValueError, e.g
simply return the string if ip is not an ipv6 address string.
Related: RHBZ#2239356
Closes-Bug: #2036454
Change-Id: Icefd96d6873474b4cfb7fbf3d8337cd42fd63ca6
Generally, print should not be used for unit tests, it may pollute
the output stream. Right now, our internal build system is facing
BlockingIOError: [Errno 11] write could not complete without blocking
on prints. Especially ACL tests seem to be a big offender because they
are vary numerous and each may print several times. Many prints in other
tests are cryptic and probably just leftover from debugging.
I only leave the API unit tests where the output is arguably useful.
But I reduce it to one print per call since the input is already known.
Change-Id: Ic5aaf9624f86b39609e2db6157c98cf8e35712fc
We have discovered hardware that only applies boot mode / secure boot
changes during a reboot. Furthermore, the same hardware cannot update
both at the same time. To err on the safe side, reboot and wait for
the value to change if it's not changed immediately.
Co-Authored-By: Jacob Anders <janders@redhat.com>
Change-Id: I318940a76be531f453f0f5cf31a59cba16febf57
Ubuntu focal was in testing runtime as best effort
testing in 2023.1 cycle. In 2023.2, we do not need to
test the focal as such. Removing its testing to more
focus on making Jammy testing more stable.
[0] https://review.opendev.org/c/openstack/tempest/+/884952
Change-Id: Ia3a9bfb6287fd283c3eeb49b43d2c0d12420596d
Two issues have occcured:
1) Zuul has decided some syntax is deprecated and generates an error.
The exlcusionary nature of the syntax is just not supported by RE2
which is the new requirement, so explicitly matching "^master$"
as opposed to "not stable branches".
2) Marking the snmp job as non-voting, the root issue appears to be ipxe
or the VMs, unknown as of yet.
Change-Id: I68aa95eb1ed80a0fde1c29d708ebd606393481aa
For neutron to write the correct dnsmasq configuration
for IPv6 network boot, the `dnsmasq_enable_addr6_list`
option for neutron dhcp agent must be enabled.
Change-Id: I8ef978ad689f0d1c822bb724f1af58f6fd4c2d8b
Dbapi method _reserve_node_place_lock is a bit of a special
method. It has both a decorator to retry sqlite "database is locked"
issues, and an outer synchronized process fair lock
(from oslo.concurrency.lockutils), which ensures only *one* thread
is working on locks at a time.
Thing is, we can build contention when a stack of heartbeats
come in, because they are forced to execute in serialized fashion.
And whil investigating some metal3 logs, we could see some lock
interactions are basically instant, and when things begin to
get backed up, we start seeing 10+ second gaps where we are
trying to get ahold of the database, and can't lock the node.
And looking at the code for the method, I realized we were *always*
re-querying the node, but never returning it after updating the node.
Apparently, so we can just log *if* there was an issue.
Instead, just consult the result set and then re-query if we must
to determine *who* holds the lock, we now only do so *if* we are
operating without SQLite, because if we are then we can safely
assume the lock came from another thread.
Change-Id: Ie606439670be21cf267eb541ce864711d2097207
Adds service steps on a variety of internal interfaces,
and begins to tie documentation together to provide clarity
on the use and purpose of service steps.
Change-Id: Ifd7241f06648c8d73c1b97fcf08673496f049f45
So it seems we can have weirdness here which
can get lost in the mix because heartbeat operations
get called with spawn_after on the task, and without
an error handler. The net effect seems to be we can
eat errors, which maybe we should log.
Logs a warning now and returns to the caller which will
exit the task.
Change-Id: I8d052e5d26396737bc7d807cbafdc317cfd2f21f
Julia, who has been looking at Metal3 container logs a lot, is tired of
not being able to easily figure out when on the timeline of the git repo
that the container's source code was running from. So instead of guessing
and having to figure out based upon behavior/logs, just log the version
the software believes itself to be using the existing version info,
similar to how we do it with IPA.
Change-Id: I3c76ddfb89b92d3d4bc29b7ccab4362604775568
Ages ago we supported pxelinux. Now, not really since it is long EOL.
And while troubleshooting bug # 2033430, we discovered we had option
210 in the DHCP payload from the server, which ended up being the
folder base path for a tftp client to self reference the structure,
but only with OVN.
Further troubleshooting with the neutron-dhcp-agent and dnsmasq
revealed we never actaully really sent that option to clients.
In other words, meaning it was always redundant. Since excess
information could be part of the problem with grub, we're removing
it.
Change-Id: Iaa2f174b6082fadcab6635ca874fc5fae2fb4842
Eventlet expects to have green versions of every module loaded, including OS. The reasons we originally did not patch os are lost
to time, but there have been many releases of eventlet since, and we should return to a reasonable baseline.
Change-Id: Ia4113124b415bd647e3c984e587828eb5b612eee
Current Redfish driver uses SimpleStorage resource
to collect disk information.
Storage resource is defined to provide more sophisticated
functionality over SimpsleStorage resource. Some Redfish
implementations may take transition from SimpleStorage to
Storage.
So Redfish driver's logic around disk should be provided
through Storage resource with SimpleStorage compatibility.
This commit does 2 things
* Use Storage resource instead of SimpleStorage, if possible
* Fix wrong disk indicator LED logic (SimpleStorage doesn't
support indicator LED operation)
Related-bug: 2032590
Change-Id: I28abd75a41059c23028717db6a9766a5164313c7
Debian bookworm is where our unit tests run for py3.11. This change
permits them to configure and run.
Change-Id: Ic51ca8df82552b9a8b6765cc0237f84e468e2fd8
Adds storage of the json-rpc port number to the conductor hostname
to enable rpc clients to understand which rpc servies they need to
connect to.
Depends-On: https://review.opendev.org/c/openstack/ironic-lib/+/879211
Change-Id: I6021152c83ab5025a9a9e6d8d24c64278c4c1053
Remove the $ in the condition so that we don't attept to
execute the output from ping (i.e PING - unknown command)
Change-Id: Ic90f7c93d9a7b86fbf3f2cdef46bc1b2bbea489d
While the prior sevice steps patch had a huge portion of the
needed code already due to copy-pasta, this change finishes
wiring in the ability for the agent to be launched for service
steps and heartbeat to occur, combined with support to retrieve
service steps from the running agent, ultimately to enable
operators to take a deployed node, and ask Ironic to make changes,
or my more favorite use case, go benchmark it for a while.
Also edits the service steps release note to remove the outstanding
issue, and makes some minor corrections in the code which was copied
but didn't quite have testing wired up yet.
Change-Id: Ibfe42037b520a76539234cf1a5e19afd335ce8a8
Introduce config to allow setting default ramdisks per-architecture.
The hierarchy of the parameters is:
Node config > config by architecture > general config
Change-Id: I95dfece3e8f7bcd3121ac808985cb61997877a51
Update version requirement of python-scciclient
to greater than or equal to 0.15.0.
Since this version 0.15.0, python-scciclient uses
pysnmp lextudio.
Change-Id: I736707027002578ab87577c94b5e6c45ba1c5f72
Adds inspection hooks in the agent inspect interface for processing
data received from the ramdisk at the /v1/continue_inspection
endpoint. The four default configuration hooks 'ramdisk-error',
'validate-interfaces', 'ports' and 'architecture' are added.
(The remaining inspection hooks will be added in further patches.)
Change-Id: I2cf1be465ba7a93fd66881b14972e960acd4dd4e
Story: #2010275
This is a significant improvement and update to Ironic contributor
documentation, as an attempt to make it easier for new Ironic
contributors to onboard.
It is not perfect, but it's significantly better than the existing
documentation.
What this change does:
- Improve dev-quickstart guide, make it easier to find
devstack configurations.
- Removes information that can bit-rot over time and replaces with
more generic information.
- Provides an actual working, tested, Ironic+Nova devstack conf
What hasn't been done:
- Testing of Ironic BFV or Multitenant networking devstack confs
- Validation that the local development method still works
- There is a ton more information about how to use these testing
envs (both bifrost and devstack) which could be added.
- System prerequsities and Python prerequisites under the unit
tests section has bitrotted considerably; they have not been
significantly modified and will be fixed in a future commit.
Change-Id: I0cdfe50042fabb6b65633961fc418aff5d6ebfe3
In the agent token mechanism, restrictions exist when a an agent
token can be generated, and unfortunately this has to be done on
the conductor side involving a lock and a task because we need to
save the state of the node.
As such, we were in a situation where we were waiting on DB node
locking, which would prevent the agent from getting a node, and
potentially causing the lookup operation to fail, eventually.
We now quickly return NodeLocked which shouldn't cause the agent
any issues, although we need to improve error handling there as
well.
Change-Id: Ice335eed82b936753be99eedb16ceccf8a9a86a8
Load the database instance only once for all RPC
interactions, as opposed to upon each time we try
to find a random conductor.
Change-Id: I917537f041e794fd3c94e0f2c48d25eb4351d4c8
... And only execute the loader once.
We now use a cached dbapi instance, and then set the object
on the request to None, disconnecting the request from the
request from the database, which doesn't need to be held
open any longer than absolutely necessary.
Interestingly... locally, my total unit test time dropped with this
change from an aggregate around 200 seconds to 130-140 seconds. It
seems there was quite a bit of overhead in that call to begin with.
Change-Id: Ibffbc0b14d3b908f772b2cac8c6fedae432b267a
... To not try instantly, but also not to wait forever to retry.
Also, the maximum delay is also now the proper setting to cause
the attempt to exit, and is only set to 10 seconds, with a fairly
tight interval for retries to occur within.
This change also doesn't abort retries for releasing a node lock
and updating a node, both actions if they halt due to the close
out of a task, can be catastrophic to the underlying operation
and state, because internal actions around locking can't be retried
with a long interval, otherwise things break in very bad ways.
Change-Id: I2041e90bb0f7f522bde4338eceda97f0ae8b2c35
Currently, it is not possible to use virtual media based provisioning on
servers that only support DVD MediaTypes and do not support CD
MediaTypes. This change adds support for DVD-only virtual media. In
addition to this, it adds hanling of BadRequestException on attempt to
insert virtual media, so that instead of giving up on the first failure
encountered, the code continues to iterate through the list of other
virtual media devices available on the system and attempts to insert
media into the next suitable device till it either suceeds or runs out of
devices to try.
Change-Id: I0bb0ad5df613df86cd8a7686f9c32e902826cd20
Closes-Bug: #2031595
A huge list of initial work for service steps
* Adds service_step verb
* Adds service_step db/object/API field on the node object for the
status.
* Increments the API version to 1.87 for both changes.
* Increments the RPC API version to 1.57.
* Adds initial testing to facilitate ensurance that supplied steps
are passed through and executed upon.
Does not:
* Have tests for starting the agent ramdisk, although this is
relatively boiler plate.
* Have a collection of pre-decorated steps available for immediate
consumption.
Change-Id: I5b9dd928f24dff7877a4ab8dc7b743058cace994
This ensures the options for oslo.versionedobjects library are
included in the file generated by oslo-config-generator.
Change-Id: Ib63c4dd1c14905ec200e67a8fe9ba5f20b160b08
On some hardware, supportedApplyTime attribute may not be listed
under Redfish BIOS settings URL. This patch adds handling of this case
to prevent failure on attempt of updating BIOS settings.
Change-Id: I40359973fd832146cb2b179bfa447a308078e83d
Adds support for SHA256 and SHA512 checksums to be passed
to firmware upgrade steps for the ilo hardware type.
Change-Id: I5455c4bfa4741a35b0ddada37298c897887e6cea
We've discovered we can deadlock on allocations, and reviewing
the code of both the test and the underlying db, it is sort of a
"multiple things contribute scenario", but first up here is to
streamline the allocations update process so we re-query after
closing out the transaction.
Change-Id: I46e78813787703819a61f69d4243271ec07e0983
Partial-Bug: #2028866
Somehow, missed adding a test for this early on, while working on
validating the API client changes, I discovered that we seemed to
be missing unit tests for the method, because I couldn't get it
to work for the client, but then I realized I was just mixing
the nature of use up. C'est la vie!
Change-Id: I382ad3a011f3151abedb896f58c75e8b76b357fa
Adds a wait step to allow for finer grained workflows
and forcing interruptions which may be needed in some
cases with specialized hardware.
Change-Id: Idc338b761ebe35a4635022a324ca5acbf29fc462
Ramdisk and anaconda deploys share image processing by
ironic.common.pxe_utils.get_instance_image_info(). This method
unnecessarily processes ks_template property when ramdisk
deploy is in use, and by itself calls _get_image_properties()
which requires image_source property to exist.
For ramdisk deploy it is enough to have kernel and ramdisk.
This patch adds conditions to process ks_template property only
for anaconda deploy.
Change-Id: I5f88d3b1da1c17bc26d49370cc6ce74644d13679
logging has a method debug, and a static variable DEBUG, which are entirely
different. Tenacity requires the integer value which is passed in for logging
actions.
This is rooted in the examples published on tenacity logging, since the lower
level caller doesn't log in many cases, and our testing didn't catch this precise
case because we were only validating that logging was called, not that logging
worked. We also didn't see any of the errors related to CI resource
contention being lessened when the patch was running, and many of the chances
of lock conflicts being reduced in other fixes.
Duplicates the retry test *without* the logging mock, as not mocking the logging
would have yielded the break.
Change-Id: I4a65a044e90aff3cffae24f191e425bc75b5fb74
Adds a database retry decorator to capture and retry exceptions
rooted in SQLite locking. These locking errors are rooted in
the fact that essentially, we can only have one distinct writer
at a time. This writer becomes transaction oriented as well.
Unfortunately with our green threads and API surface, we run into
cases where we have background operations (mainly, periodic tasks...)
and API surface transacations which need to operate against the DB
as well. Because we can't say one task or another (realistically
speaking) can have exclusive control and access, then we run into
database locking errors.
So when we encounter a lock error, we retry.
Adds two additional configuration parameters to the database
configuration section, to allow this capability to be further
tuned, as file IO performance is *surely* a contributing factor
to our locking issues as we mostly see them with a loaded CI
system where other issues begin to crop up.
The new parameters are as follows:
* sqlite_retries, a boolean value allowing the retry logic
to be disabled. This can largely be ignored, but is available
as it was logical to include.
* sqlite_max_wait_for_retry, a integer value, default 30 seconds
as to how long to wait for retrying SQLite database operations
which are failing due to a "database is locked" error.
The retry logic uses the tenacity library, and performs an
expoential backoff. Setting the amount of time to a very large
number is not advisable, as such the default of 30 seconds was
deemed reasonable.
Change-Id: Ifeb92e9f23a94f2d96bb495fe63a71df9865fef3
When debugigng the logs, it is sort of difficult to know when
actions complete for a specifc periodic task.
Example: Power sync, which includes an explicit sleep(0) to
yield control. The explicit sleep *is* important as it is
a low priority task. Where this quickly mixes things up
is when we're hunting database locking issues, and we
*must* verify that we *have exited* part of the code
completely.
Change-Id: Ie9eddedf6bca603845ff14e1ccbda665e3b9e5bd
Disables internal heartbeat mechanism when ironic has been
configured to utilize a SQLite database backend.
This is done to lessen the possibility of a
"database is locked" error, which can occur when two
distinct threads attempt to write to the database
at the same time with open writers.
The process keepalive heartbeat process was identified as
a major source of these write operations as it was writing
every ten seconds by default, which would also collide with
periodic tasks.
Change-Id: I7b6d7a78ba2910f22673ad8e72e255f321d3fdff
The versions only differ in the first paragraph, and the supposedly
common parts actually have different code paths for different distros.
Also be realistic about which distros we support.
Change-Id: Ifcc19a20d42f384300cadf442951739be8682047
Adds the logic and testing to handle vendor interfaces to be able
to be called as steps, as well as adds the ipmitool send_raw
vendor passthru method to be able to be called as a step.
Change-Id: I741a4173f1d150298008d3190e4c3998402a8b86
Currently, there's no information printed about the style of failure
when we cannot properly change the power status on an iLO server. Now,
we ensure the actual server state and the expected server state at the
BMC level is logged, helping with troubleshooting edge cases.
Related-bug: 2021995
Change-Id: I77dc69ef4dd42e5ad674f5c00a4500027ef030ec
Only port creation/updating/deletion logic has been replicated from
ironic-inspector, as well as the add_ports and keep_ports options.
In the future patches, the added code will become a part of processing
hooks.
Change-Id: I69d6a1a53c5bf9e0f41d1a5bce7215edeea54b22
No real inspection is done: it only accepts data and returns success.
Common code has been extracted from the existing inspector-based
implementation.
Change-Id: I7462bb2e0449fb1098fe59e394b5c583fea89bac
An issue previously existed where periodics would cause an open
transaction to exist with the database which would cause issues
when attempting to write to the database.
This issue has been fixed by assembling the data to return to
the calling method, such that an open transaction does not
remain, by copying the data retrieved from the database,
thus disjointing it from the transaction.
Closes-Bug: #2027405
Change-Id: I6401193b04fd3be78c37433bfdd0ccbd92aac8da
FirmwareInterface base
New Config options [default]
- enabled_firmware_interfaces
- default_firmware_interface
New FirmwareInterface base with update method
Implementations of FirmwareInterface
- FakeFirmware (fake)
- NoFirmware (no-firmware)
New entrypoint ironic.hardware.interfaces.firmware
* fake and no-firmware
Api Controllers
- Updated: driver/node/utils/versions
- Created: firmware
Unit tests
api-ref for Node Firmware
Fake and Noop implementation for FirmwareInterface
Change-Id: Ib3b9cb22099819f97d5eab1e3f1b670cb91cbb25
Investigation of our standalone test job issues, where jobs would
fail, hosts not get DHCP updates, and ultimately IPXE would
fail prior to getting a valid or the expected response,
revealed the discovery that dnsmasq was crashing often when
the port updates were going through, ultimately preventing
the mutli-scenario test jobs from running as the standalone
jobs represent a number of different scenarios which are
executed across a pool of test machines.
In this case, the path forward appears to be to downgrade
dnsmasq to stablize our CI and allow us to otherwise upgrade.
This patch adds the focal updates as a package source,
and installs the dnsmasq package.
Related-Bug: #2026757
Change-Id: Iacfd1ab677c612525601afcaeee5e5b067206ff3
Cleanup if images.fetch fails as in some cases, we might get a stale
.part file that is incomplete and corrupted (ie: full disk due to image
conversion) and this prevents future deployment from working.
Co-Authored-By: Julia Kreger <juliaashleykreger@gmail.com>
Change-Id: I53bfd0dcc6289e51316795fbe352c70d608e4f31
So, I've long wondered if we still have some spanning tree behavior
going on in CI. Turns out we might, but we just rely upon the defaults
which creates a variable.
Anyway, regardless, I found some details in the ovs-vsctl manual[0], and
well, lets set the options!
[0]: http://www.openvswitch.org/support/dist-docs/ovs-vsctl.8.html
Change-Id: I8f229fa6e738a69a668d4b891723431b2da362fa
Major:
- Result's first() does not raise NoResultFound, it returns None.
- Avoid opening a read transaction while a write transaction is active.
Minor:
- Do not handle NoResultFound where it cannot happen (or is already
handled).
- Only load the fields we use when fetching intermediate results.
- Do not load the node the 2nd time when releasing: the detection of
the exact error is racy anyway.
Change-Id: I099c7b782e0a633908383958e7aed2b17c8c3864
* Updates API version to 1.85 to permit an ``unhold`` verb
* Adds the ``deploy hold`` and ``clean hold`` provision states
to the internal state machine.
* Adds on documentation on steps to help provide greater clarity
to Ironic's users on how to utilize steps. It should be noted
this documentation also includes the power state reserved step
names from the DPU functionality patch.
* Fixes the state machine diagram. Changes type to PNG as SVG
rendering is broken due to python libraries utilized for SVG
generation which do not work on more recent Python versions.
Change-Id: I34f58f4e77e7757b89247fd64f5fcde26f679453
Disable irmc virtual media test_prepare_instance_with_secure_boot
and test_clean_up_instance_with_secure_boot
Related-Bug: #2025424
Explicitly close out test connection
When creating a test database, we should follow the same
pattern we know to be good, where an orphaned handler is not
left in memory to close out a connection.
Change connection style in dbTestBase to mirror how we do
database connections so they close out when we are done
with them.
Change migrations timeout to be >60 seoncds
In local testing, I found the migrations tended to take an
average of 70 seconds. Granted, my test machine is old, and slow
but the performance is very similar to a busy cloud provider.
As such, increase the timeout to a larger value so we can enable
the double migration test again.
Also use BASE_TEST_TIMEOUT as time limit for unit tests, failing
hard if that's passed.
Co-Authored-By: Julia Kreger <juliaashleykreger@gmail.com>
Change-Id: I84802be2e75751fe44ba2e1b60e60563cd276483
All database migration testing in opestack is done through
an opportunistic worker model, where if the database is available
and correctly configured for testing, i.e. openstack-citest user
and access appropriately granted, then the tests will create and
test migrations.
However, this has been problematic with mysql as of recent, as we
have seen a long standing migration issue boil to the surface often
with tests.
As a result, we're isolating that test down to it's own job so we
can limit the blast damage. This also helps us isolate is it all
of the tests, or is it just soley isolated down to the mysql test
run class, which is an additional data point.
By default, we continue to run Postgres migration tests in the
main jobs, as they haven't been impacted by this issue.
Change-Id: Iefc044c31ef029e400a7dad294504175a4462638
We should try to run coverage tests in the same conditions as
unit tests using the same tox environment.
Change-Id: I07341a5e16264964f4d165b9b5ef80dc35293a17
* Correct the policy name for /continue_inspection ("node" vs "driver")
* Use the right version constant (83 was used before a rebase).
Also add missing unit tests (that would have found the issues).
Change-Id: Ib3eee48eb3ca4a52bfc6ec70491b01636516d6a9
We have both Invalid and BadRequest which result in HTTP 400 and 500
accordingly. The latter is clearly incorrect.
Then we have NotFound and HTTPNotFound which mean the same thing.
Alias exceptions in both cases with the intention to drop one copy.
Finally, NotAuthorized and Unauthorized result in HTTP 403 and 500
again. Fortunately, the latter is not used and can be removed.
Change-Id: If9d571792a8617dd6ecf17e163dea252cb0f7fae
The default test runner behavior is to distribute various classes
across many runners, which is fine. But when you have lots of setup
to ensure an environment is correct for tests in a class, you
generally want to execute that together instead of separately.
As such, set the --parallel-class execution option for stestr.
Change-Id: I5a65accfb7e2033690b2934d874141db7f4bf383
The PXE Annaconda dhcp cleanup test triggers the dhcp_factory clean
up code by default. Which is good! Problem is, if you don't have
dnsmasq installed, things blow up.
Specifically becuase it was called in such a way where it was
trying to clean up dhcp records for nodes. Example:
ironic.common.exception.InstanceDeployFailure: An error occurred
after deployment, while preparing to reboot the node
1be26c0b-03f2-4d2e-ae87-c02d7f33c123: [Errno 2] No such file
or directory:
'/etc/dnsmasq.d/hostsdir.d/ironic-52:54:00:cf:2d:31.conf'
Instead of executing that far, we just now check that we did, indeed
call for dhcp cleanup.
This was discovered while trying to fix unit test race conditions
and random failures in CI.
Change-Id: Id7b1e2e9ca97aeff786e9df06f35eca67dd36b58
Adds the following methods to DB API:
* create_firmware_component
* update_firmware_component
* get_firmware_component
* get_firmware_component_list
FirmwareComponent
* create | save | get
FirmwareComponentList
* get_by_node id | sync_firmware_components
Adds two exceptions:
* FirmwareComponentAlreadyExists
* FirmwareComponentNotFound
Tests for db and objects
Changes were required in models, the class name should match the
object name we will create
Story: 2010659
Task: 47977
Change-Id: Ie1e2a4150d4ee4521290737612780c02506f4a9e
In local testing, I found the migrations tended to take an
average of 70 seconds. Granted, my test machine is old, and slow
but the performance is very similar to a busy cloud provider.
As such, increase the timeout to a larger value so we can enable
the double migration test again.
Also use BASE_TEST_TIMEOUT as time limit for unit tests, failing
hard if that's passed.
Change-Id: Ida495d59bde977a20590c34e282c884a65ceab43
We have started to notice an SAWarning from sqlalchemy indicating:
SAWarning: Cannot correctly sort tables; there are unresolvable
cycles between tables "allocations, nodes", which is usually
caused by mutually dependent foreign key constraints.
Foreign key constraints involving these tables will not be
considered; this warning may raise an error in a future release.
Hunting this down, it appears to be the two data consistency Foreign
Key constraints in the "allocations" table where an allocation would
try to have a conductor_affinity value mapped to conductors.id
and also have a direct association to a node, which *also* had the
same constraint.
And then similarlly, mapping in reverse, asserting a fk constraint,
when nodes also had it's own constraint back on allocations.
Sort of a circular loop.
Anyhow, removes it, and adds a db migration to remove the two
constraints.
Change-Id: I5596008e4971a29c635c45b24cb85db2d0d13ed3
In all the concern about the WAL pragma change with unit test issues,
I've decided that maybe the path is to just turn it off for unit testing.
Change-Id: I67bbdb158ad1c8350f1e613ac0afb861ccea00a0
With counts declared outside the context manager, it means values
in counts that come from the sqlalchemy results object are still in
scope when the context manager exits! Oh no!
Co-Authored-By: Clif Houck <me@clifhouck.com>
Related-Bug: https://bugs.launchpad.net/ironic/+bug/2023316
Change-Id: Ifd66444f73355d912c52905a0f04748284b25c1b
Another of our continuing attempts to find the magic stop sign plaguing
our CI runs.
Related-bug: https://bugs.launchpad.net/ironic/+bug/2024941
Change-Id: I3626a12bad3299ace2991550cafd92f4f0142434
Two issues existed in our test migrations:
1) We took a sqlalchemy orm-ey object and returned it. The handler closed
but the connection stays open.
2) In the test we mocked data, saved data, checked data, but there is a
constraint that would prevent nodes from being deleted before the
firmware_information data.
Also fixes some additional assignment of object value
pattern of usage elsewhere in the test migrations as pointed
out in code review.
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: I91429f33ce01f40ecd665b2f76b27a162354bc6f
The MIGRATIONS_TIMEOUT value is the timeout for the migrations tests
in seconds.
Can now be set ad environment variable, default to 60 seconds.
Change-Id: I20045a7c0c08a37038b4707791f6b67f715c4877
Fails a test after OS_TEST_TIMEOUT seconds, default to 30 seconds.
Based on OS_TEST_TIMEOUT from oslotest
See https://github.com/openstack/oslotest/blob/master/oslotest/base.py#L35-L45
for more info
Also increase log output for improved troubleshooting.
Change-Id: Ibcdca2c449970b5a2ecdf2e3bb3bb900881b6d7c
This test has been taking an inordinate amount of time to complete. We
should figure out the root cause and fix it, but in the meantime it only
causes us harm to be unable to land patches.
Related-Bug: 2023316
Change-Id: I604369e000c80914cf0c584c9deab7245c66b1b4
This change fixes the worst offenders, potentially reducing the test
runs by more than 10 seconds.
I could not fix iLO tests though because they heavily rely on time.sleep
being precise.
Change-Id: I10d7845700275d9d03b98ebadd0f12540f1e7656
When a node is inspected more than one time and the database is
configured as a storage backend, a new entry is made in the database
for each inspection result (node inventory). This patch handles this
behaviour as follows:
By deleting previous inventory entries for the same node before adding
a new entry in the database.
By retrieving the most recent node inventory from the database when the
database is queried.
Change-Id: Ic3df86f395601742d2fea2bcde62f7547067d8e4
This change creates all necessary parts to processing inspection data:
* New API /v1/continue_inspection
Depending on the API version, either behaves like the inspector's API
or (new version) adds the lookup functionality on top.
The lookup process is migrated from ironic-inspector with minor changes.
It takes MAC addresses, BMC addresses and (optionally) a node UUID and
tries to find a single node in INSPECTWAIT state that satisfies all
of these. Any failure results in HTTP 404.
To make lookup faster, the resolved BMC addresses are cached in advance.
* New RPC continue_inspection
Essentially, checks the provision state again and delegates to the
inspect interface.
* New inspect interface call continue_inspection
The base version does nothing. Since we don't yet have in-band
inspection in Ironic proper, the only actual implementation is added
to the existing "inspector" interface that works by doing a call
to ironic-inspector.
Story: #2010275
Task: #46208
Change-Id: Ia3f5bb9d1845d6b8fab30232a72b5a360a5a56d2
Leave the snmp job on focal for the time being as it's failing on jammy
and we need to move forward with the migration.
Change-Id: I0b9b600c3eb10761054abdb9c13d7107269001b9
Add to the information collected by Redfish hardware inspection from
sushy, and store it in the documented hardware inventory format
Change-Id: I651599b84e6b8901647960b719626489b000b65f
Now that we have autocommit disabled, we need to make the metal3 job
voting to ensure we don't accidentally break sqlite support in the
future.
Change-Id: I4915dc23b1101b9b799f392434f237e5ccb323e4
Allows steps to be executed on child nodes, and adds
the reserved power_on, power_off, and reboot step names.
Change-Id: I4673214d2ed066aa8b95a35513b144668ade3e2b
It appears nova's policies have changed, or to be more precise,
they have turned on new policy enforcement[0] and our plugin
was wrong.
+++ /opt/stack/ironic/devstack/lib/ironic:\n
ironic_configure_tempest:3205 :\n
oscwrap --os-cloud devstack-system-admin flavor show baremetal -f value -c id
ForbiddenException: 403: Client Error for url:
https://173.231.254.232/compute/v2.1/flavors/
e4312534-b349-4f70-9a1b-5806debff275/os-extra_specs,
Policy doesn't allow os_compute_api:os-flavor-extra-specs:index to be performed.
[0]: dfd7aeaf6c
Change-Id: I8070852fbe9346e346c50088537797f353753d02
It appear the push to Cirros 0.6.1 has re-occured, and we now
have things failing as a result.
Specifically ironic-grenade is trying to run with Cirros 0.5.2,
yet the file is not found later on.
Anyhow, an explicit pin should resolve this.
Change-Id: I97a1403820c8dbe633cf1d529adc79e8af463e80
The get_not_version method was put in, but never used, the
underlying upgrade check code just explicitly passes the
appropriate query arguments and doesn't use the helper.
Why now? Because it used model_query.
Change-Id: I697f13486c2e11cd04b4f381f3066eeb154ea2a3
Removing model_query usage as it returns an object which
does not get cleaned up and release any held transactions
until it is garbage collected.... which is likely okay for
most, but not okay with newer python and our support of
sqlite.
Change-Id: I792f8d3069959e6801a40d18b3cba258c46fc5ef
The db field value version check, which is a preflight to
major upgrades (to detect if a prior upgrade was not completed)
was using model_query, which could orphan an open transaction in
the same process until the python interpretter went and took out
the perverable trash.
We now use an explicit session which structurally ensures we close
any open transactions which allows a metadata lock to be obtained
to perform a schema update..
Change-Id: Id51419bc50af5a756bb7b0ca451df1936dd6f904
Adds the parent node support and tests in one change
including all DB/Model/API changes along with RBAC and
basic API tests.
* Updates the API version to 1.83
* Adds parent_node and related index to the nodes table.
* Adds new API parameters to list by parent node relationship.
Depends-On: https://review.opendev.org/c/openstack/ironic/+/883967
Change-Id: I8d64fee7105718199986db4994e13352d639f04f
This patch deals with overlooked situation in patch
d23f72ee50, in which
`irmc_ipmi_succeed` flag is added to deal with iRMC
firmware's IPMI incompatibility introduced at iRMC
firmware version S6 2.00 and later.
That flag is set and updated by `irmc` power_interface
code and rest of iRMC driver code use that flag.
When `ipmitool` is set as power_interface, that flag
is not set nor updated and rest of iRMC driver code
fail to handle IPMI incompatibility correctly.
This patch adds logic to check power_interface to
make iRMC driver properly deal with iRMC firmware's IPMI
incompatibility even when `ipmitool` power_interface
is used.
Change-Id: Id353c4f5260a7c469779b50ad302f442223df5a0
Previously, we did query object handlers with model_query
where we woudl then return the handler, however that was problematic
because we returned ORM objects.
Now we have selects, but the data has to be extrated to a set before
we close out the reader session with the database. Returning the
overall object means that the call seems to sometimes not completely
execute on newer python versions in unit tests, which create unit
test failures.
We now hold off the database query and operation to a variable,
exit the reader, and return the variable.
Change-Id: I6ae9e4442bcb473ab5ccff6e167bf61f3daa0f7e
We'v been able to observe one of the scenario test jobs failing
due to tinycorelinux being inaccessible. Possibly on an IPv6 only
test VM. Turns out tinycorelinux's main page is only accessible via
IPv4.
As such, I've changed the mirror to a mirror which is acessible via
IPv6 and which I've verified works for me.
Change-Id: I2b4ccd16189038ce2f054d7403775b012796aea3
Disabling the performance counters as we suspect it is causing
database interaction to freeze on the grenade CI job.
Change-Id: Id951815ab9bfd1ca16aa66fa4c87c0e1b3e788f6
One of our tests is super-friendly to race because it relies on changing
an object, but that object not being saved yet to verify the difference.
However, that object and even similar changes can occur in the database
and trigger the test to fail.
So instead using the shared resource created on test startup, create an
isolated, dedicated resource for it to use inside of the test, which should
eliminate the entire issue. I hope.
Change-Id: Ia0df1c211d5510777c6055f1341335f9bc5c5006
In the recent change to cinder, to address CVE-2023-2088,
cinder changed the policy rules and behavior for unbinding,
or "detaching" a volume. This was because of a vulnerability
in compute nodes where a volume which was in use by a VM
could be detached outside of Nova, and nova wouldn't become
aware the volume was detached, and the volume could be accessible
to the next VM.
This vulnerability doesn't apply to bare metal operations as
volumes are attached to whole baremetal nodes with Ironic.
We now generate and use a service token when interacting with
Cinder which allows cinder to recognize "this request is
coming from a fellow OpenStack service", and by-pass
checking with Nova if the "instance" is managed by Nova,
or Not. This allows the volumes to be attached, and detached
as needed as part of the power operation flow and overall
set of lifecycle operations.
Related-Bug: 2004555
Closes-Bug: 2019892
Change-Id: Ib258bc9650496da989fc93b759b112d279c8b217
Until we're able to get the BFV job softed, we need to unblock
the gate, and as such moving the BFV job to non-voting to allow
other contributors to make progress.
Change-Id: I045d58afe195f08823af3b1a2fa6eabb6efb63ca
The pysnmp library is not maintained since 4 years now and it's
incompatible with recent libraries like pyasn1.
Its fork pysnmp-lextudio is regularly maintained, we should move
to that.
Change-Id: I3b7b5c0cf2f7d669b265669d27e0eaca0dd1fc2a
Seems we can now get a failure on jammy where the commit/rollback
has *already* occured in some cases, causing test failures.
This seems to be a result of the pattern of returning the in-flight
db operation, instead of returning the result later.
Since we should only *really* be doing so where we explicitly
know we need to do so, I've gone ahead and changed the rest of
the returns, since we recently had to visit some of them for
the same basic reason.
Change-Id: I3c2033b05f8005891d96c3ca141a80838a8699a4
When enabling scope enforcement, the self_owned_node check could
generate a failure because the check internally can be touched
by both a project scoped and system scoped endpoint.
This change changes the tag in the policy so it doesn't prematurely
return an error to the API consumer.
Change-Id: I49e2f7f29eb98e5bb4e18614cea0aca726703f55
GET /v1/nodes/{node_ident}/management/indicators/{component}
as documented does not work. This calls ironic.api.
controllers.v1.node.IndicatorController.get_one and it will
raise InvalidParameterValue in ironic.api.controllers.v1.node.
IndicatorAtComponent unless `{indicator}@{component}` is given.
Change-Id: I5e7edb36b5f9dacf990215c05a84739d003a09b7
When doing a GET or PUT on an indicator the indicator
specified must use `indicator@component` syntax. For
example:
indicators/led@system or indicators/led-0@chassis
Ref: ironic.api.controllers.v1.node.IndicatorAtComponent
Change-Id: I6908544b52be88cbddf537c954dd5e097a0d83e6
Turns out more than one test was relying upon the object
change determination test. Modifies this test to use the
same pattern of behavior so we are avoiding racing.
Change-Id: I29ee6cab7320d13fcc2eeda27dae08aeb2d98b00
The test, periodically under certian CI race conditions, may
be handled as if there was not a change, which breaks the
test as it does not save a modified port, it uses the in-flight
list of changes to determine the correct path.
The challenge is, the list of changes may not reconize there has
been a change with the underlying object/db layer. So instead
of re-test the library code, we just force the behavior by
replacing the method on the object in the test, as the
undrelying method being tested is tested as part of the
oslo versioned objects code base.
Change-Id: Ic8f9b2384ab2f8f76299afce9806fbe93e350f0e
hashring use time.time() to calculate intervals, replace with
monotonic so it will not be affected by system time jump.
Change-Id: I17569359f4d2c0f2f24ca8b50773c4d210ed8deb
Anaconda stage2 location is not verified nor loaded when kernel
and ramdisk instance info on a node is already populated, due
to a condition removed by commit 4d653ac225.
This change implements a check based on stage2 label presence
that again allows stage2 to be processed separately from kernel
and ramdisk.
And changes incorrect parameter check_on_exit passed to
oslo_concurrency.processutils.execute() to the correct one,
check_exit_code.
Change-Id: If7bfc99f200fd0bfc9fbe2a4c8d039ba6ae8d0c5
Currently, if an attempt is made to fetch MAC address information using
OOB inspection on a Redfish-managed node and EthernetInterfaces
attribute is missing on the node, inspection fails due to a
MissingAttributeError exception being raised by sushy. This change adds
catching and handling this exception.
Change-Id: I6f16da05e19c7efc966128fdf79f13546f51b5a6
Patch Ic8b1d964f7be5784e01c89bfb6c0277ea82eec2d was developed
without the autocommit change in place, which should allow
Ironic to operate properly with sqlite, in a single process
standalone mode.
As such, we shouldn't need to keep autocommit turned on, but
we may need to put it back if we identify yet another issue,
which is entirely possible with major database refactors.
Change-Id: Icde231e9db3b7a9f59205505cd51a4064e41d746
Prior to this fix, we have been unable to run the Metal3 CI job
with SQLAlchemy's internal autocommit setting enabled. However
that setting is deprecated and needs to be removed.
Investigating our DB queries and request patterns, we were able
to identify some queries which generally resulted in the
underlying task and lock being held longer because the output
was not actually returned, which is something we've generally
had to fix in some places previously. Doing some of these
changes did drastically reduce the number of errors encountered
with the Metal3 CI job, however it did not eliminate them
entirely.
Further investigation, we were able to determine that the underlying
issue we were encountering was when we had an external semi-random
reader, such as Metal3 polling endpoints, we could reach a situation
where we would be blocked from updating the database as to open a
write lock, we need the active readers not to be interacting with
the database, and with a random reader of sorts, the only realistic
option we have is to enable the Write Ahead Log[0]. We didn't have
to do this with SQLAlchemy previously because autocommit behavior
hid the complexities from us, but in order to move to SQLAlchemy
2.0, we do need to remove autocommit.
Additionally, adds two unit tests for get_node_with_token rpc
method, which apparently we missed or lost somewhere along the
way. Also, adds notes to two Database interactions to suggest
we look at them in the future as they may not be the most
efficient path forward.
[0]: https://www.sqlite.org/wal.html
Change-Id: Iebcc15fe202910b942b58fc004d077740ec61912
Direct deploy interface with http source never removes instance
image after deployment, which makes the image in the image cache
referenced and unable to be cleaned up exceeds TTL.
Change-Id: I3d8ee090007dc90b8868026ba24509719f81e4d7
The troubleshooting kernel command line option nomodeset
unfortunately changes the way framebuffer interactions work
with graphics devices which in some cases can result in kernel
memory to be used for graphics updates. When this happens on
some specific hardware common in rack mount servers with baseboard
management controllers, this can cause the memory bus to become
locked for a brief time while the graphics update is occuring.
This locked memory bus means disk IO can become blocked,
and network cards can overflow their buffers resulting in
packet loss on top of the latency incurred by the graphics
update executing.
As such, we've removed the nomodeset option from default usage and
added a note describing its removal to the documentation along
with a release note.
Change-Id: I9084d88c3ec6f13bd64b8707892758fa87dd7f86
This patch fixes a condition where iRMC driver interfaces would have
the FIPS enforcement logic check applied if the SNMP version was not
set to SNMP v3, even if the interfaces did not use SNMP.
With this patch, if FIPS enabled, iRMC driver enforces SNMP
version to be version 3 only when any xxx_interface of iRMC
driver actually uses SNMP.
Story: 2010713
Task: 47879
Change-Id: I774c459a5e11b7cd01f7a65754d5a2c7cc573476
We have seen duplicate ip issues when leaving clean failed nodes
powered on. This patch allows operators to power down nodes that
enter clean failed state.
Change-Id: Iecb402227485fe0ba787a262121c9d6a048b0e13
Fix the warnings that oslo.versionedobjects has been emitting for years
now.
Change-Id: I53bd78d8b70f276d2ea8569f0ab1e7ce04f52fea
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Resolve the following SAWarning warning:
SELECT statement has a cartesian product between FROM element(s)
"foo" and FROM element "bar". Apply join condition(s) between each
element to resolve.
This was happening because we were filtering instances of
ConductorHardwareInterfaces by the state of the Conductor referenced by
the 'conductor_id' field *without* joining the Conductor table. By
adding the join, we can avoid this cartesian product.
Change-Id: I2c20d7a7c1de41d4d0057fabc1d953b5bfb5b216
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The metal3 integration job now builds the ironic container image
using the ironic code in the tested patch and run
metal3-dev-env with it.
Change-Id: I0f339ad6931264875b32e11dff79c02a252d04b1
Launching test VMs can take a while, and grenade can fail
if the VM's networking is not quite online in under sixty
seconds. As such, it is reasonable to use a larger window
so the failure rate of ironic-grenade will hopefully decline.
Depends-On: https://review.opendev.org/c/openstack/grenade/+/879674
Change-Id: I07aead4b09ccb7e427a0d3d04e7a580cf4b00a95
Bandit 1.7.5 dropped with logic to check requests invocations.
Specifically if a timeout is not explicitly set, then it results
in an error.
This should cause our bandit job to go green.
Closes-Bug: 2015284
Change-Id: I1dcb3075de63aae97bb22012a54736c293393185
The current check is insufficient: it passes for Kubernetes shared
volumes, although hard-linking between them is not possible.
This patch changes the approach to trying a hard link and falling
back to copyfile instead.
The patch relies on optimizations in Python 3.8 and thus should not
be backported beyond the Zed series to avoid performance regression.
Change-Id: I929944685b3ac61b2f63d2549198a2d8a1c8fe35
Template databases are not designed to run random SQL code. They should
only be accessed to modify the template itself. Use postgres instead.
Change-Id: Id7d38895d8d04964557447ecbc6ca29f39f626c9
Some tests tool and IDE may create a .local directory inside the repo
with virtualenvs for dependencies, other tools may create . directories
or files for temporary reports.
While they can be removed in a second time, or configured differently,
it's advisable to just exclude all file starting with . from the
flake8 tests to avoid confusion and possible unexpected errors.
Change-Id: I5dc2fd2dec9690c77babe0c2b1dc5d0991413b32
Unused by Nova and unlike memory_mb/local_gb also by Ironic (actually,
our usage of local_gb is worth double-checking as well, but at the very
least it's referenced by inspection implementations).
Change-Id: Ie8b0d9f58f4dcd102c183c30ae7f5acf68a5e4c3
Lookup returns generic 404 errors for security reasons. Logging is
the only way of debugging any issues during it.
Change-Id: I860ed6b90468a403f0f6cdec9c3d84bc872fda06
Currently, we silently fall back to ironic-inspector managing boot if
the boot interface cannot do it. What ironic-inspector does is set
the boot device to PXE and issue a reboot request. This was done
to keep backward compatibility with how inspection worked before managed
boot was introduced.
With in-band inspection migrating to Ironic proper, this "unmanaged"
mode becomes a more exotic case since it requires additional PXE
infrastructure. Additionally, the popularity of Redfish is rapidly
growing, and we support pre-populating ports when Redfish is used.
As such, the "unmanaged" mode should no longer be allowed by default.
This change prepares for the future flip of the default value by
issuing a deprecation warning if no explicit value is set for the option.
Depends-On: https://review.opendev.org/c/openstack/bifrost/+/877469
Change-Id: I6a13cf62b427c9e5c7d7d9ddc447d60f94592c9a
* Avoid using the term "introspection". We need to settle on either
"inspection" or "introspection", and the Ironic API already uses
the former.
* Accept (and return) inventory and plugin data separately to reflect
the Ironic API (single JSON blobs are an Inspector legacy).
* Make sure to mention the container name in error logging.
* Use more readable formatting syntax for building Swift names.
* Do not mock objects with dicts (in unit tests).
* Simplify inventory API tests.
Change-Id: Id8c4bc6d35b9634f5a5ac2b345a8fd7f1dba13c0
... to reduce the already frightening size of ironic.conductor.manager
and make space for more inspection additions.
While here, fix up log messages for clarity and brevity.
Change-Id: I5196d58016ae094f17e0aad187a11d9cceaab04b
When creating nodes, previously there was no way to set a
default conductor group to create nodes with, thus forcing
a two step process, a dedicated conductor without a conductor
group to serve reqeusts for it.
With this change, an operator can set specific conductor_group
settings by API, allowing increased delineation with reduced
risk of misconfiguration or mis-step.
Story: 2010267
Task: 46183
Change-Id: I21d58750504b2eecf3368d2e03eaca050065c3d7
Add file to the reno documentation build to show release notes for
stable/2023.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.
Sem-Ver: feature
Change-Id: Iabfa1f9b492c85be185c6993a016a6dd88ed9f9a
This mapping allows object version upgrades to be navigated
and needs to be updated pre-release otherwise we break the
inherent upgrade job to the latest state of the development
branch.
Also, had to backfill the records for the bugfix branch since,
while not required for that version to run, it is required to
have to upgrade from that version.
Also, lists antelope and 2023.1 as "named" releases, due to the
abiguity and configuration, it just seemed better to be on the
safe side.
Change-Id: I633275caf8c3dc750023fbb27bd8a3f4d23e9fa5
... And tags, but nobody uses tags since it is not available
via the API.
Anyhow, the online upgrade code was written under the assumption
that *all* tables had an "id" column. This is not always true
in the ironic data model for tables which started as pure extensions
of the Nodes table, and fails in particular when:
1) A database row has data stored in an ealier version of the object
2) That same object gets a version upgrade.
In the case which discovered this, BIOSSetting was added at version
1.0, and later updated to include additional fields which incremented
the version to 1.1. When the upgrade went to evaluate and iterate
through the fields, the command failed because the table was designed
around "node_id" instead of "id".
Story: 2010632
Task: 47590
Change-Id: I7bec6cfacb9d1558bc514c07386583436759f4df
Even if a glance image is raw, we still recalculate the checksum after
"converting" it to raw. This process may take exceptionally long.
Change-Id: Id93d518b8d2b8064ff901f1a0452abd825e366c0
While investigating a very curious report, I discovered that
if somehow the power was *already* turned off to a node, say
through an incorrect BMC *or* human action, and Ironic were
to pick it up (as it does by default, because it checks before
applying the power state, then it would not wipe the token
information, preventing the agent from connecting on the next
action/attempt/operation.
We now remove the token on all calls to conductor
utilities node_power_action method when appropriate, even
if no other work is required.
Change-Id: Ie89e8be9ad2887467f277772445d4bef79fa5ea1
In a relatively odd turn of events, should cleaning
have started, but then timed out due to lost communications
or a hard failure of the machine, an agent token could
previously be orphaned preventing re-cleaning.
We now explicitly remove the token in this case.
Change-Id: I236cdf6ddb040284e9fd1fa10136ad17ef665638
SNMP driver was using the wrong dictionary key to retrieve auth_protocol
and priv_protocol from driver info. As a result, the SNMP client was
created with empty strings for both those fields. Any nodes configured
to use SNMP v3 with those fields failed because the SNMP driver was
unable to perform power related operations due to authentication error.
- Use correct keys for snmp auth_protocol and priv_protocol when
creating SNMP client
- Sanitize snmp auth_key and priv_key in API results
Story: 2010613
Task: 47535
Change-Id: I5efd3c9f79a021f1a8e613c3d13b6596a7972672
Converts ironic.drivers.modules.inspector into a package with
two subpackages: client and interface, the latter containing most
of the current content.
Change-Id: Idbfd275c60a873e3de2e0a34db793619f8c99d85
When cleaning fails, we power off the node, unless it has been running
a clean step already. This happens when aborting cleaning or on a boot
failure. This change makes sure that the power action does not wipe
the last_error field, resulting in a node with provision_state=CLEANFAIL
and last_error=None for several seconds. I've hit this in Metal3.
Also when aborting cleaning, make sure last_error is set during
the transition to CLEANFAIL, not when the clean up thread starts
running.
While here, make sure to log the current step in all cases, not only
when aborting a non-abortable step.
Change-Id: Id21dd7eb44dad149661ebe2d75a9b030aa70526f
Story: #2010603
Task: #47476
Instead of clearing existing reservations at the beginning of
del_host, wait for the tasks holding them to go to completion. This
check continues indefinitely until the conductor process exits due to
one of:
- All reservations for this conductor are released
- CONF.graceful_shutdown_timeout has elapsed
- The process manager (systemd, kubernetes) sends SIGKILL after the
configured graceful period
Because the default values of [DEFAULT]graceful_shutdown_timeout and
[conductor]heartbeat_timeout are the same (60s) no other conductor
will claim a node as an orphan until this conductor exits.
Change-Id: Ib8db915746228cd87272740825aaaea1fdf953c7
Currently when a conductor is stopped, the rpc service stops
responding to requests as soon as self.manager.del_host returns. This
means that until the hash ring is reset on the whole cluster, requests
can be sent to a service which is stopped.
This change waits for the remaining seconds to delay stopping until
CONF.hash_ring_reset_interval has elapsed. This will improve the
reliability of the cluster when scaling down or rolling out updates.
This delay only occurs when there is more than one online conductor,
to allow fast restarts on single-node ironic installs (bifrost,
metal3).
Change-Id: I643eb34f9605532c5c12dd2a42f4ea67bf3e0b40
This change adds the capability for the ironic-conductor
and standalone service process to transmit timer and counter
metrics to the message bus notifier which may be consumed by
a ceilometer, ironic-prometheus-exporter, or other consumer of
metrics event data on to the message bus.
This functionality is not presently supported on dedicated API
services such as those running as an ``ironic-api`` application
process, or Ironic WSGI application. This is due to the lack of
an internal trigger mechanism to transmit the data in a metrics
update to the message bus and/or notifier plugin.
This change requires ironic-lib 5.4.0 to collect and ship metrics via
the message bus.
Depends-On: https://review.opendev.org/c/openstack/ironic-lib/+/865311
Change-Id: If6941f970241a22d96e06d88365f76edc4683364
While developing some internal metrics collection capability,
and the realization that a lock was needed, we realized that
the lock activity itself would be a bit noisy. And image actions
also get lock logging, and it is just really noisy, but not super
helpful for troubleshooting.
So, set it to WARNING instead.
Discussion wise, see:
https://review.opendev.org/c/openstack/ironic-lib/+/865311
Change-Id: I3ab14ee5b5cc063784d26e3c760f1422c692060d
Follow-up to I6b830e5cc30f1fa1f1900e7c45e6f246fa1ec51c
Original changa introduced some errors such as mismatched
arguments for exceptions
Story: 2010275
Task: 46204
Change-Id: I550e048ab22a6cd25502b41d1c579819df369249
Follow-up to I74b19f7a42c1326d7ec04e6320176e81639ebfb4
Mention need of the maintenance mode to orphan swift
objects during node clean up
Story: 2010275
Task: 46204
Change-Id: Ie95a5bd333b0dab3e97254dfb4eb532bdbfd2650
The tl;dr is that we changed ``inspecting`` to include a
``inspect wait`` state. Unfortunately we never spotted the logic
inside of the db API. We never spotted it because our testing in
inspection code uses a mocked task manager... and we *really* don't
have intense db testing because we expect the objects and higher
level interactions to validate the lowest db level.
Unfortunately, because of the out of band inspection workflow,
we have to cover both cases in terms of what the starting state
and ending state could be, but we've added tests to
validate this is handled as we expect.
Change-Id: Icccbc6d65531e460c55555e021bf81d362f5fc8b
The dynamically allocated console port for a node is saved
into database and reused on subsequent console operations.
In certain code path the port record cann't be trusted and
we should do a re-allocation.
This patch fixes the issue by ignores previous allocation
record. The extra cleanup in the takeover is not required
anymore and removed as well.
Change-Id: I1a07ea9b30a2c760af7a6a4e39f3ff227df28fff
Story: 2010489
Task: 47061
Recently we hit an issue that the pid file is missing, current logic
simply removes pid file if the corresponding process is not found,
but if the pid file is lost then the console could never be stopped
and futher more, be restarted, regardless if the process is there or
not.
This patch captures FileNotFound to the exception handling to allow
console recovery.
Change-Id: I1a0b8347e960c6cff8aca10a22c67b710f7d617e
Follow-up to Ie174904420691be64ce6ca10bca3231f45a5bc58
which enables storage of inventory in Swift, but does not delete
the Swift entry when the node whose inventory is stored is deleted
Story: 2010275
Task: 46204
Change-Id: I74b19f7a42c1326d7ec04e6320176e81639ebfb4
- Basic support and testing for CRUD for node.shard.
- Policy checking for update node.shard.
- New API endpoint: GET /v1/shards
- Policy checking for GET /v1/shards
- Support for querying for nodes in a list of shards
Story: 2010378
Task: 46624
Change-Id: I385594339028c20cfc83fdcc4cbbec107efdacff
This request parameter will allow an operator to ask the question
"Do I need to assign shards to any of my nodes?".
Change-Id: I26b745e5ef2b320a8d8a0667ac61c080fcdcd576
The format string is expecting a dictionary with keys matching
those used in the format string. Any unused parameters will
cause an "not all arguments converted during string formatting"
exception.
The quote style is also changed from double to single quotes to
match the other logging statements in the code.
Change-Id: Ic9dea4f51d82866be8ac16242a79237c789b9745
Ports cannot be filtered by node, node_uuid, or portgroup at the same
time as other potential filters. Explicitly document this.
Change-Id: Ia875a6543eb8871ce70028c055de2f1832c3ecdb
Per discussion in IRC, the retirement documentation sets forth
an understanding that sensitive data will be removed from the
baremetal node, however this is performed through cleaning which
inherently sets forth a requirement in automated cleaning.
Explicitly note, and provide options should an operator wish
to utilize the feature.
Change-Id: I6755433b97cacd6ebf6a8f7eb5b404697e0a4349
In its current place, reno config changes will not cause
build-openstack-releasenotes job to run, which means changes can land to
that config without being tested. Yikes!
Also fixes error in regexp which was preventing this from actually
fixing the build-openstack-releasenotes job.
Change-Id: I4d46ba06ada1afb5fd1c63db5850a1983e502a6c
The anaconda job is failing as were getting a redirect issued back
upon attempting to validate URLs. The servers are now directing us
to use HTTPS instead.
Change-Id: Iac8e6e58653ac616250f4ce3ab3ae7f5164e5b03
Reno was assuming all tags ending in -eol represented an old, EOL'd
stable branch. That's not true for Ironic projects which have bugfix
branches. Update the regexp to exclude those branches.
Co-Authored-By: Adam McArthur <adam@mcaq.me>
Change-Id: I568b14097cd46d4d7d365ff894ef5cd29edd1e3a
Move functions storing and obtaining introspection data
from drivers/modules/inspector.py and api/controllers/v1/node.py
to driver/modules/inspect_utils.py
Follow-up to change If50f665da5fbb16f7646f3d6195a6e14e7325b0a
Story: 2010275
Task: 46204
Change-Id: I2b206670aff6ad3a9f9cc76236453abf42663cad
I got pinged with some questions by an operator who had
issues attempting to exit cleaning. In the discussion,
it was realized we lack basic troubleshooting guidance,
which led them to try everything but the command they needed.
As such, adding some guidance in an attempt to help operators
navigate these sorts of issues moving forward.
Change-Id: Ia563f5e50bbcc789ccc768bef5800a64b38ff3d7
In the networking code stack, one of the methods
looks to identify if a change has occured, except
some of the other tests utilize the same value that
was previously asserted for the same base object.
Becaues of this, just use a unique value so we
don't risk the possibility of the test failing
erroneously.
Change-Id: Ide2b205ade67a4090a0b9bfe1282d01f7605ceb9
Some of these metrics decorator were unlabeled without a class which
would result in semi-confusing structures for the metrics counters.
Now, we should be semi-consistent.
Change-Id: Ie2795419991dc941f2a2b2bc0c6116b92d285041
This change adds support for the ``service`` role, which is intended
largely for service to service communiation, such as if one wanted to
utilzie a "nova" project, and have an ironic service user within it,
and then configure the ``nova-compute`` service utilizing those credentials.
Or vice versa, an "ironic" project, with a nova user.
In this case, access is exceptionally similar to the rights afforded to
a "project scoped manager" or an "owner-admin".
Change-Id: Ifd098a4567d60c90550afe5236ae2af143b6bac2
Create [inventory] to hold CONF parameters for storage of introspection data
Story: 2010275
Task: 46204
Change-Id: I06fa4f69160206dd350856e264cbb0842e34fd2a
Since iRMC S6 2.00, iRMC firmware disables IPMI over LAN
with default iRMC firmware configuration.
To deal with this firmware incompatibility, this commit
modifies driver's methods which use IPMI to first try
IPMI and, if IPMI fails, try to use Redfish API.
Story: 2010396
Task: 46746
Change-Id: I1730279d2225f1248ecf7fe403a5e503b6c3ff87
Since iRMC S6 2.00, iRMC firmware doesn't support HTTP
connection to REST API.
To deal with this firmware incompatibility, this commit
adds verify step to check connection to REST API and adds
node vendor passthru to fetch&cache version of iRMC firmware.
Story: 2010396
Task: 46745
Change-Id: Ib04b66b0c7b1ef1c4175841689c16a7fbc0b1e54
If the published image is a hardlink, the source selinux context is
preserved. This could cause access denied when retrieving the image
using its URL.
Change-Id: I550dac9d055ec30ec11530f18a675cf9e16063b5
Grub2 looks for files in different paths depending on the boot mode
of the binary. Previously the grub_config_path setting was defaulted
to the path used exclusively for BIOS booting, which meant anyone
using it had to override the setting. Now, we've set the default
to the default for UEFI booting, and the world should be a happier,
and less override filled place.
Change-Id: Id6723e92efb62f8ca03099f15c90580cec887ddd
Adding an entry to the troubleshooting documentation to cover the
very complex topic of cleaning + RAID + disk protocols + device
behavior/capabilities.
Change-Id: I8d322dd901634c59950a6a458b265111282d0494
This was merely obscuring a bug in pbr. setuptools doesn't do the
auto-discovery when pbr is in use. Remove it.
Change-Id: I40500ed7bf9d9fb30381c7539548544152cea85e
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Depends-on: https://review.opendev.org/c/openstack/pbr/+/869082
This commit partially reverts change set
I0bfef09a5312a17be54ce5c09805f06b7c349026
where the amount of memory for test VMs was
increased to 4GB. This was because excess
junk getting stuck in the staged ramdisk
images used by CI.
Change-Id: Ia0c74cbeecdb9febf9f7a4e76db84e0f378a97fc
It appears we are getting an opcode error when attempting to boot
Centos 9-stream utilizing the EFI artifacts from Ubuntu.
Technically this should work, however further aftifacts in the boot
chain may be signed with other key credentials that Ubuntu's
grub does not know about, because the chain of trust is
MSFT -> Vendor shim (slow change rate) -> Vendor GRUB -> Kernel
Where vendor differences should never work, is if Secure Boot
is enforcing.
Exception on launch:
X64 Exception Type - 06(#UD - Invalid Opcode) CPU Apic ID - 00000000 !!!!
A similar Debian bug is open for a very similar issue:
https://groups.google.com/g/linux.debian.bugs.dist/c/BOiLLeROrmo
However, no additional comments or information have been in follow
up to that reported issue. So in the mean time, we're going to try
and do what those smarter than I recommend, use the vendor's
binaries for their distribution.
There is one further, potentially far more depressing possibility,
that centos9's kernel doesn't support the type of hardware
we're getting. This is suggested by the precise opcode error, UD,
https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-212.html
But again, easiest possibility first.
Change-Id: Id9bd30bc3c2f1076555317e4a3f277725fa7c1f4
The IRONIC_VM_MACS_CSV_FILE is generated only if we execute the
ironic basic ops, so when IRONIC_BAREMETAL_BASIC_OPS is True.
In some jobs we set IRONIC_BAREMETAL_BASIC_OPS to False but we
still look for that file causing a "file not found" error which
does not trigger a trap until focal, but it does in jammy.
Let's create the file if it does not exist.
Change-Id: Ib938abe0723072419f336159cbffff33e46ea39b
The RC_DIR does not existed (and it never existed, it was SRC_DIR)
Change that to TOP_DIR which is what we use commonly in other
sections.
Change-Id: I4a400fd434a20938cd38c0bb876da21fec7473a1
We're builing tinyipa using tinycore 13.x since a while, we should use
the same version for the base ramdisk image.
Change-Id: I9d144f122c20f717ff946282ef7ffa16d82812f5
- Remove skipsdist that it was never supported and causes breakage
when used with usedevelop.
- add script to allowlist for pep8 test
- disable setuptools autodiscovery
- Increase base VM memory according to new requirements for CS9
based IPA
Change-Id: I0bfef09a5312a17be54ce5c09805f06b7c349026
In [1] we finally got rid of the unfinished lib/neutron module and kept
only lib/neutron-legacy. It's renamed to lib/neutron now and it's the
only neutron related module in Devstack.
So this patch removes leftovers related to the old lib/neutron-legacy.
[1] https://review.opendev.org/c/openstack/devstack/+/865014
Change-Id: Id938deab7188743e754d028dee8e0b2591ab6f7b
Do not update `raid_configs` if operation is synchronous.
First, it is not needed, second, it will not be cleaned
up by async periodics. As the result the data remains
on the node and causes errors the next time node is in
cleaning state.
Story: 2010476
Task: 47037
Change-Id: Ib1850c58d1670c3555ac9b02eb7958a1b440a339
This change adds 'node_uuid' to:
ironic.objects.portgroup.Portgroup
'node_uuid' is a relationship using association_proxy in
models.Portgroup. Using the association_proxy removes the
need to do the node lookup to populate node uuid for
portgroups in the api controller.
NOTE:
On portgroup create a read is added to read the port from
the database, this ensures node_uuid is loaded and solves
the DetachedInstanceError which is otherwise raised.
The test test_list_with_deleted_port_group was deleted, if
the portgroup does not exist porgroup_uuid on the port will
be None, no need for extra handling of that case.
Bumps Portgroup object version to 1.5
Change-Id: I4317d034b6661da4248935cb0b9cb095982cc052
This change adds 'node_uuid' to ironic.objects.port.Port
and adds a relationship using association_proxy in
models.Port. Using the association_proxy removes the need
to do the node lookup to populate node uuid for ports in
the api controller.
NOTE:
On port create a read is added to read the port from the
database, this ensures node_uuid is loaded and solves the
DetachedInstanceError which is otherwise raised.
Bumps Port object version to 1.11
With patch:
1. Returned 20000 ports in python 2.7768702507019043
seconds from the DB.
2. Took 0.433107852935791 seconds to iterate through
20000 port objects.
Ports table is roughly 12800000 bytes of JSON.
3. Took 5.662816762924194 seconds to return all 20000
ports via ports API call pattern.
Without patch:
1. Returned 20000 ports in python 1.0273635387420654
seconds from the DB.
2. Took 0.4772777557373047 seconds to iterate through
20000 port objects.
Ports table is roughly 12800000 bytes of JSON.
3. Took 147.8800814151764 seconds to return all 20000
ports via ports API call pattern.
Conclusion:
Test #1 plain dbapi.get_port_list() test is ~3 times
slower, but Test #3 doing the API call pattern test
is ~2500% better.
Story: 2007789
Task: 40035
Change-Id: Iff204b3056f3058f795f05dc1d240f494d60672a
Given the assistance we rendered last week where it ended up being
rooted in an image being misconfigured, it makes sense to at least
publish a troubleshooting note to aid in discoverability for
operators who encounter this issue in the future.
Change-Id: I8bc35571cc944ad20f413d53a47f94920dd1e928
No exception is used to communicate back, the exceptions
are used to catch failures, and if we don't catch other
possible exceptions leaving cleaning states, we may not clean
up state properly.
So instead of specific exceptions, we just catch any exception
like is used earlier in the same method.
Inspired by https://review.opendev.org/c/openstack/ironic/+/866856
and investigation through the code base as a result of inability
to clean the node.
Change-Id: I2a6bca3550819b98adbaffe315f77427b8a43d62
The Ironic inspector allows users to choose between SQL, Swift
and NoStore. Ironic should offer similar functionality.
Story: 2010275
Task: 46204
Change-Id: Ie174904420691be64ce6ca10bca3231f45a5bc58
Follow-up to change I058ceadab33f6969157b89aca5ba34ebd0be2a93
to mark some properties recommended, move documentation
and update contact information.
Co-Authored-By: Mike Raineri <michael.raineri@gmail.com>
Change-Id: I493f9402e15fa78bc5dae9d9bcbb124146f0d026
Add documentation for 'agent_burnin_fio_disk_smart_test' option
for disk burn-in.
Story: #2007523
Task: #43383
Change-Id: I686acddeb353839b045d5c0ad944114cb938f414
Prepare the ironic database to accommodate node inventory received from
the inspector once the API is implemented.
Story: 2010275
Task: 46204
Change-Id: I6b830e5cc30f1fa1f1900e7c45e6f246fa1ec51c
This commit modifies iRMC driver to use ironic.conf [deploy]
default_boot_mode as default value of boot_mode.
Before this commit, iRMC driver assumes Legacy BIOS as default
boot_mode and value of default_boot_mode doesn't have any effect
on iRMC driver's behavior.
Story: 2010381
Task: 46643
Change-Id: Ic5a235785a1a2bb37fef38bd3a86f40125acb3d9
This change aligns the boot interface order of the irmc
hardware type to match the other hardware type interface
order lists.
This change is a result of an operator reporting inconsistent
behavior of ironic when they are adding nodes using the irmc
hardware type, where they would default to use the "irmc-pxe" boot
interface, where as the other interfaces would end up defaulting
to "ipxe".
Change-Id: I017c6560f9de884eefb2c1925321380cc1c721e2
It relied on mocking tenacity.retry, but it's executed on class
initialization. Depending on the ordering, it may do nothing or
it may replace ImageService.call with a mock.
Instead, add a new tenacity helper that loads an option in runtime.
As a nice side effect, [glance]num_retries is now mutable.
Change-Id: I2e02231d294997e824db77c998ef8d352fa69075
The model_query call results in a nested read transaction, that does not
seem to play well with SQLite support. Since it's inherently relying on
the query style deprecated in SQLAlchemy 2.0, we need to migrate away
from this call. As an intermediate step, change instances of model_query
to session.query, making sure every call creates a session that lives
as long as is needed to fetch the results.
Removes a unit test which was built around creating
a fake deadlock condition to test that oslo_db was working
as expected. It's interaction was totally mocked, and in
retooling the base method there was no easy to keep
the same test logic around.
Co-Authored-By: Dmitry Tantsur <dtantsur@protonmail.com>
Change-Id: Ic8b1d964f7be5784e01c89bfb6c0277ea82eec2d
Some dbapi calls are still relying on it (e.g. touch_conductor
apparently), causing sqlite databases to get locked.
Change-Id: If17d49ef434cf60876a81dae8e5ddaa6dc45e707
This patch rewrites portions of the database migration
testing to the style required to not break when SQLAlchemy 2.0
is released.
The *Major* difference is a transition from using Dictionary key
styles towards using object names. This is because to retrieve
a dictionary form, or access a row object as a dictionary, requires
it to be cast as a dictonary, but in SQLAlchemy 2.0 row result
attribute .keys is no longer present, which ultimately prevents
casting and loading as such. Ultimately this just meant change
the tests to use the object model field labels.
One other change is we now query just the columns needed to get
an ORM object. This is a result of the unification of the select
interface and us being unable to instantiate a current full DB
object (as in models.Node in current code) against an older
database version in order to perform migration validation.
One last item, there appears to be a minor trivial difference
in the behavior in the return of a dictionary/json response
object with Postgres. Ultimately, it seems trivial, we just
needed the test to be aware of the difference as it is a
very low level test.
Change-Id: I4d7213488ce90176893459087fe2f0491a6a61fc
All services except Ironic (and keystone to support ironic
with system scope deployement), will not have system scope in
their API policy instead they are default to project scoped.
Change-Id: Id13a359086f9b24dbfcd2b565a42c50d0dab7736
* Changed common exception imports from SQLAlchemy for ORM
query types which are now originated from the main exception
definition set.
* Changed base join option usage to use objects instead of labels,
and defaulted all multi-row result sets to return data using
"selectinload" as opposed to operating with a join query to
avoid need to de-duplicate all result sets.
* Changed DeployTemplates to utilize objects instead of
field names for queries, and updated the associated join ORM
model's relationship record between DeployTemplate and
DeployTemplateSteps.
* Changed Ports, Chassis, Conductor, Volume Target/Connector
queries to lean towards use of select/update/delete queries as
opposed to ORM queries. Most of these changes revolved around
references of field names as opposed to model objects.
* This change also labels a few lines as "noqa", which is a result
of the style check rules getting triggered on statements as
needed for SQLAlchemy.
Change-Id: I651ec4b50c79be6aa8c798ee27957ed720a578d8
Mocking time.sleep is known to be problematic in general,
espescially when eventlet is involved. Since this can generate
false failures, change the unit test to just check that it was
called, as opposed to trying to count the number of times so
we don't accidently count other test's calling time.sleep.
Change-Id: I4323e74d7af008a651719fb972df667eb823e314
One of the major changes in SQLAlchemy 2.0 is the removal
of autocommit support. It turns out Ironic was using this quite
aggressively without even really being aware of it.
* Moved the declaritive_base to ORM, as noted in the SQLAlchemy 2.0
changes[0].
* Console testing caused us to become aware of issues around locking
where session synchronization, when autocommit was enabled, was
defaulted to False. The result of this is that you could have two
sessions have different results, which could results on different
threads, and where one could still attempt to lock based upon prior
information. Inherently, while this basically worked, it was
also sort of broken behavior. This resulted in locking being
rewritten to use the style mandated in SQLAlchemy 2.0 migration
documentation. This ultimately is due to locking, which is *heavily*
relied upon in Ironic, and in unit testing with sqlite, there are
no transactions, which means we can get some data inconsistency
in unit testing as well if we're reliant upon the database to
precisely and exactly return what we committed.[1]
* Begins changing the query.one()/query.all() style to use explicit
select statements as part of the new style mandated for migration
to SQLAlchemy 2.0.
* Instead of using field label strings for joined queries, use the
object format, which makes much more sense now, and is part of
the items required for eventual migration to 2.0.
* DB queries involving Traits are now loaded using SelectInLoad
as opposed to Joins. The now deprecated ORM queries were quietly
and silently de-duplicating rows and providing consistent sets
from the resulting joined table responses, however putting much
higher CPU load on the processing of results on the client.
Prior performance testing has informed us this should be a minimal
overhead impact, however these queries should no longer be in
transactions with the Database Servers which should offset the
shift in load pattern. The reason we cannot continue to deduplicate
locally in our code is because we carry Dict data sets which cannot
be hashed for deduplication. Most projects have handled this by
treating them as Text and then converting, but without a massive
rewrite, this seems to be the viable middle ground.
* Adds an explict mapping for traits and tags on the Node object
to point directly to the NodeTrait and NodeTag classes. This
superceeds the prior usage of a backref to make the association.
* Splits SQLAlchemy class model Node into Node and NodeBase, which
allows for high performance queries to skip querying for ``tags``
and ``traits``. Otherwise with the afrormentioned lookups would
always execute as they are now properties as well on the Node
class. This more common of a SQLAlchemy model, but Ironic's model
has been a bit more rigid to date.
* Adds a ``start_consoles`` and ``start_allocations`` option to the
conductor ``init_host`` method. This allows unit tests to be
executed and launched with the service context, while *not* also
creating race conditions which resulted in failed tests.
* The db API ``_paginate_query`` wrapper now contains additional
logic to handle traditional ORM query responses and the newer style
of unified query responses. Due to differences in queries and handling,
which also was part of the driver for the creation of ``NodeBase``,
as SQLAlchemy will only create an object if a base object is referenced.
Also, by default, everything returned is a tuple in 1.4 with the
unified interface.
* Also modified one unit test which counted time.sleep calls, which is
a known pattern which can create failures which are ultimately noise.
Ultimately, I have labelled the remaining places which SQLAlchemy
warnings are raised at for deprecation/removal of functionality,
which needs to be addressed.
[0] https://docs.sqlalchemy.org/en/14/changelog/migration_20.html
[1] https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#transaction-isolation-level-autocommit
Change-Id: Ie0f4b8a814eaef1e852088d12d33ce1eab408e23
In trying to figure out why I was unable to run
all of the test_migrations tests, I realized we need
to fix and clean up our unicode declarations.
Specifically, the way I found this was my local mysql
install was defaulted to using 4 Byte Unicode characters,
however some of our fields are 255 characters, which do not
fit inside of InnoDB tables.
They do, however fit with the "utf8" storage alias, which is
presently short for UTF8MB3, as opposed to UTF8MB4 which is
what my local database server was configured for. Because this
was in opportunistic tests, I wasn't able to really sort out
what was going on and thought we needed to shorten the fields.
In reality, it turns out we never defined the allocations
table to use UTF8 and Innodb for storage.
Storage engine wise, this is not a big deal, but may mean a
DBA will one day need to dump and reload the allocation table
of a deployment.
Character set wise... It is not great, but there is not a good
way for us to do this programatically. In my opinion, the chance
of an issue being encountered by an operator is unlikely, which
out weighs the risk and impact of dumping the entire table,
deleting the table, recreating the table with the updated schema
and then repopulating the entries. Of course, if operators are not
using allocations, then it really doesn't matter for them.
Along the way, I discovered we had used the "UTF8" type alias,
which may change one day, which would break Ironic. As such,
I've also updated the definitions used to create databases
and updated our documentation.
Recommended reading:
https://docs.sqlalchemy.org/en/14/dialects/mysql.html#unicodehttps://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb4.html
Story: 2010348
Task: 46492
Change-Id: I4103152489bf61e2d614eaa297da858f7b2112a3
Adding an upgrade check to provide awareness to the state of
the database in regards if an unexpected engine is in use or
if the character set encoding is also not UTF8.
These will raise non-fatal warnings on the upgrade status
check.
Change-Id: Ide0eb4690a056be557e5ea7d5ba5f6be37b50d0a
Story: 2010384
Any version of testtools lower than 2.5.0 fails if the
attribute 'result_supports_subtests' is present on any
test object because of old unittest support.
See also https://github.com/testing-cabal/testtools/issues/235
and 38fc9a9e30
for more info.
Change-Id: I23a1ca789f4ea4ffd981e1071f3c2c6c333ac330
Simulating workloads with the fake driver currently misses the reality
that some operations take time to complete, rather than occuring
instantly. This makes it difficult to mock real workloads for
performance and functional testing of ironic itself.
This change adds configurable random wait times for fake drivers in a
new ironic.conf [fake] section. Each supported driver having one
configuration option controlling the delay. These delays are applied
to operations which typically block in other drivers.
The default value of zero continues the existing behaviour of no
delay. A single integer value will result in a constant delay in
seconds. Two values separated by a comma will result in a triangular
distribution weighted by the first value, specifically in python[1]:
random.triangular(a, b, a)
Change-Id: I7cb1b50d035939e6c4538b3373002a309bfedea4
[1] https://docs.python.org/3/library/random.html#random.triangular
The 'all-plugin' tox environment was deprecated by this patch [1].
Instead of the 'all-plugin' it is recommended to use the 'all' tox
environment.
This patch removes any reference to 'all-plugin' tox environment and
updates the documentation so that the installation steps work with
the 'all' venv.
[1] https://review.opendev.org/c/openstack/tempest/+/543974
Change-Id: Id3451147d172002d67b4557680560a59b026ed77
Fixes the anaconda deploy(URL based) and adds
anaconda_boot entry to pxe_grub_config.template so
that ProLiants can be also deployed in PXE mode.
Story: 2010347
Task: 46490
Change-Id: I4b9e3a2060d9d73de5cab31cc08d3a764dc56e90
This patch adds new SNMPv3 auth protocols to iRMC which are supported
from iRMC S6.
Change-Id: Id2fca59bebb0745e6b16caaaa7838d1f1a2717e1
Story: 2010309
Task: 46353
This is an automatically generated patch to ensure unit testing
is in place for all the of the tested runtimes for antelope. Also,
updating the template name to generic one.
See also the PTI in governance [1].
[1]: https://governance.openstack.org/tc/reference/project-testing-interface.html
Change-Id: Ie1e2138b16929c204235e459df5f9c26885140ab
Add file to the reno documentation build to show release notes for
stable/zed.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.
Sem-Ver: feature
Change-Id: Iaf4d6982a78509af2ba5de44916a6e17d684e786
The Zed cycle is coming to a close, and we need a release notes
prelude.
Contribtors, please edit, I just put this together so we wouldn't
forget.
Change-Id: I3a5ca31bf3648c9f8a956f4592c305d2b23f419e
This is a pre-release commit for the Yoga release following our docs [1]
[1] https://docs.openstack.org/ironic/latest/contributor/releasing.html
We will clean-up the releasenotes and include the prelude in other patch
Change-Id: I3b8df0dce64c4ee3b20b7a714b6647d6e1ec0330
Ironic has fake drivers for development use. Document that they are
not suitable for production.
Story: 1326269
Task: 9877
Change-Id: Ibe6d43e1740a95b1cb3886394afaf8545de00e54
Ironic validates network interface before the cleaning process,
currently invalid parameter is captured but for not others.
There is chance that a node could be stucked at the cleaning
state on networking issues or temporary service down of neutron
service.
This patch adds NetworkError to the exception hanlding to cover
such cases.
Change-Id: If20de2ad4ae4177dea10b7ebfc9a91ca6fbabdb9
Provide the ability to limit resource intensive or potentially
wide scale operations which could be a symptom of a highly
distructive and unplanned operation in progress.
The idea behind this change is to help guard the overall deployment
to prevent an overall resource exhaustion situation, or prevent an
attacker with valid credentials from putting an entire deployment
into a potentially disasterous cleaning situation since ironic only
other wise limits concurrency based upon running tasks by conductor.
Story: 2010007
Task: 45140
Change-Id: I642452cd480e7674ff720b65ca32bce59a4a834a
PERC 9 and PERC 10 might not be in RAID mode with no or limited RAID
support. This fixes to convert any eligible controllers to RAID mode
during delete_configuration clean step or deploy step.
Story: 2010272
Task: 46199
Change-Id: I5e85df95a66aed9772ae0660b2c85ca3a39b96c7
Sushy 4 includes enhancements including support for hardware Ironic
should work with in Zed.
Story: #2009865
Task: #44548
Change-Id: Ib82bd4d1442bf7d9b135d1c1553c39cfef87548a
* Resolved PEP8 issues
* Trimmed comments to remove extraneous information
* Changed rfc1902.Integer() calls to the correct snmp.Integer() calls
* Fixed power state logic checking for new PDUs that don't have transitional states (e.g., 'pendingOn')
* Removed redundant warning messages
* Added unit tests for Raritan PD2, ServerTech Sentry 3/4, and Vertiv Geist drivers
* Updated documentation to list tested PDUs for the new drivers
* Updated release notes
Change-Id: I9da7b9042b817c346f75a44cd8287e1f63efcb56
This commit adds new clean steps create_csr and add_https_certificate
to allow users to create certificate signing request and adds
https certificate to the iLO.
Story: 2009118
Task: 43016
Change-Id: I1e2da0e0da5e397b6e519e817e0bf60a02bbf007
We will use this in the future to prepare for SQLAlchemy 2.0. For now,
we're simply using it to filter out some of the more annoying warnings
and to highlight general SQLAlchemy issues we need to address.
Change-Id: I7c26c20e4b36c4f3b98873939677b966ec6186a5
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Introduces additional job configuration to enable automated
integration testing via tempest of the anaconda deployment
interface.
Also, configures a private subnet with DNS, which is required
by anaconda executing, in order to facilitate processing of URLs.
Change-Id: I61b5205cf2c9f83dfcabf4314247c76fb6a56acd
oslo.db 12.1.0 has changed the default value for the 'autocommit'
parameter of 'LegacyEngineFacade' from 'True' to 'False'. This is a
necessary step to ensure compatibility with SQLAlchemy 2.0. However, we
are currently relying on the autocommit behavior and need changes to
explicitly manage sessions. Until that happens, we need to override the
default.
Change-Id: I9e095d810ff5398920e8ffd4f2f089d9b8d29335
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
The ``[dhcp]dhcp_provider`` configuration option can now be set to
``dnsmasq`` as an alternative to ``none`` for standalone deployments.
This enables the same node-specific DHCP capabilities as the
``neutron`` provider. See the ``[dnsmasq]`` section for configuration
options.
Change-Id: I3ab86ed68c6597d4fb4b0f2ae6d4fc34b1d59f11
Story: 2010203
Task: 45922
Before version 4.8, jsonschema did some wild guessing when producing
error messages for schemas with several equivalent subschemas. In
version 4.8 it is no longer done, causing error messages that are more
correct but also more generic.
This change restores guessing the potential root cause without claiming
that it's the only possible root cause. Also the traits schema is
simplified to make it less ambiguous.
See https://github.com/python-jsonschema/jsonschema/issues/991 for details.
Change-Id: Ia75cecd2bfbc602b8b2b85bdda20fdc04c5eadf4
default_boot_mode is effective regardless of the management
interface capabilities as it also sets the default for the boot
image used.
Change-Id: I012aa4067f8fa54eab7a2b860259d1aea5b94955
Previously, when a password change occured in ironic,
the session would not be invalidated, and this, in theory,
could lead to all sorts of issues with the old password
still being re-used for authentication.
In a large environment where credentials for BMCs may not
be centralized, this can quickly lead to repeated account
lockout experiences for the BMC service account.
Anyhow, now we consider it in tracking the sessions, so
when the saved password is changed, a new session is
established, and the old session is eventually expired out
of the cache.
Change-Id: I49e1907b89a9096aa043424b205e7bd390ed1a2f
The stock anaconda template previously lacked any ability
to indicate "don't validate the tls certificate".
The capability for the installation to operate *without*
requiring this to be the case is necessary for efficient
and simple CI testing as injecting CA certificates is
an overly complex interaction for CI testing.
Also updates the overall anaconda documentation to indicate
the constraint exists, but does not indicate explicitly how
to disable the setting via ironic.conf.
Change-Id: Ia8e4320cbedb205ab183af121da53562792a8faa
To use a source as a path with the anaconda deployment interface,
the kickstart template needs to utilize a 'url' command as opposed
to a second stage ramdisk.
This allows a seamless automatic switch without a customized
kickstart template to just use a URL.
Change-Id: I31febd4e131ed0cc1b37adb9318be8cb17136a68
Adds capabilites for a project scoped admin to
create and delete nodes in Ironic's API.
These nodes are automatically associated with the
project of the requestor.
Effectively, this does allow anyone with sufficient
privilges, i.e. admin, in an OpenStack deployment
to be able to create new baremetal nodes and delete
those baremetal nodes. In this case, the user has
the "owner" level of rights in the RBAC model.
Change-Id: I3fd9ce5de0bc600275b5c4b7a95b0f9405342688
This change aligns the boot interface order of the ilo
hardware type to match the other hardware type interface
order lists.
This change is a result of an operator reporting inconsistent
behavior of ironic when they are adding nodes using the ilo
hardware type, where they would default to use the "pxe" boot
interface, where as the other interfaces would end up defaulting
to "ipxe".
Change-Id: I3d848af284545a7a1fb1e065f09fe2df6a9114ac
The image lookup process, when handed a path attempts
to issue a HEAD request against the path and gets a
response which is devoid of details like a content length
or any properties. This is expected behavior, however if
we have a path, we also know we don't need to explicitly
attempt to make an HTTP HEAD request in an attempt to
match the glance ``kernel_id`` -> ``kernel`` and similar
value population behavior.
Also removes an invalid test which was written before the
overall method was fully understood.
And fixes the default fallback for kickstart template
configuration, so that it uses a URL instead of a
direct file path.
And fix logic in the handling of image property result
set, where the code previously assumed a ``stage2``
ramdisk was always required, and based other cleanup
upon that.
Change-Id: I589e9586d1279604a743746952aeabbc483825df
When we added concurrent disk erasures, we kept the concurrency
to 1 as to not risk any different oeprator behavior, at the cost
of not faster erasure times.
That being said, we have had the setting in place for some time
and we have received no reports of issues, so we are incrementing
it to four as that should be still quite relatively safe from a
concurrency standpoint for disk controllers in systems.
Change-Id: I6326422d60ec024a739ca596f46552bbd91b0419
2022-05-26 09:49:15 -07:00
1362 changed files with 113149 additions and 30283 deletions
:description:Implement availability zones with Ironic using conductor groups and shards. Multi-datacenter deployments, fault tolerance, and resource partitioning strategies.
:keywords:availability zones, conductor groups, shards, fault tolerance, multi-datacenter, resource partitioning, high availability, geographic distribution
:author:OpenStack Ironic Team
:robots:index, follow
:audience:cloud architects, system administrators
==========================================
Availability Zones and Resource Isolation
==========================================
Overview
========
While Ironic does not implement traditional OpenStack Availability Zones like
Nova and Neutron, it provides a **three-tier approach** for resource
partitioning and isolation that achieves comprehensive availability
zone functionality:
* **Multiple Ironic Deployments**: Completely separate Ironic services
targeted by different Nova compute nodes
* **Conductor Groups**: Physical/geographical resource partitioning within
a deployment
* **Shards**: Logical grouping for operational scaling within a deployment
This document explains how these mechanisms work together and how to achieve
sophisticated availability zone functionality across your infrastructure.
This document does **not** cover similar effect which can be achieved
through the use of API level Role Based Access Control through the