Virtio 0.9 implementation was limited to the maximum virtqueue size of
MAX_QUEUE_NUM and the virtio-net driver would fail to initialize on hosts
exceeding this limit.
This commit lifts the restriction by allocating the queue memory based on
the actual queue size instead of using a fixed maximum. Note that virtio
1.0 still uses the MAX_QUEUE_NUM constant to cap the size (unfortunately
this functionality is not available in virtio 0.9).
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
vpm_find_vqs incorrectly accepted the host provided queue size with no
regard to iPXE's internal limitations. Virtio 1.0 makes it possible for
the driver to override the queue size to reduce memory requirements and
iPXE is a great use case for this feature.
Also removing the extra vq->vring.num assignment which is already
handled in vring_init.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EFI provides no clean way for device drivers to shut down in
preparation for handover to a booted operating system. The platform
firmware simply doesn't bother to call the drivers' Stop() methods.
Instead, drivers must register an EVT_SIGNAL_EXIT_BOOT_SERVICES event
to be signalled when ExitBootServices() is called, and clean up
without any reference to the EFI driver model.
Unfortunately, all timers silently stop working when ExitBootServices()
is called. Even more unfortunately, and for no discernible reason,
this happens before any EVT_SIGNAL_EXIT_BOOT_SERVICES events are
signalled. The net effect of this entertaining design choice is that
any timeout loops on the shutdown path (e.g. for gracefully closing
outstanding TCP connections) may wait indefinitely.
There is no way to report failure from currticks(), since the API
lazily assumes that the host system continues to travel through time
in the usual direction. Work around EFI's violation of this
assumption by falling back to a simple free-running monotonic counter.
Debugged-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
AppleNetBoot.h is not taken from the EDK2 codebase and so cannot be
imported using include/ipxe/efi/import.pl. Mark as a native iPXE
header (by changing the include guard) to avoid breaking the import
process.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow certificates to be marked as having been added explicitly at run
time. Such certificates will not be discarded via the certificate
store cache discarder.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Enable IMAGE_PNG (but not IMAGE_PNM) by default, and drag in the
relevant objects only when image_pixbuf() is present in the binary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add PEM-encoded ASN.1 as an image format. We accept as PEM any image
containing a line starting with a "-----BEGIN" boundary marker.
We allow for PEM files containing multiple ASN.1 objects, such as a
certificate chain produced by concatenating individual certificate
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add DER-encoded ASN.1 as an image format. There is no fixed signature
for DER files. We treat an image as DER if it comprises a single
valid SEQUENCE object covering the entire length of the image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code to create a partial ASN.1 cursor containing only the type
and length bytes, so that asn1_start() may be used to determine the
length of a large ASN.1 blob without first allocating memory to hold
the entire blob.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Windows drivers for VMBus devices are enumerated using the
instance UUID rather than the channel number. Include the instance
UUID within the iPXE device name to allow an iPXE network device to be
more easily associated with the corresponding Windows network device
when debugging.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Select the IPv6 source address and corresponding router (if any) using
a very simplified version of the algorithm from RFC6724:
- Ignore any source address that has a smaller scope than the
destination address. For example, do not use a link-local source
address when sending to a global destination address.
- If we have a source address which is on the same link as the
destination address, then use that source address.
- If we are left with multiple possible source addresses, then choose
the address with the smallest scope. For example, if we are sending
to a site-local destination address and we have both a global source
address and a site-local source address, then use the site-local
source address.
- If we are still left with multiple possible source addresses, then
choose the address with the longest matching prefix.
For the purposes of this algorithm, we treat RFC4193 Unique Local
Addresses as having organisation-local scope. Since we use only
link-local scope for our multicast transmissions, this approximation
should remain valid in all practical situations.
Originally-implemented-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the IPv6 settings to construct the routing table, in a matter
analogous to the construction of the IPv4 routing table.
This allows for manual assignment of IPv6 addresses via e.g.
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0/len6 64
set net0/gateway6 fe80::226:bff:fedd:d3c0
The prefix length ("len6") may be omitted, in which case a default
prefix length of 64 will be assumed.
Multiple IPv6 addresses may be assigned manually by implicitly
creating child settings blocks. For example:
set net0/ip6 2001:ba8:0:1d4::6950:5845
set net0.ula/ip6 fda4:2496:e992::6950:5845
Signed-off-by: Michael Brown <mcb30@ipxe.org>
A reasonable user expectation is that ${net0/ip6} should show the
"highest-priority" of the IPv6 addresses, even when multiple IPv6
addresses are active. The expected order of priority is likely to be
manually-assigned addresses first, then stateful DHCPv6 addresses,
then SLAAC addresses, and lastly link-local addresses.
Using ${priority} to enforce an ordering is undesirable since that
would affect the priority assigned to each of the net<N> blocks as a
whole, so use the sibling ordering capability instead.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow settings blocks to provide an explicit default ordering between
siblings, with lower precedence than the existing ${priority} setting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the IPv6 address (or prefix) as ${ip6}, the prefix length as
${len6}, and the router address as ${gateway6}.
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Originally-implemented-by: Marin Hannache <git@mareo.fr>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The settings scope ipv6_scope refers specifically to IPv6 settings
that have a corresponding DHCPv6 option. Rename to dhcpv6_scope to
more accurately reflect this purpose.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In edk2, there are several drivers that associate HII forms (and
corresponding config access protocol instances) with each individual
network device. (In this context, "network device" means the EFI
handle on which the SNP protocol is installed, and on which the device
path ending with the MAC() node is installed also.) Such edk2 drivers
are, for example: Ip4Dxe, HttpBootDxe, VlanConfigDxe.
In UEFI, any given handle can carry at most one instance of a specific
protocol (see e.g. the specification of the InstallProtocolInterface()
boot service). This implies that the class of drivers mentioned above
can't install their EFI_HII_CONFIG_ACCESS_PROTOCOL instances on the
SNP handle directly -- they would conflict with each other.
Accordingly, each of those edk2 drivers creates a "private" child
handle under the SNP handle, and installs its config access protocol
(and corresponding HII package list) on its child handle.
The device path for the child handle is traditionally derived by
appending a Hardware Vendor Device Path node after the MAC() node.
The VenHw() nodes in question consist of a GUID (by definition), and
no trailing data (by choice). The purpose of these VenHw() nodes is
only that all the child nodes can be uniquely identified by device
path.
At the moment iPXE does not follow this pattern. It doesn't run into
a conflict when it installs its EFI_HII_CONFIG_ACCESS_PROTOCOL
directly on the SNP handle, but that's only because iPXE is the sole
driver not following the pattern. This behavior seems risky (one
might call it a "latent bug"); better align iPXE with the edk2 custom.
Cc: Michael Brown <mcb30@ipxe.org>
Cc: Gary Lin <glin@suse.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Ref: http://thread.gmane.org/gmane.comp.bios.edk2.devel/13494/focus=13532
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As with assertions, profiling is enabled for objects built with any
debug level (including an explicit debug level of zero).
Allow profiling to be globally enabled or disabled by adding PROFILE=1
or PROFILE=0 respectively to the build command line.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Assertions are enabled for objects built with any debug level
(including an explicit debug level of zero). It is sometimes useful
to be able to enable assertions across all objects; this currently
requires manually hacking include/assert.h.
Allow assertions to be globally enabled by adding ASSERT=1 to the
build command line. For example:
make bin/8086100e.mrom ASSERT=1
Similarly, allow assertions to be globally disabled by adding ASSERT=0
to the build command line. If no ASSERT=... is specified on the
build command line, then only objects mentioned in DEBUG=... will have
assertions enabled (as is currently the case).
Note than globally enabling assertions imposes a relatively heavy
runtime penalty, primarily due to the various sanity checks performed
by list_add(), list_for_each_entry(), etc.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the DEBUG=... syntax to allow debug messages to be compiled in
but disabled by default. For example:
make bin/undionly.kpxe DEBUG=netdevice:3:1
would compile in the messages as for DEBUG=netdevice:3, but would set
the debug level mask so that only the DEBUG=netdevice:1 messages would
be displayed.
This allows for external code to selectively enable the additional
debug messages at runtime, without being overwhelmed by unwanted
initial noise. For example, a developer of a new protocol may want to
temporarily enable tracing of all packets received: this can be done
by building with DEBUG=netdevice:3:1 and using
// temporarily enable per-packet messages
DBG_ENABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
...
// disable per-packet messages
DBG_DISABLE_OBJECT ( netdevice, DBGLVL_EXTRA );
Note that unlike the usual DBG_ENABLE() and DBG_DISABLE() macros,
DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() will not be removed via
dead code elimination if debugging is disabled in the specified
object. In particular, this means that using either of these macros
will always result in a symbol reference to the specified object.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DBG_ENABLE() and DBG_DISABLE() macros currently affect the debug
level of all objects that were built with debugging enabled. This is
undesirable, since it is common to use different debug levels in each
object.
Make the debug level mask a per-object variable. DBG_ENABLE() and
DBG_DISABLE() now control only the debug level for the containing
object (which is consistent with the intended usage across the
existing codebase). DBG_ENABLE_OBJECT() and DBG_DISABLE_OBJECT() may
be used to control the debug level for a specified object. For
example:
// Enable DBG() messages from tcpip.c
DBG_ENABLE_OBJECT ( tcpip, DBGLVL_LOG );
Note that the existence of debug messages continues to be gated by the
DEBUG=... list specified on the build command line. If an object was
built without the relevant debug level, then DBG_ENABLE_OBJECT() will
have no effect on that object at runtime (other than to explicitly
drag in the object via a symbol reference).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The vendor class identifier strings in DHCP_ARCH_VENDOR_CLASS_ID are
out of sync with the (correct) client architecture values in
DHCP_ARCH_CLIENT_ARCHITECTURE.
Fix by removing all definitions of DHCP_ARCH_VENDOR_CLASS_ID, and
instead generating the vendor class identifier string automatically
based on DHCP_ARCH_CLIENT_ARCHITECTURE and DHCP_ARCH_CLIENT_NDI.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC3315 defines DHCPv6 option 16 (vendor class identifier) but does
not define any direct relationship with the roughly equivalent DHCPv4
option 60.
The PXE specification predates IPv6, and the UEFI specification is
expectedly vague on the subject. Examination of the reference EDK2
codebase suggests that the DHCPv6 vendor class identifier will be
formatted in accordance with RFC3315, using a single vendor-class-data
item in which the opaque-data field is the string as would appear in
DHCPv4 option 60.
RFC3315 requires the vendor class identifier to specify an IANA
enterprise number, as a way of disambiguating the vendor-class-data
namespace. The EDK2 code uses the value 343, described as:
// TODO: IANA TBD: temporarily using Intel's
Since this "TODO" has been present since at least 2010, it is probably
safe to assume that it has now become a de facto standard.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC5970 defines DHCPv6 options 61 (client system architecture type)
and 62 (client network interface identifier), with contents equivalent
to DHCPv4 options 93 and 94 respectively.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some of the regions may end up being unmapped, either because they are
optional or because the attempt to map them has failed. Region types
starting at 0 didn't make it easy to test for this condition.
This commit bumps all valid region types up by 1 with 0 having the
implicit 'unmapped' meaning.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a mechanism to allow an arbitrary adjustment to be applied to
all subsequent calls to time().
Note that the underlying clock source (e.g. the RTC clock) will not be
changed; only the time as reported within iPXE will be affected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer. For example, a firewall may drop a NAT forwarding table
entry.
Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions. The temporary loss of connectivity can therefore
effectively become permanent.
Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.
TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer. Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.
Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the 16-bit PCI bus:dev.fn address to a 32-bit seg🚌dev.fn
address, assuming a segment value of zero in contexts where multiple
segments are unsupported by the underlying data structures (e.g. in
the iBFT or BOFM tables).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mac OS X uses non-standard EFI protocols to obtain the DHCP packets
from the UEFI firmware.
Originally-implemented-by: Michael Kuron <m.kuron@gmx.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There has been a longstanding disagreement between RFC4578 and the
IANA "Processor Architecture Types" registry. RFC4578 section 2.1
defines type 7 as "EFI BC" and type 9 as "EFI x86-64"; the IANA
registry quotes RFC4578 as its source but has these values erroneously
swapped. The EDK2 codebase uses the IANA values.
As of March 2016, RFC4578 has been modified by an errata to match the
values as recorded in the IANA registry.
Fix our definitions to match the consensus values.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently use the EFI_CPU_ARCH_PROTOCOL's GetTimerValue() method to
generate the currticks() timer, calibrated against a 1ms delay from
the boot services Stall() method.
This does not work on ARM platforms, where GetTimerValue() is an empty
stub which just returns EFI_UNSUPPORTED.
Fix by instead creating a periodic timer event, and using this event
to increment a current tick counter.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Require architecture-specific code to make a deliberate choice to use
the unoptimised generic_tcpip_continue_chksum() function, if there is
no optimised version available.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit adds support for driving virtio 1.0 PCI devices. In
addition to various helpers, a number of vpm_ functions are introduced
to be used instead of their legacy vp_ counterparts when accessing
virtio 1.0 (aka modern) devices.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Virtio 1.0 introduces new constants and data structures, common to all
devices as well as specific to virtio-net. This commit adds a subset
of these to be able to drive the virtio-net 1.0 network device.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI devices may support more capabilities of the same type (for
example PCI_CAP_ID_VNDR) and there was no way to discover all of them.
This commit adds a new API pci_find_next_capability which provides
this functionality. It would typically be used like so:
for (pos = pci_find_capability(pci, PCI_CAP_ID_VNDR);
pos > 0;
pos = pci_find_next_capability(pci, pos, PCI_CAP_ID_VNDR)) {
...
}
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The DHCP option 175.189 has been defined (by us) since 2006 as
containing the drive number to be used for a SAN boot, but has never
been automatically used as such by iPXE.
Use this option (if specified) to override the default SAN drive
number.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Interpret the maximum drive number (0xff for hard disks, 0x7f for
floppy disks) as meaning "use natural drive number".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide access to local files via the "file://" URI scheme. There are
three syntaxes:
- An opaque URI with a relative path (e.g. "file:script.ipxe").
This will be interpreted as a path relative to the iPXE binary.
- A hierarchical URI with a non-network absolute path
(e.g. "file:/boot/script.ipxe"). This will be interpreted as a
path relative to the root of the filesystem from which the iPXE
binary was loaded.
- A hierarchical URI with a network path in which the authority is a
volume label (e.g. "file://bootdisk/script.ipxe"). This will be
interpreted as a path relative to the root of the filesystem with
the specified volume label.
Note that the potentially desirable shell mappings (e.g. "fs0:" and
"blk0:") are concepts internal to the UEFI shell binary, and do not
seem to be exposed in any way to external executables. The old
EFI_SHELL_PROTOCOL (which did provide access to these mappings) is no
longer installed by current versions of the UEFI shell.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some architectures (such as ARM) the "@" character is used as a
comment delimiter. A section type argument such as "@progbits"
therefore becomes "%progbits".
This is further complicated by the fact that the "%" character has
special meaning for inline assembly when input or output operands are
used, in which cases "@progbits" becomes "%%progbits".
Allow the section type character(s) to be defined via Makefile
variables.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.
Fix by adding an explicit check on the I/O buffer length.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The assumption in asn1_type() that an ASN.1 cursor will always contain
a type byte is incorrect. A cursor that has been cleanly invalidated
via asn1_invalidate_cursor() will contain a type byte, but there are
other ways in which to arrive at a zero-length cursor.
Fix by explicitly checking the cursor length in asn1_type(). This
allows asn1_invalidate_cursor() to be reduced to simply zeroing the
length field.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations utilise an EoIB-to-Ethernet gateway device
that does not perform a FullMember join to the multicast group for the
EoIB broadcast domain. This has various exciting side-effects, such
as requiring every EoIB node to send every broadcast packet twice.
As an added bonus, the gateway may also break the EoIB MAC address to
GID mapping protocol by sending Ethernet-sourced packets from the
wrong QPN.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some EoIB implementations require each individual EoIB node to create
the multicast group for the EoIB broadcast domain.
It is left as an exercise for the interested reader to determine how
such an implementation might ever allow the parameters of such a
multicast group to be changed without requiring a simultaneous upgrade
of every driver on every operating system on every machine currently
attached to the fabric.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
EoIB is a fairly simple protocol in which raw Ethernet frames
(excluding the CRC) are encapsulated within Infiniband Unreliable
Datagrams, with a four-byte fixed EoIB header (which conveys no actual
information). The Ethernet broadcast domain is provided by a
multicast group, similar to the IPoIB IPv4 multicast group.
The mapping from Ethernet MAC addresses to Infiniband address vectors
is achieved by snooping incoming traffic and building a peer cache
which can then be used to map a MAC address into a port GID. The
address vector is completed using a path record lookup, as for IPoIB.
Note that this requires every packet to include a GRH.
Add basic support for EoIB devices. This driver is substantially
derived from the IPoIB driver. There is currently no mechanism for
automatically creating EoIB devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit e62e52b ("[ipoib] Simplify test for received broadcast
packets") relies upon the multicast LID being present in the
destination address vector as passed to ipoib_complete_recv().
Unfortunately, this information is not present in many Infiniband
devices' completion queue entries.
Fix by testing instead for the presence of a multicast GID.
Signed-off-by: Michael Brown <mcb30@ipxe.org>