david/ipxe
david
/
ipxe
Archived
1
0
Fork 0
Commit Graph

1137 Commits

Author SHA1 Message Date
Michael Brown 188789eb3c [tcp] Send TCP keepalives on idle established connections
In some circumstances, intermediate devices may lose state in a way
that temporarily prevents the successful delivery of packets from a
TCP peer.  For example, a firewall may drop a NAT forwarding table
entry.

Since iPXE spends most of its time downloading files (and hence purely
receiving data, sending only TCP ACKs), this can easily happen in a
situation in which there is no reason for iPXE's TCP stack to generate
any retransmissions.  The temporary loss of connectivity can therefore
effectively become permanent.

Work around this problem by sending TCP keepalives after a period of
inactivity on an established connection.

TCP keepalives usually send a single garbage byte in sequence number
space that has already been ACKed by the peer.  Since we do not need
to elicit a response from the peer, we instead send pure ACKs (with no
garbage data) in order to keep the transmit code path simple.

Originally-implemented-by: Ladi Prosek <lprosek@redhat.com>
Debugged-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-06-13 09:58:32 +01:00
Michael Brown b42e71921f [http] Accept headers with no whitespace following the colon
Reported-by: Raphael Cohn <raphael.cohn@stormmq.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-06-09 12:27:04 +01:00
Michael Brown f42b2585fe [http] Ignore unrecognised "Connection" header tokens
Some HTTP/2 servers send the header "Connection: upgrade, close".  This
currently causes iPXE to fail due to the unrecognised "upgrade" token.

Fix by ignoring any unrecognised tokens in the "Connection" header.

Reported-by: Ján ONDREJ (SAL) <ondrejj@salstar.sk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-25 15:35:43 +01:00
Michael Brown 231adda40f [netdevice] Fix failure path in register_netdev()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-05-23 14:17:47 +01:00
Michael Brown 40a8a5294c [ethernet] Make LACP support configurable at build time
Add a build configuration option NET_PROTO_LACP to control whether or
not LACP support is included for Ethernet devices.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-04-18 10:08:46 +01:00
Michael Brown 70509e6a03 [netdevice] Return ENOENT for an unknown bus type
It is possible for the preloaded UNDI device to end up with no
specified bus type, since it may not be recognised as either a PCI or
an ISAPnP device.  This will result in a bus type value of zero, which
currently results in NULL being treated as a string pointer by
netdev_fetch_bustype().

Fix by returning ENOENT if an unknown bus type is specified.

Reported-by: Todd Stansell <todd@stansell.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-29 20:59:30 +01:00
Michael Brown f8e1678b84 [crypto] Allow cross-certificate source to be configured at build time
Provide a build option CROSSCERT in config/crypto.h to allow the
default cross-signed certificate source to be configured at build
time.  The ${crosscert} setting may still be used to reconfigure the
cross-signed certificate source at runtime.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-24 19:25:03 +00:00
Michael Brown 64acfd9ddd [arp] Validate length of ARP packet
There is no practical way to generate an underlength ARP packet since
an ARP packet is always padded up to the minimum Ethernet frame length
(or dropped by the receiving Ethernet hardware if incorrectly padded),
but the absence of an explicit check causes warnings from some
analysis tools.

Fix by adding an explicit check on the I/O buffer length.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-12 01:24:03 +00:00
Michael Brown 05dcb07cb2 [tls] Avoid potential out-of-bound reads in length fields
Many TLS records contain variable-length fields.  We currently
validate the overall record length, but do so only after reading the
length of the variable-length field.  If the record is too short to
even contain the length field, then we may read uninitialised data
from beyond the end of the record.

This is harmless in practice (since the subsequent overall record
length check would fail regardless of the value read from the
uninitialised length field), but causes warnings from some analysis
tools.

Fix by validating that the overall record length is sufficient to
contain the length field before reading from the length field.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-11 16:09:40 +00:00
Michael Brown e44f6dcb89 [xsigo] Add support for Xsigo virtual Ethernet (XVE) EoIB devices
Add support for EoIB devices as implemented by Xsigo.  Based on the
public (but out-of-tree) Linux kernel drivers at

  https://oss.oracle.com/git/?p=linux-uek.git;a=log;h=v4.1.12-32.2.1

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-09 08:46:24 +00:00
Michael Brown 5bcaa1e4d4 [infiniband] Make IPoIB support configurable at build time
Add a build configuration option VNIC_IPOIB to control whether or not
IPoIB support is included for Infiniband devices.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-09 08:43:40 +00:00
Michael Brown 076d772648 [infiniband] Retrieve GID flag from cached path entries
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 17:40:52 +00:00
Michael Brown 6a3ffa0114 [infiniband] Assign names to queue pairs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 15:51:53 +00:00
Michael Brown 174bf6b569 [infiniband] Assign names to CMRC connections
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 15:51:19 +00:00
Michael Brown 5a7fd2cc90 [infiniband] Allow for the creation of multicast groups
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:23:30 +00:00
Michael Brown 14ad9cbd67 [infiniband] Parse MLID, rate, and SL from multicast membership record
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:23:30 +00:00
Michael Brown c335f8eae4 [infiniband] Record multicast GID attachment as part of group membership
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:23:30 +00:00
Michael Brown 114a2f19a6 [infiniband] Do not use GRH for local paths
Avoid including an unnecessary GRH in packets sent to unicast
destinations within the local subnet.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:23:24 +00:00
Michael Brown bd1687465c [infiniband] Use correct transaction identifier in CM responses
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown 8336186564 [infiniband] Use connection's local ID as debug message identifier
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown 36c4779356 [infiniband] Use "%d" as format specifier for LIDs
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown 7aef4d4c94 [infiniband] Use "%#lx" as format specifier for queue pair numbers
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown d7794dcac7 [infiniband] Assign names to Infiniband devices for debug messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown ff13eeb747 [infiniband] Add support for performing service record lookups
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:08:58 +00:00
Michael Brown 7544763626 [infiniband] Avoid multiple calls to ib_cmrc_shutdown()
When a CMRC connection is closed, the deferred shutdown process calls
ib_destroy_qp().  This will cause the receive work queue entries to
complete in error (since they are being cancelled), which will in turn
reschedule the deferred shutdown process.  This eventually leads to
ib_destroy_conn() being called on a connection that has already been
freed.

Fix by explicitly cancelling any pending shutdown process after the
shutdown process has completed.

Ironically, this almost exactly reverts commit 019d4c1 ("[infiniband]
Use a one-shot process for CMRC shutdown"); prior to the introduction
of one-shot processes the only way to achieve a one-shot process was
for the process to cancel itself.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-08 12:07:03 +00:00
Michael Brown fcf3b03544 [netdevice] Refuse to create duplicate network device names
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-03-07 21:04:40 +00:00
Michael Brown 4ddd3d99c3 [slam] Avoid potential division by zero
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-27 23:27:47 +00:00
Michael Brown fef8e34b6f [tcp] Guard against malformed TCP options
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-27 23:06:50 +00:00
Michael Brown f0e9e55442 [tftp] Mangle initial slash on TFTP URIs
TFTP URIs are intrinsically problematic, since:

- TFTP servers may use either normal slashes or backslashes as a
  directory separator,

- TFTP servers allow filenames to be specified using relative paths
  (with no initial directory separator),

- TFTP filenames present in a DHCP filename field may use special
  characters such as "?" or "#" that prevent parsing as a generic URI.

As of commit 7667536 ("[uri] Refactor URI parsing and formatting"), we
have directly constructed TFTP URIs from DHCP next-server and filename
pairs, avoiding the generic URI parser.  This eliminated the problems
related to special characters, but indirectly made it impossible to
parse a "tftp://..." URI string into a TFTP URI with a non-absolute
path.

Re-introduce the convention of requiring an extra slash in a
"tftp://..." URI string in order to specify a TFTP URI with an initial
slash in the filename.  For example:

  tftp://192.168.0.1/boot/pxelinux.0  => RRQ "boot/pxelinux.0"
  tftp://192.168.0.1//boot/pxelinux.0 => RRQ "/boot/pxelinux.0"

This is ugly, but there seems to be no other sensible way to provide
the ability to specify all possible TFTP filenames.

A side-effect of this change is that format_uri() will no longer add a
spurious initial "/" when formatting a relative URI string.  This
improves the console output when fetching an image specified via a
relative URI.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-21 18:00:33 +00:00
Andrew Widdersheim 3fd81799ba [netdevice] Add "ifname" setting
Expose the network interface name (e.g. "net0") as a setting.  This
allows a script to obtain the name of the most recently opened network
interface via ${netX/ifname}.

Signed-off-by: Andrew Widdersheim <amwiddersheim@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-18 08:50:44 +00:00
Michael Brown 8af8886d0a [stp] Fix incorrectly disambiguated errors
The three nominally-disambiguated ENOTSUP errors accidentally all used
the same error disambiguator, rendering them identical.  Fix by
changing all three values.  We avoid reusing the 0x01 disambiguator
value, since that remains ambiguous in older binaries.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-14 12:39:35 +00:00
Michael Brown 7c6858e95d [infiniband] Profile post work queue entry operations
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-10 15:44:00 +00:00
Michael Brown 0af0888832 [tftp] Do not change current working URI when TFTP server is cleared
For historical reasons, iPXE sets the current working URI to the root
of the TFTP server whenever the TFTP server address is changed.  This
was originally implemented in the hope of allowing a DHCP-provided
TFTP filename to be treated simply as a relative URI.  This usage
turns out to be impractical since DHCP-provided TFTP filenames may
include characters which would have special significance to the URI
parser, and so the DHCP next-server+filename combination is now
handled by the dedicated pxe_uri() function instead.

The practice of setting the current working URI to the root of the
TFTP server is potentially helpful for interactive uses of iPXE,
allowing a user to type e.g.

  iPXE> dhcp
  Configuring (net0 52:54:00:12:34:56)... ok
  iPXE> chain pxelinux.0

and have the URI "pxelinux.0" interpreted as being relative to the
root of the TFTP server provided via DHCP.

The current implementation of tftp_apply_settings() has an unintended
flaw.  When the "dhcp" command is used to renew a DHCP lease (or to
pick up potentially modified DHCP options), the old settings block
will be unregistered before the new settings block is registered.
This causes tftp_apply_settings() to believe that the TFTP server has
been changed twice (to 0.0.0.0 and back again), and so the current
working URI will always be set to the root of the TFTP server, even if
the DHCP response provides exactly the same TFTP server as previously.

Fix by doing nothing in tftp_apply_settings() whenever there is no
TFTP server address.

Debugged-by: Andrew Widdersheim <awiddersheim@inetu.net>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-09 14:51:21 +00:00
Michael Brown 74c812a68c [http] Handle relative redirection URIs
Resolve redirection URIs as being relative to the original HTTP
request URI, rather than treating them as being implicitly relative to
the current working URI.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2016-01-09 13:20:55 +00:00
Michael Brown ed0d7c4f6f [dhcp] Limit maximum number of DHCP discovery deferrals
For switches which remain permanently in the non-forwarding state (or
which erroneously report a non-forwarding state), ensure that iPXE
will eventually give up waiting for the link to become unblocked.

Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-11-10 14:05:46 +00:00
Michael Brown 7cc7e0ec86 [dhcp] Reset start time when deferring discovery
If we detect (via STP) that a switch port is in a non-forwarding
state, then the link is marked as being temporarily blocked and DHCP
discovery will be deferred until the link becomes unblocked.

The timer used to decide when to give up waiting for ProxyDHCPOFFERs
is currently based on the time that DHCP discovery was started, and
makes no allowances for any time spent waiting for the link to become
unblocked.  Consequently, if STP is used then the timeout for
ProxyDHCPOFFERs becomes essentially zero.

Fix by resetting the recorded start time whenever DHCP discovery is
deferred due to a blocked link.

Debugged-by: Sebastian Roth <sebastian.roth@zoho.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-10-30 13:29:03 +00:00
Michael Brown 3bd0d340f4 [http] Verify server port when reusing a pooled connection
Reported-by: Allen <allen@gtf.org>
Reported-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-10-02 07:54:51 +01:00
Michael Brown 0a4805bf94 [peerdist] Avoid NULL pointer dereference for plaintext blocks
Avoid accidentally dereferencing a NULL cipher context pointer for
plaintext blocks (which are usually messages with a block length of
zero, indicating a missing block).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-29 01:24:36 +01:00
Michael Brown 8baefad659 [tcpip] Avoid generating positive zero for transmitted UDP checksums
TCP/IP checksum fields are one's complement values and therefore have
two possible representations of zero: positive zero (0x0000) and
negative zero (0xffff).

In RFC768, UDP over IPv4 exploits this redundancy to repurpose the
positive representation of zero (0x0000) to mean "no checksum
calculated"; checksums are optional for UDP over IPv4.

In RFC2460, checksums are made mandatory for UDP over IPv4.  The
wording of the RFC is such that the UDP header is mandated to use only
the negative representation of zero (0xffff), rather than simply
requiring the checksum to be correct but allowing for either
representation of zero to be used.

In RFC1071, an example algorithm is given for calculating the TCP/IP
checksum.  This algorithm happens to produce only the positive
representation of zero (0x0000); this is an artifact of the way that
unsigned arithmetic is used to calculate a signed one's complement
sum (and its final negation).

A common misconception has developed (exemplified in RFC1624) that
this artifact is part of the specification.  Many people have assumed
that the checksum field should never contain the negative
representation of zero (0xffff).

A sensible receiver will calculate the checksum over the whole packet
and verify that the result is zero (in whichever representation of
zero happens to be generated by the receiver's algorithm).  Such a
receiver will not care which representation of zero happens to be used
in the checksum field.

However, there are receivers in existence which will verify the
received checksum the hard way: by calculating the checksum over the
remainder of the packet and comparing the result against the checksum
field.  If the representation of zero used by the receiver's algorithm
does not match the representation of zero used by the transmitter (and
so placed in the checksum field), and if the receiver does not
explicitly allow for both representations to compare as equal, then
the receiver may reject packets with a valid checksum.

For UDP, the combined RFCs effectively mandate that we should generate
only the negative representation of zero in the checksum field.

For IP, TCP and ICMP, the RFCs do not mandate which representation of
zero should be used, but the misconceptions which have grown up around
RFC1071 and RFC1624 suggest that it would be least surprising to
generate only the positive representation of zero in the checksum
field.

Fix by ensuring that all of our checksum algorithms generate only the
positive representation of zero, and explicitly inverting this in the
case of transmitted UDP packets.

Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-10 14:46:54 +01:00
Michael Brown be51713474 [pxe] Populate ciaddr in fake PXE Boot Server ACK packet
We currently do not populate the ciaddr field in the constructed PXE
Boot Server ACK packet.  This causes a WDS server to respond with a
broadcast packet, which is then ignored by wdsmgfw.efi since it does
not match the specified IP address filter.

Fix by populating ciaddr within the constructed PXE Boot Server ACK
packet.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-01 21:24:02 +01:00
Michael Brown 8430642642 [tcpip] Allow supported address families to be detected at runtime
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-09-01 21:04:45 +01:00
Michael Brown f0c6c4efd8 [dhcp] Do not skip ProxyDHCPREQUEST if next-server is empty
We attempt to mimic the behaviour of Intel's PXE ROM by skipping the
separate ProxyDHCPREQUEST if the ProxyDHCPOFFER already contains a
boot filename or a PXE boot menu.

Experimentation reveals that Intel's PXE ROM will also check for a
non-empty next-server address alongside the boot filename.  Update our
test to match this behaviour.

Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-26 16:08:58 +01:00
Michael Brown ba3695353a [settings] Re-add "uristring" setting type
Commit 09b057c ("[settings] Remove "uristring" setting type") removed
support for URI-encoded settings via the "uristring" setting type, on
the basis that such encoding was no longer necessary to avoid problems
with the command line parser.

Other valid use cases for the "uristring" setting type do exist: for
example, a password containing a '/' character expanded via

  chain http://username:${password:uristring}@server.name/boot.php

Restore the existence of the "uristring" setting, avoiding the
potentially large stack allocations that were used in the old code
prior to commit 09b057c ("[settings] Remove "uristring" setting
type").

Requested-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-25 13:31:46 +01:00
Michael Brown 0a34c2aab9 [dhcp] Ignore ProxyDHCPACKs without PXE options
Suggested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-18 17:18:38 +01:00
Michael Brown 60e2b71471 [dhcp] Allow pseudo-DHCP servers to use pseudo-identifiers
Some ProxyDHCP servers and PXE boot servers do not specify a DHCP
server identifier via option 54.  We currently work around this in a
variety of ad-hoc ways:

 - if a ProxyDHCPACK has no server identifier then we treat it as
   having the correct server identifier,

 - if a boot server ACK has no server identifier then we use the
   packet's source IP address as the server identifier.

Introduce the concept of a DHCP server pseudo-identifier, defined as
being:

 - the server identifier (option 54), or

 - if there is no server identifier, then the next-server address
   (siaddr),

 - if there is no server identifier or next-server address, then the
   DHCP packet's source IP address.

Use the pseudo-identifier in place of the server identifier when
handling ProxyDHCP and PXE boot server responses.

Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-18 15:43:06 +01:00
Wissam Shoukair eb8df9a046 [ipoib] Fix a race when chain-loading undionly.kpxe in IPoIB
The Infiniband link status change callback ipoib_link_state_changed()
may be called while the IPoIB device is closed, in which case there
will not be an IPoIB queue pair to be joined to the IPv4 broadcast
group.  This leads to NULL pointer dereferences in ib_mcast_attach()
and ib_mcast_detach().

Fix by not attempting to join (or leave) the broadcast group unless we
actually have an IPoIB queue pair.

Signed-off-by: Wissam Shoukair <wissams@mellanox.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 14:42:36 +01:00
Michael Brown fd18417cf1 [peerdist] Add support for PeerDist (aka BranchCache) HTTP content encoding
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 13:24:40 +01:00
Michael Brown d2b2a0adae [peerdist] Add block download multiplexer
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 13:24:39 +01:00
Michael Brown 4d032d5db8 [peerdist] Add individual block download mechanism
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 13:24:39 +01:00
Michael Brown dc9d24e7d2 [peerdist] Add segment discovery mechanism
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 13:24:39 +01:00
Michael Brown 518a98eb56 [http] Rewrite HTTP core to support content encodings
Rewrite the HTTP core to allow for the addition of arbitrary content
encoding mechanisms, such as PeerDist and gzip.

The core now exposes http_open() which can be used to create requests
with an explicitly selected HTTP method, an optional requested content
range, and an optional request body.  A simple wrapper provides the
preexisting behaviour of creating either a GET request or an
application/x-www-form-urlencoded POST request (if the URI includes
parameters).

The HTTP SAN interface is now implemented using the generic block
device translator.  Individual blocks are requested using http_open()
to create a range request.

Server connections are now managed via a connection pool; this allows
for multiple requests to the same server (e.g. for SAN blocks) to be
completely unaware of each other.  Repeated HTTPS connections to the
same server can reuse a pooled connection, avoiding the per-connection
overhead of establishing a TLS session (which can take several seconds
if using a client certificate).

Support for HTTP SAN booting and for the Basic and Digest
authentication schemes is now optional and can be controlled via the
SANBOOT_PROTO_HTTP, HTTP_AUTH_BASIC, and HTTP_AUTH_DIGEST build
configuration options in config/general.h.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-17 13:24:33 +01:00
Michael Brown b1caa48e4b [crypto] Support SHA-{224,384,512} in X.509 certificates
Add support for SHA-224, SHA-384, and SHA-512 as digest algorithms in
X.509 certificates, and allow the choice of public-key, cipher, and
digest algorithms to be configured at build time via config/crypto.h.

Originally-implemented-by: Tufan Karadere <tufank@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-02 16:54:24 +01:00
Michael Brown fc7885ed9e [tls] Report supported signature algorithms in ClientHello
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-02 14:17:24 +01:00
Michael Brown 1ac7434111 [tls] Do not access beyond the end of a 24-bit integer
The current implementation handles big-endian 24-bit integers (which
occur in several TLS record types) by treating them as big-endian
32-bit integers which are shifted by 8 bits.  This can result in
"Invalid read" errors when running under valgrind, if the 24-bit field
happens to be exactly at the end of an I/O buffer.

Fix by ensuring that we touch only the three bytes which comprise the
24-bit integer.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-08-01 00:06:58 +01:00
Michael Brown 51b99d8bc8 [peerdist] Add support for constructing and decoding discovery messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 16:09:14 +01:00
Michael Brown f0d594557c [peerdist] Include trimmed range within content information block
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 15:22:26 +01:00
Michael Brown b20d4a1522 [netdevice] Allow network devices to disclaim IRQ support at runtime
VLAN and 802.11 devices use a network device operations structure that
wraps an underlying structure.  For example, the vlan_operations
structure wraps the network device operations structure of the
underlying trunk device.  This can cause false positives from the
current implementation of netdev_irq_supported(), which will always
report that VLAN devices support interrupts since it has no visibility
into the support provided by the underlying trunk device.

Fix by allowing network devices to explicitly flag that interrupts are
not supported, despite the presence of an irq() method.

Originally-fixed-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 15:14:40 +01:00
Michael Brown 76338543f9 [iscsi] Add missing "break" statements
iscsi_tx_done() is missing "break" statements at the end of each case.
(Fortunately, this happens not to cause a bug in practice, since
iscsi_login_request_done() is effectively a no-op when completing a
data-out PDU.)

Reported-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 14:15:14 +01:00
Michael Brown 2bcf13f13a [ipv4] Allow IPv4 socket addresses to include a scope ID
Extend the IPv6 concept of "scope ID" (indicating the network device
index) to IPv4 socket addresses, so that IPv4 multicast transmissions
may specify the transmitting network device.

The scope ID is not (currently) exposed via the string representation
of the socket address, since IPv4 does not use the IPv6 concept of
link-local addresses (which could legitimately be specified in a URI).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 13:48:29 +01:00
Michael Brown 6efcabd415 [ipv4] Redefine IP address constants to avoid unnecessary byte swapping
Redefine various IPv4 address constants and testing macros to avoid
unnecessary byte swapping at runtime, and slightly rename the macros
to prevent code from accidentally using the old definitions.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 13:48:29 +01:00
Michael Brown 9c185e2eac [netdevice] Avoid using zero as a network device index
Avoid using zero as a network device index, so that a zero
sin6_scope_id can be used to mean "unspecified" (rather than
unintentionally meaning "net0").

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 13:48:29 +01:00
Michael Brown 41670ca2fe [ipv6] Treat a missing network device name as "netX"
When an IPv6 socket address string specifies a link-local or multicast
address but does not specify the requisite network device name
(e.g. "fe80::69ff:fe50:5845" rather than "fe80::69ff:fe50:5845%net0"),
assume the use of "netX".

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-28 13:48:23 +01:00
Michael Brown 1a30c20daf [802.11] Use correct SHA1_DIGEST_SIZE constant name
The constant SHA1_SIZE is defined only as part of the imported AXTLS code.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-27 15:59:10 +01:00
Michael Brown cbbd6b761e [xferbuf] Generalise to handle umalloc()-based buffers
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22 21:17:47 +01:00
Michael Brown d0325b1da6 [fault] Generalise NETDEV_DISCARD_RATE fault injection mechanism
Provide a generic inject_fault() function that can be used to inject
random faults with configurable probabilities.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22 21:17:47 +01:00
Michael Brown 9546b0c17b [tcp] Ensure FIN is actually sent if connection is closed while idle
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-22 21:16:40 +01:00
Michael Brown 38afcc51ea [tcp] Gracefully close connections during shutdown
We currently do not wait for a received FIN before exiting to boot a
loaded OS.  In the common case of booting from an HTTP server, this
means that the TCP connection is left consuming resources on the
server side: the server will retransmit the FIN several times before
giving up.

Fix by initiating a graceful close of all TCP connections and waiting
(for up to one second) for all connections to finish closing
gracefully (i.e. for the outgoing FIN to have been sent and ACKed, and
for the incoming FIN to have been received and ACKed at least once).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-07-04 12:51:23 +01:00
Michael Brown 8829634bd7 [ipoib] Attempt to generate ARPs as needed to repopulate REMAC cache
The only way to map an eIPoIB MAC address (REMAC) to an IPoIB MAC
address is to intercept an incoming ARP request or reply.

If we do not have an REMAC cache entry for a particular destination
MAC address, then we cannot transmit the packet.  This can arise in at
least two situations:

 - An external program (e.g. a PXE NBP using the UNDI API) may attempt
   to transmit to a destination MAC address that has been obtained by
   some method other than ARP.

 - Memory pressure may have caused REMAC cache entries to be
   discarded.  This is fairly likely on a busy network, since REMAC
   cache entries are created for all received (broadcast) ARP
   requests.  (We can't sensibly avoid creating these cache entries,
   since they are required in order to send an ARP reply, and when we
   are being used via the UNDI API we may have no knowledge of which
   IP addresses are "ours".)

Attempt to ameliorate the situation by generating a semi-spurious ARP
request whenever we find a missing REMAC cache entry.  This will
hopefully trigger an ARP reply, which would then provide us with the
information required to populate the REMAC cache.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-29 14:50:16 +01:00
Michael Brown d73982f098 [dhcp] Defer discovery if link is blocked
If the link is blocked (e.g. due to a Spanning Tree Protocol port not
yet forwarding packets) then defer DHCP discovery until the link
becomes unblocked.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 17:32:24 +01:00
Michael Brown 94dbfb4374 [stp] Fix interpretaton of hello time
Times in STP packets are expressed in units of 1/256 of a second.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 17:32:24 +01:00
Michael Brown fb28c4a979 [stp] Add support for detecting Spanning Tree Protocol non-forwarding ports
A fairly common end-user problem is that the default configuration of
a switch may leave the port in a non-forwarding state for a
substantial length of time (tens of seconds) after link up.  This can
cause iPXE to time out and give up attempting to boot.

We cannot force the switch to start forwarding packets sooner, since
any attempt to send a Spanning Tree Protocol bridge PDU may cause the
switch to disable our port (if the switch happens to have the Bridge
PDU Guard feature enabled for the port).

For non-ancient versions of the Spanning Tree Protocol, we can detect
whether or not the port is currently forwarding and use this to inform
the network device core that the link is currently blocked.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 16:58:38 +01:00
Michael Brown f3812395a2 [netdevice] Add a generic concept of a "blocked link"
When Spanning Tree Protocol (STP) is used, there may be a substantial
delay (tens of seconds) from the time that the link goes up to the
time that the port starts forwarding packets.

Add a generic concept of a "blocked link" (i.e. a link which is up but
which is not expected to communicate successfully), and allow "ifstat"
to indicate when a link is blocked.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 16:46:47 +01:00
Michael Brown 7e7870984b [ethernet] Add minimal support for receiving LLC frames
In some Ethernet framing variants the two-byte protocol field is used
as a length, with the Ethernet header being followed by an IEEE 802.2
LLC header.  The first two bytes of the LLC header are the DSAP and
SSAP.

If the received Ethernet packet appears to use this framing, then
interpret the two-byte DSAP and SSAP as being the network-layer
protocol.  This allows support for receiving Spanning Tree Protocol
frames (which use an LLC header with {DSAP,SSAP}=0x4242) to be added
without requiring a full LLC protocol layer.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 15:28:42 +01:00
Michael Brown c117b25e0b [tcp] Do not shrink window when discarding received packets
We currently shrink the TCP window permanently if we are ever forced
(by a low-memory condition) to discard a previously received TCP
packet.  This behaviour was intended to reduce the number of
retransmissions in a lossy network, since lost packets might
potentially result in the entire window contents being retransmitted.

Since commit e0fc8fe ("[tcp] Implement support for TCP Selective
Acknowledgements (SACK)") the cost of lost packets has been reduced by
around one order of magnitude, and the reduction in the window size
(which affects the maximum throughput) is now the more significant
cost.

Remove the code which reduces the TCP maximum window size when a
received packet is discarded.

Reported-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-06-25 10:20:48 +01:00
Michael Brown 15759e539e [neighbour] Return success when deferring a packet
Deferral of a packet for neighbour discovery is not really an error.
If we fail to discover a neighbour then the failure will eventually be
reported by the call to neighbour_destroy() when any outstanding I/O
buffers are discarded.

The current behaviour breaks PXE booting on FreeBSD, which seems to
treat the error return from PXENV_UDP_WRITE as a fatal error and so
never proceeds to poll PXENV_UDP_READ (and hence never allows iPXE to
receive the ARP reply and send the deferred UDP packet).

Change neighbour_tx() to return success when deferring a packet.  This
fixes interoperability with FreeBSD and removes transient neighbour
cache misses from the "ifstat" error output, while leaving genuine
neighbour discovery failures visible via "ifstat" (once neighbour
discovery times out, or the interface is closed).

Debugged-by: Wissam Shoukair <wissams@mellanox.com>
Tested-by: Wissam Shoukair <wissams@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-05-20 15:29:36 +01:00
Michael Brown 86aa959561 [ipv6] Disambiguate received ICMPv6 errors
Originally-implemented-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-05-11 12:45:14 +01:00
Michael Brown 1205721cbd [base64] Add buffer size parameter to base64_encode() and base64_decode()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24 15:32:04 +01:00
Michael Brown 9aa8090d06 [base16] Add buffer size parameter to base16_encode() and base16_decode()
The current API for Base16 (and Base64) encoding requires the caller
to always provide sufficient buffer space.  This prevents the use of
the generic encoding/decoding functionality in some situations, such
as in formatting the hex setting types.

Implement a generic hex_encode() (based on the existing
format_hex_setting()), implement base16_encode() and base16_decode()
in terms of the more generic hex_encode() and hex_decode(), and update
all callers to provide the additional buffer length parameter.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24 14:41:32 +01:00
Christian Hesse bf40b79734 [build] Add missing "const" qualifiers
This fixes "initialization discards 'const' qualifier from pointer
target type" warnings with GCC 5.1.0.

Signed-off-by: Christian Hesse <mail@eworm.de>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-24 13:03:28 +01:00
Michael Brown d9166bbcae [peerdist] Add support for decoding PeerDist Content Information
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-04-13 12:26:05 +01:00
Michael Brown c492a9fd92 [netdevice] Add missing bus types to netdev_fetch_bustype()
Reported-by: Robin Smidsrød <robin@smidsrod.no>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-18 16:42:39 +00:00
Michael Brown 57bab4e1d3 [tcpip] Fix dubious calculation of min_port
Detected using sparse.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-13 10:19:44 +00:00
Michael Brown e0fc8fe781 [tcp] Implement support for TCP Selective Acknowledgements (SACK)
The TCP Selective Acknowledgement option (specified in RFC2018)
provides a mechanism for the receiver to indicate packets that have
been received out of order (e.g. due to earlier dropped packets).

iPXE often operates in environments in which there is a high
probability of packet loss.  For example, the legacy USB keyboard
emulation in some BIOSes involves polling the USB bus from within a
system management interrupt: this introduces an invisible delay of
around 500us which is long enough for around 40 full-length packets to
be dropped.  Similarly, almost all 1Gbps USB2 devices will eventually
end up dropping packets because the USB2 bus does not provide enough
bandwidth to sustain a 1Gbps stream, and most devices will not provide
enough internal buffering to hold a full TCP window's worth of
received packets.

Add support for sending TCP Selective Acknowledgements.  This provides
the sender with more detailed information about which packets have
been lost, and so allows for a more efficient retransmission strategy.

We include a SACK-permitted option in our SYN packet, since
experimentation shows that at least Linux peers will not include a
SACK-permitted option in the SYN-ACK packet if one was not present in
the initial SYN.  (RFC2018 does not seem to mandate this behaviour,
but it is consistent with the approach taken in RFC1323.)  We ignore
any received SACK options; this is safe to do since SACK is only ever
advisory and we never have to send non-trivial amounts of data.

Since our TCP receive queue is a candidate for cache discarding under
low memory conditions, we may end up discarding data that has been
reported as received via a SACK option.  This is permitted by RFC2018.
We follow the stricture that SACK blocks must not report data which is
no longer held by the receiver: previously-reported blocks are
validated against the current receive queue before being included
within the current SACK block list.

Experiments in a qemu VM using forced packet drops (by setting
NETDEV_DISCARD_RATE to 32) show that implementing SACK improves
throughput by around 400%.

Experiments with a USB2 NIC (an SMSC7500) show that implementing SACK
improves throughput by around 700%, increasing the download rate from
35Mbps up to 250Mbps (which is approximately the usable bandwidth
limit for USB2).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-11 23:14:39 +00:00
Michael Brown 042a982c4d [http] Support MD5-sess Digest authentication
Microsoft IIS supports only MD5-sess for Digest authentication.

Requested-by: Andreas Hammarskjöld <junior@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-09 13:45:09 +00:00
Michael Brown 42ea20afee [http] Abstract out HTTP Digest hash algorithm operations
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-09 13:21:27 +00:00
Michael Brown 1a4e94a828 [legal] Relicense files under GPL2_OR_LATER_OR_UBDL
Relicense files with kind permission from

    Stefan Hajnoczi <stefanha@redhat.com>

alongside the contributors who have already granted such relicensing
permission.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05 11:40:13 +00:00
Michael Brown 93b4586447 [retry] Colourise debug output
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05 11:25:54 +00:00
Michael Brown 47ad8fc1ba [retry] Rewrite unrelicensable portions of retry.c
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05 11:06:03 +00:00
Michael Brown fbc4ba4b4e [build] Fix the REQUIRE_SYMBOL mechanism
At some point in the past few years, binutils became more aggressive
at removing unused symbols.  To function as a symbol requirement, a
relocation record must now be in a section marked with @progbits and
must not be in a section which gets discarded during the link (either
via --gc-sections or via /DISCARD/).

Update REQUIRE_SYMBOL() to generate relocation records meeting these
criteria.  To minimise the impact upon the final binary size, we use
existing symbols (specified via the REQUIRING_SYMBOL() macro) as the
relocation targets where possible.  We use R_386_NONE or R_X86_64_NONE
relocation types to prevent any actual unwanted relocation taking
place.  Where no suitable symbol exists for REQUIRING_SYMBOL() (such
as in config.c), the macro PROVIDE_REQUIRING_SYMBOL() can be used to
generate a one-byte-long symbol to act as the relocation target.

If there are versions of binutils for which this approach fails, then
the fallback will probably involve killing off REQUEST_SYMBOL(),
redefining REQUIRE_SYMBOL() to use the current definition of
REQUEST_SYMBOL(), and postprocessing the linked ELF file with
something along the lines of "nm -u | wc -l" to check that there are
no undefined symbols remaining.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05 00:59:38 +00:00
Michael Brown 86ae6e6c18 [build] Use REQUIRE_OBJECT() to drag in per-object configuration
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-05 00:57:44 +00:00
Michael Brown c1ac466838 [iscsi] Rewrite unrelicensable portions of iscsi.c
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-02 20:40:31 +00:00
Michael Brown 01d16d821f [libc] Rewrite byte-swapping code
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-02 16:35:37 +00:00
Michael Brown 2f020a8df3 [legal] Relicense files under GPL2_OR_LATER_OR_UBDL
These files cannot be automatically relicensed by util/relicense.pl
since they either contain unusual but trivial contributions (such as
the addition of __nonnull function attributes), or contain lines
dating back to the initial git revision (and so require manual
knowledge of the code's origin).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-02 16:35:29 +00:00
Michael Brown 626ccf76ea [legal] Relicense files under GPL2_OR_LATER_OR_UBDL
Relicence files with kind permission from the following contributors:

  Alex Williamson <alex.williamson@redhat.com>
  Eduardo Habkost <ehabkost@redhat.com>
  Greg Jednaszewski <jednaszewski@gmail.com>
  H. Peter Anvin <hpa@zytor.com>
  Marin Hannache <git@mareo.fr>
  Robin Smidsrød <robin@smidsrod.no>
  Shao Miller <sha0.miller@gmail.com>
  Thomas Horsten <thomas@horsten.com>

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-02 14:50:42 +00:00
Michael Brown b6ee89ffb5 [legal] Relicense files under GPL2_OR_LATER_OR_UBDL
Relicense files for which I am the sole author (as identified by
util/relicense.pl).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-03-02 14:17:31 +00:00
Alex Williamson 47aebc24d3 [dhcp] Extract timing parameters out to config/dhcp.h
iPXE uses DHCP timeouts loosely based on values recommended by the
specification, but often abbreviated to reduce timeouts for reliable
and/or simple network topologies.  Extract the DHCP timing parameters
to config/dhcp.h and document them.  The resulting default iPXE
behavior is exactly the same, but downstreams are now afforded the
opportunity to implement spec-compliant behavior via config file
overrides.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-02-25 16:58:43 +00:00
Michael Brown bb1abb2b21 [ipv4] Rewrite inet_aton()
The implementation of inet_aton() has an unknown provenance.  Rewrite
this code to avoid potential licensing uncertainty.

Also move the code from core/misc.c to its logical home in net/ipv4.c,
and add a few extra test cases.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-02-19 14:02:07 +00:00
Michael Brown 095c007aa3 [legal] Add missing copyright header to net/ipv4.c
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-02-18 14:16:59 +00:00
Michael Brown f3725a86e0 [rndis] Add rndis_rx_err()
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-02-11 17:26:51 +00:00
Michael Brown 2dfdcae938 [tftp] Explicitly abort connection whenever parent interface is closed
Fetching the TFTP file size is currently implemented via a custom
"tftpsize://" protocol hack.  Generalise this approach to instead
close the TFTP connection whenever the parent data-transfer interface
is closed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2015-02-06 12:08:54 +00:00