We seem to be having issues with various E820 memory maps. These
problems are often difficult to reproduce, requiring access to the
specific system exhibiting the problem.
Add a facility for hooking in a fake E820 map generator, using an
arbitrary map defined in a C array, solely in order to be able to test
the map-mangling code against arbitrary E820 maps.
In particular, allow BANNER_TIMEOUT=0 to inhibit the prompt banners
altogether.
Ironically, this request comes from the same OEM that originally
required the prompts to be present during POST.
Some really moronic BIOSes bring up the PXE stack via the UNDI loader
entry point during POST, and then don't bother to unload it before
overwriting the code and data segments. If this happens, we really
don't want to leave INT 15 hooked, because that will cause any loaded
OS to die horribly as soon as it attempts to fetch the system memory
map.
We use a heuristic to detect whether or not we are being loaded at the
top of free base memory. If we determine that we are being loaded at
some other arbitrary location in base memory, then we assume that it's
not safe to hook INT 15.
On non-BBS systems we hook INT 19, since there is no other way we can
guarantee gaining control of the flow of execution. If we end up
doing this, prompt the user before attempting boot, since forcibly
capturing INT 19 is rather antisocial.
If the INT 15,e820 memory map reports a region [0,0), this confuses
the "truncate to even megabytes" logic, which ends up rounding the
region 'down' to [0,fff00000).
Fix by ensuring that the region's end address is at least 1, before we
subtract 1 to obtain the "last byte in region" address.
INT 15,e801 is capable of returning a memory range that extends to
4GB, so allow for this in the debug message that shows the data
returned by INT 15,e801.
Apparently some BIOSes will place option ROMs on 512-byte boundaries.
While this is against specification, it doesn't actually hurt
anything, so we may as well increase our scan granularity to 512
bytes.
Contributed by Luca <lucarx76@gmail.com>
Wyse Streaming Manager server (WLDRM13.BIN) assumes that the PXENV+
entry point is at UNDI_CS:0000; apparently, somebody at Wyse has
difficulty distinguishing between the words "may" and "must"...
Add a dummy entry point at UNDI_CS:0000, which just jumps to the
correct entry point.
The multiboot specification states that, for raw images, if
load_end_addr is zero then it should be interpreted as meaning "use
the entire file", and if bss_end_addr is zero it should be interpreted
as meaning "no bss".
Explicitly state that we are using 32-bit addressing in 16-bit code.
GNU as 2.15 (FreeBSD/amd64 7-STABLE) got confused that 32-bit registers
are used in the code that was declared as 16-bit. Add explicit modifier
'addr32' to make assembler happy.
Signed-off-by: Eygene Ryabinkin <rea-fbsd@codelabs.ru>
IBM's iSCSI Firmware Initiator checks the UNDIROMID pointer in the
!PXE structure that gets created by the UNDI loader. We didn't
previously fill this value in.
Include PMM allocation result in POST banner.
Include full product string in "starting execution" message.
Also mark ourselves as supporting DDIM in PnP header, for
completeness.
On a system that doesn't support BBS, we end up hooking INT19 to gain
control of the boot process. If the system is PCI3.0, we must take
care to use the runtime value for %cs, rather than the POST-time
value, otherwise we end up pointing INT19 to the temporary option ROM
POST scratch area.
Allow for an arbitrary number of splits of the system memory map via
INT 15,e820.
Features of the new map-mangling algorithm include:
Supports random access to e820 map entries.
Requires only sequential access support from the underlying e820
map, even if our caller uses random access.
Empty regions will always be stripped.
Always terminates with %ebx=0, even if the underlying map terminates
with CF=1.
Allows for an arbitrary number of hidden regions, with underlying
regions split into as many subregions as necessary.
Total size increase to achieve this is 193 bytes.
Define a list of N allowed memory regions, and split each underlying
e820 region into up to N subregions. Strip resulting empty regions
out of the map, avoiding using the "return with CF set to strip last
empty region" trick, because it seems that bootmgr.exe in Win2k8 gets
upset if the memory map is terminated with CF set.
This is an intermediate checkin that defines a single allowed memory
region covering the entire 64-bit address space, and uses the existing
map-mangling code on top of the new region-splitting code. This
sanitises the memory map to the point that Win2k8 is able to boot even
on a system that defines a final zero-length region at the 4GB mark.
I'm checking this in because it may be useful for future debugging
efforts to be able to run with the existing and known-working map
mangling code together with the map sanitisation capabilities of the
new map mangling code.
H. Peter Anvin <hpa@zytor.com> sent word that Sergey Vlasov
<vsu@altlinux.ru> discovered gPXE lkrn images fail to load in SYSLINUX
3.70 because we have initrd_addr_max zeroed. This patch sets the same
value as the Linux kernel.
Also change the header jmp instruction to use a hardcoded opcode value
like Linux does. Just in case the assembler decides to use a three-byte
instruction instead of the desired two-byte jmp.
Add yet another ugly hack to iscsiboot.c, this time to allow the user to
inhibit the shutdown/removal of the iSCSI INT13 device (and the network
devices, since they are required for the iSCSI device to function).
On the plus side, the fact that shutdown() now takes flags to
differentiate between shutdown-for-exit and shutdown-for-boot means that
another ugly hack (to allow returning via the PXE stack on BIOSes that
have broken INT 18 calls) will be easier.
I feel dirty.
Shifting all INT13 drive numbers causes problems on systems that use a
sparse drive number space (e.g. qemu BIOS, which uses 0xe0 for the CD-ROM
drive).
The strategy now is:
Each drive is assigned a "natural" drive number, being the next
available drive number in the system (based on the BIOS drive count).
Each drive is accessed using its specified drive number. If the
specified drive number is -1, the natural drive number will be used.
Accesses to the specified drive number will be delivered to the
emulated drive, masking out any preexisting drive using this number.
Accesses to the natural drive number, if different, will be remapped to
the masked-out drive.
The overall upshot is that, for examples:
System has no drives. Emulated INT13 drive gets natural number 0x80
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive, and there is no remapping.
System has one drive. Emulated INT13 drive gets natural number 0x81
and specified number 0x80. Accesses to drive 0x80 go to the emulated
drive. Accesses to drive 0x81 get remapped to the original drive 0x80.
We can just treat all non-kernel images as initrds, which matches our
behaviour for multiboot kernels. This allows us to eliminate initrd as
an image type, and treat the "initrd" command as just another synonym for
"imgfetch".
__from_data16 and __from_text16 now take a pointer to a
.data16/.text16 variable, and return the real-mode offset within the
appropriate segment. This matches the use case for every occurrence
of these macros, and prevents potential future bugs such as that fixed
in commit d51d80f. (The bug arose essentially because "&pointer" is
still syntactically valid.)
When the 16-bit segment registers are accessed using 32-bit instructions
the high order bytes are undefined on older CPUs. We now explicitly
zero the high order bytes when snapshotting the CPU state. This ensures
that the GDB stub reports consistent values for the segment registers.
Commit fd0aef9 introduced a typo that caused PMM detection to start at
paragraph 0xe00 rather than 0xe000. (Detection would still work, since it
would scan until it ran out of base memory, but it would end up scanning
an unnecessarily large portion of base memory.)
Spotted by Sebastian Herbszt <herbszt@gmx.de>.
Send a null command, specifically "pulse outputs" with no outputs
selected, to the KBC after changing A20. This was apparently done by DOS,
presumably as a synchronization hack, and the authors of the UHCI spec
thought it was inherent. Therefore, there are systems out there (e.g. HP
DL360 G5) which will stop responsing to "legacy USB" unless they see the
null command, 0xFF, written to port 0x64 at the end of the A20 toggling
sequence.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the BIOS doesn't support BBS, hooking INT 19 is the only way to add
ourselves as a boot device. If we have to do this, we should at least
try to chain to the original INT 19 vector if our boot fails.
Idea suggested by Andrew Schran <aschran@google.com>
2.6.22+ kernels have an extra field in the bzimage_header structure to
indicate the maximum permitted command-line length. Use this if it is
available.
A bug in read_smbios_string() was causing the starting offset of the
SMBIOS structure to be added twice, resulting in completely the wrong
strings being returned.
Bug identified by Martin Herweg <m.herweg@gmx.de>
We never set up specific multicast filters; native drivers will ask
the card to receive all multicast packets. The only way to achieve
this via the UNDI API is to enable promiscuous mode.
Delete ELF as a generic image type. The method for invoking an
ELF-based image (as well as any tables that must be set up to allow it
to boot) will always depend on the specific architecture. core/elf.c
now only provides the elf_load() function, to avoid duplicating
functionality between ELF-based image types.
Add arch/i386/image/elfboot.c, to handle the generic case of 32-bit
x86 ELF images. We don't currently set up any multiboot tables, ELF
notes, etc. This seems to be sufficient for loading kernels generated
using both wraplinux and coreboot's mkelfImage.
Note that while Etherboot 5.4 allowed ELF images to return, we don't.
There is no callback mechanism for the loaded image to shut down gPXE,
which means that we have to shut down before invoking the image. This
means that we lose device state, protection against being trampled on,
etc. It is not safe to continue afterwards.
The GDBSYM config.h option was an attempt at QEMU GDB debugging. I have
removed the code since it is unused and may confuse people wanting to
use the GDB stub.
The ROM prefix now prompts the user to enter the gPXE shell during POST;
this allows for configuring gPXE without needing to attempt to boot from
it. (It also slows down system boot by three seconds per gPXE ROM, but
hey.)
This is apparently a certain OEM's requirement for option ROMs.
Add ability for network devices to flag link up/down state to the
networking core.
Autobooting code will now wait for link-up before attempting DHCP.
IPoIB reflects the Infiniband link state as the network device link state
(which is not strictly correct; we also need a succesful IPoIB IPv4
broadcast group join), but is probably more informative.
PXE is a catch-all image format with no signature checks. If an
unsupported image file is loaded, it will be treated as a PXE image. In
most cases, the image will be too large to be loaded as a PXE image (which
has to fit in base memory), so the error returned to the user will be that
the segment could not fit within the memory region.
Add an explicit check to pxe_image.c to reject images larger than base
memory with ENOEXEC.
Add ENOEXEC to the error string table.
Allow for settings to be described by something other than a DHCP option
tag if desirable. Currently used only for the MAC address setting.
Separate out fake DHCP packet creation code from dhcp.c to fakedhcp.c.
Remove notion of settings from dhcppkt.c.
Rationalise dhcp.c to use settings API only for final registration of the
DHCP options, rather than using {store,fetch}_setting throughout.
Add dedicated functions create_dhcpdiscover(), create_dhcpack() and
create_proxydhcpack() for use by external code such as the PXE preboot
code.
Register ProxyDHCP options under the global scope "proxydhcp".
Unregister previously-acquired DHCP and ProxyDHCP settings when DHCP
succeeds.
When PMM is used, the gPXE image source will no longer be in base memory.
Decompression of .text16 and .data16 can therefore no longer be done in
real mode.
Use BBS installation check to see if we need to hook INT19 even on a PnP
BIOS.
Verify that $PnP signature is paragraph-aligned; bochs/qemu BIOS provides
a dummy $PnP signature with no valid entry point, and deliberately
unaligns the signature to indicate that it is not properly valid.
Print message if INT19 is hooked.
Attempt to use PMM even if BBS check failed.
ROM initialisation vector now attempts to allocate a 2MB block using
PMM. If successful, it copies the ROM image to this block, then
shrinks the ROM image to allow for more option ROMs. If unsuccessful,
it leaves the ROM as-is.
ROM BEV now attempts to return to the BIOS, resorting to INT 18 only
if the BIOS stack has been corrupted.
This allows pxelinux to execute arbitrary gPXE commands. This is
remarkably unsafe (not least because some of the commands will assume
full ownership of memory and do nasty things like edit the e820 map
underneath the calling pxelinux), but it does allow access to the
"sanboot" command.
Replace a printf with a DBG in timer_rtdsc.c
Replace a printf in timer.c with assert
Return proper error codes from timer drivers
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Timer subsystem initialization code in core/timer.c
Split the BIOS and RTDSC timer drivers from i386_timer.c
Split arch/i386/firmware/pcbios/bios.c into the RTSDC
timer driver and arch/i386/core/nap.c
Split the headers properly:
include/unistd.h - delay functions to be used by the
gPXE core and drivers.
include/gpxe/timer.h - the fimer subsystem interface
to be used by the timer drivers
and currticks() to be used by
the code gPXE subsystems.
include/latch.h - removed
include/timer.h - scheduled for removal. Some driver
are using currticks, which is
only for core subsystems.
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
Since we don't know what the UNDI code does, it is safest to
save/restore %eflags even though the lower half of %eflags is
automatically saved by the interrupt itself.
As written, if the if the UNDI ISR call clobbers the upper halves of
any of the GPRs (which by convention it is permitted to do, and by
paranoia should be expected to do) then nothing in the interrupt
handler will recover the state.
Additionally, save/restore %fs and %gs out of sheer paranoia - it's a
cheap enough operation, and may prevent problems due to poorly written
UNDI stacks.
_textdata_link_addr, _load_addr and _max_align in the linker scripts.
A bug in some versions of ld causes segfaults if the DEFINED() macro
is used in a linker script *and* the -Map option to ld is present.
We don't currently need to override any of these values; if we need to
do so in future then the solution will probably be to always specify
the values on the ld command line, and have the linker script not
define them at all.
memory map. (We achieve this by setting CF on the last entry if it is
zero-length; this avoids the need to look ahead to see at each entry
if the *next* entry would be both the last entry and zero-length).
This fixes the "0kB base memory" error message upon starting Windows
2003 on a SunFire X2100.
byte, rather than the number of permissible bytes (i.e. subtract one
from the value under the previous definition to get the value under
the new definition).
This avoids integer overflow on 64-bit kernels, where
bzhdr.initrd_addr_max may be 0xffffffffffffffff; under the old
behaviour we set mem_limit equal to initrd_addr_max+1, which meant it
ended up as zero. Kernel loads would fail with ENOBUFS.
Experimentation reveals that gcc ignores -mrtd for the implicit
arithmetic functions (e.g. __udivdi3), but not for the implicit
memcpy() and memset() functions. Mark the implicit arithmetic
functions with __attribute__((cdecl)) to compensate for this.
(Note: we cannot mark with with __cdecl, because we define __cdecl to
incorporate regparm(0) as well.)
us to round down the size for the relocation copy to the nearest 64kB
(+0x10 bytes); this just happened to work on most machines because the
last 64kB of the image is all-zeroes anyway (it's the .bss).
link-time check for section overlaps. (In order to avoid wasting
space in the executable image, .bss16 will overlap with the following
section, which is .text).
number of (potentially very slow) gateA20_set operations.
Die with a fatal error if we are unable to set gate A20; if this fails
then we are bound to experience memory corruption at a later stage,
and I'd prefer to pick it up early.
the UNDI stack.
Ignore obviously invalid length combinations (as returned by
e.g. VMWare's PXE stack).
Limit to one packet per poll to avoid memory exhaustion.
Always send EOI; do not chain to BIOS's default interrupt handler.
They are just too unpredictable; at least VMware's seems to kill the
machine if you go anywhere near it.
Disable interrupts after return from PXENV_UNDI_ISR, just in case some
dumb PXE stack enables them.
safe dropping of the netdev ref by the driver while other refs still
exist.
Add netdev_irq() method. Net device open()/close() methods should no
longer enable or disable IRQs.
Remove rx_quota; it wasn't used anywhere and added too much complexity
to implementing correct interrupt-masking behaviour in pxe_undi.c.
Use generic fields in struct device_description rather than assuming
that the struct device * is contained within a pci_device or
isapnp_device; this assumption is broken when using the undionly
driver.
Add PXENV_UNDI_SET_STATION_ADDRESS.
entirely self-hosted (which avoids problems when building the same
tree on multiple systems - e.g. when you have /home NFS-mounted).
Also saves around 50 bytes in total - not sure why.
clue what the "previous" interrupt handler will do, which could range
from "just an iret" to "disable the interrupt"), and that means that
we have to take responsibility for ACKing all interrupts. Joy.
refer to them by name from the command line, or build them into a
multiboot module list.
Use setting image->type to disambiguate between "not my image" and "bad
image"; this avoids relying on specific values of the error code.
names.
Add "dev" pointer in struct net_device to tie network interfaces back to a
hardware device.
Force natural alignment of data types in __table() macros. This seems to
prevent gcc from taking the unilateral decision to occasionally increase
their alignment (which screws up the table packing).
real_call(), rather than moving it to the RM stack and back again.
This allows the real-mode function to completely destroy the stack
contents, provided that it manages to return to real_call().