11.02.2008
Temporary halt
9.08.2008
also
8.27.2008
WANTED: PIC driver...
8.19.2008
Staying late pays...
Thanks to simple, yet great advice by Andrew Turner concerning calls to openfirmware and utilizing *OF_buf, it seems that eventually I came to the point where the kernel panics due to the lack of PIC!:
===================
atomic_subtract_16(0)... panic: no PIC detected
KDB: enter: panic
[thread pid 0 tid 100000 ]
Stopped at 0x29587c: addi r0, r0, 0x0
===================
As you can see, the KDB by itself stops at this point, no crash at last:)
All thanks to the AT and the fact that I'm studying hard for the exam (Quantum Mechanics for PhD students....)
/*decr_init()*/
Some new things came up recentrly, I've added some code from NetBSD and commented out the decr_init() in cpu_startup(). Now, during the boot I've rached subsystem 3800000 , as last time w/o decr_init(), but this time it seems it entered the newbus!
==========================================
[thread pid 0 tid 100000 ]
Breakpoint at 0x4930d8: stwu r1, r1, -0x20
db>
nexus0: registered as a time-of-day clock (resolution 1000us)
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
nexus0:
sc0: no video adapter found.
nexus0:
done.
ofw_bus_gen_get_name(0)... cpu_exception:
SRR0 0x01035CFC SRR1 0x00003030 MSR 0x00003030
LR 0x0103E1E4 CTR 0x0101A5F0 CR 0x44002042 XER 0x20000000
DAR 0xD0004DDE DSISR 0x42000000 Type 3
GPR[] 0x00000007 0x00559EB4 0x00000000 0x00000000 0x07C05323 0x00000005 0x0000000D 0x0058EBB8
...
============================================
And again, the old friend. I can't spot the cause of those crashes. They always look the same, with the same content of SRR0,1 and MSR. Adding some stuff from NetBSD helped, but it only took it just a few steps further. Thus it must be something wih the OF, but what? This time it's nothing about the stack, because the registers contains some crazy addresses, neither form kernel, nor from the OF stack... During some previous tests I've encountered some crashes on instructions reading/writing to SPRG0 and IBAT4, in ofw_sprg_prepare(); and in
===========
from: src/sys/powerpc/aim/ofw_machdep.c
__asm __volatile( "\t"
"sync\n\t"
"mfmsr %0\n\t"
"mtmsr %1\n\t"
"isync\n"
: "=r" (oldmsr)
: "r" (ofmsr[0])
);
============
both in openfirmware()...
8.17.2008
odd discovery
I was fighting with the OF_peer() to check if it passes the openfirmware() call, which is OF entry and after lots of tries it seems that it does. It's very different from what I thought before. After using ofw_stack() from NetBSD in OF_peer() as it was done in OF_read()/OF_write(), that function seems to be working but the crash remains.
Here's the dump:
DEBUG 2
done.
subsystem 2100000
freebsd4_sigreturn(0)...
cpu_startup() DEBUG 0
** DEBUGL OF_printf(): after OF_peer() in decr_init()
cpu_exception:
SRR0 0x004A0DF0 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
========================
What is very strange, during the trace I've encountered the call to OF_write() just after OF_peer(), like there were _two_ calls to openfirmware() insted of one. And it's the second one that crashes... Now, what I don't get is where from this second call come ? There's no printf() around OF_peer()...:
========================
from: sys/dev/ofw/openfirm.c
phandle_t
OF_peer(phandle_t node)
{
static struct {
cell_t name;
cell_t nargs;
cell_t nreturns;
cell_t node;
cell_t next; }
args = {
(cell_t)"peer",
1,
1,
};
ofw_stack();
args.node = node;
if (openfirmware(&args) == -1)
return (-1);
return (args.next);
}
=========================
8.16.2008
from OF_write()/OF_read() and OF_peer(), but I found none. It's not possible
to break into the fwentry() call from openfirmware(), since it always results in
spinlock corrupt crash. I've tried to insert the call to decr_init() into several
boot subsystems, it always ended with the same crash, even before the VM init.
I've also called the OF_peer() at the very beginning of OF_write() to see if it's
possible to execute it at the very early stage of the boot, before the KDB entry.
It crashed. I've also tried to switch off the decr_init() call in the cpu_starup() and the system passed several following subsystems to subsystem 3000000 (or so) and crashed in the same way it does with decr_init().
I've inspected what does the ofw_stack() function do, but it only copies the current call stack to the firmware stack located in locore.S. This call was added in OF_read()/write() to make them work on Efika. Now, I can guess it's something with what could have change in p4 FBSD vs FBSD 6.x, since the patch was for that version. I'll try to apply the patch to version 6.x. Also, I'll have a look at the NetBSD source. I really don't understand, why it does work work the OF_read()/OF_write() and does not for OF_peer(), since those calls are so similar.
8.08.2008
still no luck, and a new problem
sources in aim to try to remove the problem with the stack.
It does't work for now, but I've managed to overcome that by utilizing
some free registers. Now I've encountered yet another problem - the
kernel now crashes on the openfirmware entry in the fwentry(). It's probably
coused by a wrong address of the entry point, coming from the virtual
instead of the real addressing. Here's the crash site:
=============
cpu_exception:
SRR0 0x00000000 SRR1 0x00083030 MSR 0x00003030
LR 0x004A0E70 CTR 0x00000000 CR 0x44002082 XER 0x20000000
=============
I've made a small test by changing the address the firmware entry is supposed to
be at and I get slightly similar crash:
=============
cpu_exception:
SRR0 0x01003ED8 SRR1 0x00003030 MSR 0x00003030
LR 0x004A0E70 CTR 0x00000000 CR 0x44002044 XER 0x0
=============
The LR's point to the same location, the very next instrucion after branch to OF.
So I suppose it's the OF address, not just the argument addresing
that is responsible at the moment. I'm trying to find out how to translate an
address to the real mode. All help would be greatly appreciated...:)
8.01.2008
hard progress
Eventually, with the aid of my mentor, we found the exact position of the faulting instruction. I've managet to turn on the KDB facility and thus we were able to make some breakpoints. In aim to spot the bug we used the DAR register content and toolchan version of the objdump tool. Fortunately, it appeared that addresses obtained from the objdump and those from the running kernel match. Here's the dump:
====================================================================
...
SRR0 0x004A0D10 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
GPR[] 0x004A0BC4 0xD0004D30 0x00000000 0x01003ED8 0x00003030 0x00000000 0x00000020 0x00000000
...
====================================================================
The evil instrucion is in the openfirmware entry point wrapper,
====================================================================
ofwreal.S
...
/*
* Emulated firmware entry.
*/
fwentry:
...
lis %r3,clsave@ha /* save mmu values of client */
addi %r3,%r3,clsave@l
lis %r3,fwsave@ha /* restore mmu values of firmware */
addi %r3,%r3,fwsave@l
bl restoremmu
lis %r3,ofentry@ha
lwz %r3,ofentry@l(%r3) /* get actual firmware entry */
mtlr %r3
mfmsr %r4
stw %r4,12(%r1) /* save MSR */
ori %r4,%r4,PSL_IR|PSL_DR /* turn on MMU */
andi. %r4,%r4,~PSL_EE@l /* turn off interrupts */
mtmsr %r4
isync
====================================================================
The problem is with the stack,
stw %r4,12(%r1) /* save MSR */
That line causes probably a DSI exception. The problem is that it crashes just after the restoremmu function branch. If one stores something on the stuck before that branch, ther's no crash. So the problem lies in the restoremmu function, but it's not clear to me where exactly it is, because the restoremmu does not operate the %r1 register explicitly.
7.07.2008
suspect spotted
7.06.2008
Bad surprise
in the driver source code, and it appeared that it doesn't even come to
SI_SUB_CONFIGURE. I've turned on VERBOSE_SYSINIT, here's the snipped:
=======================================================================
0x253718(0x57d068)... done.
0x253718(0x57d120)... done.
0x253718(0x578dcc)... done.
0x26d900(0x56f194)... done.
0x253718(0x56f19c)... done.
0x253718(0x57d5a0)... done.
0x2a46e8(0)... done.
0x308d88(0)... done.
0x295c50(0)... done.
subsystem 1c00000
0x293e94(0)... done.
subsystem 1c00001
0x27a9d8(0)... done.
subsystem 2000000
0x2520e0(0)... done.
0x245f24(0)... done.
0x284630(0)... done.
linker_file_unload(0)... done.
ktrsyscall(0)... done.
linker_file_lookup_set(0)... done.
subsystem 2100000
freebsd4_sigreturn(0)... cpu_exception:
SRR0 0x004A0D10 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
GPR[] 0x004A0BC4 0xD0004D30 0x00000000 0x01003ED8 0x00003030 0x00000000 0x00000020 0x00000000
GPR[] 0x00000000 0x00000000 0x00000000 0x004A0D00 0x00003030 0x00000000 0xEFF7737F 0x00575BC8
GPR[] 0x00590000 0x00554D0C 0x00522C6C 0x00522C4C 0x00522C5C 0x00522C90 0xD0004E18 0xD0004E1C
GPR[] 0x00590000 0x00800001 0x00590000 0x00000001 0x00590000 0x02100000 0x00003032 0xD0004D40
ibat0U 0x00001FFF ibat0L 0x00000012
ibat1U 0xF0001FFF ibat1L 0xF0000012
ibat2U 0x00000000 ibat2L 0x00000000
ibat3U 0x00000000 ibat3L 0x00000000
dbat0U 0xF0001FFF dbat0L 0xF000002A
dbat1U 0x80001FFF dbat1L 0x8000002A
dbat2U 0x00001FFF dbat2L 0x00000012
dbat3U 0xC0001FFF dbat3L 0xC000002A
HID0 0x0000C000
deadend:
=======================================================================
The crashing subsystem is subsystem 2100000, and is responsible for the
CPU resources:
=======================================================================
from sys/sys/kernel.h
...
enum sysinit_sub_id {
...
SI_SUB_KLD = 0x2000000, /* KLD and module setup */
SI_SUB_CPU = 0x2100000, /* CPU resource(s)*/
...
};
...
=======================================================================
It crashes somwhere near sigreturn(). The PIC has to wait for a second...
6.14.2008
Recent activity report 01
Now it's the time to check if the device enumeration goes well. I'll be trying to turn on the debugging in the devices source code to see if dev's ID's are OK with the OpenFirmware... After that I plan to get aquired with the newbus framework and I'll be heading towards the allmighty PIC... nexux.c awaits...
6.12.2008
So here we go...
It's my blog on my 2008's GSoC project, which is porting FreeBSD to Efika, a PPC based evaluation board. I've decided to create this blog in aim to show you how a geek spends his summer :) - I'll be describing ups and downs of being a GSoCer as the time passes, but also I'll be reporting the progress of the project. More techincal info can be foud on my FreeBSD wiki:
http://wiki.freebsd.org/PrzemekWitaszczyk#preview