8.27.2008

WANTED: PIC driver...

Finally, I've managed to fix all (at least up till now..) broken OF calls, and the kernel went further, to the place where devices are attached. Among them, there should be a PIC driver, which is missing... After adding the ofwbus driver which I got from Andrew, I've constructed a PIC driver skeleton from the sys/powerpc/powermac/hrowpic.c. The PIC was found to resist on the ofwbus3, and the attch method fails to allocate memory (nice screenshot on my wiki http://wiki.freebsd.org/Porting_FreeBSD_to_Efika_(PPC_bring_up)). Now I have to mutate the old mac driver to obey the mpc5k2 will... I've also created a new folder in the p4 branch, where all the stuff concerning MPC5200B will be stored. It's called ... mpc5k2 :)

8.19.2008

Staying late pays...

Thanks to simple, yet great advice by Andrew Turner concerning calls to openfirmware and utilizing *OF_buf, it seems that eventually I came to the point where the kernel panics due to the lack of PIC!:

===================

 atomic_subtract_16(0)... panic: no PIC detected

KDB: enter: panic
[thread pid 0 tid 100000 ]
Stopped at 0x29587c: addi r0, r0, 0x0

===================

As you can see, the KDB by itself stops at this point, no crash at last:)

All thanks to the AT and the fact that I'm studying hard for the exam (Quantum Mechanics for PhD students....)

/*decr_init()*/

Some new things came up recentrly, I've added some code from NetBSD and commented out the decr_init() in cpu_startup(). Now, during the boot I've rached subsystem 3800000 , as last time w/o decr_init(), but this time it seems it entered the newbus!

==========================================

[thread pid 0 tid 100000 ]
Breakpoint at 0x4930d8: stwu r1, r1, -0x20
db>
nexus0: registered as a time-of-day clock (resolution 1000us)
nexus0: , type (unknown) (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type memory (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type (unknown) (no driver attached)
nexus0: , type serial (no driver attached)
nexus0: , type builtin (no driver attached)
nexus0: , type pci (no driver attached)
sc0: no video adapter found.
nexus0: , type syscons (no driver attached)
done.
  ofw_bus_gen_get_name(0)... cpu_exception:
SRR0 0x01035CFC SRR1 0x00003030 MSR 0x00003030
LR 0x0103E1E4 CTR 0x0101A5F0 CR 0x44002042 XER 0x20000000
DAR 0xD0004DDE DSISR 0x42000000 Type 3
GPR[] 0x00000007 0x00559EB4 0x00000000 0x00000000 0x07C05323 0x00000005 0x0000000D 0x0058EBB8

...

============================================

And again, the old friend. I can't spot the cause of those crashes. They always look the same, with the same content of SRR0,1 and MSR. Adding some stuff from NetBSD helped, but it only took it just a few steps further. Thus it must be something wih the OF, but what? This time it's nothing about the stack, because the registers contains some crazy addresses, neither form kernel, nor from the OF stack... During some previous tests I've encountered some crashes on instructions reading/writing to SPRG0 and IBAT4, in  ofw_sprg_prepare(); and in

===========

from: src/sys/powerpc/aim/ofw_machdep.c

 __asm __volatile( "\t"
  "sync\n\t"
  "mfmsr %0\n\t"
  "mtmsr %1\n\t"
  "isync\n"
  : "=r" (oldmsr)
  : "r" (ofmsr[0])
  );
============

both in openfirmware()...

8.17.2008

odd discovery

I was fighting with the OF_peer() to check if it passes the openfirmware() call, which is OF entry and after lots of tries it seems that it does. It's very different from what I thought before. After using ofw_stack() from NetBSD in OF_peer() as it was done in OF_read()/OF_write(), that function seems to be working but the crash remains. 

Here's the dump:

DEBUG 2
done.
subsystem 2100000
  freebsd4_sigreturn(0)...
cpu_startup() DEBUG 0
** DEBUGL OF_printf(): after OF_peer() in decr_init()
cpu_exception:
SRR0 0x004A0DF0 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
========================
What is very strange, during the trace I've encountered the call to OF_write() just after OF_peer(), like there were _two_ calls to openfirmware() insted of one. And it's the second one that crashes... Now, what I don't get is where from this second call come ? There's no printf() around OF_peer()...:

========================

from: sys/dev/ofw/openfirm.c

phandle_t
OF_peer(phandle_t node)
{
      static struct {
      cell_t name;
      cell_t nargs;
      cell_t nreturns;
      cell_t node;
      cell_t next; }

               args = {
                           (cell_t)"peer",
                            1,
                            1,
                            };


  ofw_stack();
  args.node = node;
  if (openfirmware(&args) == -1)
  return (-1);
  return (args.next);
}
=========================

8.16.2008

I've tried to find the differences between the contexts of calling openfirmware()
from OF_write()/OF_read() and OF_peer(), but I found none. It's not possible
to break into the fwentry() call from openfirmware(), since it always results in
spinlock corrupt crash.  I've tried to insert the call to decr_init() into several
boot subsystems, it always ended with the same crash, even before the VM init.
I've also called the OF_peer() at the very beginning of OF_write() to see if it's
possible to execute it at the very early stage of the boot, before the KDB entry.
It crashed. I've also tried to switch off the decr_init() call in the cpu_starup() and the system passed several following subsystems to subsystem 3000000 (or so) and crashed in the same way it does with decr_init().
I've inspected what does the ofw_stack() function do, but it only copies the current call stack to the firmware stack located in locore.S. This call was added in OF_read()/write() to make them work on Efika. Now, I can guess it's something with what could have change in p4 FBSD vs FBSD 6.x, since the patch was for that version. I'll try to apply the patch to version 6.x. Also, I'll have a look at the NetBSD source. I really don't understand, why it does work work the OF_read()/OF_write() and does not for OF_peer(), since those calls are so similar.

8.08.2008

still no luck, and a new problem

I've performed lots of experiments with the ofwreal.S and locore.S
sources in aim to try to remove the problem with the stack.
It does't work for now, but I've managed to overcome that by utilizing
some free registers. Now I've encountered yet another problem - the
kernel now crashes on the openfirmware entry in the fwentry(). It's probably
coused by a wrong address of the entry point, coming from the virtual
instead of the real addressing. Here's the crash site:
=============
cpu_exception:
SRR0 0x00000000 SRR1 0x00083030 MSR 0x00003030
LR 0x004A0E70 CTR 0x00000000 CR 0x44002082 XER 0x20000000
=============
I've made a small test by changing the address the firmware entry is supposed to
be at and I get slightly similar crash:
=============
cpu_exception:
SRR0 0x01003ED8 SRR1 0x00003030 MSR 0x00003030
LR 0x004A0E70 CTR 0x00000000 CR 0x44002044 XER 0x0
=============
The LR's point to the same location, the very next instrucion after branch to OF.
So I suppose it's the OF address, not just the argument addresing
that is responsible at the moment. I'm trying to find out how to translate an
address to the real mode. All help would be greatly appreciated...:)

8.01.2008

hard progress

Eventually, with the aid of my mentor, we found the exact position of the faulting instruction. I've managet to turn on the KDB facility and thus we were able to make some breakpoints. In aim to spot the bug we used the DAR register content and toolchan version of the objdump tool.  Fortunately, it appeared that addresses obtained from the objdump and those from the running kernel match. Here's the dump:

====================================================================

...

SRR0 0x004A0D10 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
GPR[] 0x004A0BC4 0xD0004D30 0x00000000 0x01003ED8 0x00003030 0x00000000 0x00000020 0x00000000
...

====================================================================

The evil instrucion is in the openfirmware entry point wrapper,

====================================================================

ofwreal.S

...

/*
 * Emulated firmware entry.
 */
fwentry:

...

 lis %r3,clsave@ha   /* save mmu values of client */
  addi %r3,%r3,clsave@l
  lis %r3,fwsave@ha /* restore mmu values of firmware */
  addi %r3,%r3,fwsave@l
  bl restoremmu


  lis %r3,ofentry@ha
  lwz %r3,ofentry@l(%r3) /* get actual firmware entry */
  mtlr %r3

  mfmsr %r4
  stw %r4,12(%r1) /* save MSR */
  ori %r4,%r4,PSL_IR|PSL_DR /* turn on MMU */
  andi. %r4,%r4,~PSL_EE@l /* turn off interrupts */
  mtmsr %r4
  isync

====================================================================

The problem is with the stack,

stw %r4,12(%r1) /* save MSR */

That line causes probably a DSI exception. The problem is that it crashes just after the restoremmu function branch. If one stores something on the stuck before that branch, ther's no crash. So the problem lies in the restoremmu function, but it's not clear to me where exactly it is, because the restoremmu does not operate the %r1 register explicitly.