8.01.2008

hard progress

Eventually, with the aid of my mentor, we found the exact position of the faulting instruction. I've managet to turn on the KDB facility and thus we were able to make some breakpoints. In aim to spot the bug we used the DAR register content and toolchan version of the objdump tool.  Fortunately, it appeared that addresses obtained from the objdump and those from the running kernel match. Here's the dump:

====================================================================

...

SRR0 0x004A0D10 SRR1 0x00003030 MSR 0x00003030
LR 0x01003ED8 CTR 0x00000000 CR 0x24002044 XER 0x0
DAR 0xD0004D3C DSISR 0x42000000 Type 3
GPR[] 0x004A0BC4 0xD0004D30 0x00000000 0x01003ED8 0x00003030 0x00000000 0x00000020 0x00000000
...

====================================================================

The evil instrucion is in the openfirmware entry point wrapper,

====================================================================

ofwreal.S

...

/*
 * Emulated firmware entry.
 */
fwentry:

...

 lis %r3,clsave@ha   /* save mmu values of client */
  addi %r3,%r3,clsave@l
  lis %r3,fwsave@ha /* restore mmu values of firmware */
  addi %r3,%r3,fwsave@l
  bl restoremmu


  lis %r3,ofentry@ha
  lwz %r3,ofentry@l(%r3) /* get actual firmware entry */
  mtlr %r3

  mfmsr %r4
  stw %r4,12(%r1) /* save MSR */
  ori %r4,%r4,PSL_IR|PSL_DR /* turn on MMU */
  andi. %r4,%r4,~PSL_EE@l /* turn off interrupts */
  mtmsr %r4
  isync

====================================================================

The problem is with the stack,

stw %r4,12(%r1) /* save MSR */

That line causes probably a DSI exception. The problem is that it crashes just after the restoremmu function branch. If one stores something on the stuck before that branch, ther's no crash. So the problem lies in the restoremmu function, but it's not clear to me where exactly it is, because the restoremmu does not operate the %r1 register explicitly.

2 comments:

phs said...

Maybe I'm missing something, but shouldn't you look at SRR0 to find the faulting instruction? I've re-read the relevant parts of Book III, both -S and -E, and both say that when a Data Storage Interrupt occurs, SRR0 is set to the address of the instruction that caused the exception.

So in other words, if the "stw" instruction isn't at 0x004A0D10, then I think you're looking in the wrong spot.

vi0 said...

Thanks a lot for the comment, it;s actually a mistake - we used the SRR0 to find the instruction and DAR to check that the stack is mialigned...