Discussion:
Issue with valgrind not taking branch (ppc64le power9)
Add Reply
Aaron Sawdey
2017-05-11 21:44:26 UTC
Reply
Permalink
Raw Message
Hello all,
  I've been working with Carl Love to try to get valgrind up and
functional on power9. The issue I've been seeing is a bl that is not
being followed. I'm hoping someone can help me with where to look to
see at what point valgrind is choosing the wrong execution path --
basically I need some help knowing where to look in the internal data
structures.

Here's a description of the situation where it is going awry. The code
looks like this:

   81b38:       a6 02 08 7c     mflr    r0
   81b3c:       e0 ff 81 fb     std     r28,-32(r1)
   81b40:       e8 ff a1 fb     std     r29,-24(r1)
   81b44:       f0 ff c1 fb     std     r30,-16(r1)
   81b48:       f8 ff e1 fb     std     r31,-8(r1)
   81b4c:       78 23 9e 7c     mr      r30,r4
   81b50:       ff ff 80 38     li      r4,-1
   81b54:       00 00 e0 38     li      r7,0
   81b58:       78 1b 7c 7c     mr      r28,r3
   81b5c:       78 2b bd 7c     mr      r29,r5
   81b60:       00 00 20 39     li      r9,0
   81b64:       00 00 c0 38     li      r6,0
   81b68:       ff ff a0 38     li      r5,-1
   81b6c:       20 04 84 54     rlwinm  r4,r4,0,16,16
   81b70:       10 00 01 f8     std     r0,16(r1)
   81b74:       d1 fe 21 f8     stdu    r1,-304(r1)
   81b78:       20 00 e1 3b     addi    r31,r1,32
   81b7c:       a8 00 21 f9     std     r9,168(r1)
   81b80:       78 fb e3 7f     mr      r3,r31
   81b84:       15 2c 01 48     bl      94798
<_IO_default_doallocate@@GLIBC_2.17+0x158>
   81b88:       00 00 00 60     nop
   81b8c:       ff ff 22 3d     addis   r9,r2,-1
   81b90:       78 e3 84 7f     mr      r4,r28
   81b94:       78 fb e3 7f     mr      r3,r31
   81b98:       00 00 c0 38     li      r6,0
   81b9c:       00 00 a0 38     li      r5,0
   81ba0:       a8 5a 29 39     addi    r9,r9,23208
   81ba4:       f8 00 21 f9     std     r9,248(r1)
   81ba8:       81 4f 01 48     bl      96b28
<_IO_str_pbackfail@@GLIBC_2.17+0xf8>

At the end of the first block we see this

        0x6CF1B80:  mr r3,r31

              ------ IMark(0x6CF1B80, 4, 0) ------
              t54 = GET:I64(264)
              t56 = GET:I64(264)
              t55 = t54
              PUT(40) = t55
              PUT(1296) = 0x6CF1B84:I64

        0x6CF1B84:  bl 0x6D04798

              ------ IMark(0x6CF1B84, 4, 0) ------
              PUT(1304) = 0x6CF1B88:I64
              t61 = 0x6D04798:I64
              ====== AbiHint(Sub64(GET:I64(24),0x120:I64), 288, t61)
======
              PUT(1296) = 0x6D04798:I64
              PUT(1296) = GET:I64(1296); exit-Call

Where 0x6D04798 is the correct address we should branch-and-link to.

At the end of the generated code there is this:

(xDirect) if (always) { imm64 r30,0x6D04798; std r30,1296(%r31); imm64-
fixed5 r30,$disp_cp_chain_me_to_fastEP; mtctr r30; bctrl }

D0 06 C0 3F 98 47 DE 63 10 05 DF FB 00 00 C0 3F 00 00 DE 63 C6 07 DE 7B
0A 38 DE 67 DC 1D DE 63 A6 03 C9 7F 21 04 80 4E 

But the very next block is the code following the bl at 0x6CF1B88, not
the code at 0x6D04798:

==== SB 8420 (evchecks 3471236) [tid 1] 0x6cf1b88
vsscanf@@GLIBC_2.17+88 /opt/at10.0-4-beta1/lib64/power9/libc-
2.24.so+0x81b88

------------------------ Front end ------------------------

        0x6CF1B88:  ori r0,r0,0x0

              ------ IMark(0x6CF1B88, 4, 0) ------
              t0 = GET:I64(16)
              t2 = GET:I64(16)
              t1 = Or64(t0,0x0:I64)
              PUT(16) = t1
              PUT(1296) = 0x6CF1B8C:I64

        0x6CF1B8C:  addis r9,r2,0xFFFFFFFF

              ------ IMark(0x6CF1B8C, 4, 0) ------
              t3 = GET:I64(32)
              t4 = GET:I64(264)
              t5 = Add64(t3,0xFFFFFFFFFFFF0000:I64)
              PUT(88) = t5
              PUT(1296) = 0x6CF1B90:I64

Thanks,
   Aaron

-- 
Aaron Sawdey, Ph.D.  ***@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

--
Aaron Sawdey, Ph.D. ***@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
Ivo Raisr
2017-05-12 00:54:49 UTC
Reply
Permalink
Raw Message
Post by Aaron Sawdey
But the very next block is the code following the bl at 0x6CF1B88, not
==== SB 8420 (evchecks 3471236) [tid 1] 0x6cf1b88
2.24.so+0x81b88
An obvious question - are you sure that the next block executed is
really SB 8420?
It is not possible to deduce it from the VEX logs alone - they show
only translations.
I tracked a similar problem in past and had to add a debug statement
in the scheduler and disable block chaining
to get a clear picture what gets scheduled. Maybe there is a better way...

I.
Julian Seward
2017-05-12 08:55:33 UTC
Reply
Permalink
Raw Message
Yeah, to echo Ivo's point -- what you're looking at is a list of the blocks
the first time they are executed -- because at that point they have to be
instrumented. It's not an execution trace. All this really tells you is
that the block at 0x6D04798 (the call target) has already been instrumented.
You should be able to find evidence of that earlier in the debug output.

So to return to the original question -- do you have some other reason to
believe that bits of code that should be executed, are being missed?

J
Aaron Sawdey
2017-05-12 15:03:33 UTC
Reply
Permalink
Raw Message
Post by Julian Seward
Yeah, to echo Ivo's point -- what you're looking at is a list of the blocks
the first time they are executed -- because at that point they have to be
instrumented.  It's not an execution trace.  All this really tells
you is
that the block at 0x6D04798 (the call target) has already been
instrumented.
You should be able to find evidence of that earlier in the debug output.
So to return to the original question -- do you have some other reason to
believe that bits of code that should be executed, are being missed?
J
Julian & Ivo,
You are right, it was executed earlier:

==== SB 8296 (evchecks 3470788) [tid 1] 0x6d04798 _IO_no_init+8 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x94798

I was originally led to the conclusion that it wasn't being executed
when I single-stepped it (stepi) by running with --vgdb=yes --vgdb-
error=0 and attaching gdb and setting a breakpoint at 0x6cf1b84 (see
below). The presenting symptom is that the program segfaults shortly
after this when run in valgrind (any tool) and runs correctly native.
So either the gdbserver is behaving oddly and the problem is elsewhere,
or it is not executing this call, which results in segfault later in
glibc code because of stuff it was supposed to have initialized.

Breakpoint 1, 0x0000000006cf1b84 in _IO_vsscanf (
    string=0xfff00e37f " R 12626 18764 12626 34818 18764 4194304 2816 0 0 0 7546 117 0 0 20 0 1 0 265857706 160563200 1672 18446744073709551615 939524096 942755688 70368633113024 0 0 0 0 0 2008612607 0 0 0 17 28
 0 0 0 0 0 94"..., format=0x6b02170 "%*s%d", args=0xfff00e2a0 <incomplete sequence \343>) at iovsscanf.c:38
38      iovsscanf.c: No such file or directory.
(gdb) disass $pc-64,$pc+32
Dump of assembler code from 0x6cf1b44 to 0x6cf1ba4:
   0x0000000006cf1b44 <_IO_vsscanf+20>: std     r30,-16(r1)
   0x0000000006cf1b48 <_IO_vsscanf+24>: std     r31,-8(r1)
   0x0000000006cf1b4c <_IO_vsscanf+28>: mr      r30,r4
   0x0000000006cf1b50 <_IO_vsscanf+32>: li      r4,-1
   0x0000000006cf1b54 <_IO_vsscanf+36>: li      r7,0
   0x0000000006cf1b58 <_IO_vsscanf+40>: mr      r28,r3
   0x0000000006cf1b5c <_IO_vsscanf+44>: mr      r29,r5
   0x0000000006cf1b60 <_IO_vsscanf+48>: li      r9,0
   0x0000000006cf1b64 <_IO_vsscanf+52>: li      r6,0
   0x0000000006cf1b68 <_IO_vsscanf+56>: li      r5,-1
   0x0000000006cf1b6c <_IO_vsscanf+60>: rlwinm  r4,r4,0,16,16
   0x0000000006cf1b70 <_IO_vsscanf+64>: std     r0,16(r1)
   0x0000000006cf1b74 <_IO_vsscanf+68>: stdu    r1,-304(r1)
   0x0000000006cf1b78 <_IO_vsscanf+72>: addi    r31,r1,32
   0x0000000006cf1b7c <_IO_vsscanf+76>: std     r9,168(r1)
   0x0000000006cf1b80 <_IO_vsscanf+80>: mr      r3,r31
=> 0x0000000006cf1b84 <_IO_vsscanf+84>: bl      0x6d04798 <_IO_no_init+8>
   0x0000000006cf1b88 <_IO_vsscanf+88>: nop
   0x0000000006cf1b8c <_IO_vsscanf+92>: addis   r9,r2,-1
   0x0000000006cf1b90 <_IO_vsscanf+96>: mr      r4,r28
   0x0000000006cf1b94 <_IO_vsscanf+100>:        mr      r3,r31
   0x0000000006cf1b98 <_IO_vsscanf+104>:        li      r6,0
   0x0000000006cf1b9c <_IO_vsscanf+108>:        li      r5,0
   0x0000000006cf1ba0 <_IO_vsscanf+112>:        addi    r9,r9,23208
End of assembler dump.
(gdb) stepi
0x0000000006cf1b88      38      in iovsscanf.c
(gdb) stepi
39      in iovsscanf.c
(gdb) p/x $pc
$1 = 0x6cf1b8c
(gdb) stepi
40      in iovsscanf.c
(gdb) stepi
0x0000000006cf1b94      40      in iovsscanf.c
(gdb) stepi
0x0000000006cf1b98      40      in iovsscanf.c
(gdb) stepi
0x0000000006cf1b9c      40      in iovsscanf.c

Here's my attempt to instrument the chaining and get more of a picture
of what's going on:

First call to _IO_no_init:

handle_chain_me place_to_chain 0x80477E920 ip 0x6cfbdc8 to_sNo 0x0 to_tteNo 0x85f5
--26007:0:transtab tt_tc_do_chaining host code 0x80477E95C from__patch_addr 0x80477E920
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B9B30 tcptr 0x80477E940 entry 0x6cfbdc8
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B9BA0 tcptr 0x80477E3A8 entry 0x6cfbd88
sched: CHAIN_TO_FAST_EP: 0x80477EBF0
==== SB 8295 (evchecks 3470785) [tid 1] 0x6d04798 _IO_no_init+8 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x94798

handle_chain_me place_to_chain 0x80477EBF0 ip 0x6d04798 to_sNo 0x0 to_tteNo 0x85f4
--26007:0:transtab tt_tc_do_chaining host code 0x80477EC2C from__patch_addr 0x80477EBF0
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B9AC0 tcptr 0x80477EC10 entry 0x6d04798
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B9B30 tcptr 0x80477E940 entry 0x6cfbdc8
sched: CHAIN_TO_SLOW_EP: 0x80477EFC8

==== SB 8296 (evchecks 3470786) [tid 1] 0x6d04710 _IO_old_init /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x94710

handle_chain_me place_to_chain 0x80477EFC8 ip 0x6d04710 to_sNo 0x0 to_tteNo 0x85f3
--26007:0:transtab tt_tc_do_chaining host code 0x80477EFE8 from__patch_addr 0x80477EFC8
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B9A50 tcptr 0x80477EFE8 entry 0x6d04710
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B9AC0 tcptr 0x80477EC10 entry 0x6d04798

==== SB 8297 (evchecks 3470787) [tid 1] 0x6d047c8 _IO_no_init+56 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x947c8

I don't see chaining (through this function anyway) from 6d04710 to anywhere which seems surprising.

Later, when the call that's skipped in gdb occurs (the block that ends with the call at 0x6cf1b84):

==== SB 8417 (evchecks 3471231) [tid 1] 0x6cf1b38 vsscanf@@GLIBC_2.17+8 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x81b38

   ====== AbiHint(Sub64(t88,0x120:I64), 288, 0x6D04798:I64) ======

-- PUT(1296) = 0x6D04798:I64; exit-Call
(xDirect) if (always) { imm64 r30,0x6D04798; std r30,1296(%r31); imm64-fixed5 r30,$disp_cp_chain_me_to_fastEP; mtctr r30; bctrl }

handle_chain_me place_to_chain 0x80478FB18 ip 0x6cf1b38 to_sNo 0x0 to_tteNo 0x857a
--26007:0:transtab tt_tc_do_chaining host code 0x80478FB54 from__patch_addr 0x80478FB18
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B6560 tcptr 0x80478FB38 entry 0x6cf1b38
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B65D0 tcptr 0x80478F6D0 entry 0x6ce89d0

sched: CHAIN_TO_FAST_EP: 0x804790094

This suggests that we do go to 6d04798, so why didn't the gdbserver step into there?

handle_chain_me place_to_chain 0x804790094 ip 0x6d04798 to_sNo 0x0 to_tteNo 0x85f4
--26007:0:transtab tt_tc_do_chaining host code 0x80477EC2C from__patch_addr 0x804790094
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B9AC0 tcptr 0x80477EC10 entry 0x6d04798
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B6560 tcptr 0x80478FB38 entry 0x6cf1b38

If 6cf1b38 is a new block at this point, shouldn't we have seen chaining from a successor to 6d04798 back to
6cf1b88?

==== SB 8418 (evchecks 3471235) [tid 1] 0x6cf1b88 vsscanf@@GLIBC_2.17+88 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x81b88

This is chaining of the block after 6cf1b38 to another call:

sched: CHAIN_TO_FAST_EP: 0x804790304
handle_chain_me place_to_chain 0x804790304 ip 0x6d06b28 to_sNo 0x0 to_tteNo 0x85ef
--26007:0:transtab tt_tc_do_chaining host code 0x80478005C from__patch_addr 0x804790304
--26007:0:transtab tt_tc_do_chaining to_tteC 0x8050B9890 tcptr 0x804780040 entry 0x6d06b28
--26007:0:transtab tt_tc_do_chaining from_tteC 0x8050B64F0 tcptr 0x8047900B0 entry 0x6cf1b88
sched: CHAIN_TO_FAST_EP: 0x80478044C

==== SB 8419 (evchecks 3471237) [tid 1] 0x6d06b58 _IO_str_init_static_internal+56 /opt/at10.0-4-beta1/lib64/power9/libc-2.24.so+0x96b58

Thanks,
Aaron
--
Aaron Sawdey, Ph.D. ***@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
Aaron Sawdey
2017-05-14 16:02:44 UTC
Reply
Permalink
Raw Message
Post by Julian Seward
Yeah, to echo Ivo's point -- what you're looking at is a list of the blocks
the first time they are executed -- because at that point they have to be
instrumented.  It's not an execution trace.  All this really tells
you is
that the block at 0x6D04798 (the call target) has already been
instrumented.
You should be able to find evidence of that earlier in the debug output.
So to return to the original question -- do you have some other reason to
believe that bits of code that should be executed, are being missed?
Julian,
I tried it again, and this time set a breakpoint at the call target
before trying to single step the bl at 0x6cf1b84. That breakpoint was
triggered, so the problem seems to be just that stepi is allowing the
subroutine call to execute, i.e. behaving like nexti instead. The
segfault is probably due to issues relating to the new p9 instructions.

Thanks for your assistance with this.

Aaron
--
Aaron Sawdey, Ph.D. ***@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
Philippe Waroquiers
2017-05-14 16:16:55 UTC
Reply
Permalink
Raw Message
Post by Aaron Sawdey
I tried it again, and this time set a breakpoint at the call target
before trying to single step the bl at 0x6cf1b84. That breakpoint was
triggered, so the problem seems to be just that stepi is allowing the
subroutine call to execute, i.e. behaving like nexti instead.
If nexti/stepi were done in a gdb+vgdb context, using --vgdb=full
might help to have more precise debugging.

Philippe
Aaron Sawdey
2017-05-15 01:52:24 UTC
Reply
Permalink
Raw Message
Post by Philippe Waroquiers
  I tried it again, and this time set a breakpoint at the call
target
before trying to single step the bl at 0x6cf1b84. That breakpoint was
triggered, so the problem seems to be just that stepi is allowing the
subroutine call to execute, i.e. behaving like nexti instead. 
If nexti/stepi were done in a gdb+vgdb context, using --vgdb=full
might help to have more precise debugging.
Philippe
Thanks Philippe -- that did the trick.
--
Aaron Sawdey, Ph.D. ***@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
Loading...