[Valgrind-developers] valgrind support for glibc/kernel arm64 HWCAPS

Discussion:

Mark Wielaard

2017-07-05 15:42:56 UTC

Hi,

Adding valgrind-developers (and dropping fedora glibc).

valgrind needs to mask out all unknown/unimplemented flags. And I
thought it was 1? LD_HWCAP_MASK=1 acts as a workaround, after all.

The remaining flags shouldn't actually matter to glibc since they're
essentially assumed features (asimd, fp) but there may be programs out
there that might read them.

#define HWCAP_FP (1 << 0)
#define HWCAP_ASIMD (1 << 1)
#define HWCAP_EVTSTRM (1 << 2)
#define HWCAP_AES (1 << 3)
#define HWCAP_PMULL (1 << 4)
#define HWCAP_SHA1 (1 << 5)
#define HWCAP_SHA2 (1 << 6)
#define HWCAP_CRC32 (1 << 7)
#define HWCAP_ATOMICS (1 << 8)
#define HWCAP_FPHP (1 << 9)
#define HWCAP_ASIMDHP (1 << 10)
#define HWCAP_CPUID (1 << 11)
#define HWCAP_ASIMDRDM (1 << 12)
#define HWCAP_JSCVT (1 << 13)
#define HWCAP_FCMA (1 << 14)
#define HWCAP_LRCPC (1 << 15)
BTW the glibc linux/aarch64/bitshwcap.h only go up to HWCAP_ASIMDRDM.
Is there are corresponding ARM abi document that maps those values to
the corresponding arm64 cpu instruction sets? Valgrind supports some,
but certainly not all. Since valgrind emulates/translates all
instructions explicitly it makes sense to mask off anything unknown.

Yeah I assumed that anything before CPUID was probably implemented in
valgrind already, but if that's the conservative way to go then so be it.

The issue really is that I don't know what the HWCAP bits stand for. So
for me the only way is the conservative way assuming that since things
worked without any bits set, that is what we should default to for now.

Hopefully someone knows which instruction sets the HWCAPS bits stand for
and which ones are currently (fully) implemented in valgrind. Then we
can more selectively enable bits.

So does this mean that if there are specific hwcaps we know are
implemented in valgrind (now or in future), that the flags should be
enabled one by one?

Yes. IMHO.

For example, if valgrind disables hwcap_cpuid then
bugs in micro-architecture specific routines may get masked out since
they will never get called (unless you're using the not-merged-yet
glibc.tune.cpu tunable) and it would change program behaviour
considerably.

I am not sure I understand this part.
What are micro-architecture specific routines?

So once support for midr_el1 is in place, maybe
hwcap_cpuid should be brought back. Likewise for other flags.

Yes.

Cheers,

Mark

Mark Wielaard

2017-07-05 19:48:31 UTC

Permalink

Post by Mark Wielaard
I am not sure I understand this part.
What are micro-architecture specific routines?

These are routines written specifically for vendor CPUs, such as the
thunderx version of memcpy and memmove. The HWCAP_CPUID allows for such
routines to be launched on the correct hardware, but when run under
valgrind, those routines will not get called and any potential bugs in
those routines may get masked.

aha. We probably already intercept such implementations of memcpy and
memmove with our own versions anyway. See the shared/vg_replace_strmem.c
source in valgrind for some of the reasons for intercepting these hyper
optimized string/memory functions.

Cheers,

Mark

Mark Wielaard

2017-07-06 07:26:33 UTC

Permalink

Why do you expect valgrind to be able to execute code specific to vendor
CPUs? That's not even true for i386 and x86-64 and micro-architecture
ISA extensions.

The expectation is that it looks for potential memory access issues in
these low level functions themselves, but that is probably too much to
(c) the glibc SSE-variants can read past the end of the input data
ranges. This can cause false-positive Memcheck / Helgrind / DRD
reports.
sounds scary. It's been a while since I've looked at the x86 versions
of the string functions, do you (or Mark) know what this is referring to
and why it is safe?

There is a page boundary check upfront, or this only happens after the
pointer is aligned and the load cannot cross a page boundary.

(b) some of the normal versions are hyper-optimised, which fools
Memcheck and cause spurious value warnings. Our versions are
simpler.
specifically whether it is x86-specific or true for other architectures
as well

I'm not sure if this is what (b) is about, but for some instructions or
instruction sequences it is hard to implement proper uninitialized value
tracking, so that valgrind reports a dependency on uninitialized data
which does not in fact exist because the bits are masked away in
practice, but valgrind does not see this due to this problem.

Right, the hand-written glibc mem/str assembly routines are sometimes a
little too clever. valgrind/memcheck is optimized for code generated by
actual compilers. And valgrind has a somewhat stricter definition/view
of addressable memory (since it knows exactly how big an allocated
buffer is because it intercepts all memory allocation functions). See
also the valgrind/memcheck options --partial-loads-ok and
--expensive-definedness-checks:
http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok
http://valgrind.org/docs/manual/mc-manual.html#opt.expensive-definedness-checks

In newer valgrind releases the first defaults to yes, the second to no.

Cheers,

Mark