Modify the instructions emited for PREFETCH on arm64 (#10117)

Summary:
__builtin_prefetch(...., 1) prefetches into the L2 cache on x86 while the same
emits a pldl3keep instruction on arm64 which doesn't seem to be close enough.

Testing on a Graviton3, and M1 system with memtablerep_bench fillrandom and
skiplist througpuh increased as follows adjusting the 1 to 2 or 3:
```
           1 -> 2     1 -> 3
----------------------------
Graviton3   +10%        +15%
M1          +10%        +10%
```

Given that prefetching into the L1 cache seems to help, I chose that conversion

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10117

Reviewed By: pdillinger

Differential Revision: D37120475

fbshipit-source-id: db1ef43f941445019c68316500a2250acc643d5e
This commit is contained in:
Ali Saidi 2022-06-14 17:58:44 -07:00 committed by Facebook GitHub Bot
parent 751d1a3e48
commit b550fc0b09
1 changed files with 9 additions and 0 deletions

View File

@ -202,7 +202,16 @@ extern void *cacheline_aligned_alloc(size_t size);
extern void cacheline_aligned_free(void *memblock); extern void cacheline_aligned_free(void *memblock);
#if defined(__aarch64__)
// __builtin_prefetch(..., 1) turns into a prefetch into prfm pldl3keep. On
// arm64 we want this as close to the core as possible to turn it into a
// L1 prefetech unless locality == 0 in which case it will be turned into a
// non-temporal prefetch
#define PREFETCH(addr, rw, locality) \
__builtin_prefetch(addr, rw, locality >= 1 ? 3 : locality)
#else
#define PREFETCH(addr, rw, locality) __builtin_prefetch(addr, rw, locality) #define PREFETCH(addr, rw, locality) __builtin_prefetch(addr, rw, locality)
#endif
extern void Crash(const std::string& srcfile, int srcline); extern void Crash(const std::string& srcfile, int srcline);