diff options
author | Subrata Banik <subratabanik@google.com> | 2023-05-29 13:53:57 +0530 |
---|---|---|
committer | Subrata Banik <subratabanik@google.com> | 2023-06-01 07:52:12 +0000 |
commit | 36faccfec51dc5314193f048ff77f23a0304e0b1 (patch) | |
tree | c26f0867def3266ed2d6c22f5dba3c818e259236 /src/include | |
parent | dffb1c8933c09c041b10e1f6a666449e7a7531da (diff) |
include/cpu/x86: Skip `wbinvd` on CPUs with cache self-snooping (SS)
This patch refers and backport some of previous work from Linux Kernel
(https://lore.kernel.org/all/1561689337-19390-3-git-send-email-ricardo.
neri-calderon@linux.intel.com/T/#u) that optimizes the MTRR register
programming in multi-processor systems by relying on the CPUID
(self-snoop feature supported).
Refer to the details below:
Programming MTRR registers in multi-processor systems is a rather
lengthy process as it involves flushing caches. As a result, the
process may take a considerable amount of time. Furthermore, all
processors must program these registers serially.
`wbinvd` instruction is used to invalidate the cache line to ensure
that all modified data is written back to memory. All logical processors
are stopped from executing until after the write-back and invalidate
operation is completed.
The amount of time or cycles for WBINVD to complete will vary due to the
size of different cache hierarchies and other factors. As a consequence,
the use of the WBINVD instruction can have an impact on response time.
As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping (ss).
"Flush all caches using the WBINVD instructions. Note on a processor
that supports self-snooping, CPUID feature flag bit 27, this step is
unnecessary."
Thus, skipping the cache flushes can reduce by several tens of
milliseconds the time needed to complete the programming of the MTRR
registers:
Platform Before After
12-core (14 Threads) MeteorLake 35ms 1ms
BUG=b:260455826
TEST=Able to build and boot google/rex.
Change-Id: I83cac2b1e1707bbb1bc1bba82cf3073984e9768f
Signed-off-by: Subrata Banik <subratabanik@google.com>
Reviewed-on: https://review.coreboot.org/c/coreboot/+/75511
Tested-by: build bot (Jenkins) <no-reply@coreboot.org>
Reviewed-by: Jérémy Compostella <jeremy.compostella@intel.com>
Reviewed-by: Lean Sheng Tan <sheng.tan@9elements.com>
Reviewed-by: Himanshu Sahdev <himanshu.sahdev@intel.com>
Reviewed-by: Tarun Tuli <taruntuli@google.com>
Diffstat (limited to 'src/include')
-rw-r--r-- | src/include/cpu/x86/cache.h | 15 |
1 files changed, 14 insertions, 1 deletions
diff --git a/src/include/cpu/x86/cache.h b/src/include/cpu/x86/cache.h index 4143d972f5..d4d9160252 100644 --- a/src/include/cpu/x86/cache.h +++ b/src/include/cpu/x86/cache.h @@ -9,9 +9,11 @@ #define CR0_NoWriteThrough (CR0_NW) #define CPUID_FEATURE_CLFLUSH_BIT 19 +#define CPUID_FEATURE_SELF_SNOOP_BIT 27 #if !defined(__ASSEMBLER__) +#include <arch/cpuid.h> #include <stdbool.h> #include <stddef.h> @@ -51,6 +53,16 @@ static __always_inline void enable_cache(void) write_cr0(cr0); } +/* + * Cache flushing is the most time-consuming step when programming the MTRRs. + * However, if the processor supports cache self-snooping (ss), we can skip + * this step and save time. + */ +static __always_inline bool self_snooping_supported(void) +{ + return (cpuid_edx(1) >> CPUID_FEATURE_SELF_SNOOP_BIT) & 1; +} + static __always_inline void disable_cache(void) { /* Disable and write back the cache */ @@ -58,7 +70,8 @@ static __always_inline void disable_cache(void) cr0 = read_cr0(); cr0 |= CR0_CD; write_cr0(cr0); - wbinvd(); + if (!self_snooping_supported()) + wbinvd(); } #endif /* !__ASSEMBLER__ */ |