Skip to content

Commit

Permalink
[AutoBump] Merge with 50b1534 (Jun 27)
Browse files Browse the repository at this point in the history
  • Loading branch information
mgehre-amd committed Sep 13, 2024
2 parents 7d62407 + 50b1534 commit 6d8ed84
Show file tree
Hide file tree
Showing 1,230 changed files with 105,843 additions and 47,723 deletions.
8 changes: 4 additions & 4 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ clang/test/AST/Interp/ @tbaederr
/mlir/Dialect/*/Transforms/Bufferize.cpp @matthias-springer

# Linalg Dialect in MLIR.
/mlir/include/mlir/Dialect/Linalg/* @dcaballe @nicolasvasilache @rengolin
/mlir/lib/Dialect/Linalg/* @dcaballe @nicolasvasilache @rengolin
/mlir/include/mlir/Dialect/Linalg @dcaballe @nicolasvasilache @rengolin
/mlir/lib/Dialect/Linalg @dcaballe @nicolasvasilache @rengolin
/mlir/lib/Dialect/Linalg/Transforms/DecomposeLinalgOps.cpp @MaheshRavishankar @nicolasvasilache
/mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp @MaheshRavishankar @nicolasvasilache
/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp @MaheshRavishankar @nicolasvasilache
Expand All @@ -85,8 +85,8 @@ clang/test/AST/Interp/ @tbaederr
/mlir/**/*VectorToSCF* @banach-space @dcaballe @matthias-springer @nicolasvasilache
/mlir/**/*VectorToLLVM* @banach-space @dcaballe @nicolasvasilache
/mlir/**/*X86Vector* @aartbik @dcaballe @nicolasvasilache
/mlir/include/mlir/Dialect/Vector/* @dcaballe @nicolasvasilache
/mlir/lib/Dialect/Vector/* @dcaballe @nicolasvasilache
/mlir/include/mlir/Dialect/Vector @dcaballe @nicolasvasilache
/mlir/lib/Dialect/Vector @dcaballe @nicolasvasilache
/mlir/lib/Dialect/Vector/Transforms/* @hanhanW @nicolasvasilache
/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp @MaheshRavishankar @nicolasvasilache
/mlir/**/*EmulateNarrowType* @dcaballe @hanhanW
Expand Down
120 changes: 120 additions & 0 deletions bolt/docs/OptimizingLinux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Optimizing Linux Kernel with BOLT


## Introduction

Many Linux applications spend a significant amount of their execution time in the kernel. Thus, when we consider code optimization for system performance, it is essential to improve the CPU utilization not only in the user-space applications and libraries but also in the kernel. BOLT has demonstrated double-digit gains while being applied to user-space programs. This guide shows how to apply BOLT to the x86-64 Linux kernel and enhance your system's performance. In our experiments, BOLT boosted database TPS by 2 percent when applied to the kernel compiled with the highest level optimizations, including PGO and LTO. The database spent ~40% of the time in the kernel and was quite sensitive to kernel performance.

BOLT optimizes code layout based on a low-level execution profile collected with the Linux `perf` tool. The best quality profile should include branch history, such as Intel's last branch records (LBR). BOLT runs on a linked binary and reorders the code while combining frequently executed blocks of instructions in a manner best suited for the hardware. Other than branch instructions, most of the code is left unchanged. Additionally, BOLT updates all metadata associated with the modified code, including DWARF debug information and Linux ORC unwind information.

While BOLT optimizations are not specific to the Linux kernel, certain quirks distinguish the kernel from user-level applications.

BOLT has been successfully applied to and tested with several flavors of the x86-64 Linux kernel.


## QuickStart Guide

BOLT operates on a statically-linked kernel executable, a.k.a. `vmlinux` binary. However, most Linux distributions use a `vmlinuz` compressed image for system booting. To use BOLT on the kernel, you must either repackage `vmlinuz` after BOLT optimizations or add steps for running BOLT into the kernel build and rebuild `vmlinuz`. Uncompressing `vmlinuz` and repackaging it with a new `vmlinux` binary falls beyond the scope of this guide, and at some point, we may add the capability to run BOLT directly on `vmlinuz`. Meanwhile, this guide focuses on steps for integrating BOLT into the kernel build process.


### Building the Kernel

After downloading the kernel sources and configuration for your distribution, you should be able to build `vmlinuz` using the `make bzImage` command. Ideally, the kernel should binary match the kernel on the system you are about to optimize (the target system). The binary matching part is critical as BOLT performs profile matching and optimizations at the binary level. We recommend installing a freshly built kernel on the target system to avoid any discrepancies.

Note that the kernel build will produce several artifacts besides bzImage. The most important of them is the uncompressed `vmlinux` binary, which will be used in the next steps. Make sure to save this file.

Build and target systems should have a `perf` tool installed for collecting and processing profiles. If your build system differs from the target, make sure `perf` versions are compatible. The build system should also have the latest BOLT binary and tools (`llvm-bolt`, `perf2bolt`).

Once the target system boots with the freshly-built kernel, start your workload, such as a database benchmark. While the system is under load, collect the kernel profile using perf:


```bash
$ sudo perf record -a -e cycles -j any,k -F 5000 -- sleep 600
```


Convert `perf` profile into a format suitable for BOLT passing the `vmlinux` binary to `perf2bolt`:


```bash
$ sudo chwon $USER perf.data
$ perf2bolt -p perf.data -o perf.fdata vmlinux
```


Under a high load, `perf.data` should be several gigabytes in size and you should expect the converted `perf.fdata` not to exceed 100 MB.

Two changes are required for the kernel build. The first one is optional but highly recommended. It introduces a BOLT-reserved space into `vmlinux` code section:


```diff
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -139,6 +139,11 @@ SECTIONS
STATIC_CALL_TEXT
*(.gnu.warning)

+ /* Allocate space for BOLT */
+ __bolt_reserved_start = .;
+ . += 2048 * 1024;
+ __bolt_reserved_end = .;
+
#ifdef CONFIG_RETPOLINE
__indirect_thunk_start = .;
*(.text.__x86.*)
```


The second patch adds a step that runs BOLT on `vmlinux` binary:


```diff
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -340,5 +340,13 @@ if is_enabled CONFIG_KALLSYMS; then
fi
fi

+# Apply BOLT
+BOLT=llvm-bolt
+BOLT_PROFILE=perf.fdata
+BOLT_OPTS="--dyno-stats --eliminate-unreachable=0 --reorder-blocks=ext-tsp --simplify-conditional-tail-calls=0 --skip-funcs=__entry_text_start,irq_entries_start --split-functions"
+mv vmlinux vmlinux.pre-bolt
+echo BOLTing vmlinux
+${BOLT} vmlinux.pre-bolt -o vmlinux --data ${BOLT_PROFILE} ${BOLT_OPTS}
+
# For fixdep
echo "vmlinux: $0" > .vmlinux.d
```


If you skipped the first step or are running BOLT on a pre-built `vmlinux` binary, drop the `--split-functions` option.


## Performance Expectations

By improving the code layout, BOLT can boost the kernel's performance by up to 5% by reducing instruction cache misses and branch mispredictions. When measuring total system performance, you should scale this number accordingly based on the time your application spends in the kernel (excluding I/O time).


## Profile Quality

The timing and duration of the profiling may have a significant effect on the performance of the BOLTed kernel. If you don't know your workload well, it's recommended that you profile for the whole duration of the benchmark run. As longer times will result in larger `perf.data` files, you can lower the profiling frequency by providing a smaller value of `-F` flag. E.g., to record the kernel profile for half an hour, use the following command:


```bash
$ sudo perf record -a -e cycles -j any,k -F 1000 -- sleep 1800
```



## BOLT Disassembly

BOLT annotates the disassembly with control-flow information and attaches Linux-specific metadata to the code. To view annotated disassembly, run:


```bash
$ llvm-bolt vmlinux -o /dev/null --print-cfg
```


If you want to limit the disassembly to a set of functions, add `--print-only=<func1regex>,<func2regex>,...`, where a function name is specified using regular expressions.
4 changes: 2 additions & 2 deletions clang-tools-extra/clang-doc/tool/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ set(assets
)

set(asset_dir "${CMAKE_CURRENT_SOURCE_DIR}/../assets")
set(resource_dir "${CMAKE_BINARY_DIR}/share/clang")
set(resource_dir "${CMAKE_BINARY_DIR}/share/clang-doc")
set(out_files)

function(copy_files_to_dst src_dir dst_dir file)
Expand All @@ -42,7 +42,7 @@ endfunction(copy_files_to_dst)

foreach(f ${assets})
install(FILES ${asset_dir}/${f}
DESTINATION "${CMAKE_INSTALL_DATADIR}/clang"
DESTINATION "${CMAKE_INSTALL_DATADIR}/clang-doc"
COMPONENT clang-doc)
copy_files_to_dst(${asset_dir} ${resource_dir} ${f})
endforeach(f)
Expand Down
2 changes: 1 addition & 1 deletion clang-tools-extra/clang-doc/tool/ClangDocMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Example usage for a project using a compile commands database:
llvm::sys::path::native(ClangDocPath, NativeClangDocPath);
llvm::SmallString<128> AssetsPath;
AssetsPath = llvm::sys::path::parent_path(NativeClangDocPath);
llvm::sys::path::append(AssetsPath, "..", "share", "clang");
llvm::sys::path::append(AssetsPath, "..", "share", "clang-doc");
llvm::SmallString<128> DefaultStylesheet;
llvm::sys::path::native(AssetsPath, DefaultStylesheet);
llvm::sys::path::append(DefaultStylesheet,
Expand Down
44 changes: 42 additions & 2 deletions clang-tools-extra/clang-tidy/misc/UseInternalLinkageCheck.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,26 @@

using namespace clang::ast_matchers;

namespace clang::tidy {

template <>
struct OptionEnumMapping<misc::UseInternalLinkageCheck::FixModeKind> {
static llvm::ArrayRef<
std::pair<misc::UseInternalLinkageCheck::FixModeKind, StringRef>>
getEnumMapping() {
static constexpr std::pair<misc::UseInternalLinkageCheck::FixModeKind,
StringRef>
Mapping[] = {
{misc::UseInternalLinkageCheck::FixModeKind::None, "None"},
{misc::UseInternalLinkageCheck::FixModeKind::UseStatic,
"UseStatic"},
};
return {Mapping};
}
};

} // namespace clang::tidy

namespace clang::tidy::misc {

namespace {
Expand Down Expand Up @@ -57,6 +77,16 @@ AST_POLYMORPHIC_MATCHER(isExternStorageClass,

} // namespace

UseInternalLinkageCheck::UseInternalLinkageCheck(StringRef Name,
ClangTidyContext *Context)
: ClangTidyCheck(Name, Context),
HeaderFileExtensions(Context->getHeaderFileExtensions()),
FixMode(Options.get("FixMode", FixModeKind::UseStatic)) {}

void UseInternalLinkageCheck::storeOptions(ClangTidyOptions::OptionMap &Opts) {
Options.store(Opts, "FixMode", FixMode);
}

void UseInternalLinkageCheck::registerMatchers(MatchFinder *Finder) {
auto Common =
allOf(isFirstDecl(), isAllRedeclsInMainFile(HeaderFileExtensions),
Expand All @@ -82,11 +112,21 @@ static constexpr StringRef Message =

void UseInternalLinkageCheck::check(const MatchFinder::MatchResult &Result) {
if (const auto *FD = Result.Nodes.getNodeAs<FunctionDecl>("fn")) {
diag(FD->getLocation(), Message) << "function" << FD;
DiagnosticBuilder DB = diag(FD->getLocation(), Message) << "function" << FD;
SourceLocation FixLoc = FD->getTypeSpecStartLoc();
if (FixLoc.isInvalid() || FixLoc.isMacroID())
return;
if (FixMode == FixModeKind::UseStatic)
DB << FixItHint::CreateInsertion(FixLoc, "static ");
return;
}
if (const auto *VD = Result.Nodes.getNodeAs<VarDecl>("var")) {
diag(VD->getLocation(), Message) << "variable" << VD;
DiagnosticBuilder DB = diag(VD->getLocation(), Message) << "variable" << VD;
SourceLocation FixLoc = VD->getTypeSpecStartLoc();
if (FixLoc.isInvalid() || FixLoc.isMacroID())
return;
if (FixMode == FixModeKind::UseStatic)
DB << FixItHint::CreateInsertion(FixLoc, "static ");
return;
}
llvm_unreachable("");
Expand Down
11 changes: 8 additions & 3 deletions clang-tools-extra/clang-tidy/misc/UseInternalLinkageCheck.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,22 @@ namespace clang::tidy::misc {
/// http://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html
class UseInternalLinkageCheck : public ClangTidyCheck {
public:
UseInternalLinkageCheck(StringRef Name, ClangTidyContext *Context)
: ClangTidyCheck(Name, Context),
HeaderFileExtensions(Context->getHeaderFileExtensions()) {}
UseInternalLinkageCheck(StringRef Name, ClangTidyContext *Context);
void registerMatchers(ast_matchers::MatchFinder *Finder) override;
void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
void storeOptions(ClangTidyOptions::OptionMap &Opts) override;
std::optional<TraversalKind> getCheckTraversalKind() const override {
return TK_IgnoreUnlessSpelledInSource;
}

enum class FixModeKind {
None,
UseStatic,
};

private:
FileExtensionsSet HeaderFileExtensions;
FixModeKind FixMode;
};

} // namespace clang::tidy::misc
Expand Down
14 changes: 13 additions & 1 deletion clang-tools-extra/clang-tidy/tool/ClangTidyMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,14 @@ option is recognized.
)"),
cl::init(false), cl::cat(ClangTidyCategory));

static cl::opt<bool> AllowNoChecks("allow-no-checks", desc(R"(
Allow empty enabled checks. This suppresses
the "no checks enabled" error when disabling
all of the checks.
)"),
cl::init(false),
cl::cat(ClangTidyCategory));

namespace clang::tidy {

static void printStats(const ClangTidyStats &Stats) {
Expand Down Expand Up @@ -598,7 +606,7 @@ int clangTidyMain(int argc, const char **argv) {
}

if (ListChecks) {
if (EnabledChecks.empty()) {
if (EnabledChecks.empty() && !AllowNoChecks) {
llvm::errs() << "No checks enabled.\n";
return 1;
}
Expand Down Expand Up @@ -652,6 +660,10 @@ int clangTidyMain(int argc, const char **argv) {
}

if (EnabledChecks.empty()) {
if (AllowNoChecks) {
llvm::outs() << "No checks enabled.\n";
return 0;
}
llvm::errs() << "Error: no checks enabled.\n";
llvm::cl::PrintHelpMessage(/*Hidden=*/false, /*Categorized=*/true);
return 1;
Expand Down
7 changes: 7 additions & 0 deletions clang-tools-extra/clang-tidy/tool/clang-tidy-diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,11 @@ def main():
default=[],
help="Load the specified plugin in clang-tidy.",
)
parser.add_argument(
"-allow-no-checks",
action="store_true",
help="Allow empty enabled checks.",
)

clang_tidy_args = []
argv = sys.argv[1:]
Expand Down Expand Up @@ -327,6 +332,8 @@ def main():
common_clang_tidy_args.append("-p=%s" % args.build_path)
if args.use_color:
common_clang_tidy_args.append("--use-color")
if args.allow_no_checks:
common_clang_tidy_args.append("--allow-no-checks")
for arg in args.extra_arg:
common_clang_tidy_args.append("-extra-arg=%s" % arg)
for arg in args.extra_arg_before:
Expand Down
Loading

0 comments on commit 6d8ed84

Please sign in to comment.