Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add nix build definitions #745

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aws-nslick
Copy link
Contributor

@aws-nslick aws-nslick commented Dec 9, 2024

Issue #, if available:

Description of changes:

This adds a nix flake for aws-ofi-nccl.

Usage:

  1. To use the development shell, install direnv then create .envrc with contents:
watch_file nix/shell.nix
use_flake . --impure

This will create a .clangd file which can be used for IDE-like features atop clangd. Individual editor configuration varies. Example emacs configuration using doom emacs:

(use-package! eglot
  :config
  (add-to-list 'eglot-server-programs
               `((c-mode c-ts-mode c++-mode c++-ts-mode objc-mode)
                 . ("clangd"
                    "-j=8"
                    "-background-index-priority=normal"
                    "--enable-config"
                    "--all-scopes-completion"
                    "--header-insertion=iwyu"
                    "--header-insertion-decorators"
                    "--log=verbose"
                    "--clang-tidy"
                    "--background-index"
                    "--malloc-trim"
                    "--completion-style=detailed"
                    "--import-insertions"
                    "--pch-storage=memory"
                    "--function-arg-placeholders=1"))))
  1. To build aws-ofi-nccl, libfabric, openmpi, etc.: nix build .#default.

  2. To build wrappers for nccl-tests that run atop ubuntu, run: nix build .#ubuntu-test-runners. ./result/bin/ will contain shell scripts that LD_PRELOAD nvidia libraries from ubuntu expected locations, then forwards args to the nccl-test in question.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@aws-nslick aws-nslick requested a review from a team as a code owner December 9, 2024 23:02
@aws-nslick aws-nslick force-pushed the nix-for-master branch 3 times, most recently from 6bca13b to bdfd488 Compare December 12, 2024 17:14
Copy link
Contributor

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix isn't really my thing, but I have no objection to most of this patch (other than the Libfabric patch I called out below). Other than that, I'm not sure it's a competent review, but I'd be fine approving.

nix/overlays/libfabric/0001-cuda-dlopen-correctly.patch Outdated Show resolved Hide resolved
Signed-off-by: Nicholas Sielicki <nslick@amazon.com>
@aws-nslick
Copy link
Contributor Author

@ConnorBaker if you have any free time, any comments or criticisms are much appreciated.

Comment on lines +17 to +26
dirs = {
third-party = ./../../../3rd-party;
docs = ./../../../doc;
headers = ./../../../include;
mfour = ./../../../m4;
nix = ./../../../nix;
tus = ./../../../src;
tests = ./../../../tests;
topologies = ./../../../topology;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unused?

Comment on lines +51 to +55
cleanRepo = traceVal (gitTracked ../../../.);
cleaned = x: intersection x (gitTracked ../../../.);
sourceFiles = cleaned (sourceFilter ../../../.);
buildFiles = cleaned (buildFileFilter ../../../.);
thirdPartyFiles = cleaned ../../../.;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend adding root = ../../.. to the let in block here and then using that instead of ../../../. (which is equivalent) everywhere. At the end of the let in you can then do inherit root; in the lib.fileset.toSource call.


cudaBuildDepsJoined = symlinkJoin {
name = "cuda-build-deps-joined";
paths = lib.optionals (cudaSupport) (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
paths = lib.optionals (cudaSupport) (
paths = lib.optionals cudaSupport (

Unnecessary parenthesis.

Comment on lines +43 to +56
[
(lib.getDev cudaPackages.cuda_nvcc)
cudaPackages.cuda_cudart.include
]
++ (
if effectiveStdenv.hostPlatform.isStatic then
[
(lib.getOutput "static" cudaPackages.cuda_cudart)
]
else
[
(lib.getLib cudaPackages.cuda_cudart)
]
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[
(lib.getDev cudaPackages.cuda_nvcc)
cudaPackages.cuda_cudart.include
]
++ (
if effectiveStdenv.hostPlatform.isStatic then
[
(lib.getOutput "static" cudaPackages.cuda_cudart)
]
else
[
(lib.getLib cudaPackages.cuda_cudart)
]
)
[
(lib.getDev cudaPackages.cuda_nvcc)
cudaPackages.cuda_cudart.include
(
lib.getOutput
(if effectiveStdenv.hostPlatform.isStatic then "static" else "lib")
cudaPackages.cuda_cudart
)
]

Comment on lines +75 to +84
(import ./nix/cudaPackagesExtensions/0001-add-latest-nccl.nix { inherit (pkgs) fetchFromGitHub; })
(import ./nix/cudaPackagesExtensions/0002-use-latest-nccl.nix)
(import ./nix/cudaPackagesExtensions/0003-nccl-tests-use-mpi.nix { inherit config; })
(import ./nix/cudaPackagesExtensions/0004-add-ncclAws.nix {
inherit lib config;
inherit (pkgs) symlinkJoin patchelf;
})
(import ./nix/cudaPackagesExtensions/0005-add-nccl-tests-aws.nix {
inherit (pkgs) replaceDependency;
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not push the arguments inside the extension? That is, instead of passing pkgs.fetchFromGitHub to the extension, you should be able to have something like

ffinal:
let 
 inherit (ffinal.pkgs) fetchFromGitHub;
in
pprev:
...

pkgs is a member of every cudaPackages package set; this would also prevent your pkgs from contaminating the overlays generated by overlayAttrs (because you're passing things from your pkgs to them directly instead of allowing the extension to retrieve them from whatever fixed point it is being evaluated within).

I'd recommend changing usages in the extensions of config.packages.blarg with ffinal.pkgs.blarg for the same reason; if you're using overlays properly, everything that you'd normally need to pass through directly will be available in the resulting Nixpkgs instantiation which is used to evaluate these extensions.

Comment on lines +93 to +97
aws-ofi-nccl = (
pkgs.callPackage ./nix/pkgs/aws-ofi-nccl {
inherit inputs self;
}
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aws-ofi-nccl = (
pkgs.callPackage ./nix/pkgs/aws-ofi-nccl {
inherit inputs self;
}
);
aws-ofi-nccl = pkgs.callPackage ./nix/pkgs/aws-ofi-nccl {
inherit inputs self;
};

rev = "v2.23.4-1";
hash = "sha256-DlMxlLO2F079fBkhORNPVN/ASYiVIRfLJw7bDoiClHw=";
};
name = "cuda${ffinal.cudaMajorMinorPatchVersion}-nccl-2.23.4-1";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to change the name? Using backendStdenv, name should be the result of ${backendStdenv.cudaNamePrefix}-${pname}-${version} (https://github.com/ConnorBaker/cuda-packages/blob/bb6bf09b919e8e7dda006bcee9a48675d034850f/cuda-packages/common/backendStdenv.nix#L55) by default, so updating the version should be enough.

@@ -0,0 +1,13 @@
{ fetchFromGitHub }:
ffinal: pprev: {
nccl_latest = pprev.nccl.overrideAttrs (prevAttrs: {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like prevAttrs is being used -- remember that overrideAttrs takes three different types of argument:

  • attribute set (unconditional overrides)
  • function to attribute set (uses previous values)
  • function to function to attribute set (uses previous and final values)

Copy link

@ConnorBaker ConnorBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nix code looks good to me, left a few comments/nits -- do note that I'm not able to test it at the moment.

@aws-nslick
Copy link
Contributor Author

Thank you for taking the time, I really appreciate it. Taking every change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants