Results from GH action on NVIDIA_RTX4090x2

mlcommons · Dec 23, 2024 · 541ca5b · 541ca5b
1 parent 57478f9
commit 541ca5b
Show file tree

Hide file tree

Showing 96 changed files with 21,721 additions and 0 deletions.
diff --git a/closed/MLCommons/code/stable-diffusion-xl/README.md b/closed/MLCommons/code/stable-diffusion-xl/README.md
@@ -0,0 +1 @@
+TBD
diff --git a/open/MLCommons/code/stable-diffusion-xl/README.md b/open/MLCommons/code/stable-diffusion-xl/README.md
@@ -0,0 +1 @@
+TBD
diff --git a/...inal-gpu-tensorrt-vdefault-default_config/stable-diffusion-xl/offline/README.md b/...inal-gpu-tensorrt-vdefault-default_config/stable-diffusion-xl/offline/README.md
@@ -0,0 +1,132 @@
+This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).
+
+*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*
+
+## Host platform
+
+* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29
+* CPU version: x86_64
+* Python version: 3.8.10 (default, Nov  7 2024, 13:10:47) 
+[GCC 9.4.0]
+* MLCommons CM version: 3.5.2
+
+## CM Run Command
+
+See [CM installation guide](https://docs.mlcommons.org/inference/install/).
+
+```bash
+pip install -U cmind
+
+cm rm cache -f
+
+cm pull repo mlcommons@mlperf-automations --checkout=7dcef66c48436c29b6faae8f6b00ee4f81265617
+
+cm run script \
+	--tags=app,mlperf,inference,generic,_nvidia,_sdxl,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \
+	--quiet=true \
+	--env.CM_QUIET=yes \
+	--env.CM_MLPERF_IMPLEMENTATION=nvidia \
+	--env.CM_MLPERF_MODEL=sdxl \
+	--env.CM_MLPERF_RUN_STYLE=valid \
+	--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \
+	--env.CM_DOCKER_PRIVILEGED_MODE=True \
+	--env.CM_MLPERF_BACKEND=tensorrt \
+	--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter,edge \
+	--env.CM_MLPERF_CLEAN_ALL=True \
+	--env.CM_MLPERF_DEVICE=cuda \
+	--env.CM_MLPERF_SUBMISSION_DIVISION=closed \
+	--env.CM_MLPERF_USE_DOCKER=True \
+	--env.CM_NVIDIA_GPU_NAME=rtx_4090 \
+	--env.CM_HW_NAME=RTX4090x2 \
+	--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=yes \
+	--env.CM_MLPERF_INFERENCE_PULL_CODE_CHANGES=yes \
+	--env.CM_MLPERF_INFERENCE_PULL_SRC_CHANGES=yes \
+	--env.OUTPUT_BASE_DIR=/home/arjun/gh_action_results \
+	--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/gh_action_submissions \
+	--env.CM_MLPERF_SUBMITTER=MLCommons \
+	--env.CM_USE_DATASET_FROM_HOST=yes \
+	--env.CM_USE_MODEL_FROM_HOST=yes \
+	--env.CM_MLPERF_LOADGEN_ALL_SCENARIOS=yes \
+	--env.CM_MLPERF_LOADGEN_COMPLIANCE=yes \
+	--env.CM_MLPERF_SUBMISSION_RUN=yes \
+	--env.CM_RUN_MLPERF_ACCURACY=on \
+	--env.CM_RUN_SUBMISSION_CHECKER=yes \
+	--env.CM_TAR_SUBMISSION_DIR=yes \
+	--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full \
+	--env.CM_MLPERF_INFERENCE_VERSION=5.0-dev \
+	--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \
+	--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \
+	--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=5.0.4 \
+	--env.CM_MLPERF_LAST_RELEASE=v5.0 \
+	--env.CM_TMP_PIP_VERSION_STRING= \
+	--env.CM_MODEL=sdxl \
+	--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \
+	--env.CM_RERUN=yes \
+	--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \
+	--env.CM_MLPERF_LOADGEN_MODE=performance \
+	--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \
+	--env.CM_MLPERF_LOADGEN_SCENARIOS,=SingleStream,Offline,Server \
+	--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \
+	--env.CM_OUTPUT_FOLDER_NAME=valid_results \
+	--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
+	--env.CM_DOCKER_DETACHED_MODE=yes \
+	--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
+	--env.CM_DOCKER_CONTAINER_ID=f55582df015c \
+	--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST04 \
+	--add_deps_recursive.compiler.tags=gcc \
+	--add_deps_recursive.coco2014-original.tags=_full \
+	--add_deps_recursive.coco2014-preprocessed.tags=_full \
+	--add_deps_recursive.imagenet-original.tags=_full \
+	--add_deps_recursive.imagenet-preprocessed.tags=_full \
+	--add_deps_recursive.openimages-original.tags=_full \
+	--add_deps_recursive.openimages-preprocessed.tags=_full \
+	--add_deps_recursive.openorca-original.tags=_full \
+	--add_deps_recursive.openorca-preprocessed.tags=_full \
+	--add_deps_recursive.coco2014-dataset.tags=_full \
+	--add_deps_recursive.igbh-dataset.tags=_full \
+	--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \
+	--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \
+	--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \
+	--adr.compiler.tags=gcc \
+	--adr.coco2014-original.tags=_full \
+	--adr.coco2014-preprocessed.tags=_full \
+	--adr.imagenet-original.tags=_full \
+	--adr.imagenet-preprocessed.tags=_full \
+	--adr.openimages-original.tags=_full \
+	--adr.openimages-preprocessed.tags=_full \
+	--adr.openorca-original.tags=_full \
+	--adr.openorca-preprocessed.tags=_full \
+	--adr.coco2014-dataset.tags=_full \
+	--adr.igbh-dataset.tags=_full \
+	--adr.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \
+	--adr.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \
+	--adr.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \
+	--v=False \
+	--print_env=False \
+	--print_deps=False \
+	--dump_version_info=True \
+	--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/gh_action_results \
+	--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/gh_action_submissions \
+	--env.SDXL_CHECKPOINT_PATH=/home/cmuser/CM/repos/local/cache/762e6805370c44eb/stable_diffusion_fp16 \
+	--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/4db00c74da1e44c8
+```
+*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
+ you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:*
+
+```bash
+cm rm repo mlcommons@mlperf-automations
+cm pull repo mlcommons@mlperf-automations
+cm rm cache -f
+
+```
+
+## Results
+
+Platform: RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config
+
+Model Precision: int8
+
+### Accuracy Results 
+
+### Performance Results 
+`Samples per second`: `1.39663`
diff --git a/...-diffusion-xl/offline/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config.json b/...-diffusion-xl/offline/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config.json
@@ -0,0 +1,7 @@
+{
+  "starting_weights_filename": "https://github.com/mlcommons/cm4mlops/blob/main/script/get-ml-model-stable-diffusion/_cm.json#L174",
+  "retraining": "no",
+  "input_data_types": "int32",
+  "weight_data_types": "int8",
+  "weight_transformations": "quantization, affine fusion"
+}
diff --git a/...nal-gpu-tensorrt-vdefault-default_config/stable-diffusion-xl/offline/accuracy_console.out b/...nal-gpu-tensorrt-vdefault-default_config/stable-diffusion-xl/offline/accuracy_console.out
@@ -0,0 +1,79 @@
+[2024-12-23 07:19:35,172 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-12-23 07:19:36,490 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/stable-diffusion-xl/Offline
+[2024-12-23 07:19:36,490 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=2 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/e26690e93e5f474d826157ebba2a2a17.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan,./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
+[2024-12-23 07:19:36,490 __init__.py:53 INFO] Overriding Environment
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
+  warnings.warn(_BETA_TRANSFORMS_WARNING)
+[2024-12-23 07:19:38,323 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:38,461 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:39,150 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:40,549 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:41,787 backend.py:96 INFO] Enabling cuda graphs for unet
+[2024-12-23 07:19:41,997 backend.py:154 INFO] captured graph for BS=1
+[2024-12-23 07:19:42,250 backend.py:154 INFO] captured graph for BS=2
+[2024-12-23 07:19:42,393 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:42,526 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:43,216 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:44,598 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
+[2024-12-23 07:19:45,786 backend.py:96 INFO] Enabling cuda graphs for unet
+[2024-12-23 07:19:45,932 backend.py:154 INFO] captured graph for BS=1
+[2024-12-23 07:19:46,183 backend.py:154 INFO] captured graph for BS=2
+[2024-12-23 07:19:46,184 harness.py:207 INFO] Start Warm Up!
+[2024-12-23 07:19:57,547 harness.py:209 INFO] Warm Up Done!
+[2024-12-23 07:19:57,547 harness.py:211 INFO] Start Test!
+[2024-12-23 08:19:38,444 backend.py:801 INFO] [Server] Received 5000 total samples
+[2024-12-23 08:19:38,445 backend.py:809 INFO] [Device 0] Reported 2494 samples
+[2024-12-23 08:19:38,445 backend.py:809 INFO] [Device 1] Reported 2506 samples
+[2024-12-23 08:19:38,445 harness.py:214 INFO] Test Done!
+[2024-12-23 08:19:38,445 harness.py:216 INFO] Destroying SUT...
+[2024-12-23 08:19:38,445 harness.py:219 INFO] Destroying QSL...
+benchmark : Benchmark.SDXL
+buffer_manager_thread_count : 0
+data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
+gpu_batch_size : 2
+gpu_copy_streams : 1
+gpu_inference_streams : 1
+input_dtype : int32
+input_format : linear
+log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.23-07.19.33
+mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
+model_path : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/models/SDXL/
+offline_expected_qps : 0.0
+precision : int8
+preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data
+scenario : Scenario.Offline
+system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.334532, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334532000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='RTX4090x2')
+tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
+test_mode : AccuracyOnly
+use_graphs : True
+user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/e26690e93e5f474d826157ebba2a2a17.conf
+system_id : RTX4090x2
+config_name : RTX4090x2_stable-diffusion-xl_Offline
+workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
+optimization_level : plugin-enabled
+num_profiles : 1
+config_ver : custom_k_99_MaxP
+accuracy_level : 99%
+inference_server : custom
+skip_file_checks : False
+power_limit : None
+cpu_freq : None
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
+[I] Loading bytes from ./build/engines/RTX4090x2/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
+[2024-12-23 08:19:38,939 run_harness.py:166 INFO] Result: Accuracy run detected.
+
+======================== Result summaries: ========================
+