generated from mlcommons/mlperf_inference_submissions
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Results from GH action on NVIDIA_RTX4090x2
- Loading branch information
1 parent
63f3664
commit 89807da
Showing
81 changed files
with
24,520 additions
and
0 deletions.
There are no files selected for viewing
133 changes: 133 additions & 0 deletions
133
...ia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Nov 7 2024, 13:10:47) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.5.2 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094 | ||
|
||
cm run script \ | ||
--tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_valid,_r4.1-dev_default,_multistream \ | ||
--quiet=true \ | ||
--env.CM_QUIET=yes \ | ||
--env.CM_MLPERF_IMPLEMENTATION=nvidia \ | ||
--env.CM_MLPERF_MODEL=resnet50 \ | ||
--env.CM_MLPERF_RUN_STYLE=valid \ | ||
--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \ | ||
--env.CM_DOCKER_PRIVILEGED_MODE=True \ | ||
--env.CM_MLPERF_BACKEND=tensorrt \ | ||
--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter,edge \ | ||
--env.CM_MLPERF_CLEAN_ALL=True \ | ||
--env.CM_MLPERF_DEVICE=cuda \ | ||
--env.CM_MLPERF_SUBMISSION_DIVISION=closed \ | ||
--env.CM_MLPERF_USE_DOCKER=True \ | ||
--env.CM_NVIDIA_GPU_NAME=rtx_4090 \ | ||
--env.CM_HW_NAME=RTX4090x2 \ | ||
--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_CODE_CHANGES=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_SRC_CHANGES=yes \ | ||
--env.OUTPUT_BASE_DIR=/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/gh_action_submissions \ | ||
--env.CM_MLPERF_SUBMITTER=MLCommons \ | ||
--env.CM_USE_DATASET_FROM_HOST=yes \ | ||
--env.CM_USE_MODEL_FROM_HOST=yes \ | ||
--env.CM_MLPERF_LOADGEN_ALL_SCENARIOS=yes \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE=yes \ | ||
--env.CM_MLPERF_SUBMISSION_RUN=yes \ | ||
--env.CM_RUN_MLPERF_ACCURACY=on \ | ||
--env.CM_RUN_SUBMISSION_CHECKER=yes \ | ||
--env.CM_TAR_SUBMISSION_DIR=yes \ | ||
--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full \ | ||
--env.CM_MLPERF_INFERENCE_VERSION=5.0-dev \ | ||
--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \ | ||
--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \ | ||
--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=5.0.4 \ | ||
--env.CM_MLPERF_LAST_RELEASE=v5.0 \ | ||
--env.CM_TMP_PIP_VERSION_STRING= \ | ||
--env.CM_MODEL=resnet50 \ | ||
--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \ | ||
--env.CM_RERUN=yes \ | ||
--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \ | ||
--env.CM_MLPERF_LOADGEN_MODE=performance \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIO=MultiStream \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIOS,=SingleStream,Offline,MultiStream,Server \ | ||
--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \ | ||
--env.CM_OUTPUT_FOLDER_NAME=valid_results \ | ||
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \ | ||
--env.CM_DOCKER_DETACHED_MODE=yes \ | ||
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \ | ||
--env.CM_DOCKER_CONTAINER_ID=242af263479b \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST04 \ | ||
--add_deps_recursive.compiler.tags=gcc \ | ||
--add_deps_recursive.coco2014-original.tags=_full \ | ||
--add_deps_recursive.coco2014-preprocessed.tags=_full \ | ||
--add_deps_recursive.imagenet-original.tags=_full \ | ||
--add_deps_recursive.imagenet-preprocessed.tags=_full \ | ||
--add_deps_recursive.openimages-original.tags=_full \ | ||
--add_deps_recursive.openimages-preprocessed.tags=_full \ | ||
--add_deps_recursive.openorca-original.tags=_full \ | ||
--add_deps_recursive.openorca-preprocessed.tags=_full \ | ||
--add_deps_recursive.coco2014-dataset.tags=_full \ | ||
--add_deps_recursive.igbh-dataset.tags=_full \ | ||
--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--adr.compiler.tags=gcc \ | ||
--adr.coco2014-original.tags=_full \ | ||
--adr.coco2014-preprocessed.tags=_full \ | ||
--adr.imagenet-original.tags=_full \ | ||
--adr.imagenet-preprocessed.tags=_full \ | ||
--adr.openimages-original.tags=_full \ | ||
--adr.openimages-preprocessed.tags=_full \ | ||
--adr.openorca-original.tags=_full \ | ||
--adr.openorca-preprocessed.tags=_full \ | ||
--adr.coco2014-dataset.tags=_full \ | ||
--adr.igbh-dataset.tags=_full \ | ||
--adr.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--adr.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--adr.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--v=False \ | ||
--print_env=False \ | ||
--print_deps=False \ | ||
--dump_version_info=True \ | ||
--env.CM_DATASET_IMAGENET_PATH=/home/cmuser/CM/repos/local/cache/6920a25715cb4646/imagenet-2012-val \ | ||
--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/gh_action_submissions \ | ||
--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/4db00c74da1e44c8 | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@mlperf-automations | ||
cm pull repo mlcommons@mlperf-automations | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
`acc`: `76.064`, Required accuracy for closed division `>= 75.6954` | ||
|
||
### Performance Results | ||
`Samples per query`: `502795.0` |
7 changes: 7 additions & 0 deletions
7
.../resnet50/multistream/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"starting_weights_filename": "https://zenodo.org/record/2592612/files/resnet50_v1.onnx", | ||
"retraining": "no", | ||
"input_data_types": "int8", | ||
"weight_data_types": "int8", | ||
"weight_transformations": "no" | ||
} |
94 changes: 94 additions & 0 deletions
94
...a_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
[2024-12-23 00:11:50,691 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2 | ||
[2024-12-23 00:11:50,868 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream | ||
[2024-12-23 00:11:50,868 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50 | ||
[2024-12-23 00:11:50,868 __init__.py:53 INFO] Overriding Environment | ||
benchmark : Benchmark.ResNet50 | ||
buffer_manager_thread_count : 0 | ||
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data | ||
disable_beta1_smallk : True | ||
gpu_batch_size : 8 | ||
gpu_copy_streams : 1 | ||
gpu_inference_streams : 1 | ||
input_dtype : int8 | ||
input_format : linear | ||
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.23-00.11.49 | ||
map_path : data_maps/imagenet/val_map.txt | ||
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf | ||
multi_stream_expected_latency_ns : 0 | ||
multi_stream_samples_per_query : 8 | ||
multi_stream_target_latency_percentile : 99 | ||
precision : int8 | ||
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data | ||
scenario : Scenario.MultiStream | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.334532, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334532000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='RTX4090x2') | ||
tensor_path : build/preprocessed_data/imagenet/ResNet50/int8_linear | ||
test_mode : AccuracyOnly | ||
use_deque_limit : True | ||
use_graphs : True | ||
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf | ||
system_id : RTX4090x2 | ||
config_name : RTX4090x2_resnet50_MultiStream | ||
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 1 | ||
config_ver : lwis_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : lwis | ||
skip_file_checks : False | ||
power_limit : None | ||
cpu_freq : None | ||
&&&& RUNNING Default_Harness # ./build/bin/harness_default | ||
[I] mlperf.conf path: /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf | ||
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf | ||
Creating QSL. | ||
Finished Creating QSL. | ||
Setting up SUT. | ||
[I] [TRT] Loaded engine size: 26 MiB | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 78, GPU 837 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 79, GPU 847 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB) | ||
[I] Device:0.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded. | ||
[I] [TRT] Loaded engine size: 26 MiB | ||
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 581 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 110, GPU 591 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +25, now: CPU 0, GPU 49 (MiB) | ||
[I] Device:1.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded. | ||
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.) | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 839 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 83, GPU 847 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 66 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 84, GPU 583 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 84, GPU 591 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 83 (MiB) | ||
[I] Start creating CUDA graphs | ||
[I] Capture 8 CUDA graphs | ||
[I] Capture 8 CUDA graphs | ||
[I] Finish creating CUDA graphs | ||
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false | ||
Finished setting up SUT. | ||
Starting warmup. Running for a minimum of 5 seconds. | ||
Finished warmup. Ran for 5.02405s. | ||
Starting running actual test. | ||
|
||
No warnings encountered during test. | ||
|
||
No errors encountered during test. | ||
Finished running actual test. | ||
Device Device:0.GPU processed: | ||
3125 batches of size 8 | ||
Memcpy Calls: 0 | ||
PerSampleCudaMemcpy Calls: 0 | ||
BatchedCudaMemcpy Calls: 3125 | ||
Device Device:1.GPU processed: | ||
3125 batches of size 8 | ||
Memcpy Calls: 0 | ||
PerSampleCudaMemcpy Calls: 0 | ||
BatchedCudaMemcpy Calls: 3125 | ||
&&&& PASSED Default_Harness # ./build/bin/harness_default | ||
[2024-12-23 00:12:06,094 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
[2024-12-23 00:12:06,094 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-imagenet.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy/mlperf_log_accuracy.json --imagenet-val-file data_maps/imagenet/val_map.txt --dtype int32 | ||
accuracy=76.064%, good=38032, total=50000 | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.