Skip to content

Commit

Permalink
Results from GH action on NVIDIA_RTX4090x2
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 29, 2024
1 parent f8d3d66 commit 9c3a45a
Show file tree
Hide file tree
Showing 26 changed files with 799 additions and 797 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=467517e4a572872046058e394a0d83512cfff38b
cm pull repo mlcommons@mlperf-automations --checkout=c52956b27fa8d06ec8db53f885e1f05021e379e9

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \
Expand Down Expand Up @@ -71,7 +71,7 @@ cm run script \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
--env.CM_DOCKER_DETACHED_MODE=yes \
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
--env.CM_DOCKER_CONTAINER_ID=0ea02743d854 \
--env.CM_DOCKER_CONTAINER_ID=6733602d12a8 \
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \
--add_deps_recursive.compiler.tags=gcc \
--add_deps_recursive.coco2014-original.tags=_full \
Expand Down Expand Up @@ -129,4 +129,4 @@ Model Precision: int8
`F1`: `90.15674`, Required accuracy for closed division `>= 89.96526`

### Performance Results
`Samples per second`: `8277.86`
`Samples per second`: `8237.36`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[2024-12-27 20:25:43,536 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-27 20:25:44,069 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/bert-99/Offline
[2024-12-27 20:25:44,069 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/6c5d0d8c0f4f47c1/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3ef477688d004a39a48da5ba31ae9c98.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-27 20:25:44,069 __init__.py:53 INFO] Overriding Environment
[2024-12-28 20:41:21,631 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-28 20:41:22,190 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/bert-99/Offline
[2024-12-28 20:41:22,190 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/10b872089277481d/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/733a046654da43acb8b34def0e921432.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-28 20:41:22,190 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.BERT
buffer_manager_thread_count : 0
coalesced_tensor : True
Expand All @@ -11,8 +11,8 @@ gpu_copy_streams : 2
gpu_inference_streams : 2
input_dtype : int32
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.27-20.25.42
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/6c5d0d8c0f4f47c1/inference/mlperf.conf
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.28-20.41.20
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/10b872089277481d/inference/mlperf.conf
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data
Expand All @@ -21,7 +21,7 @@ system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='In
tensor_path : build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy
test_mode : AccuracyOnly
use_graphs : False
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3ef477688d004a39a48da5ba31ae9c98.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/733a046654da43acb8b34def0e921432.conf
system_id : RTX4090x2
config_name : RTX4090x2_bert_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -34,8 +34,8 @@ skip_file_checks : True
power_limit : None
cpu_freq : None
&&&& RUNNING BERT_HARNESS # ./build/bin/harness_bert
I1227 20:25:44.119817 20262 main_bert.cc:163] Found 2 GPUs
I1227 20:25:44.249424 20262 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
I1228 20:41:22.237586 20263 main_bert.cc:163] Found 2 GPUs
I1228 20:41:22.367789 20263 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
[I] [TRT] Loaded engine size: 414 MiB
[I] [TRT] Loaded engine size: 414 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
Expand All @@ -45,53 +45,53 @@ I1227 20:25:44.249424 20262 bert_server.cc:147] Engine Path: ./build/engines/RTX
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 737, GPU 969 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 738, GPU 979 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +291, now: CPU 1, GPU 581 (MiB)
I1227 20:25:44.739281 20262 bert_server.cc:208] Engines Creation Completed
I1227 20:25:44.759653 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.759660 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.759663 20262 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
I1228 20:41:22.845306 20263 bert_server.cc:208] Engines Creation Completed
I1228 20:41:22.863188 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1228 20:41:22.863198 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1228 20:41:22.863202 20263 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 324, GPU 1901 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 324, GPU 1909 (MiB)
I1227 20:25:44.826346 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.826371 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1228 20:41:22.928879 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1228 20:41:22.928902 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 1, GPU 581 (MiB)
I1227 20:25:44.827178 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.827324 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.827328 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.827332 20262 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
I1228 20:41:22.929729 20263 bert_core_vs.cc:476] Setup complete
I1228 20:41:22.929889 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1228 20:41:22.929893 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1228 20:41:22.929896 20263 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 447, GPU 1645 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 447, GPU 1653 (MiB)
I1227 20:25:44.893194 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.893208 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1228 20:41:22.995138 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1228 20:41:22.995153 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 1, GPU 581 (MiB)
I1227 20:25:44.894070 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.894234 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.894239 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.894243 20262 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
I1228 20:41:22.995955 20263 bert_core_vs.cc:476] Setup complete
I1228 20:41:22.996120 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1228 20:41:22.996124 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1228 20:41:22.996127 20263 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 570, GPU 2715 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 570, GPU 2725 (MiB)
I1227 20:25:44.957968 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 1
I1228 20:41:23.060681 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 2, GPU 581 (MiB)
I1227 20:25:44.958278 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1227 20:25:44.959084 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.959231 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.959236 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.959239 20262 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
I1228 20:41:23.061033 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1228 20:41:23.061902 20263 bert_core_vs.cc:476] Setup complete
I1228 20:41:23.062060 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1228 20:41:23.062063 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1228 20:41:23.062067 20263 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 693, GPU 2459 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 693, GPU 2469 (MiB)
I1227 20:25:45.022871 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 1
I1228 20:41:23.127297 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 2, GPU 581 (MiB)
I1227 20:25:45.023195 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1227 20:25:45.024029 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:45.478945 20262 main_bert.cc:184] Starting running actual test.
I1227 20:25:46.866901 20262 main_bert.cc:190] Finished running actual test.
I1228 20:41:23.127616 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1228 20:41:23.128468 20263 bert_core_vs.cc:476] Setup complete
I1228 20:41:23.583055 20263 main_bert.cc:184] Starting running actual test.
I1228 20:41:24.961525 20263 main_bert.cc:190] Finished running actual test.

No warnings encountered during test.

No errors encountered during test.
[2024-12-27 20:25:47,087 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-27 20:25:47,088 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
[2024-12-28 20:41:25,182 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-28 20:41:25,182 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
{"exact_match": 82.81929990539263, "f1": 90.15673510616978}
Reading examples...
Loading cached features from 'eval_features.pickle'...
Expand Down
Loading

0 comments on commit 9c3a45a

Please sign in to comment.