Skip to content

Commit

Permalink
Results from GH action on NVIDIA_RTX4090x2
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 28, 2024
1 parent 6ab5cca commit 6efa216
Show file tree
Hide file tree
Showing 29 changed files with 938 additions and 939 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=a90475d2de72bf0622cebe8d5ca8eb8c9d872fbd
cm pull repo mlcommons@mlperf-automations --checkout=467517e4a572872046058e394a0d83512cfff38b

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \
Expand Down Expand Up @@ -71,7 +71,7 @@ cm run script \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
--env.CM_DOCKER_DETACHED_MODE=yes \
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
--env.CM_DOCKER_CONTAINER_ID=c2a65ff58585 \
--env.CM_DOCKER_CONTAINER_ID=0ea02743d854 \
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \
--add_deps_recursive.compiler.tags=gcc \
--add_deps_recursive.coco2014-original.tags=_full \
Expand Down Expand Up @@ -129,4 +129,4 @@ Model Precision: int8
`F1`: `90.15674`, Required accuracy for closed division `>= 89.96526`

### Performance Results
`Samples per second`: `8268.76`
`Samples per second`: `8277.86`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[2024-12-24 20:31:04,302 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-24 20:31:04,852 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/bert-99/Offline
[2024-12-24 20:31:04,853 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/af0a19318f524c01/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/be76b677d3a64087a781cb7f69a2f3b4.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-24 20:31:04,853 __init__.py:53 INFO] Overriding Environment
[2024-12-27 20:25:43,536 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-27 20:25:44,069 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/bert-99/Offline
[2024-12-27 20:25:44,069 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/6c5d0d8c0f4f47c1/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3ef477688d004a39a48da5ba31ae9c98.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-27 20:25:44,069 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.BERT
buffer_manager_thread_count : 0
coalesced_tensor : True
Expand All @@ -11,8 +11,8 @@ gpu_copy_streams : 2
gpu_inference_streams : 2
input_dtype : int32
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.24-20.31.03
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/af0a19318f524c01/inference/mlperf.conf
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.27-20.25.42
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/6c5d0d8c0f4f47c1/inference/mlperf.conf
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data
Expand All @@ -21,7 +21,7 @@ system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='In
tensor_path : build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy
test_mode : AccuracyOnly
use_graphs : False
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/be76b677d3a64087a781cb7f69a2f3b4.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/3ef477688d004a39a48da5ba31ae9c98.conf
system_id : RTX4090x2
config_name : RTX4090x2_bert_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -34,64 +34,64 @@ skip_file_checks : True
power_limit : None
cpu_freq : None
&&&& RUNNING BERT_HARNESS # ./build/bin/harness_bert
I1224 20:31:04.902976 20263 main_bert.cc:163] Found 2 GPUs
I1224 20:31:05.030550 20263 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
I1227 20:25:44.119817 20262 main_bert.cc:163] Found 2 GPUs
I1227 20:25:44.249424 20262 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
[I] [TRT] Loaded engine size: 414 MiB
[I] [TRT] Loaded engine size: 414 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +66, GPU +8, now: CPU 787, GPU 1225 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 789, GPU 1235 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 727, GPU 1225 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 729, GPU 1235 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +581, now: CPU 0, GPU 581 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 737, GPU 969 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 738, GPU 979 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +291, now: CPU 1, GPU 581 (MiB)
I1224 20:31:05.517578 20263 bert_server.cc:208] Engines Creation Completed
I1224 20:31:05.530956 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 20:31:05.530967 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 20:31:05.530972 20263 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
I1227 20:25:44.739281 20262 bert_server.cc:208] Engines Creation Completed
I1227 20:25:44.759653 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.759660 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.759663 20262 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 324, GPU 1901 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 324, GPU 1909 (MiB)
I1224 20:31:05.595320 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.826346 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.826371 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 1, GPU 581 (MiB)
I1224 20:31:05.595347 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 20:31:05.596164 20263 bert_core_vs.cc:476] Setup complete
I1224 20:31:05.596310 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 20:31:05.596315 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 20:31:05.596319 20263 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
I1227 20:25:44.827178 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.827324 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.827328 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.827332 20262 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 447, GPU 1645 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 447, GPU 1653 (MiB)
I1224 20:31:05.660576 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.893194 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1227 20:25:44.893208 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 1, GPU 581 (MiB)
I1224 20:31:05.660593 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 20:31:05.661386 20263 bert_core_vs.cc:476] Setup complete
I1224 20:31:05.661542 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 20:31:05.661545 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 20:31:05.661548 20263 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
I1227 20:25:44.894070 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.894234 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.894239 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.894243 20262 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 570, GPU 2715 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 571, GPU 2725 (MiB)
I1224 20:31:05.724439 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 570, GPU 2725 (MiB)
I1227 20:25:44.957968 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 2, GPU 581 (MiB)
I1224 20:31:05.724767 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 20:31:05.725576 20263 bert_core_vs.cc:476] Setup complete
I1224 20:31:05.725715 20263 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 20:31:05.725718 20263 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 20:31:05.725721 20263 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 694, GPU 2459 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 694, GPU 2469 (MiB)
I1224 20:31:05.788931 20263 bert_core_vs.cc:426] Setting Opt.Prof. to 1
I1227 20:25:44.958278 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1227 20:25:44.959084 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:44.959231 20262 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1227 20:25:44.959236 20262 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1227 20:25:44.959239 20262 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 693, GPU 2459 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 693, GPU 2469 (MiB)
I1227 20:25:45.022871 20262 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 2, GPU 581 (MiB)
I1224 20:31:05.789244 20263 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 20:31:05.790045 20263 bert_core_vs.cc:476] Setup complete
I1224 20:31:06.244817 20263 main_bert.cc:184] Starting running actual test.
I1224 20:31:07.623162 20263 main_bert.cc:190] Finished running actual test.
I1227 20:25:45.023195 20262 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1227 20:25:45.024029 20262 bert_core_vs.cc:476] Setup complete
I1227 20:25:45.478945 20262 main_bert.cc:184] Starting running actual test.
I1227 20:25:46.866901 20262 main_bert.cc:190] Finished running actual test.

No warnings encountered during test.

No errors encountered during test.
[2024-12-24 20:31:07,851 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-24 20:31:07,851 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
[2024-12-27 20:25:47,087 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-27 20:25:47,088 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
{"exact_match": 82.81929990539263, "f1": 90.15673510616978}
Reading examples...
Loading cached features from 'eval_features.pickle'...
Expand Down
Loading

0 comments on commit 6efa216

Please sign in to comment.