Skip to content

Commit

Permalink
Results from GH action on NVIDIA_RTX4090x2
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 25, 2024
1 parent 9bea238 commit 7b24f2b
Show file tree
Hide file tree
Showing 52 changed files with 4,022 additions and 4,024 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094
cm pull repo mlcommons@mlperf-automations --checkout=a90475d2de72bf0622cebe8d5ca8eb8c9d872fbd

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_resnet50,_tensorrt,_cuda,_valid,_r4.1-dev_default,_multistream \
Expand Down Expand Up @@ -71,7 +71,7 @@ cm run script \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
--env.CM_DOCKER_DETACHED_MODE=yes \
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
--env.CM_DOCKER_CONTAINER_ID=242af263479b \
--env.CM_DOCKER_CONTAINER_ID=e8ed1d33e4c0 \
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST04 \
--add_deps_recursive.compiler.tags=gcc \
--add_deps_recursive.coco2014-original.tags=_full \
Expand Down Expand Up @@ -130,4 +130,4 @@ Model Precision: int8
`acc`: `76.064`, Required accuracy for closed division `>= 75.6954`

### Performance Results
`Samples per query`: `502795.0`
`Samples per query`: `498904.0`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[2024-12-23 00:11:50,691 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-23 00:11:50,868 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
[2024-12-23 00:11:50,868 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
[2024-12-23 00:11:50,868 __init__.py:53 INFO] Overriding Environment
[2024-12-24 23:15:46,315 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-24 23:15:46,483 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
[2024-12-24 23:15:46,484 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/ad206b0bdf344f129717e94ac08e54e5.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
[2024-12-24 23:15:46,484 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.ResNet50
buffer_manager_thread_count : 0
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
Expand All @@ -11,7 +11,7 @@ gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int8
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.23-00.11.49
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.24-23.15.45
map_path : data_maps/imagenet/val_map.txt
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
multi_stream_expected_latency_ns : 0
Expand All @@ -25,7 +25,7 @@ tensor_path : build/preprocessed_data/imagenet/ResNet50/int8_linear
test_mode : AccuracyOnly
use_deque_limit : True
use_graphs : True
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/ad206b0bdf344f129717e94ac08e54e5.conf
system_id : RTX4090x2
config_name : RTX4090x2_resnet50_MultiStream
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -39,27 +39,27 @@ power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4e6a5741f75b4ffdb16375bfdfcf40d5.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/ad206b0bdf344f129717e94ac08e54e5.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 26 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 78, GPU 837 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 79, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 77, GPU 837 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 79, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
[I] Device:0.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 26 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 581 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 110, GPU 591 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 580 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 109, GPU 590 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +25, now: CPU 0, GPU 49 (MiB)
[I] Device:1.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 839 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 83, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 82, GPU 839 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 82, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 66 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 84, GPU 583 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 84, GPU 591 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 582 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 84, GPU 590 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 83 (MiB)
[I] Start creating CUDA graphs
[I] Capture 8 CUDA graphs
Expand All @@ -68,7 +68,7 @@ Setting up SUT.
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Finished setting up SUT.
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.02405s.
Finished warmup. Ran for 5.02417s.
Starting running actual test.

No warnings encountered during test.
Expand All @@ -86,8 +86,8 @@ Device Device:1.GPU processed:
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 3125
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2024-12-23 00:12:06,094 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-23 00:12:06,094 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-imagenet.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy/mlperf_log_accuracy.json --imagenet-val-file data_maps/imagenet/val_map.txt --dtype int32
[2024-12-24 23:16:01,951 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-24 23:16:01,951 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-imagenet.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/accuracy/mlperf_log_accuracy.json --imagenet-val-file data_maps/imagenet/val_map.txt --dtype int32
accuracy=76.064%, good=38032, total=50000

======================== Result summaries: ========================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@
],
"CM_HOST_PLATFORM_FLAVOR": "x86_64",
"CM_HOST_PYTHON_BITS": "64",
"CM_HOST_SYSTEM_NAME": "242af263479b"
"CM_HOST_SYSTEM_NAME": "e8ed1d33e4c0"
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[2024-12-23 00:01:23,139 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-23 00:01:23,305 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
[2024-12-23 00:01:23,306 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="PerformanceOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/0d6dab1298e34a73990864b7d89f11a0.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
[2024-12-23 00:01:23,306 __init__.py:53 INFO] Overriding Environment
[2024-12-24 23:05:18,829 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-24 23:05:18,999 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/resnet50/MultiStream
[2024-12-24 23:05:19,000 __init__.py:46 INFO] Running command: ./build/bin/harness_default --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/resnet50/multistream/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=2048 --test_mode="PerformanceOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=8 --map_path="data_maps/imagenet/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/imagenet/ResNet50/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4f537047cf444b288f6102750b4ade03.conf" --gpu_engines="./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model resnet50
[2024-12-24 23:05:19,000 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.ResNet50
buffer_manager_thread_count : 0
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
Expand All @@ -11,7 +11,7 @@ gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int8
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.23-00.01.21
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.24-23.05.17
map_path : data_maps/imagenet/val_map.txt
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
multi_stream_expected_latency_ns : 0
Expand All @@ -25,7 +25,7 @@ tensor_path : build/preprocessed_data/imagenet/ResNet50/int8_linear
test_mode : PerformanceOnly
use_deque_limit : True
use_graphs : True
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/0d6dab1298e34a73990864b7d89f11a0.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4f537047cf444b288f6102750b4ade03.conf
system_id : RTX4090x2
config_name : RTX4090x2_resnet50_MultiStream
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -39,27 +39,27 @@ power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/0d6dab1298e34a73990864b7d89f11a0.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/4f537047cf444b288f6102750b4ade03.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 26 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 78, GPU 837 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 79, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 77, GPU 837 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 79, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +24, now: CPU 0, GPU 24 (MiB)
[I] Device:0.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 26 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 580 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 110, GPU 590 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 108, GPU 581 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 109, GPU 591 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +25, now: CPU 0, GPU 49 (MiB)
[I] Device:1.GPU: [0] ./build/engines/RTX4090x2/resnet50/MultiStream/resnet50-MultiStream-gpu-b8-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 839 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 83, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 82, GPU 839 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 82, GPU 847 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 66 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 84, GPU 582 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 84, GPU 590 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 83, GPU 583 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 84, GPU 591 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +17, now: CPU 0, GPU 83 (MiB)
[I] Start creating CUDA graphs
[I] Capture 8 CUDA graphs
Expand All @@ -68,37 +68,37 @@ Setting up SUT.
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Finished setting up SUT.
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.02414s.
Finished warmup. Ran for 5.02411s.
Starting running actual test.
================================================
MLPerf Results Summary
================================================
SUT name : LWIS_Server
Scenario : MultiStream
Mode : PerformanceOnly
99th percentile latency (ns) : 502574
99th percentile latency (ns) : 498723
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: Yes
Early Stopping Result:
* Processed at least 662 queries (1237051).
* Would discard 12112 highest latency queries.
* Early stopping 99th percentile estimate: 502795
* Processed at least 662 queries (1240784).
* Would discard 12149 highest latency queries.
* Early stopping 99th percentile estimate: 498904

================================================
Additional Stats
================================================
Per-query latency:
Min latency (ns) : 442690
Max latency (ns) : 1509293
Mean latency (ns) : 474019
50.00 percentile latency (ns) : 472645
90.00 percentile latency (ns) : 484741
95.00 percentile latency (ns) : 490338
97.00 percentile latency (ns) : 494072
99.00 percentile latency (ns) : 502574
99.90 percentile latency (ns) : 648822
Min latency (ns) : 439652
Max latency (ns) : 899917
Mean latency (ns) : 472372
50.00 percentile latency (ns) : 471384
90.00 percentile latency (ns) : 482670
95.00 percentile latency (ns) : 487276
97.00 percentile latency (ns) : 491079
99.00 percentile latency (ns) : 498723
99.90 percentile latency (ns) : 647573

================================================
Test Parameters Used
Expand Down Expand Up @@ -128,17 +128,17 @@ No warnings encountered during test.
No errors encountered during test.
Finished running actual test.
Device Device:0.GPU processed:
618526 batches of size 8
620392 batches of size 8
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 618526
BatchedCudaMemcpy Calls: 620392
Device Device:1.GPU processed:
618525 batches of size 8
620392 batches of size 8
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 618525
BatchedCudaMemcpy Calls: 620392
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2024-12-23 00:11:32,450 run_harness.py:166 INFO] Result: result_99.00_percentile_per_query_latency_ns: 502574, Result is VALID
[2024-12-24 23:15:28,338 run_harness.py:166 INFO] Result: result_99.00_percentile_per_query_latency_ns: 498723, Result is VALID

======================== Result summaries: ========================

Loading

0 comments on commit 7b24f2b

Please sign in to comment.