hewumars
8/6/2019 - 10:47 AM

[20190806]Atlas300性能实验

像素类型NPU进程数(线程数)BatchAI-CPU使用率(%)ctrl-CPU使用率(%)NPU使用率(%)NPU显存(M)带宽(%)时间(ms)每天处理图片数量(张)cpu使用率(%)内存使用(M)
200WJPEG11(8)1630-4538-50506840-6050.613026000
JPEG11(7)165945.1722000
JPEG11(6)1625-3530-90505625-6039.86
JPEG11(5)1632-3625-35505630-5029.56
JPEG11(3)1615-2012-15365320-4018.38
H2641(1)41.75不编码40.93/AiToMat删除后37.50
H26411(3)1615-403-4020-36565-5447.47150-250
H26411(6)1673.64
H26411(8)1615-3089-9548-507650-60147.22不编码135.21/AiToMat删除后139.59170-2502500

原始预处理

  1. PreProcess_Avg:Avg. of 3116 loops: 0.650000 ms
  2. PreProcess_Normal_:Avg. of 4357 loops: 0.185000 ms
  3. PreProcess_PlateRot_:Avg. of 79 loops: 2.083000 ms
  4. PreProcess_Plate1v1_:Avg. of 157 loops: 0.581000 ms
  5. PreProcess_Plate1v2_:Avg. of 1081 loops: 0.865000 ms

Dvpp预处理

  1. PreProcess_Avg:Avg. of 3116 loops: 0.481000 ms
  2. PreProcess_Normal_:Avg. of 4357 loops: 0.179000 ms
  3. PreProcess_PlateRot_:Avg. of 79 loops: 2.120000 ms
  4. PreProcess_Plate1v1_1v2_:Avg. of 1238 loops: 0.427000 ms
像素类型NPU进程数(线程数)BatchARM使用率(%)NPU使用率(%)NPU显存(M)时间(ms)每天处理图片数量(张)cpu使用率(%)内存使用(M)
200WH26411(6)446.927911046733
1(5)835.5356512156806
1(4)1622.995115029288
JPEG1(8)161255529600
H2641(8)161753949714
20191226
200WJPEG41(9)卡口目标少速度比视频目标多快50%,1.93个目标8.342ms 改host为DMalloc申请buf后貌似有提速单核9.16237721972
500w1(7)6.44个目标17.23520051842
700w1(5)7.43个目标20.67216718322
200WH26441(9)包括回传编码跟踪等22.08615648197
1(9)去掉回传编码跟踪比上面速度快20%17.91819287894
1(9)目标少 比目标多视频快35%14.2524251220
像素类型NPU进程数(线程数)BatchARM使用率(%)NPU使用率(%)NPU显存(M)算法时间(ms)解码时间(ms)每天处理图片数量(张)cpu使用率(%)内存使用(M)
200万1-3车ascend3101(host:18,device:46)42275(+-20)165620.388*23.6494237787/22501585
222*2801761*223.026*24.3847504560/2225*21602+1341
跑到1000多张崩了322*31151545*329.831*26.3368688947/2220*31484*3
上面的统计有误时间统一要乘以2
20190816main中sendData改为200张循环一次1435-451301000(参考,内存会上升)23.7363.649364004060517
2433*21000*235.2565.5124901293522
3428*3100052.7556.4114913278
20190817发4张返回4张结果8(5-10)*840-120246*8135.8673.65087328(5-10)*8513*8
20190820不返回decodeBuf8103.6086671299
201908238k内存池54(5-10)*863964.4466703286
1427.9723088803
8424693.2617411458
1826.2173295571
4k内存池、内存管理7843378.4337711040531
fp1651644.1159792657519
int861657443.94111797571519
int8616559.842.66712149816527
500万
700万5车int861656947.71910863617541
模块(单进程)batch=4耗时Batch=8耗时Batch=16耗时
DvppJpegDecode2.878*42.873*82.859
ObjectDetectStage1_v3_Input0.0280.0360.036
ObjectDetectStage1_v3_PreProcess6.01811.94511.985
ObjectDetectStage1_v3_Predict24.84748.64897.002
ObjectDetectStage1_v3_GetLayer0.7430.770.771
ObjectDetectStage1_v3_PostProcess0.664*40.661*80.661
ObjectDetectStage1_v3_Output0.011*40.011*80.01
VehicleDetectStage2_Input0.0280.0340.034
VehicleDetectStage2_Index0.0050.0070.009
VehicleDetectStage2_PreProcess0.8421.5632.643
VehicleDetectStage2_Predict4.5167.33413.754
VehicleDetectStage2_GetLayer0.2060.2010.188
VehicleDetectStage2_PostProcess0.172*40.163*80.159
VehicleDetectStage2_Output0.003*40.003*80.021
VehicleDetectStage2_Segment_Index0.0050.0090.01
VehicleDetectStage2_Segment_PreProcess0.3540.5780.544
VehicleDetectStage2_Segment_Predict0.7840.9471.02
VehicleDetectStage2_Segment_Output0.0020.0020.002
Vehicle_Input0.0230.0270.027
Vehicle_Index0.0020.0050.006
Vehicle_PreProcess0.7591.5262.741
Vehicle_Predict1.5552.2473.327
Vehicle_Classification0.011*40.02*80.021
Vehicle_ExtractFeature0.008*40.016*80.018
Vehicle_FeatureCode0.4870.9961.084
VehicleDriver_Input0.0250.0310.028
VehicleDriver_Index0.0060.010.012
VehicleDriver_PreProcess0.2890.3540.323
VehicleDriver_Predict4.386.73611.514
VehicleDriver_ArgMax0.001*40.001*80.001
VehicleDriver_Output0.001*40.001*80.001
VehiclePlate_Input0.0240.030.027
VehiclePlateAlign_Index0.0630.0660.07
VehiclePlateAlign_PreProcess0.6440.6460.934
VehiclePlateAlign_Predict1.6142.7746.691
VehiclePlateAlign_Output0.001*40.001*80.001
VehiclePlate_ProcessPlate9.81816.82717.364
VehiclePlate_Index5.44210.14210.578
VehiclePlate_CropToMat0.1860.1520.154
VehiclePlate_ImageRotate1.211.1781.251
VehiclePlate_PreProcess0.050.0980.163
VehiclePlate_Predict1.4612.1543.594
VehiclePlate_Output0.0940.1720.279
VehiclePlate_PostProcess0.0540.0980.1
VehicleSpecial_Input0.0190.0190.018
VehicleSpecial_Index0.0180.0260.027
VehicleSpecial_PreProcess1.1272.0122.71
VehicleSpecial_Predict2.0533.0785.366
VehicleSpecial_GetLayer0.0220.020.019
VehicleSpecial_PostProcess0.039*40.035*80.035
VehicleSpecial_Output0.001*40.001*80.001
其他(框架消耗?)111.888-76.774=35.114
网络耗时(batch=1)(3.4GB)耗时(Batch=4)
ObjectDetectStageCentG320x38418.99000
ObjectDetectStageV160x1284.001000
VehiclePlateSegmentCH0.879000
VehicleDriverGeneral3.446000
VehiclePlateNo2.013000
VehiclePlateAlignCH1.655000
VehiclePlateNameCH1.576000
VehiclePlateExceptionPlate1.012000
VehiclePlateExceptionHead0.895000
VehicleLabel2.778000
VehicleColor1.679000
VehicleType1.083000
ObjectDetectStageT160x961.498000
Graph整体运行时间60ms