最后编辑于2021-05
Doris 版本 : Doris version trunk (0.12)
100 个线程并发测试Doris 分区表的查询能力
查看Profile 发现查询计划下发到Be后,多个Instances中必定存在某一两个 Instance 的Scan Node很慢超过5s,其他Instance 都在毫秒级别返回
慢 Instance Profile 见附录1, 最开始怀疑是慢Be,把慢Be 加入到slow_be_addresses 后又会出现其他慢Be
在慢instance 的OlapScanNode中reader_init_time 占到2s多 想请教下大佬们 这种情况可能的原因有哪些
补充一点: 上述这种情况只会在查历史分区的时候出现(2天前),如果查最近这两天刚导入的分区就都很快没有慢Instance 这种情况
正常Instance Profile 见附录2
附录1:慢Instance Profile
Fragment 2: Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb890 (host=TNetworkAddress(hostname:10.16.166.23, port:9060)):(Active: 7s13ms, % non-child: 0.02%) - MemoryLimit: 2.00 GB - FragmentCpuTime: 15.635ms - PeakUsedReservation: 0.00 - PeakMemoryUsage: 37.02 MB - RowsProduced: 93 - AverageThreadTokens: 4611385778452229600.00 - ExecuteDelayAfterRpc: 173.0ms - PeakReservation: 34.00 MB - GetLockTime: 764ns - PrepareTime: 142.223ms BlockMgr: - BlocksCreated: 0 - BlockWritesOutstanding: 0 - BytesWritten: 0.00 - TotalEncryptionTime: 0ns - BufferedPins: 0 - TotalReadBlockTime: 0ns - TotalBufferWaitTime: 0ns - BlocksRecycled: 0 - TotalIntegrityCheckTime: 0ns - MaxBlockSize: 8.00 MB DataStreamSender (dst_id=6):(Active: 62.548us, % non-child: 0.00%) - OverallThroughput: 51.184444427490234 MB/sec - PeakMemoryUsage: 66.64 KB - SerializeBatchTime: 47.432us - IgnoreRows: 0 - BytesSent: 3.28 KB - NetworkThroughput(*): 0.0 /sec - ThriftTransmitTime(*): 0ns - UncompressedRowBatchSize: 5.49 KB AGGREGATION_NODE (id=3):(Active: 7s11ms, % non-child: 0.05%) ExecOption: Streaming Preaggregation - ReductionFactorEstimate: 0.00 - PeakMemoryUsage: 36.43 MB - LargestPartitionPercent: 0 - GetResultsTime: 62.531us - RowsReturned: 93 - PartitionsCreated: 16 - RowsPassedThrough: 0 - ReductionFactorThresholdToExpand: 0.00 - RowsReturnedRate: 13 - HashBuckets: 32.768K (32768) - HTResizeTime: 184.352us - StreamingTime: 2.867ms HASH_JOIN_NODE (id=2):(Active: 7s7ms, % non-child: 0.12%) - ProbeTime: 8.856ms - ProbeRows: 115.739K (115739) - PeakMemoryUsage: 569.38 KB - LoadFactor: 4604536993823588400.00 - BuildBuckets: 8.192K (8192) - PushDownTime: 0ns - BuildRows: 9.913K (9913) - RowsReturned: 3.479K (3479) - BuildTime: 1.472ms - RowsReturnedRate: 496 - PushDownComputeTime: 0ns EXCHANGE_NODE (id=5):(Active: 252.658ms, % non-child: 2.98%) - BytesReceived: 90.38 KB - DeserializeRowBatchTimer: 187.351us - ConvertRowBatchTime: 161.871us - PeakMemoryUsage: 32.00 KB - SendersBlockedTotalTimer(*): 0ns - RowsReturned: 9.913K (9913) - FirstBatchArrivalWaitTime: 233.874ms - DataArrivalWaitTime: 252.439ms - RowsReturnedRate: 39.234K /sec OLAP_SCAN_NODE (id=0):(Active: 6s744ms, % non-child: 79.61%) - CompressedBytesRead: 3.55 MB - MaxWaitScanTime: 373.8ms - ScannerBlockPutTimer: 37.247us - ScanCpuTime: 177.891ms - BitmapIndexFilterTimer: 0ns - NumScanners: 1 - RowsStatsFiltered: 0 - OpenTime: 8.300us - VectorPredEvalTime: 309.673us - BlockLookupCacheTime: 0ns - RowsDelFiltered: 0 - RowsetReaderInitTime: 663.739us - BlockSeekPosPageTime: 0ns - NumDiskAccess: 0 - GetNextTime: 6s744ms - TotalPagesNum: 0 - BlockLoadTime: 41.205ms - CachedPagesNum: 0 - BlocksLoad: 222 - TabletCount : 1 - DecompressorTimer: 5.63ms - RowsVectorPredFiltered: 0 - ReaderInitTime: 2s685ms - RowsRead: 191.621K (191621) - BytesRead: 11.05 MB - ScannerMaxPendingTimer: 3s886ms - TotalRawReadTime(*): 0ns - PeakMemoryUsage: 0.00 - RowsPushedCondFiltered: 0 - ScanRangesComplete: 0 - ScanTime: 177.910ms - BitmapIndexFilterCount: 0 - BlockSeekTime: 2s348ms - StartScanTime: 289.97us - RawRowsRead: 193.022K (193022) - ScannerThreadsVoluntaryContextSwitches: 0 - IndexLoadTime: 1.533ms - ScannerThreadsTotalWallClockTime: 0ns - MaterializeTupleTime(*): 0ns - ScannerThreadsUserTime: 0ns - ScannerThreadsSysTime: 0ns - BlockReadDecompressTime: 0ns - BlockVerifyChecksumTime: 0ns - RowsReturnedRate: 17.16K /sec - UncompressedBytesRead: 6.06 MB - RowsBloomFilterFiltered: 0 - RowsReturned: 115.739K (115739) - RowsetNum: 0 - ScannerThreadsInvoluntaryContextSwitches: 0 - BlockPutCacheTime: 0ns - BlockSeekPageIterTime: 0ns - PerReadThreadRawHdfsThroughput: 0.0 /sec - BlockFetchTime: 2s740ms - ShowHintsTime: 76.128us - TotalReadThroughput: 0.0 /sec - IOTimer: 2.52ms - BlockReadPageTime: 0ns - BlockConvertTime: 15.87ms - SegmentNum: 0 - BlockSeekCount: 0 Buffer pool: - PeakUnpinnedBytes: 0.00 - PeakUsedReservation: 34.00 MB - ReservationLimit: 80.00 GB - AllocTime: 19.110us - CumulativeAllocationBytes: 34.00 MB - CumulativeAllocations: 17 - PeakReservation: 34.00 MB
附录2 正常Instance Profile 情况
Instance 6f2c1b2e5bd5408b-9ca88adb9b7eb88f (host=TNetworkAddress(hostname:10.16.196.10, port:9060)):(Active: 988.455ms, % non-child: 0.02%) - MemoryLimit: 2.00 GB - FragmentCpuTime: 12.92ms - PeakUsedReservation: 0.00 - PeakMemoryUsage: 36.89 MB - RowsProduced: 91 - AverageThreadTokens: 4610184818551597600.00 - ExecuteDelayAfterRpc: 91.0ms - PeakReservation: 34.00 MB - GetLockTime: 462ns - PrepareTime: 184.704ms BlockMgr: - BlocksCreated: 0 - BlockWritesOutstanding: 0 - BytesWritten: 0.00 - TotalEncryptionTime: 0ns - BufferedPins: 0 - TotalReadBlockTime: 0ns - TotalBufferWaitTime: 0ns - BlocksRecycled: 0 - TotalIntegrityCheckTime: 0ns - MaxBlockSize: 8.00 MB DataStreamSender (dst_id=6):(Active: 69.690us, % non-child: 0.00%) - OverallThroughput: 44.939969062805176 MB/sec - PeakMemoryUsage: 66.64 KB - SerializeBatchTime: 57.438us - IgnoreRows: 0 - BytesSent: 3.21 KB - NetworkThroughput(*): 0.0 /sec - ThriftTransmitTime(*): 0ns - UncompressedRowBatchSize: 5.37 KB AGGREGATION_NODE (id=3):(Active: 986.636ms, % non-child: 0.05%) ExecOption: Streaming Preaggregation - ReductionFactorEstimate: 0.00 - PeakMemoryUsage: 36.30 MB - LargestPartitionPercent: 0 - GetResultsTime: 80.590us - RowsReturned: 91 - PartitionsCreated: 16 - RowsPassedThrough: 0 - ReductionFactorThresholdToExpand: 0.00 - RowsReturnedRate: 92 - HashBuckets: 32.768K (32768) - HTResizeTime: 107.622us - StreamingTime: 3.49ms HASH_JOIN_NODE (id=2):(Active: 982.247ms, % non-child: 0.10%) ExecOption: Hash Table Built Asynchronously - ProbeTime: 6.751ms - ProbeRows: 116.102K (116102) - PeakMemoryUsage: 569.38 KB - LoadFactor: 4604536993823588400.00 - BuildBuckets: 8.192K (8192) - PushDownTime: 0ns - BuildRows: 9.913K (9913) - RowsReturned: 3.582K (3582) - BuildTime: 1.836ms - RowsReturnedRate: 3.646K /sec - PushDownComputeTime: 0ns EXCHANGE_NODE (id=5):(Active: 241.235ms, % non-child: 2.85%) - BytesReceived: 90.38 KB - DeserializeRowBatchTimer: 220.75us - ConvertRowBatchTime: 154.816us - PeakMemoryUsage: 32.00 KB - SendersBlockedTotalTimer(*): 0ns - RowsReturned: 9.913K (9913) - FirstBatchArrivalWaitTime: 220.854ms - DataArrivalWaitTime: 241.19ms - RowsReturnedRate: 41.092K /sec OLAP_SCAN_NODE (id=0):(Active: 732.300ms, % non-child: 8.64%) - CompressedBytesRead: 3.57 MB - MaxWaitScanTime: 207.670us - ScannerBlockPutTimer: 32.941us - ScanCpuTime: 154.120ms - BitmapIndexFilterTimer: 0ns - NumScanners: 1 - RowsStatsFiltered: 0 - OpenTime: 9.472us - VectorPredEvalTime: 277.373us - BlockLookupCacheTime: 0ns - RowsDelFiltered: 0 - RowsetReaderInitTime: 1.872ms - BlockSeekPosPageTime: 0ns - NumDiskAccess: 0 - GetNextTime: 732.309ms - TotalPagesNum: 0 - BlockLoadTime: 43.325ms - CachedPagesNum: 0 - BlocksLoad: 212 - TabletCount : 1 - DecompressorTimer: 5.299ms - RowsVectorPredFiltered: 0 - ReaderInitTime: 583.623ms - RowsRead: 192.361K (192361) - BytesRead: 11.12 MB - ScannerMaxPendingTimer: 138.885us - TotalRawReadTime(*): 0ns - PeakMemoryUsage: 0.00 - RowsPushedCondFiltered: 0 - ScanRangesComplete: 0 - ScanTime: 154.136ms - BitmapIndexFilterCount: 0 - BlockSeekTime: 576.145ms - StartScanTime: 274.938us - RawRowsRead: 194.34K (194340) - ScannerThreadsVoluntaryContextSwitches: 0 - IndexLoadTime: 1.546ms - ScannerThreadsTotalWallClockTime: 0ns - MaterializeTupleTime(*): 0ns - ScannerThreadsUserTime: 0ns - ScannerThreadsSysTime: 0ns - BlockReadDecompressTime: 0ns - BlockVerifyChecksumTime: 0ns - RowsReturnedRate: 158.544K /sec - UncompressedBytesRead: 6.11 MB - RowsBloomFilterFiltered: 0 - RowsReturned: 116.102K (116102) - RowsetNum: 0 - ScannerThreadsInvoluntaryContextSwitches: 0 - BlockPutCacheTime: 0ns - BlockSeekPageIterTime: 0ns - PerReadThreadRawHdfsThroughput: 0.0 /sec - BlockFetchTime: 641.70ms - ShowHintsTime: 52.977us - TotalReadThroughput: 0.0 /sec - IOTimer: 1.857ms - BlockReadPageTime: 0ns - BlockConvertTime: 17.966ms - SegmentNum: 0 - BlockSeekCount: 0 Buffer pool: - PeakUnpinnedBytes: 0.00 - PeakUsedReservation: 34.00 MB - ReservationLimit: 80.00 GB - AllocTime: 23.533us - CumulativeAllocationBytes: 34.00 MB - CumulativeAllocations: 17 - PeakReservation: 34.00 MB
收藏
点赞
0
个赞
请登录后评论
TOP
切换版块
定位了问题 应该是storage v1 导致的异常,BlockSeekTime 都要两秒多,换到v2 后一切正常 感谢
好的 感谢
我也觉得很奇怪 明明应该是实时导入的 没有compaction 应该慢才对
先升级之后再测下试试
首先,0.12 这个版本有些古老了,Doris 后面也做过大量的这方便的优化。
reader init 耗时长有可能是 IO 资源问题,也有可能是因为 tablet 对应的数据版本过多的问题。
你这个场景比较奇怪,通常查询新导入的数据可能会更慢一点(因为部分版本还没有合并),而历史数据应该会更快(因为合并度更高。)
建议先进行版本升级,再做测试。