通过brokerload,从hdfs导入数据后,再次启动brokerload导入下一个分区数据,be会突然挂掉,必现,下面是be.out的日志,麻烦大家帮忙看下。
集群5台服务器,5台be,1台fe。
type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = there is no scanNode Backend. [10010: not alive]
start time: Fri Dec 24 10:28:57 CST 2021
start time: Fri Dec 24 11:36:42 CST 2021
*** Aborted at 1640322560 (unix time) try "date -d @1640322560" if you are using GNU date ***
PC: @ 0x1267e3a doris::SlotRef::get_string_val()
*** SIGSEGV (@0xffffffffffffffff) received by PID 41093 (TID 0x7f5c43f80700) from PID 18446744073709551615; stack trace: ***
@ 0x214c712 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f5cf7db9630 (unknown)
@ 0x1267e3a doris::SlotRef::get_string_val()
@ 0x12522b1 doris::ExprContext::get_value()
@ 0x1261b27 doris::ScalarFnCall::evaluate_children()
@ 0x12646ce doris::ScalarFnCall::interpret_eval<>()
@ 0x1a36472 doris::ExecNode::eval_conjuncts()
@ 0x1b9a4e1 doris::BaseScanner::fill_dest_tuple()
@ 0x1b7e450 doris::ORCScanner::get_next()
@ 0x1b4def2 doris::BrokerScanNode::scanner_scan()
@ 0x1b4e93c doris::BrokerScanNode::scanner_worker()
@ 0x3907400 execute_native_thread_routine
@ 0x7f5cf7db1ea5 start_thread
@ 0x7f5cf77d8b0d __clone
@ 0x0 (unknown)
start time: Fri Dec 24 13:16:22 CST 2021
start time: Fri Dec 24 15:09:23 CST 2021
*** Aborted at 1640333188 (unix time) try "date -d @1640333188" if you are using GNU date ***
PC: @ 0x125b5ed doris::LikePredicate::constant_starts_with_fn()
*** SIGSEGV (@0x0) received by PID 39262 (TID 0x7f19fd86e700) from PID 0; stack trace: ***
@ 0x214c712 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f1ab3d6f630 (unknown)
@ 0x125b5ed doris::LikePredicate::constant_starts_with_fn()
@ 0x1a36472 doris::ExecNode::eval_conjuncts()
@ 0x1b9a4e1 doris::BaseScanner::fill_dest_tuple()
@ 0x1b7e450 doris::ORCScanner::get_next()
@ 0x1b4def2 doris::BrokerScanNode::scanner_scan()
@ 0x1b4e93c doris::BrokerScanNode::scanner_worker()
@ 0x3907400 execute_native_thread_routine
@ 0x7f1ab3d67ea5 start_thread
@ 0x7f1ab378eb0d __clone
@ 0x0 (unknown)
be的warn日志内有部分Failed to close block日志,时间点和be挂掉不一致,可作为判断故障参考
W1224 15:36:39.529726 39678 file_block_manager.cpp:135] Failed to close block /data/soft/doris/storage/data/765/17605/1148778291/02000000000000570b4f44e3ab985f6da5f3a1622678f8aa_2.dat: Not found: /data/soft/doris/storage/data/765/17605/1148778291/02000000000000570b4f44e3ab985f6da5f3a1622678f8aa_2.dat: No such file or directory (error 2)
W1224 15:36:39.532016 39678 file_block_manager.cpp:135] Failed to close block /data/soft/doris/storage/data/765/17605/1148778291/02000000000000570b4f44e3ab985f6da5f3a1622678f8aa_3.dat: Not found: /data/soft/doris/storage/data/765/17605/1148778291/02000000000000570b4f44e3ab985f6da5f3a1622678f8aa_3.dat: No such file or directory (error 2)