2019-01-09T01:53:56.144297984Z ('./datasets/train_data/ColorImage_road02/ColorImage/Record005/Camera 5/170927_064343490_Camera_5.jpg', './datasets/train_label/Label_road02/Label/Record005/Camera 5/170927_064343490_Camera_5_bin.png')
2019-01-09T01:54:12.264223554Z F0109 01:54:12.263927 148 grpc_client.cc:357] GetRPC name:[conv1_1_3_3_s2.w_0], ep:[192.168.158.161:7164], status:[-1] meets grpc error, error_code:14 error_message:Socket closed error_details:
2019-01-09T01:54:12.26426681Z *** Check failure stack trace: ***
2019-01-09T01:54:12.270665494Z @ 0x7f95c029ea6d google::LogMessage::Fail()
2019-01-09T01:54:12.276017515Z @ 0x7f95c02a251c google::LogMessage::SendToLog()
2019-01-09T01:54:12.282589075Z @ 0x7f95c029e593 google::LogMessage::Flush()
2019-01-09T01:54:12.323237914Z @ 0x7f95c02a3a2e google::LogMessageFatal::~LogMessageFatal()
2019-01-09T01:54:12.33002892Z @ 0x7f95c17c820c paddle::operators::distributed::GRPCClient::Proceed()
2019-01-09T01:54:12.335986199Z @ 0x7f95e0f14c80 (unknown)
2019-01-09T01:54:12.342922082Z @ 0x7f965e4386ba start_thread
2019-01-09T01:54:12.349492249Z @ 0x7f965e16e41d clone
2019-01-09T01:54:12.356391526Z @ (nil) (unknown)
2019-01-09T01:54:13.368908749Z Aborted (core dumped)
能提供一下任务编号吗
任务编号是2395
我们正在看, 能不能再重试一下, 看看是否可以复现
谢谢啦,感觉是集群的问题,现在能正常执行了。
好的好的~~解决了就好