首页 炼丹房 帖子详情
运行StarGAN内存报错
收藏
快速回复
炼丹房 问答新手上路 1580 3
运行StarGAN内存报错
收藏
快速回复
炼丹房 问答新手上路 1580 3

def set_paddle_flags(flags):
for key, value in flags.items():
if os.environ.get(key, None) is None:
os.environ[key] = str(value)


set_paddle_flags({
'FLAGS_conv_workspace_size_limit': 500,
'FLAGS_eager_delete_tensor_gb': 0, # enable gc
'FLAGS_memory_fraction_of_eager_deletion': 1,
'FLAGS_fraction_of_gpu_memory_to_use': 0.4,

})

 

设置FLAGS_eager_delete_tensor_gb=0也没效果,下面是报错信息

W0930 14:11:48.960021 169 system_allocator.cc:121] Cannot malloc 14124.3 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mbenvironment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.92. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0
F0930 14:11:48.960222 169 legacy_allocator.cc:201] Cannot allocate 16.000000MB in GPU 0, available 1.214783GBtotal 16945512448GpuMinChunkSize 256.000000BGpuMaxChunkSize 13.793239GBGPU memory used: 13.778815GB
*** Check failure stack trace: ***
@ 0x7fc394fc435d google::LogMessage::Fail()
@ 0x7fc394fc7e0c google::LogMessage::SendToLog()
@ 0x7fc394fc3e83 google::LogMessage::Flush()
@ 0x7fc394fc931e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fc396e74604 paddle::memory::legacy::Alloc<>()
@ 0x7fc396e748e5 paddle::memory::allocation::LegacyAllocator::AllocateImpl()
@ 0x7fc396e68a05 paddle::memory::allocation::AllocatorFacade::Alloc()
@ 0x7fc396e68b8a paddle::memory::allocation::AllocatorFacade::AllocShared()
@ 0x7fc396a7346c paddle::memory::AllocShared()
@ 0x7fc396e3b434 paddle::framework::Tensor::mutable_data()
@ 0x7fc394e8f027 paddle::framework::Tensor::mutable_data<>()
@ 0x7fc395325d9c paddle::operators::SumToLoDTensor<>()
@ 0x7fc39532e778 _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EINS0_9operators9SumKernelINS7_17CUDADeviceContextEfEENSA_ISB_dEENSA_ISB_iEENSA_ISB_lEENSA_ISB_NS7_7float16EEEEEclEPKcSK_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
@ 0x7fc396de4d77 paddle::framework::OperatorWithKernel::RunImpl()
@ 0x7fc396de5151 paddle::framework::OperatorWithKernel::RunImpl()
@ 0x7fc396de274c paddle::framework::OperatorBase::Run()
@ 0x7fc396bddc4a paddle::framework::details::ComputationOpHandle::RunImpl()
@ 0x7fc396bd05f0 paddle::framework::details::OpHandleBase::Run()
@ 0x7fc396bb1966 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
@ 0x7fc396bb05cf paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
@ 0x7fc396bb098f _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
@ 0x7fc3950b1633 std::_Function_handler<>::_M_invoke()
@ 0x7fc394f48467 std::__future_base::_State_base::_M_do_set()
@ 0x7fc3fa4f7a99 __pthread_once_slow
@ 0x7fc396bac012 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
@ 0x7fc394f499e4 _ZZN10ThreadPoolC1EmENKUlvE_clEv
@ 0x7fc3b8088421 execute_native_thread_routine_compat
@ 0x7fc3fa4f06ba start_thread
@ 0x7fc3f990e41d clone
@ (nil) (unknown)
Aborted (core dumped)

 

 

 

0
收藏
回复
全部评论(3)
时间顺序
AIStudio810261
#2 回复于2019-09

看样子是爆显存了. 通常需要再缩一下batch size

0
回复
口乐观口
#3 回复于2019-10

谢谢,已经解决了,将FLAGS_eager_delete_tensor_gb和训练放到同一行就行

!export FLAGS_eager_delete_tensor_gb=0 && \
python train.py --model_net StarGAN \
--dataset celeba --crop_size 178 --image_size 128 \
--train_list ./data/celeba/list_attr_celeba.txt \
--batch_size 16 --epoch 20 --gan_mode wgan --output ./output/stargan/ \
--init_model ./output/stargan/checkpoints/4/ \
--print_freq=1000

0
回复
该搬砖去了
#4 回复于2021-06

同学你好,你上传的activitynet-v1.3数据集解压后都是csv文件,100*400的维度,请问你这些文件是什么含义呢?feature map吗?

0
回复
在@后输入用户全名并按空格结束,可艾特全站任一用户