飞桨深度学习500问即将上线~留下您感兴趣的话题
收藏
快速回复
AI Studio平台使用 文章平台资讯 11117 79
飞桨深度学习500问即将上线~留下您感兴趣的话题
收藏
快速回复
AI Studio平台使用 文章平台资讯 11117 79

飞桨深度学习500问内容征集活动即将上线~

走过路过

留下您对于深度学习感兴趣的方向

或是写出您对AI Studio使用过程中的任何问题

或是列出您认为AI Studio应该增加的项目内容

我们将从评论中挑选幸运儿获得100小时算力卡一张~

每十楼抽选一次哦~

请幸运儿们加入百度AI Studio 官方QQ群:816164541

在群中@滴滴-AI Studio就好啦

5
收藏
回复
全部评论(79)
时间顺序
skywalk163
#43 回复于2019-11

训练的时候报错:

RuntimeError: DataLoader worker (pid 17629) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

 

经百度,发现是shm空间小导致的,只有64M,太小拉

aistudio@jupyter-141218-157872:~$ df -H
Filesystem Size Used Avail Use% Mounted on
overlay 832G 165G 625G 21% /
tmpfs 68M 0 68M 0% /dev
tmpfs 60G 0 60G 0% /sys/fs/cgroup
/dev/vda1 521G 53G 469G 10% /home/aistudio
/dev/vdb 832G 165G 625G 21% /etc/hosts
shm 68M 39M 29M 59% /dev/shm
tmpfs 60G 13k 60G 1% /proc/driver/nvidia
tmpfs 12G 1.3G 11G 11% /run/nvidia-persistenced/socket
udev 60G 0 60G 0% /dev/nvidia0
tmpfs 60G 0 60G 0% /proc/acpi
tmpfs 60G 0 60G 0% /proc/scsi
tmpfs 60G 0 60G 0% /sys/firmware

 

请问,能增加shm空间吗? 百度说,需要在docker命令里加上参数:

--shm-size=4g

此外也可以尝试在系统里手动加,但是没有sudo口令,所以加不上。

 

希望能增加shm孔家,谢谢。

0
回复
c
cheeryoung79
#44 回复于2019-11

自己跑文本分类,将批量运行去掉,报如下错误:

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run(self, program, exe, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
747 self._feed_data(program, feed, feed_var_name, scope)
748 if not use_program_cache:
--> 749 exe.run(program.desc, scope, 0, True, True, fetch_var_name)
750 else:
751 exe.run_cached_prepared_ctx(ctx, scope, False, False, False)

exe.run的参数好像不对

0
回复
口乐观口
#45 回复于2019-11
您好啊,请问您具体是什么场景,需要用到v5和v6的芯片呢?您用的是哪款芯片。paddle lite目前主要支持的v7和v8,考虑的是目前市场上的芯片主流的都是这一类的。如果能够判断您这边的需求在市场上有较高需求,我们也会考虑加入roadmap之中。(欢迎加入我们的lite群交流(QQ群:696965088)
展开

之前的hi3518的是arm926,现在新款的是ARM Cortex A7的了!非常感谢

0
回复
无知者1215
#46 回复于2019-11
训练的时候报错: RuntimeError: DataLoader worker (pid 17629) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.   经百度,发现是shm空间小导致的,只有64M,太小拉 aistudio@jupyter-141218-157872:~$ df -H Filesystem Size Used Avail Use% Mounted on overlay 832G 165G 625G 21% / tmpfs 68M 0 68M 0% /dev tmpfs 60G 0 60G 0% /sys/fs/cgroup /dev/vda1 521G 53G 469G 10% /home/aistudio /dev/vdb 832G 165G 625G 21% /etc/hosts shm 68M 39M 29M 59% /dev/shm tmpfs 60G 13k 60G 1% /proc/driver/nvidia tmpfs 12G 1.3G 11G 11% /run/nvidia-persistenced/socket udev 60G 0 60G 0% /dev/nvidia0 tmpfs 60G 0 60G 0% /proc/acpi tmpfs 60G 0 60G 0% /proc/scsi tmpfs 60G 0 60G 0% /sys/firmware   请问,能增加shm空间吗? 百度说,需要在docker命令里加上参数: --shm-size=4g 此外也可以尝试在系统里手动加,但是没有sudo口令,所以加不上。   希望能增加shm孔家,谢谢。
展开

您好,你出问题的项目id是多少,是哪里出错的,能提供下信息吗

0
回复
skywalk163
#47 回复于2019-11
您好,你出问题的项目id是多少,是哪里出错的,能提供下信息吗

项目id:

https://aistudio.baidu.com/aistudio/projectdetail/157872

 

测试代码是fastai里教学第六课:lesson6-pets-more

 

learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 这句运行的时候报错:

 0.00% [0/3 00:00<00:00]
epoch
train_loss
valid_loss
error_rate
time


 Interrupted
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
723 try:
--> 724 data = self._data_queue.get(timeout=timeout)
725 return (True, data)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout)
103 timeout = deadline - time.monotonic()
--> 104 if not self._poll(timeout):
105 raise Empty
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in poll(self, timeout)
256 self._check_readable()
--> 257 return self._poll(timeout)
258
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in _poll(self, timeout)
413 def _poll(self, timeout):
--> 414 r = wait([self], timeout)
415 return bool(r)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in wait(object_list, timeout)
919 while True:
--> 920 ready = selector.select(timeout)
921 if ready:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/selectors.py in select(self, timeout)
414 try:
--> 415 fd_event_list = self._selector.poll(timeout)
416 except InterruptedError:
~/work/py3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
65 # Python can still get and update the process status successfully.
---> 66 _error_if_any_worker_fails()
67 if previous_handler is not None:
RuntimeError: DataLoader worker (pid 25593) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last) in
----> 1 learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8)

省略。。。。

~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
735 if len(failed_workers) > 0:
736 pids_str = ', '.join(str(w.pid) for w in failed_workers)
--> 737 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
738 if isinstance(e, queue.Empty):
739 return (False, None)
RuntimeError: DataLoader worker (pid(s) 25723) exited unexpectedly

0
回复
佐勒F
#48 回复于2019-11

请问这个怎么解决,我安装的paddlepaddle-gpu1.6.1.post107,我的cund是10.0,cudnn是7.6

>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ...
E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py:774: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "", line 1, in 
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\install_check.py", line 123, in run_check
    test_simple_exe()
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\install_check.py", line 119, in test_simple_exe
    exe0.run(startup_prog)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 775, in run
    six.reraise(*sys.exc_info())
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 770, in run
    use_program_cache=use_program_cache)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 817, in _run_impl
    use_program_cache=use_program_cache)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 894, in _run_program
    fetch_var_name)
paddle.fluid.core_noavx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
PaddleCheckError: cudaGetDeviceProperties failed in paddle::platform::GetCUDAComputeCapability, error code : 30, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: unknown error at [D:\1.6.1\paddle\paddle\fluid\platform\gpu_info.cc:84]
0
回复
毛毛毛
#49 回复于2019-11

您好 想问一下在NOTEBOOK里创建项目的时候 是paddle paddle已经装好了是吗 那如果要做别的项目的话 需要再去装paddlepaddle吗

0
回复
学习委员
#50 回复于2019-12
毛毛毛 #49
您好 想问一下在NOTEBOOK里创建项目的时候 是paddle paddle已经装好了是吗 那如果要做别的项目的话 需要再去装paddlepaddle吗

是预装好了的哦~在创建项目的时候选择需要的paddle版本就好~

0
回复
一生平安是福68
#51 回复于2019-12

 生成对抗网络项目可以搞一波CycleGAN

0
回复
5
54lyll
#52 回复于2019-12

情感分类,预测结果与神经网络模型选择的诡异问题

我使用飞桨gitbub模型库   中的lstm_net模型进行训练预测结果如下:

Predict probability of 0.533816 to be positive and 0.46618402 to be negative for review ' read the book forget the movie '
Predict probability of 0.8875562 to be positive and 0.11244377 to be negative for review ' this is a great movie '
Predict probability of 0.42828017 to be positive and 0.5717198 to be negative for review ' this is very bad '

对于负面语句 ' this is very bad '分析结果模糊。

同样使用bi_lstm_net训练模型并预测后,得到相反的结果,即负面语句分类准确,正向积极预测模糊。

最奇怪的是,当我使用lstm_net网络训练一次并存储模型后,再使用bi_lstm_net网络训练一次存储相同位置模型,那后面再通过lode此模型进行预测时,结果对正面、负面语句分类都很准确。这是什么原因呢?有没有大神指点下?

 

0
回复
学习委员
#53 回复于2019-12
54lyll #52
情感分类,预测结果与神经网络模型选择的诡异问题 我使用飞桨gitbub模型库   中的lstm_net模型进行训练预测结果如下: Predict probability of 0.533816 to be positive and 0.46618402 to be negative for review ' read the book forget the movie ' Predict probability of 0.8875562 to be positive and 0.11244377 to be negative for review ' this is a great movie ' Predict probability of 0.42828017 to be positive and 0.5717198 to be negative for review ' this is very bad ' 对于负面语句 ' this is very bad '分析结果模糊。 同样使用bi_lstm_net训练模型并预测后,得到相反的结果,即负面语句分类准确,正向积极预测模糊。 最奇怪的是,当我使用lstm_net网络训练一次并存储模型后,再使用bi_lstm_net网络训练一次存储相同位置模型,那后面再通过lode此模型进行预测时,结果对正面、负面语句分类都很准确。这是什么原因呢?有没有大神指点下?  
展开

欢迎您加入飞桨PaddlePaddle 交流群,群号432676488,群内将有专业工程师为您解答疑问哦

0
回复
佐勒F
#54 回复于2020-03

2020-3-18服务器进入不了训练

问:使用动态的Liner改之前动态FC怎么改?

0
回复
Action
#55 回复于2020-03

AIstudio 增加下最长使用时长吧

0
回复
AIStudio810258
#56 回复于2020-03

visualDL全面支持动态图吧

0
回复
thinc
#57 回复于2020-03

要是ai studio的赛后有相关讲解就好了~

2
回复
倍七
#58 回复于2020-04
项目id: https://aistudio.baidu.com/aistudio/projectdetail/157872   测试代码是fastai里教学第六课:lesson6-pets-more   learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 这句运行的时候报错:  0.00% [0/3 00:00<00:00] epoch train_loss valid_loss error_rate time  Interrupted ---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout) 723 try: --> 724 data = self._data_queue.get(timeout=timeout) 725 return (True, data) /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout) 103 timeout = deadline - time.monotonic() --> 104 if not self._poll(timeout): 105 raise Empty /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in poll(self, timeout) 256 self._check_readable() --> 257 return self._poll(timeout) 258 /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in _poll(self, timeout) 413 def _poll(self, timeout): --> 414 r = wait([self], timeout) 415 return bool(r) /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in wait(object_list, timeout) 919 while True: --> 920 ready = selector.select(timeout) 921 if ready: /opt/conda/envs/python35-paddle120-env/lib/python3.7/selectors.py in select(self, timeout) 414 try: --> 415 fd_event_list = self._selector.poll(timeout) 416 except InterruptedError: ~/work/py3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame) 65 # Python can still get and update the process status successfully. ---> 66 _error_if_any_worker_fails() 67 if previous_handler is not None: RuntimeError: DataLoader worker (pid 25593) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit. During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) in ----> 1 learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 省略。。。。 ~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout) 735 if len(failed_workers) > 0: 736 pids_str = ', '.join(str(w.pid) for w in failed_workers) --> 737 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) 738 if isinstance(e, queue.Empty): 739 return (False, None) RuntimeError: DataLoader worker (pid(s) 25723) exited unexpectedly
展开

请问您的问题解决了吗,我也是这样的问题,修改num_wokers和batchsize都没用

0
回复
skywalk163
#59 回复于2020-05
倍七 #58
请问您的问题解决了吗,我也是这样的问题,修改num_wokers和batchsize都没用

没有解决,我只好改用飞桨拉!

现在我用飞桨比用pytorch熟悉多了。

0
回复
小心
#60 回复于2020-05

来  多点数据 集

0
回复
thinc
#61 回复于2020-05
小心 #60
来  多点数据 集

一起来搬啊!

0
回复
peng4554
#62 回复于2020-05

要是aistudio上有多一点的权限就好了

0
回复
在@后输入用户全名并按空格结束,可艾特全站任一用户