飞桨深度学习500问即将上线~留下您感兴趣的话题

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

学习委员发布于2019-09

飞桨深度学习500问内容征集活动即将上线~

走过路过

留下您对于深度学习感兴趣的方向

或是写出您对AI Studio使用过程中的任何问题

或是列出您认为AI Studio应该增加的项目内容

我们将从评论中挑选幸运儿获得100小时算力卡一张~

每十楼抽选一次哦~

请幸运儿们加入百度AI Studio 官方QQ群：816164541

在群中@滴滴-AI Studio就好啦

全部评论(79)

skywalk163

#43 回复于2019-11

训练的时候报错：

RuntimeError: DataLoader worker (pid 17629) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

经百度，发现是shm空间小导致的，只有64M，太小拉

aistudio@jupyter-141218-157872:~$ df -H
Filesystem Size Used Avail Use% Mounted on
overlay 832G 165G 625G 21% /
tmpfs 68M 0 68M 0% /dev
tmpfs 60G 0 60G 0% /sys/fs/cgroup
/dev/vda1 521G 53G 469G 10% /home/aistudio
/dev/vdb 832G 165G 625G 21% /etc/hosts
shm 68M 39M 29M 59% /dev/shm
tmpfs 60G 13k 60G 1% /proc/driver/nvidia
tmpfs 12G 1.3G 11G 11% /run/nvidia-persistenced/socket
udev 60G 0 60G 0% /dev/nvidia0
tmpfs 60G 0 60G 0% /proc/acpi
tmpfs 60G 0 60G 0% /proc/scsi
tmpfs 60G 0 60G 0% /sys/firmware

请问，能增加shm空间吗？百度说，需要在docker命令里加上参数：

--shm-size=4g

此外也可以尝试在系统里手动加，但是没有sudo口令，所以加不上。

希望能增加shm孔家，谢谢。

cheeryoung79

#44 回复于2019-11

自己跑文本分类，将批量运行去掉，报如下错误：

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run(self, program, exe, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
747 self._feed_data(program, feed, feed_var_name, scope)
748 if not use_program_cache:
--> 749 exe.run(program.desc, scope, 0, True, True, fetch_var_name)
750 else:
751 exe.run_cached_prepared_ctx(ctx, scope, False, False, False)

exe.run的参数好像不对

口

口乐观口

#45 回复于2019-11

桃子小同志 #40

您好啊，请问您具体是什么场景，需要用到v5和v6的芯片呢？您用的是哪款芯片。paddle lite目前主要支持的v7和v8，考虑的是目前市场上的芯片主流的都是这一类的。如果能够判断您这边的需求在市场上有较高需求，我们也会考虑加入roadmap之中。（欢迎加入我们的lite群交流（QQ群：696965088）

展开

之前的hi3518的是arm926，现在新款的是ARM Cortex A7的了！非常感谢

无

无知者1215

#46 回复于2019-11

skywalk163 #43

训练的时候报错： RuntimeError: DataLoader worker (pid 17629) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit. 经百度，发现是shm空间小导致的，只有64M，太小拉 aistudio@jupyter-141218-157872:~$ df -H Filesystem Size Used Avail Use% Mounted on overlay 832G 165G 625G 21% / tmpfs 68M 0 68M 0% /dev tmpfs 60G 0 60G 0% /sys/fs/cgroup /dev/vda1 521G 53G 469G 10% /home/aistudio /dev/vdb 832G 165G 625G 21% /etc/hosts shm 68M 39M 29M 59% /dev/shm tmpfs 60G 13k 60G 1% /proc/driver/nvidia tmpfs 12G 1.3G 11G 11% /run/nvidia-persistenced/socket udev 60G 0 60G 0% /dev/nvidia0 tmpfs 60G 0 60G 0% /proc/acpi tmpfs 60G 0 60G 0% /proc/scsi tmpfs 60G 0 60G 0% /sys/firmware 请问，能增加shm空间吗？百度说，需要在docker命令里加上参数： --shm-size=4g 此外也可以尝试在系统里手动加，但是没有sudo口令，所以加不上。希望能增加shm孔家，谢谢。

展开

您好，你出问题的项目id是多少,是哪里出错的，能提供下信息吗

skywalk163

#47 回复于2019-11

无知者1215 #46

您好，你出问题的项目id是多少,是哪里出错的，能提供下信息吗

项目id：

https://aistudio.baidu.com/aistudio/projectdetail/157872

测试代码是fastai里教学第六课：lesson6-pets-more

learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 这句运行的时候报错：

0.00% [0/3 00:00<00:00]
epoch
train_loss
valid_loss
error_rate
time

Interrupted
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
723 try:
--> 724 data = self._data_queue.get(timeout=timeout)
725 return (True, data)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout)
103 timeout = deadline - time.monotonic()
--> 104 if not self._poll(timeout):
105 raise Empty
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in poll(self, timeout)
256 self._check_readable()
--> 257 return self._poll(timeout)
258
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in _poll(self, timeout)
413 def _poll(self, timeout):
--> 414 r = wait([self], timeout)
415 return bool(r)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in wait(object_list, timeout)
919 while True:
--> 920 ready = selector.select(timeout)
921 if ready:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/selectors.py in select(self, timeout)
414 try:
--> 415 fd_event_list = self._selector.poll(timeout)
416 except InterruptedError:
~/work/py3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
65 # Python can still get and update the process status successfully.
---> 66 _error_if_any_worker_fails()
67 if previous_handler is not None:
RuntimeError: DataLoader worker (pid 25593) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last) in
----> 1 learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8)

省略。。。。

~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
735 if len(failed_workers) > 0:
736 pids_str = ', '.join(str(w.pid) for w in failed_workers)
--> 737 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
738 if isinstance(e, queue.Empty):
739 return (False, None)
RuntimeError: DataLoader worker (pid(s) 25723) exited unexpectedly

佐

佐勒F

#48 回复于2019-11

请问这个怎么解决，我安装的paddlepaddle-gpu1.6.1.post107,我的cund是10.0，cudnn是7.6

>>> paddle.fluid.install_check.run_check()
Running Verify Fluid Program ...
E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py:774: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "", line 1, in 
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\install_check.py", line 123, in run_check
    test_simple_exe()
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\install_check.py", line 119, in test_simple_exe
    exe0.run(startup_prog)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 775, in run
    six.reraise(*sys.exc_info())
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 770, in run
    use_program_cache=use_program_cache)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 817, in _run_impl
    use_program_cache=use_program_cache)
  File "E:\Pycharm\anconda\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 894, in _run_program
    fetch_var_name)
paddle.fluid.core_noavx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
PaddleCheckError: cudaGetDeviceProperties failed in paddle::platform::GetCUDAComputeCapability, error code : 30, Please see detail in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038: unknown error at [D:\1.6.1\paddle\paddle\fluid\platform\gpu_info.cc:84]

毛毛毛

#49 回复于2019-11

您好想问一下在NOTEBOOK里创建项目的时候是paddle paddle已经装好了是吗那如果要做别的项目的话需要再去装paddlepaddle吗

学习委员

#50 回复于2019-12

毛毛毛 #49

您好想问一下在NOTEBOOK里创建项目的时候是paddle paddle已经装好了是吗那如果要做别的项目的话需要再去装paddlepaddle吗

是预装好了的哦~在创建项目的时候选择需要的paddle版本就好~

一生平安是福68

#51 回复于2019-12

生成对抗网络项目可以搞一波CycleGAN

54lyll

#52 回复于2019-12

情感分类，预测结果与神经网络模型选择的诡异问题

我使用飞桨gitbub模型库中的lstm_net模型进行训练预测结果如下：

Predict probability of 0.533816 to be positive and 0.46618402 to be negative for review ' read the book forget the movie '
Predict probability of 0.8875562 to be positive and 0.11244377 to be negative for review ' this is a great movie '
Predict probability of 0.42828017 to be positive and 0.5717198 to be negative for review ' this is very bad '

对于负面语句 ' this is very bad '分析结果模糊。

同样使用bi_lstm_net训练模型并预测后，得到相反的结果，即负面语句分类准确，正向积极预测模糊。

最奇怪的是，当我使用lstm_net网络训练一次并存储模型后，再使用bi_lstm_net网络训练一次存储相同位置模型，那后面再通过lode此模型进行预测时，结果对正面、负面语句分类都很准确。这是什么原因呢？有没有大神指点下？

学习委员

#53 回复于2019-12

54lyll #52

情感分类，预测结果与神经网络模型选择的诡异问题我使用飞桨gitbub模型库中的lstm_net模型进行训练预测结果如下： Predict probability of 0.533816 to be positive and 0.46618402 to be negative for review ' read the book forget the movie ' Predict probability of 0.8875562 to be positive and 0.11244377 to be negative for review ' this is a great movie ' Predict probability of 0.42828017 to be positive and 0.5717198 to be negative for review ' this is very bad ' 对于负面语句 ' this is very bad '分析结果模糊。同样使用bi_lstm_net训练模型并预测后，得到相反的结果，即负面语句分类准确，正向积极预测模糊。最奇怪的是，当我使用lstm_net网络训练一次并存储模型后，再使用bi_lstm_net网络训练一次存储相同位置模型，那后面再通过lode此模型进行预测时，结果对正面、负面语句分类都很准确。这是什么原因呢？有没有大神指点下？

展开

欢迎您加入飞桨PaddlePaddle 交流群，群号432676488，群内将有专业工程师为您解答疑问哦

佐

佐勒F

#54 回复于2020-03

2020-3-18服务器进入不了训练

问：使用动态的Liner改之前动态FC怎么改？

Action

#55 回复于2020-03

AIstudio 增加下最长使用时长吧

AIStudio810258

#56 回复于2020-03

visualDL全面支持动态图吧

thinc

#57 回复于2020-03

要是ai studio的赛后有相关讲解就好了~

倍

倍七

#58 回复于2020-04

skywalk163 #47

项目id： https://aistudio.baidu.com/aistudio/projectdetail/157872 测试代码是fastai里教学第六课：lesson6-pets-more learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 这句运行的时候报错： 0.00% [0/3 00:00<00:00] epoch train_loss valid_loss error_rate time Interrupted ---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout) 723 try: --> 724 data = self._data_queue.get(timeout=timeout) 725 return (True, data) /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/queues.py in get(self, block, timeout) 103 timeout = deadline - time.monotonic() --> 104 if not self._poll(timeout): 105 raise Empty /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in poll(self, timeout) 256 self._check_readable() --> 257 return self._poll(timeout) 258 /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in _poll(self, timeout) 413 def _poll(self, timeout): --> 414 r = wait([self], timeout) 415 return bool(r) /opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/connection.py in wait(object_list, timeout) 919 while True: --> 920 ready = selector.select(timeout) 921 if ready: /opt/conda/envs/python35-paddle120-env/lib/python3.7/selectors.py in select(self, timeout) 414 try: --> 415 fd_event_list = self._selector.poll(timeout) 416 except InterruptedError: ~/work/py3/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame) 65 # Python can still get and update the process status successfully. ---> 66 _error_if_any_worker_fails() 67 if previous_handler is not None: RuntimeError: DataLoader worker (pid 25593) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit. During handling of the above exception, another exception occurred: RuntimeError Traceback (most recent call last) in ----> 1 learn.fit_one_cycle(3, slice(1e-2), pct_start=0.8) 省略。。。。 ~/work/py3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout) 735 if len(failed_workers) > 0: 736 pids_str = ', '.join(str(w.pid) for w in failed_workers) --> 737 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) 738 if isinstance(e, queue.Empty): 739 return (False, None) RuntimeError: DataLoader worker (pid(s) 25723) exited unexpectedly

展开

请问您的问题解决了吗，我也是这样的问题，修改num_wokers和batchsize都没用

skywalk163

#59 回复于2020-05

倍七 #58

请问您的问题解决了吗，我也是这样的问题，修改num_wokers和batchsize都没用

没有解决，我只好改用飞桨拉！

现在我用飞桨比用pytorch熟悉多了。

小心

#60 回复于2020-05

来多点数据集

thinc

#61 回复于2020-05

小心 #60

来多点数据集

一起来搬啊！

peng4554

#62 回复于2020-05

要是aistudio上有多一点的权限就好了