首页 AI Studio教育版 帖子详情
《强化学习7日打卡营》学习笔记之网络调整
收藏
快速回复
AI Studio教育版 文章课程答疑 1493 2
《强化学习7日打卡营》学习笔记之网络调整
收藏
快速回复
AI Studio教育版 文章课程答疑 1493 2

本篇和大家分享一下,我在做《大作业:四轴飞行器悬浮》前期网络SIZE选择时的网络参数调整笔记。
在这里Actor和Critic都选择了两层FC的结构。如下所示。

class ActorModel(parl.Model):
def __init__(self, act_dim):
self.fc1 = layers.fc(size=ACTOR_MODEL_SIZE, act='relu')
self.fc2 = layers.fc(size=ACTOR_MODEL_SIZE, act='relu')
self.fc3 = layers.fc(size=act_dim, act='tanh')

def policy(self, obs):
out = self.fc1(obs)
out = self.fc2(out)
out = self.fc3(out)
return out

class CriticModel(parl.Model):
def __init__(self):
self.fc1 = layers.fc(size=CRITIC_MODEL_SIZE, act='relu')
self.fc2 = layers.fc(size=CRITIC_MODEL_SIZE, act='relu')
self.fc3 = layers.fc(size=1, act=None)

def value(self, obs, act):
concat = layers.concat([obs, act], axis=1)
Q = self.fc1(concat)
Q = self.fc2(Q)
Q = self.fc3(Q)
Q = layers.squeeze(Q, axes=[1])
return Q

那么在其他超参不变的前提下,不同的FC的SIZE在训练时收敛情况如何呢。
下面分别看一下256,192,128,96四种情况。


情况一
ACTOR_MODEL_SIZE = 256
CRITIC_MODEL_SIZE = 256

[06-21 22:31:53 MainThread @hw05-v1.py:356] Steps 146, Test reward: -1700.806426705496
[06-21 22:33:01 MainThread @hw05-v1.py:356] Steps 10112, Test reward: -596.2561082563233
[06-21 22:36:24 MainThread @hw05-v1.py:356] Steps 20030, Test reward: -2282.1797897001693
[06-21 22:40:15 MainThread @hw05-v1.py:356] Steps 30676, Test reward: -2407.1158758712363
[06-21 22:45:40 MainThread @hw05-v1.py:356] Steps 40110, Test reward: -349.126986411294
[06-21 22:53:53 MainThread @hw05-v1.py:356] Steps 50122, Test reward: -857.516794810552
[06-21 23:02:00 MainThread @hw05-v1.py:356] Steps 60150, Test reward: -1503.7728487440188
[06-21 23:10:32 MainThread @hw05-v1.py:356] Steps 70240, Test reward: -882.794797194164
[06-21 23:18:49 MainThread @hw05-v1.py:356] Steps 80061, Test reward: -1549.8080052208447
[06-21 23:27:42 MainThread @hw05-v1.py:356] Steps 90378, Test reward: -2083.989995594228
[06-21 23:36:07 MainThread @hw05-v1.py:356] Steps 100098, Test reward: -2147.181152423419
[06-21 23:42:47 MainThread @hw05-v1.py:356] Steps 110046, Test reward: -1640.1224792214803
[06-21 23:51:14 MainThread @hw05-v1.py:356] Steps 120808, Test reward: -2130.0448243820565
[06-21 23:58:35 MainThread @hw05-v1.py:356] Steps 130982, Test reward: -638.0628017428105
[06-22 00:05:42 MainThread @hw05-v1.py:356] Steps 140169, Test reward: -1724.1579971285087
[06-22 00:13:30 MainThread @hw05-v1.py:356] Steps 150119, Test reward: -4506.748354873329
[06-22 00:21:18 MainThread @hw05-v1.py:356] Steps 160073, Test reward: -1332.9875802709978
[06-22 00:30:15 MainThread @hw05-v1.py:356] Steps 170328, Test reward: -2015.2794296879117
[06-22 00:37:42 MainThread @hw05-v1.py:356] Steps 180113, Test reward: -1222.6069539442537
[06-22 00:46:01 MainThread @hw05-v1.py:356] Steps 190232, Test reward: -2354.914852317128
[06-22 00:54:59 MainThread @hw05-v1.py:356] Steps 200558, Test reward: -1548.1599205127736
[06-22 01:03:31 MainThread @hw05-v1.py:356] Steps 210014, Test reward: -1811.4597778433429
[06-22 01:10:47 MainThread @hw05-v1.py:356] Steps 220359, Test reward: -2872.5418137765046
[06-22 01:19:18 MainThread @hw05-v1.py:356] Steps 230036, Test reward: -879.1907517998216
[06-22 01:28:23 MainThread @hw05-v1.py:356] Steps 240301, Test reward: -663.3317883017937
[06-22 01:35:51 MainThread @hw05-v1.py:356] Steps 250312, Test reward: -1535.8588765369361
[06-22 01:44:10 MainThread @hw05-v1.py:356] Steps 260267, Test reward: -1402.0804749746671
[06-22 01:51:50 MainThread @hw05-v1.py:356] Steps 270163, Test reward: -1114.1574552599939
[06-22 02:00:45 MainThread @hw05-v1.py:356] Steps 280168, Test reward: -1766.1534968862823
[06-22 02:09:36 MainThread @hw05-v1.py:356] Steps 290325, Test reward: -1098.1700325679526
[06-22 02:17:08 MainThread @hw05-v1.py:356] Steps 300223, Test reward: -1048.6820430051548
[06-22 02:24:03 MainThread @hw05-v1.py:356] Steps 310053, Test reward: -1714.3435400593862
[06-22 02:32:37 MainThread @hw05-v1.py:356] Steps 320223, Test reward: -3368.5639165091043
[06-22 02:40:52 MainThread @hw05-v1.py:356] Steps 330352, Test reward: -1212.442554286275
[06-22 02:49:25 MainThread @hw05-v1.py:356] Steps 340031, Test reward: -1279.224195910458
[06-22 02:58:21 MainThread @hw05-v1.py:356] Steps 350254, Test reward: -1798.0573213944638
[06-22 03:06:39 MainThread @hw05-v1.py:356] Steps 360067, Test reward: -2901.4716026975393
[06-22 03:15:37 MainThread @hw05-v1.py:356] Steps 370222, Test reward: -491.67144967390186
[06-22 03:24:24 MainThread @hw05-v1.py:356] Steps 380218, Test reward: -1690.9849126795011
[06-22 03:33:11 MainThread @hw05-v1.py:356] Steps 390165, Test reward: -933.4521264656596
[06-22 03:42:12 MainThread @hw05-v1.py:356] Steps 400198, Test reward: -1494.9127519293081
[06-22 03:51:14 MainThread @hw05-v1.py:356] Steps 410035, Test reward: -2098.3290570085383
[06-22 03:58:48 MainThread @hw05-v1.py:356] Steps 420263, Test reward: -562.391600799572
[06-22 04:07:18 MainThread @hw05-v1.py:356] Steps 430266, Test reward: -734.0009007640546
[06-22 04:15:44 MainThread @hw05-v1.py:356] Steps 440012, Test reward: -752.6809121898956
[06-22 04:24:44 MainThread @hw05-v1.py:356] Steps 450084, Test reward: -807.9862501778732
[06-22 04:33:33 MainThread @hw05-v1.py:356] Steps 460239, Test reward: -768.9486142995946
[06-22 04:42:21 MainThread @hw05-v1.py:356] Steps 470023, Test reward: -1451.3082016975764
[06-22 04:51:24 MainThread @hw05-v1.py:356] Steps 480289, Test reward: -704.8861017209724
[06-22 04:59:13 MainThread @hw05-v1.py:356] Steps 490579, Test reward: -1291.0813561074308
[06-22 05:05:43 MainThread @hw05-v1.py:356] Steps 500067, Test reward: -530.2484607244259
[06-22 05:13:54 MainThread @hw05-v1.py:356] Steps 510173, Test reward: -820.6288430870812
[06-22 05:21:31 MainThread @hw05-v1.py:356] Steps 520299, Test reward: -1400.3266448899044
[06-22 05:30:01 MainThread @hw05-v1.py:356] Steps 530041, Test reward: -806.2989194646635
[06-22 05:39:05 MainThread @hw05-v1.py:356] Steps 540897, Test reward: -1068.4452278722636
[06-22 05:47:07 MainThread @hw05-v1.py:356] Steps 550276, Test reward: -1054.9583816353465
[06-22 05:56:06 MainThread @hw05-v1.py:356] Steps 560090, Test reward: -767.2149014931867
[06-22 06:05:04 MainThread @hw05-v1.py:356] Steps 570288, Test reward: -2456.2596330850574
[06-22 06:12:35 MainThread @hw05-v1.py:356] Steps 580026, Test reward: -691.0786157681566
[06-22 06:22:07 MainThread @hw05-v1.py:356] Steps 590252, Test reward: -1587.0018443949732
[06-22 06:30:55 MainThread @hw05-v1.py:356] Steps 600108, Test reward: -455.06304056188753
[06-22 06:39:55 MainThread @hw05-v1.py:356] Steps 610353, Test reward: -303.7387623741239
[06-22 06:47:58 MainThread @hw05-v1.py:356] Steps 620483, Test reward: -805.596918329632
[06-22 06:56:30 MainThread @hw05-v1.py:356] Steps 630069, Test reward: -633.3636558366767
[06-22 07:04:58 MainThread @hw05-v1.py:356] Steps 640122, Test reward: -710.6246475376696
[06-22 07:11:53 MainThread @hw05-v1.py:356] Steps 650129, Test reward: 558.4131329715196
[06-22 07:20:04 MainThread @hw05-v1.py:356] Steps 660135, Test reward: -920.5650848225225
[06-22 07:27:47 MainThread @hw05-v1.py:356] Steps 670635, Test reward: -426.45576167417977
[06-22 07:36:03 MainThread @hw05-v1.py:356] Steps 680015, Test reward: 8.559098365941827
[06-22 07:45:11 MainThread @hw05-v1.py:356] Steps 690236, Test reward: -829.4398190762283
[06-22 07:54:04 MainThread @hw05-v1.py:356] Steps 700103, Test reward: -475.40376050814956
[06-22 08:02:08 MainThread @hw05-v1.py:356] Steps 710486, Test reward: -312.862535134536
[06-22 08:08:59 MainThread @hw05-v1.py:356] Steps 720032, Test reward: -1217.7179468702202
[06-22 08:16:11 MainThread @hw05-v1.py:356] Steps 730240, Test reward: -1060.652994290966
[06-22 08:23:20 MainThread @hw05-v1.py:356] Steps 740531, Test reward: -1018.2728166702469
[06-22 08:30:43 MainThread @hw05-v1.py:356] Steps 750346, Test reward: 325.6599341150508
[06-22 08:37:39 MainThread @hw05-v1.py:356] Steps 760275, Test reward: -152.86613264933024
[06-22 08:44:38 MainThread @hw05-v1.py:356] Steps 770171, Test reward: -233.53009630935412
[06-22 08:52:41 MainThread @hw05-v1.py:356] Steps 780086, Test reward: -375.0858888911174
[06-22 09:00:58 MainThread @hw05-v1.py:356] Steps 790205, Test reward: 332.4173254451142
[06-22 09:09:26 MainThread @hw05-v1.py:356] Steps 800037, Test reward: 884.1683598422962
[06-22 09:19:13 MainThread @hw05-v1.py:356] Steps 810716, Test reward: -776.8838620391595
[06-22 09:28:06 MainThread @hw05-v1.py:356] Steps 820328, Test reward: -602.9508595685186
[06-22 09:36:00 MainThread @hw05-v1.py:356] Steps 830335, Test reward: 934.4122438078491
[06-22 09:42:55 MainThread @hw05-v1.py:356] Steps 840207, Test reward: -404.8603169363038
[06-22 09:50:42 MainThread @hw05-v1.py:356] Steps 850908, Test reward: -369.17348402519417
[06-22 09:57:27 MainThread @hw05-v1.py:356] Steps 860623, Test reward: 954.1124968089231
[06-22 10:02:03 MainThread @hw05-v1.py:356] Steps 870415, Test reward: 551.8287705292196
[06-22 10:08:46 MainThread @hw05-v1.py:356] Steps 880563, Test reward: 1028.020564888524
[06-22 10:16:36 MainThread @hw05-v1.py:356] Steps 890337, Test reward: -2365.1029547597363
[06-22 10:23:01 MainThread @hw05-v1.py:356] Steps 900288, Test reward: 777.7792214682383
[06-22 10:28:21 MainThread @hw05-v1.py:356] Steps 910180, Test reward: -2751.4257469524987
[06-22 10:33:34 MainThread @hw05-v1.py:356] Steps 920669, Test reward: -327.49938724541346
[06-22 10:40:34 MainThread @hw05-v1.py:356] Steps 930629, Test reward: 2447.089597802201
[06-22 10:49:23 MainThread @hw05-v1.py:356] Steps 940832, Test reward: 2076.544492212057
[06-22 10:54:22 MainThread @hw05-v1.py:356] Steps 950275, Test reward: 1106.9466597484297
[06-22 10:58:49 MainThread @hw05-v1.py:356] Steps 960673, Test reward: 1557.210010659887
[06-22 11:05:44 MainThread @hw05-v1.py:356] Steps 970491, Test reward: 4483.829704634486
[06-22 11:10:36 MainThread @hw05-v1.py:356] Steps 980471, Test reward: 2818.799495587752
[06-22 11:15:11 MainThread @hw05-v1.py:356] Steps 990303, Test reward: 819.4551919747184
[06-22 11:19:54 MainThread @hw05-v1.py:356] Steps 1000953, Test reward: 223.748957332402
[06-22 11:20:28 MainThread @hw05-v1.py:380] Evaluate reward: -471.1924942243069


情况二
ACTOR_MODEL_SIZE = 192
CRITIC_MODEL_SIZE = 192

[06-21 22:38:20 MainThread @hw05-v1.py:356] Steps 148, Test reward: -1717.3668349145871
[06-21 22:39:34 MainThread @hw05-v1.py:356] Steps 10140, Test reward: -731.2550947459227
[06-21 22:45:57 MainThread @hw05-v1.py:356] Steps 20081, Test reward: -676.023075149617
[06-21 22:53:34 MainThread @hw05-v1.py:356] Steps 30227, Test reward: -4157.450209377154
[06-21 23:00:45 MainThread @hw05-v1.py:356] Steps 40000, Test reward: -2276.555661727213
[06-21 23:07:56 MainThread @hw05-v1.py:356] Steps 50196, Test reward: -2527.6334012049433
[06-21 23:15:34 MainThread @hw05-v1.py:356] Steps 60189, Test reward: -1370.5313844760362
[06-21 23:23:27 MainThread @hw05-v1.py:356] Steps 70185, Test reward: -3587.203217120275
[06-21 23:31:11 MainThread @hw05-v1.py:356] Steps 80137, Test reward: -2348.620619088043
[06-21 23:38:23 MainThread @hw05-v1.py:356] Steps 90073, Test reward: -3970.8341325316105
[06-21 23:44:19 MainThread @hw05-v1.py:356] Steps 100067, Test reward: -1017.0264217728585
[06-21 23:50:58 MainThread @hw05-v1.py:356] Steps 110221, Test reward: -705.4908618832517
[06-21 23:58:23 MainThread @hw05-v1.py:356] Steps 120158, Test reward: -3322.520271807424
[06-22 00:06:15 MainThread @hw05-v1.py:356] Steps 130103, Test reward: -494.75957869087915
[06-22 00:13:10 MainThread @hw05-v1.py:356] Steps 140250, Test reward: -188.31484708172115
[06-22 00:19:43 MainThread @hw05-v1.py:356] Steps 150156, Test reward: -1179.5468735074055
[06-22 00:26:46 MainThread @hw05-v1.py:356] Steps 160558, Test reward: -2105.0047050208186
[06-22 00:32:57 MainThread @hw05-v1.py:356] Steps 170106, Test reward: -397.1974326416874
[06-22 00:39:49 MainThread @hw05-v1.py:356] Steps 180085, Test reward: -859.7878862073369
[06-22 00:47:10 MainThread @hw05-v1.py:356] Steps 190134, Test reward: -2364.7298629554048
[06-22 00:55:03 MainThread @hw05-v1.py:356] Steps 200410, Test reward: -380.49324696520205
[06-22 01:02:44 MainThread @hw05-v1.py:356] Steps 210345, Test reward: -462.1609777132362
[06-22 01:11:06 MainThread @hw05-v1.py:356] Steps 220506, Test reward: -3596.2764116029953
[06-22 01:17:59 MainThread @hw05-v1.py:356] Steps 230011, Test reward: -1216.3878118760933
[06-22 01:25:50 MainThread @hw05-v1.py:356] Steps 240165, Test reward: -33.21578552824526
[06-22 01:33:45 MainThread @hw05-v1.py:356] Steps 250274, Test reward: -542.5137175055959
[06-22 01:40:58 MainThread @hw05-v1.py:356] Steps 260851, Test reward: -683.3782534978504
[06-22 01:47:31 MainThread @hw05-v1.py:356] Steps 270016, Test reward: -1444.1323692197534
[06-22 01:53:39 MainThread @hw05-v1.py:356] Steps 280054, Test reward: -851.6494904818616
[06-22 02:01:03 MainThread @hw05-v1.py:356] Steps 290240, Test reward: -1384.9305054633755
[06-22 02:10:03 MainThread @hw05-v1.py:356] Steps 300958, Test reward: -1847.1301036365498
[06-22 02:17:34 MainThread @hw05-v1.py:356] Steps 310054, Test reward: -1280.3735055212867
[06-22 02:24:34 MainThread @hw05-v1.py:356] Steps 320207, Test reward: -1880.6913541011177
[06-22 02:31:19 MainThread @hw05-v1.py:356] Steps 330077, Test reward: 20.397941845201252
[06-22 02:39:06 MainThread @hw05-v1.py:356] Steps 340038, Test reward: -1132.1675919686554
[06-22 02:46:49 MainThread @hw05-v1.py:356] Steps 350230, Test reward: 204.71506857634395
[06-22 02:54:34 MainThread @hw05-v1.py:356] Steps 360193, Test reward: -906.246050077317
[06-22 03:02:38 MainThread @hw05-v1.py:356] Steps 370553, Test reward: -815.3717057462736
[06-22 03:10:02 MainThread @hw05-v1.py:356] Steps 380243, Test reward: -74.13008300871795
[06-22 03:18:20 MainThread @hw05-v1.py:356] Steps 390475, Test reward: -715.4623355339812
[06-22 03:26:13 MainThread @hw05-v1.py:356] Steps 400152, Test reward: -96.95510277015669
[06-22 03:33:01 MainThread @hw05-v1.py:356] Steps 410637, Test reward: 70.12802635070916
[06-22 03:40:57 MainThread @hw05-v1.py:356] Steps 420257, Test reward: -1185.3246305961684
[06-22 03:49:06 MainThread @hw05-v1.py:356] Steps 430329, Test reward: 58.411485243209746
[06-22 03:57:00 MainThread @hw05-v1.py:356] Steps 440236, Test reward: -1066.7379612567852
[06-22 04:04:45 MainThread @hw05-v1.py:356] Steps 450351, Test reward: -309.9663108325991
[06-22 04:11:10 MainThread @hw05-v1.py:356] Steps 460321, Test reward: 365.7464389993108
[06-22 04:19:09 MainThread @hw05-v1.py:356] Steps 470441, Test reward: -211.53593599324958
[06-22 04:27:30 MainThread @hw05-v1.py:356] Steps 480530, Test reward: -134.82163889448105
[06-22 04:35:27 MainThread @hw05-v1.py:356] Steps 490196, Test reward: -13.72098350273293
[06-22 04:43:46 MainThread @hw05-v1.py:356] Steps 500349, Test reward: -358.2226917279892
[06-22 04:51:13 MainThread @hw05-v1.py:356] Steps 510114, Test reward: 287.07997234721813
[06-22 04:59:53 MainThread @hw05-v1.py:356] Steps 520507, Test reward: 134.8262475366188
[06-22 05:07:18 MainThread @hw05-v1.py:356] Steps 530130, Test reward: -728.5110112959326
[06-22 05:14:31 MainThread @hw05-v1.py:356] Steps 540624, Test reward: 462.80591757717696
[06-22 05:22:15 MainThread @hw05-v1.py:356] Steps 550159, Test reward: 293.13089979247445
[06-22 05:30:25 MainThread @hw05-v1.py:356] Steps 560420, Test reward: -169.99895386442182
[06-22 05:38:00 MainThread @hw05-v1.py:356] Steps 570184, Test reward: -163.51095525221416
[06-22 05:46:41 MainThread @hw05-v1.py:356] Steps 580267, Test reward: 82.20299183502817
[06-22 05:55:01 MainThread @hw05-v1.py:356] Steps 590302, Test reward: -368.9409631327404
[06-22 06:04:09 MainThread @hw05-v1.py:356] Steps 600854, Test reward: 141.51059791061024
[06-22 06:12:12 MainThread @hw05-v1.py:356] Steps 610877, Test reward: -77.87665733254124
[06-22 06:19:08 MainThread @hw05-v1.py:356] Steps 620631, Test reward: -380.3258041097182
[06-22 06:27:19 MainThread @hw05-v1.py:356] Steps 630103, Test reward: -1085.9264597400334
[06-22 06:35:45 MainThread @hw05-v1.py:356] Steps 640196, Test reward: 113.28467570827206
[06-22 06:43:49 MainThread @hw05-v1.py:356] Steps 650546, Test reward: -463.74301197090307
[06-22 06:51:50 MainThread @hw05-v1.py:356] Steps 660137, Test reward: -686.3230828903294
[06-22 07:00:09 MainThread @hw05-v1.py:356] Steps 670208, Test reward: 1102.230326997474
[06-22 07:08:13 MainThread @hw05-v1.py:356] Steps 680029, Test reward: -1754.2805393732856
[06-22 07:17:04 MainThread @hw05-v1.py:356] Steps 690326, Test reward: -3131.3936964381337
[06-22 07:24:43 MainThread @hw05-v1.py:356] Steps 700087, Test reward: 503.1533481252679
[06-22 07:32:53 MainThread @hw05-v1.py:356] Steps 710241, Test reward: -162.34585841407866
[06-22 07:41:28 MainThread @hw05-v1.py:356] Steps 720599, Test reward: 1483.727660780239
[06-22 07:49:37 MainThread @hw05-v1.py:356] Steps 730178, Test reward: -2462.5498339460855
[06-22 07:57:28 MainThread @hw05-v1.py:356] Steps 740047, Test reward: 3166.77009277287
[06-22 08:04:55 MainThread @hw05-v1.py:356] Steps 750417, Test reward: 3811.543817567684
[06-22 08:12:42 MainThread @hw05-v1.py:356] Steps 760903, Test reward: 660.8475093601903
[06-22 08:18:36 MainThread @hw05-v1.py:356] Steps 770197, Test reward: 228.00324345315502
[06-22 08:26:53 MainThread @hw05-v1.py:356] Steps 780708, Test reward: 1968.4488297758094
[06-22 08:35:31 MainThread @hw05-v1.py:356] Steps 790676, Test reward: 2078.7968506608804
[06-22 08:42:50 MainThread @hw05-v1.py:356] Steps 800868, Test reward: 2359.7511434415874
[06-22 08:50:40 MainThread @hw05-v1.py:356] Steps 810577, Test reward: 4350.50017038279
[06-22 08:58:59 MainThread @hw05-v1.py:356] Steps 820865, Test reward: 1584.7545270675164
[06-22 09:07:16 MainThread @hw05-v1.py:356] Steps 830584, Test reward: 2812.7140739207794
[06-22 09:15:30 MainThread @hw05-v1.py:356] Steps 840196, Test reward: 3453.78322160048
[06-22 09:23:50 MainThread @hw05-v1.py:356] Steps 850845, Test reward: 3727.6258307068324
[06-22 09:31:24 MainThread @hw05-v1.py:356] Steps 860453, Test reward: 1930.718369194158
[06-22 09:38:22 MainThread @hw05-v1.py:356] Steps 870514, Test reward: 3036.930599202952
[06-22 09:44:46 MainThread @hw05-v1.py:356] Steps 880031, Test reward: 3559.0976048697325
[06-22 09:51:37 MainThread @hw05-v1.py:356] Steps 890031, Test reward: 3602.762257351339
[06-22 09:58:10 MainThread @hw05-v1.py:356] Steps 900935, Test reward: 2471.4515931569827
[06-22 10:02:38 MainThread @hw05-v1.py:356] Steps 910824, Test reward: 3447.6730598731046
[06-22 10:08:16 MainThread @hw05-v1.py:356] Steps 920867, Test reward: 3570.195233657163
[06-22 10:14:18 MainThread @hw05-v1.py:356] Steps 930867, Test reward: 4056.704182575736
[06-22 10:20:34 MainThread @hw05-v1.py:356] Steps 940797, Test reward: 5625.240856460428
[06-22 10:28:27 MainThread @hw05-v1.py:356] Steps 950797, Test reward: 5470.545896695958
[06-22 10:35:37 MainThread @hw05-v1.py:356] Steps 960633, Test reward: 3562.69288026089
[06-22 10:42:11 MainThread @hw05-v1.py:356] Steps 970633, Test reward: 4591.598849029477
[06-22 10:48:11 MainThread @hw05-v1.py:356] Steps 980621, Test reward: 5305.803331356341
[06-22 10:55:46 MainThread @hw05-v1.py:356] Steps 990621, Test reward: 5826.029451769832
[06-22 11:03:36 MainThread @hw05-v1.py:356] Steps 1000417, Test reward: 3756.173821592451
[06-22 11:04:51 MainThread @hw05-v1.py:380] Evaluate reward: 4827.196838272286


情况三
ACTOR_MODEL_SIZE = 128
CRITIC_MODEL_SIZE = 128

[06-21 22:40:15 MainThread @hw05-v1.py:356] Steps 120, Test reward: -908.3218089235846
[06-21 22:42:59 MainThread @hw05-v1.py:356] Steps 10069, Test reward: -1525.8710374382417
[06-21 22:50:00 MainThread @hw05-v1.py:356] Steps 20094, Test reward: -2914.383594368284
[06-21 22:56:59 MainThread @hw05-v1.py:356] Steps 30226, Test reward: -5239.132011875457
[06-21 23:03:28 MainThread @hw05-v1.py:356] Steps 40186, Test reward: -1255.2027066238657
[06-21 23:09:44 MainThread @hw05-v1.py:356] Steps 50178, Test reward: -2171.185143939497
[06-21 23:16:56 MainThread @hw05-v1.py:356] Steps 60207, Test reward: -2699.42115988864
[06-21 23:24:00 MainThread @hw05-v1.py:356] Steps 70069, Test reward: 298.89371076524657
[06-21 23:31:38 MainThread @hw05-v1.py:356] Steps 80088, Test reward: -3724.0743993374717
[06-21 23:39:34 MainThread @hw05-v1.py:356] Steps 90672, Test reward: -5519.664003506499
[06-21 23:46:30 MainThread @hw05-v1.py:356] Steps 100072, Test reward: -2255.685452339715
[06-21 23:53:09 MainThread @hw05-v1.py:356] Steps 110116, Test reward: -922.0366582723257
[06-22 00:00:27 MainThread @hw05-v1.py:356] Steps 120634, Test reward: -792.8609030746959
[06-22 00:06:32 MainThread @hw05-v1.py:356] Steps 130224, Test reward: -2465.3817300764013
[06-22 00:13:51 MainThread @hw05-v1.py:356] Steps 140211, Test reward: -5396.344735623849
[06-22 00:21:07 MainThread @hw05-v1.py:356] Steps 150308, Test reward: -671.0581603945426
[06-22 00:28:27 MainThread @hw05-v1.py:356] Steps 160263, Test reward: -937.935077040227
[06-22 00:36:09 MainThread @hw05-v1.py:356] Steps 170690, Test reward: -1347.0322225476011
[06-22 00:43:19 MainThread @hw05-v1.py:356] Steps 180100, Test reward: -1005.1139465656012
[06-22 00:50:25 MainThread @hw05-v1.py:356] Steps 190646, Test reward: -2432.039822061076
[06-22 00:57:28 MainThread @hw05-v1.py:356] Steps 200406, Test reward: -2788.0741976894356
[06-22 01:05:26 MainThread @hw05-v1.py:356] Steps 210657, Test reward: -3621.407388547939
[06-22 01:13:03 MainThread @hw05-v1.py:356] Steps 220638, Test reward: -3510.922076579684
[06-22 01:20:19 MainThread @hw05-v1.py:356] Steps 230145, Test reward: -3337.785932180954
[06-22 01:28:07 MainThread @hw05-v1.py:356] Steps 240488, Test reward: -1933.2211315655163
[06-22 01:35:01 MainThread @hw05-v1.py:356] Steps 250067, Test reward: -915.5610299472179
[06-22 01:43:03 MainThread @hw05-v1.py:356] Steps 260692, Test reward: -2248.270606938438
[06-22 01:50:40 MainThread @hw05-v1.py:356] Steps 270908, Test reward: -2198.1989558061446
[06-22 01:57:18 MainThread @hw05-v1.py:356] Steps 280666, Test reward: -276.9215391843842
[06-22 02:04:10 MainThread @hw05-v1.py:356] Steps 290129, Test reward: -1016.268009167807
[06-22 02:10:30 MainThread @hw05-v1.py:356] Steps 300092, Test reward: -2156.0134235942573
[06-22 02:16:24 MainThread @hw05-v1.py:356] Steps 310040, Test reward: -2617.280237421796
[06-22 02:22:49 MainThread @hw05-v1.py:356] Steps 320577, Test reward: -3402.6822733488707
[06-22 02:29:41 MainThread @hw05-v1.py:356] Steps 330294, Test reward: -977.3413776423524
[06-22 02:37:33 MainThread @hw05-v1.py:356] Steps 340503, Test reward: -2669.728557690949
[06-22 02:44:50 MainThread @hw05-v1.py:356] Steps 350366, Test reward: -696.2066327979371
[06-22 02:52:02 MainThread @hw05-v1.py:356] Steps 360319, Test reward: -802.0135012464773
[06-22 02:59:30 MainThread @hw05-v1.py:356] Steps 370218, Test reward: -1421.6957996536892
[06-22 03:06:50 MainThread @hw05-v1.py:356] Steps 380583, Test reward: -576.1689844263731
[06-22 03:12:22 MainThread @hw05-v1.py:356] Steps 390313, Test reward: -601.6169738222961
[06-22 03:18:45 MainThread @hw05-v1.py:356] Steps 400802, Test reward: -1065.2237051737218
[06-22 03:25:23 MainThread @hw05-v1.py:356] Steps 410494, Test reward: -1950.830775478746
[06-22 03:30:59 MainThread @hw05-v1.py:356] Steps 420194, Test reward: -162.38443239908426
[06-22 03:38:01 MainThread @hw05-v1.py:356] Steps 430310, Test reward: -836.7553265934628
[06-22 03:43:47 MainThread @hw05-v1.py:356] Steps 440035, Test reward: -185.21144352090067
[06-22 03:51:23 MainThread @hw05-v1.py:356] Steps 450583, Test reward: -438.46667220175294
[06-22 03:58:39 MainThread @hw05-v1.py:356] Steps 460014, Test reward: 423.99257969852687
[06-22 04:06:44 MainThread @hw05-v1.py:356] Steps 470250, Test reward: -135.47777783830415
[06-22 04:14:04 MainThread @hw05-v1.py:356] Steps 480055, Test reward: -2390.9681666547185
[06-22 04:21:34 MainThread @hw05-v1.py:356] Steps 490387, Test reward: -580.6672220963708
[06-22 04:29:08 MainThread @hw05-v1.py:356] Steps 500536, Test reward: -1031.2108391192812
[06-22 04:36:22 MainThread @hw05-v1.py:356] Steps 510196, Test reward: -88.41495603803075
[06-22 04:42:06 MainThread @hw05-v1.py:356] Steps 520123, Test reward: 62.63623259589326
[06-22 04:48:40 MainThread @hw05-v1.py:356] Steps 530217, Test reward: -2906.5564903272925
[06-22 04:55:46 MainThread @hw05-v1.py:356] Steps 540279, Test reward: -59.82264506081666
[06-22 05:02:15 MainThread @hw05-v1.py:356] Steps 550429, Test reward: 497.2237613311712
[06-22 05:09:05 MainThread @hw05-v1.py:356] Steps 560771, Test reward: -36.0384040661141
[06-22 05:14:56 MainThread @hw05-v1.py:356] Steps 570394, Test reward: -511.4686241002726
[06-22 05:21:39 MainThread @hw05-v1.py:356] Steps 580819, Test reward: -560.8192591143123
[06-22 05:28:19 MainThread @hw05-v1.py:356] Steps 590368, Test reward: 691.8300830623207
[06-22 05:35:43 MainThread @hw05-v1.py:356] Steps 600783, Test reward: 305.52107191024766
[06-22 05:43:04 MainThread @hw05-v1.py:356] Steps 610502, Test reward: -617.5818523077432
[06-22 05:50:06 MainThread @hw05-v1.py:356] Steps 620162, Test reward: 291.52268448895404
[06-22 05:57:53 MainThread @hw05-v1.py:356] Steps 630242, Test reward: 1395.6260486467843
[06-22 06:04:22 MainThread @hw05-v1.py:356] Steps 640124, Test reward: 773.2846804473193
[06-22 06:11:35 MainThread @hw05-v1.py:356] Steps 650268, Test reward: 1066.4346815397673
[06-22 06:17:48 MainThread @hw05-v1.py:356] Steps 660203, Test reward: 1580.9164978723757
[06-22 06:25:20 MainThread @hw05-v1.py:356] Steps 670169, Test reward: 1498.8405361544385
[06-22 06:31:15 MainThread @hw05-v1.py:356] Steps 680004, Test reward: -32.562685655075846
[06-22 06:38:36 MainThread @hw05-v1.py:356] Steps 690490, Test reward: 476.7426738862654
[06-22 06:45:49 MainThread @hw05-v1.py:356] Steps 700041, Test reward: 967.6726202935939
[06-22 06:53:09 MainThread @hw05-v1.py:356] Steps 710235, Test reward: 3571.5434983207283
[06-22 07:00:56 MainThread @hw05-v1.py:356] Steps 720300, Test reward: 3240.908714955565
[06-22 07:07:57 MainThread @hw05-v1.py:356] Steps 730751, Test reward: 2154.106349404805
[06-22 07:13:35 MainThread @hw05-v1.py:356] Steps 740484, Test reward: 1420.9584153029305
[06-22 07:20:56 MainThread @hw05-v1.py:356] Steps 750436, Test reward: 2502.9684285028316
[06-22 07:27:26 MainThread @hw05-v1.py:356] Steps 760427, Test reward: 5004.882724318376
[06-22 07:35:39 MainThread @hw05-v1.py:356] Steps 770809, Test reward: 3362.9364328941656
[06-22 07:42:20 MainThread @hw05-v1.py:356] Steps 780615, Test reward: 1927.0035251081222
[06-22 07:49:41 MainThread @hw05-v1.py:356] Steps 790731, Test reward: 1115.8668800859655
[06-22 07:57:00 MainThread @hw05-v1.py:356] Steps 800138, Test reward: 2258.0689716793595
[06-22 08:04:45 MainThread @hw05-v1.py:356] Steps 810364, Test reward: 985.13563908015
[06-22 08:11:21 MainThread @hw05-v1.py:356] Steps 820890, Test reward: -595.6799526944831
[06-22 08:18:55 MainThread @hw05-v1.py:356] Steps 830963, Test reward: 441.21013879154106
[06-22 08:26:21 MainThread @hw05-v1.py:356] Steps 840176, Test reward: 3172.0486084676877
[06-22 08:32:22 MainThread @hw05-v1.py:356] Steps 850114, Test reward: -1459.8211147387206
[06-22 08:38:31 MainThread @hw05-v1.py:356] Steps 860683, Test reward: -194.5917140867492
[06-22 08:44:27 MainThread @hw05-v1.py:356] Steps 870562, Test reward: 1667.1862104162792
[06-22 08:51:08 MainThread @hw05-v1.py:356] Steps 880868, Test reward: 296.77362545718535
[06-22 08:57:25 MainThread @hw05-v1.py:356] Steps 890094, Test reward: 555.5232515944328
[06-22 09:04:22 MainThread @hw05-v1.py:356] Steps 900400, Test reward: -1177.820764429534
[06-22 09:11:53 MainThread @hw05-v1.py:356] Steps 910384, Test reward: 12.703656818721583
[06-22 09:19:52 MainThread @hw05-v1.py:356] Steps 920455, Test reward: -3633.1080862607514
[06-22 09:27:12 MainThread @hw05-v1.py:356] Steps 930616, Test reward: 1942.2243036188004
[06-22 09:34:39 MainThread @hw05-v1.py:356] Steps 940057, Test reward: -336.8532937182439
[06-22 09:42:39 MainThread @hw05-v1.py:356] Steps 950384, Test reward: -1249.729168577946
[06-22 09:49:26 MainThread @hw05-v1.py:356] Steps 960362, Test reward: -2249.753554248257
[06-22 09:55:07 MainThread @hw05-v1.py:356] Steps 970210, Test reward: -823.4084738421528
[06-22 10:01:13 MainThread @hw05-v1.py:356] Steps 980764, Test reward: -1520.6836013534094
[06-22 10:08:33 MainThread @hw05-v1.py:356] Steps 990332, Test reward: -1149.2414109302267
[06-22 10:15:52 MainThread @hw05-v1.py:356] Steps 1000017, Test reward: -1295.5062268977454
[06-22 10:17:00 MainThread @hw05-v1.py:380] Evaluate reward: 243.17577703374013


情况四
ACTOR_MODEL_SIZE = 96
CRITIC_MODEL_SIZE = 96

[06-21 22:42:42 MainThread @hw05-v1.py:356] Steps 199, Test reward: -1098.6922514055498
[06-21 22:45:23 MainThread @hw05-v1.py:356] Steps 10076, Test reward: -1707.3351772351841
[06-21 22:51:50 MainThread @hw05-v1.py:356] Steps 20004, Test reward: -2297.23724552508
[06-21 22:58:34 MainThread @hw05-v1.py:356] Steps 30054, Test reward: -2014.961154587177
[06-21 23:05:29 MainThread @hw05-v1.py:356] Steps 40321, Test reward: -5569.42058897087
[06-21 23:11:46 MainThread @hw05-v1.py:356] Steps 50197, Test reward: -2137.214796088984
[06-21 23:19:02 MainThread @hw05-v1.py:356] Steps 60188, Test reward: -4553.901672352282
[06-21 23:25:57 MainThread @hw05-v1.py:356] Steps 70032, Test reward: -3212.6783639184146
[06-21 23:32:51 MainThread @hw05-v1.py:356] Steps 80082, Test reward: -693.5175824806996
[06-21 23:39:46 MainThread @hw05-v1.py:356] Steps 90091, Test reward: -1990.0352969399537
[06-21 23:47:04 MainThread @hw05-v1.py:356] Steps 100218, Test reward: -3651.0994446924246
[06-21 23:53:15 MainThread @hw05-v1.py:356] Steps 110428, Test reward: -3012.1004909933035
[06-21 23:59:27 MainThread @hw05-v1.py:356] Steps 120296, Test reward: -587.302653588665
[06-22 00:06:16 MainThread @hw05-v1.py:356] Steps 130215, Test reward: -1994.5828498680385
[06-22 00:13:11 MainThread @hw05-v1.py:356] Steps 140234, Test reward: -753.2002981459527
[06-22 00:19:37 MainThread @hw05-v1.py:356] Steps 150034, Test reward: -1445.5102605040636
[06-22 00:25:50 MainThread @hw05-v1.py:356] Steps 160105, Test reward: -1804.8161801208075
[06-22 00:32:57 MainThread @hw05-v1.py:356] Steps 170616, Test reward: -5127.134211655926
[06-22 00:40:27 MainThread @hw05-v1.py:356] Steps 180750, Test reward: -2519.425805785317
[06-22 00:47:02 MainThread @hw05-v1.py:356] Steps 190083, Test reward: -1703.293779049818
[06-22 00:53:30 MainThread @hw05-v1.py:356] Steps 200281, Test reward: -408.0099075879602
[06-22 00:59:45 MainThread @hw05-v1.py:356] Steps 210149, Test reward: -746.4955114077364
[06-22 01:06:15 MainThread @hw05-v1.py:356] Steps 220334, Test reward: -3616.9691302565043
[06-22 01:11:54 MainThread @hw05-v1.py:356] Steps 230284, Test reward: -1830.9060135379273
[06-22 01:17:48 MainThread @hw05-v1.py:356] Steps 240003, Test reward: -1316.7274417570284
[06-22 01:25:10 MainThread @hw05-v1.py:356] Steps 250426, Test reward: -2223.845515021142
[06-22 01:31:31 MainThread @hw05-v1.py:356] Steps 260050, Test reward: -2660.0585177372527
[06-22 01:37:24 MainThread @hw05-v1.py:356] Steps 270332, Test reward: -3351.2644747494496
[06-22 01:44:51 MainThread @hw05-v1.py:356] Steps 280314, Test reward: -2037.7914837827175
[06-22 01:51:44 MainThread @hw05-v1.py:356] Steps 290112, Test reward: -1019.1673200940613
[06-22 01:57:52 MainThread @hw05-v1.py:356] Steps 300089, Test reward: -1445.0810167551438
[06-22 02:05:04 MainThread @hw05-v1.py:356] Steps 310308, Test reward: -1014.5602303984936
[06-22 02:11:17 MainThread @hw05-v1.py:356] Steps 320802, Test reward: -1233.3861558984825
[06-22 02:17:45 MainThread @hw05-v1.py:356] Steps 330356, Test reward: -1890.7265039434012
[06-22 02:23:42 MainThread @hw05-v1.py:356] Steps 340384, Test reward: -785.9636596674472
[06-22 02:29:15 MainThread @hw05-v1.py:356] Steps 350123, Test reward: -719.3738803601149
[06-22 02:35:24 MainThread @hw05-v1.py:356] Steps 360621, Test reward: -332.7325008835124
[06-22 02:41:57 MainThread @hw05-v1.py:356] Steps 370032, Test reward: -390.5549375870993
[06-22 02:49:31 MainThread @hw05-v1.py:356] Steps 380008, Test reward: 174.24037211990156
[06-22 02:56:37 MainThread @hw05-v1.py:356] Steps 390081, Test reward: -2160.0319355599445
[06-22 03:02:46 MainThread @hw05-v1.py:356] Steps 400187, Test reward: -268.17754755820505
[06-22 03:08:39 MainThread @hw05-v1.py:356] Steps 410130, Test reward: -774.7979948858634
[06-22 03:14:00 MainThread @hw05-v1.py:356] Steps 420136, Test reward: -508.61058312905834
[06-22 03:19:56 MainThread @hw05-v1.py:356] Steps 430505, Test reward: 1.887079316680115
[06-22 03:25:59 MainThread @hw05-v1.py:356] Steps 440411, Test reward: -676.5310191293636
[06-22 03:32:51 MainThread @hw05-v1.py:356] Steps 450174, Test reward: -67.6732367320163
[06-22 03:39:59 MainThread @hw05-v1.py:356] Steps 460661, Test reward: -138.4761152087226
[06-22 03:45:46 MainThread @hw05-v1.py:356] Steps 470450, Test reward: -1305.8526564230972
[06-22 03:52:32 MainThread @hw05-v1.py:356] Steps 480204, Test reward: 95.41356858829468
[06-22 03:58:45 MainThread @hw05-v1.py:356] Steps 490215, Test reward: 1215.501029323275
[06-22 04:05:15 MainThread @hw05-v1.py:356] Steps 500142, Test reward: -583.8527126745339
[06-22 04:11:09 MainThread @hw05-v1.py:356] Steps 510656, Test reward: 248.06814980818163
[06-22 04:17:18 MainThread @hw05-v1.py:356] Steps 520101, Test reward: -1228.63315862918
[06-22 04:24:47 MainThread @hw05-v1.py:356] Steps 530822, Test reward: -1038.9195499385048
[06-22 04:31:39 MainThread @hw05-v1.py:356] Steps 540109, Test reward: 1382.7097273907652
[06-22 04:38:16 MainThread @hw05-v1.py:356] Steps 550084, Test reward: 2064.9340855870105
[06-22 04:44:13 MainThread @hw05-v1.py:356] Steps 560552, Test reward: 1055.1614001539108
[06-22 04:51:40 MainThread @hw05-v1.py:356] Steps 570982, Test reward: 1771.8507640457142
[06-22 04:57:32 MainThread @hw05-v1.py:356] Steps 580095, Test reward: 2608.800004284003
[06-22 05:03:42 MainThread @hw05-v1.py:356] Steps 590487, Test reward: 631.4017390550938
[06-22 05:10:14 MainThread @hw05-v1.py:356] Steps 600374, Test reward: 2266.460702273034
[06-22 05:18:03 MainThread @hw05-v1.py:356] Steps 610729, Test reward: 4511.914807150148
[06-22 05:25:22 MainThread @hw05-v1.py:356] Steps 620127, Test reward: 4525.702096996129
[06-22 05:32:29 MainThread @hw05-v1.py:356] Steps 630417, Test reward: 3479.996635827585
[06-22 05:39:22 MainThread @hw05-v1.py:356] Steps 640241, Test reward: 2749.7036808522053
[06-22 05:46:21 MainThread @hw05-v1.py:356] Steps 650422, Test reward: 540.0892372505944
[06-22 05:53:06 MainThread @hw05-v1.py:356] Steps 660018, Test reward: 1702.7190072808062
[06-22 06:00:16 MainThread @hw05-v1.py:356] Steps 670097, Test reward: 1205.3174132364377
[06-22 06:06:34 MainThread @hw05-v1.py:356] Steps 680593, Test reward: 1293.2328526655856
[06-22 06:13:06 MainThread @hw05-v1.py:356] Steps 690311, Test reward: 711.3617338111256
[06-22 06:20:54 MainThread @hw05-v1.py:356] Steps 700521, Test reward: 2518.672060461336
[06-22 06:27:46 MainThread @hw05-v1.py:356] Steps 710975, Test reward: 1801.195629293735
[06-22 06:33:49 MainThread @hw05-v1.py:356] Steps 720653, Test reward: 4128.680008569094
[06-22 06:40:39 MainThread @hw05-v1.py:356] Steps 730385, Test reward: 934.868623497982
[06-22 06:47:33 MainThread @hw05-v1.py:356] Steps 740093, Test reward: 3769.571172090582
[06-22 06:54:47 MainThread @hw05-v1.py:356] Steps 750796, Test reward: 2447.307770675349
[06-22 07:02:10 MainThread @hw05-v1.py:356] Steps 760598, Test reward: 2388.5184709005052
[06-22 07:09:44 MainThread @hw05-v1.py:356] Steps 770427, Test reward: 205.72324360575766
[06-22 07:17:33 MainThread @hw05-v1.py:356] Steps 780442, Test reward: 2473.475961858184
[06-22 07:24:18 MainThread @hw05-v1.py:356] Steps 790331, Test reward: 2050.5617741942347
[06-22 07:31:15 MainThread @hw05-v1.py:356] Steps 800595, Test reward: 2363.1817245418033
[06-22 07:38:41 MainThread @hw05-v1.py:356] Steps 810581, Test reward: 1064.0672059738067
[06-22 07:44:26 MainThread @hw05-v1.py:356] Steps 820527, Test reward: 1665.8997688961886
[06-22 07:51:54 MainThread @hw05-v1.py:356] Steps 830456, Test reward: 2989.741171792259
[06-22 07:59:26 MainThread @hw05-v1.py:356] Steps 840261, Test reward: 2930.5262811949588
[06-22 08:07:06 MainThread @hw05-v1.py:356] Steps 850092, Test reward: 2009.7293739980814
[06-22 08:15:00 MainThread @hw05-v1.py:356] Steps 860505, Test reward: 342.18391374741907
[06-22 08:22:34 MainThread @hw05-v1.py:356] Steps 870714, Test reward: 2983.925080627444
[06-22 08:28:17 MainThread @hw05-v1.py:356] Steps 880178, Test reward: 303.0927531722144
[06-22 08:35:57 MainThread @hw05-v1.py:356] Steps 890527, Test reward: 1637.0179016084421
[06-22 08:42:07 MainThread @hw05-v1.py:356] Steps 900396, Test reward: 3215.678900727367
[06-22 08:48:59 MainThread @hw05-v1.py:356] Steps 910622, Test reward: 2547.35767856875
[06-22 08:55:26 MainThread @hw05-v1.py:356] Steps 920325, Test reward: 2880.6803233860423
[06-22 09:02:46 MainThread @hw05-v1.py:356] Steps 930680, Test reward: 4997.080447984204
[06-22 09:10:26 MainThread @hw05-v1.py:356] Steps 940864, Test reward: 4489.2711987065095
[06-22 09:17:22 MainThread @hw05-v1.py:356] Steps 950655, Test reward: 1740.018707105609
[06-22 09:24:47 MainThread @hw05-v1.py:356] Steps 960815, Test reward: 5324.7080512536295
[06-22 09:31:09 MainThread @hw05-v1.py:356] Steps 970269, Test reward: 2749.1778291508895
[06-22 09:37:55 MainThread @hw05-v1.py:356] Steps 980279, Test reward: 1578.198075709337
[06-22 09:44:53 MainThread @hw05-v1.py:356] Steps 990407, Test reward: 1508.8248968233936
[06-22 09:52:17 MainThread @hw05-v1.py:356] Steps 1000596, Test reward: 2285.78992725459
[06-22 09:53:06 MainThread @hw05-v1.py:380] Evaluate reward: 2193.44541582207


可见,对于这一组超参,SIZE为192的时候,收敛效果是最好的。
以上数据,仅供各位同学参考。


附上当时的超参。
(基本和老师的BASELINE一致。只调整了GAMMA。)

ACTOR_LR = 0.0002 # Actor网络更新的 learning rate
CRITIC_LR = 0.001 # Critic网络更新的 learning rate
GAMMA = 0.95 # reward 的衰减因子,一般取 0.9 到 0.999 不等
TAU = 0.001 # target_model 跟 model 同步参数 的 软更新参数
MEMORY_SIZE = 1e6 # replay memory的大小,越大越占用内存
MEMORY_WARMUP_SIZE = 1e4 # replay_memory 里需要预存一些经验数据,再从里面sample一个batch的经验让agent去learn
REWARD_SCALE = 0.01 # reward 的缩放因子
BATCH_SIZE = 256 # 每次给agent learn的数据数量,从replay memory随机里sample一批数据出来
TRAIN_TOTAL_STEPS = 1e6 # 总训练步数
TEST_EVERY_STEPS = 1e4 # 每个N步评估一下算法效果,每次评估5个episode求平均reward

0
收藏
回复
全部评论(2)
时间顺序
AIStudio810258
#2 回复于2020-06

有数据,有真相~~

加油!共同进步!

0
回复
aaaaaa
#3 回复于2020-06

建议写个公开项目,输出太长哈哈

0
回复
在@后输入用户全名并按空格结束,可艾特全站任一用户