DC Field | Value | Language |
---|---|---|
dc.contributor.author | 김한결 | - |
dc.date.accessioned | 2023-08-31T16:35:28Z | - |
dc.date.available | 2023-08-31T16:35:28Z | - |
dc.date.issued | 2023 | - |
dc.identifier.other | OAK-2015-10227 | - |
dc.identifier.uri | http://postech.dcollection.net/common/orgView/200000690720 | ko_KR |
dc.identifier.uri | https://oasis.postech.ac.kr/handle/2014.oak/118424 | - |
dc.description | Master | - |
dc.description.abstract | Behavior cloning (BC) has been considered as a practical policy constraint to alleviate the value overestimation problem from out-of-distribution (OOD) actions in the offline reinforcement learning (RL) setting. However, it has been reported that BC often suffers from insignificant policy update due to low-quality data. To overcome this problem, this paper proposes a data-selective approach to prescreen favorable data in advance before learning a policy. Positive advantage-related data is first selected to exploit the advantage function and is then applied to advantage-weighted method to further refine the policy. Finally, we present a new RL+BC algorithm, which combines RL with the proposed method and, practically, some implementation techniques are suggested to resolve the quality-quantity dilemma. The proposed algorithm outperforms the state-of-the-art algorithms on continuous control offline RL benchmark. | - |
dc.language | eng | - |
dc.publisher | 포항공과대학교 | - |
dc.title | Data-selective Advantage-weighted Method for Offline Reinforcement Learning | - |
dc.type | Thesis | - |
dc.contributor.college | IT융합공학과 | - |
dc.date.degree | 2023- 8 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
library@postech.ac.kr Tel: 054-279-2548
Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.