Skip to content
Snippets Groups Projects
Commit 023d2963 authored by Aritz's avatar Aritz
Browse files

Update README.md

parent b68c0d07
No related branches found
No related tags found
No related merge requests found
> Paper pending acceptance at WCCI'20
# Simultaneously Evolving Deep Reinforcement Learning Models via Multifactorial Optimization
## Abstract
In the recent years, Multifactorial Optimization (MFO) has attracted a lot of interest in the optimization community. MFO is known for its inherent skills to address multiple complex optimization tasks at the same time, while inter-task information transfer is used to improve their convergence speed. These skills make Multifactorial Evolution appealing to be applied to evolve Deep Reinforcement Learning (DQL) models, which is the scenario tackled in this paper. Complex DQL models usually find difficult to converge to optimal solutions, due to the lack of exploration or sparse rewards. In order to overcome these drawbacks, pre-trained models are commonly used to make Transfer Learning, transferring knowledge from the pre-trained to the target domain. Besides, it has been shown that the lack of exploration can be reduced by using meta-heuristic optimization approaches. In this paper we aim to explore the use of the MFO framework to optimize DQL models, making an analysis between MFO and the traditional Transfer Learning and metaheuristic approaches in terms of convergence, speed and policy quality.
......@@ -5,7 +8,7 @@ In the recent years, Multifactorial Optimization (MFO) has attracted a lot of in
Source code of MFEA used in the paper: [link](https://github.com/HuangLingYu96/MFEA)
## Codification
<div align="center">
<img src="/uploads/19a99a34f98d7d915f6b7082ef83ca5a/Codification.png" width="500" height="300" title="Codification schema">
<img src="/uploads/6f3a6b1a35bf53a8cdc94ab86a6a3883/Codification.png" width="500" height="300" title="Codification schema">
</div>
......@@ -16,9 +19,9 @@ In this work we experiment with MFEA and it is used to train/evolve DQL networks
| Evolution | Model Performance |
:-------------------------:|:-------------------------:
<img src="/uploads/025f302667cbb45f300790cd1b6486db/1Cartpole.png" width="550" height="300" title="Cartpole"/> | <img src="/uploads/9d0fc2d90b4fbf5cd175d885820de73b/cartpole.gif" width="350" height="250" title="Cartpole"/>
<img src="/uploads/fad775bbbd469e5a7efed2bd30b1caf2/1Acrobot.png" width="550" height="300" title="Acrobot"/> | <img src="/uploads/7e6d535ce7000ed1929309b7b9fd3a50/acrobot.gif" width="250" height="250" title="Acrobot"/>
<img src="/uploads/64a25495d7dded22b96ec959963a7620/1Pendulum.png" width="550" height="300" title="Pendulum"> | <img src="/uploads/2f5ee3bf4abd7acac2a6a863f50b1b89/pendulum.gif" width="250" height="250" title="Pendulum">
<img src="/uploads/15a151b7299ddca5166fe29b3f38f6e9/1Cartpole.png" width="550" height="300" title="Cartpole"/> | <img src="/uploads/55e5e56c801ea181604965edf19302d8/cartpole.gif" width="350" height="250" title="Cartpole"/>
<img src="/uploads/1f55f9240b1078aa0b58b69e1e8c2ff9/1Acrobot.png" width="550" height="300" title="Acrobot"/> | <img src="/uploads/1ce54972bb9223fad1963a2e90f8c796/acrobot.gif" width="250" height="250" title="Acrobot"/>
<img src="/uploads/39372523a1c631f9cf22457063649029/1Pendulum.png" width="550" height="300" title="Pendulum"> | <img src="/uploads/486666ff961f3103bc4bf77f3c2de1fd/pendulum.gif" width="250" height="250" title="Pendulum">
......@@ -28,16 +31,16 @@ Then, the multi-environment evolution skills are tested:
| Evolution | Result |
:-------------:| :-----:
| <img src="/uploads/6172d9b8a257b3eb07052433812fcb61/cartpole_all.png" width="550" height="350" title="Cartpole(v0-3)"> | <img src="/uploads/0e4ecbc0e7d3b99be2df024bd86d4469/cartpole4.gif" width="350" height="250" title="Cartpole(v0-3)"> |
| <img src="/uploads/3e4777db8f6877759f77d3589d9c1ced/acrobot_all.png" width="550" height="350" title="Acrobot(v0-3)"> | <img src="/uploads/a7c6a3467d8c70d8d6410104a2d3126a/acrobot4.gif" width="250" height="250" title="Acrobot(v0-3)"> |
| <img src="/uploads/4079af76a88d6537f85ce46632b7ee61/pendulum_all.png" width="550" height="350" title="Pendulum(v0-3)"> | <img src="/uploads/a6036bbc452c960cfc06f0f5a9ea59a9/pendulum4.gif" width="250" height="250" title="Pendulum(v0-3)"> |
| <img src="/uploads/3c06e797826f895c0c8525b9e8b2a28d/cartpole_all.png" width="550" height="350" title="Cartpole(v0-3)"> | <img src="/uploads/e803e57687b90b8ca44079a43c403ccf/cartpole4.gif" width="350" height="250" title="Cartpole(v0-3)"> |
| <img src="/uploads/3bd629190f948c199744b1b30c8f1c7f/acrobot_all.png" width="550" height="350" title="Acrobot(v0-3)"> | <img src="/uploads/092bb71181be6260a0e16140ac9b1ad5/acrobot4.gif" width="250" height="250" title="Acrobot(v0-3)"> |
| <img src="/uploads/ac0fcbcc0cc6a1377306c78315e5b0e6/pendulum_all.png" width="550" height="350" title="Pendulum(v0-3)"> | <img src="/uploads/0bafa527b803585d280457161853a0f8/pendulum4.gif" width="250" height="250" title="Pendulum(v0-3)"> |
MFEA is able to evolve multiple scenarios with the codification proposed and good results are achieved. In scenarios like *Pendulum* it finds more difficult to converge and so, worst results are harvested.
Finally, the effectiveness of the crossovers is studied. The knowledge transference in MFEA is done via this mechanism, thus, it is relevant to check its effectiveness:
<div align="center">
<img src="/uploads/6de521b5bfe0be2efdabdce8b22cfe1d/crossover_matrix.png" width="500" height="375">
<img src="/uploads/69ea886ead6db1be6583482ddaa117e2/crossover_matrix.png" width="500" height="375">
</div>
## Conclusions
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment