Citing A-MFEA-RL
Aritz D. Martinez, Javier Del Ser, Eneko Osaba and Francisco Herrera, Adaptive Multi-factorial Evolutionary Optimization for Multi-task Reinforcement Learning, 2020.
A-MFEA-RL: Adaptive Multi-factorial Evolutionary Optimization for Multi-task Reinforcement Learning
(ABSTRACT) Evolutionary Computation has largely exhibited its potential to replace conventional learning algorithms in a manifold of Machine Learning tasks, especially those related to unsupervised (clustering) and supervised learning. It has not been until lately when the computational efficiency of evolutionary solvers has been put in prospective for training Reinforcement Learning (RL) models. However, most studies framed in this context so far have considered environments and tasks conceived in isolation, without any exchange of knowledge among related tasks. In this manuscript we present A-MFEA-RL, an adaptive version of the well-known MFEA algorithm whose search and inheritance operators are tailored for multitask RL environments. Specifically, our A-MFEA-RL approach includes crossover and inheritance mechanisms for refining the exchange of genetic material that rely on the multi-layered structure of modern Deep Learning based RL models. In order to assess the performance of the proposed evolutionary multitasking approach, we design an extensive experimental setup comprising different multitask RL environments of varying levels of complexity, comparing them to those furnished by alternative non-evolutionary multitask RL approaches. As concluded from the discussion of the obtained results, A-MFEA-RL not only achieves competitive success rates over the tasks being simultaneously solved, but also fosters the exchange of knowledge among tasks that could be intuitively expected to keep a degree of synergistic relationship.
In the framework, a reformulation of the well-known MFEA/MFEA-II algorithms is introduced. The algorithm is thought so that Multifactorial Optimization can be applied to train neural networks taking advantage of inter-task similarities bi mimicking the traditional Model-based Transfer Learning procedure. The adaptation is carried out by means of three crucial points:
- Design of the unified space towards favoring model-based Transfer Learning: specifically, aspects such as the neural network architecture, the number of neurons of each layer, and the presence of shared layers among models evolved for each task are taken into account.
- Adapted crossover operator: the crossover operator must support the previous aspects by preventing neural models from exchanging irrelevant information.
- Layer-based Transfer Learning: unlike in traditional means to implement Transfer Learning, the number of layers to be transferred between models evolved for different tasks is autonomously decided by A-MFEA-RL during the search.
The code works on top of . The experimentation carried out considers three scenarios; TOY, MT-10/MT-10-R and MT-50/MT-50-R (Results included in Results Section ), R denotes random initialized episodes as in the next image:
MT-10-R results
Running the experimentation
It is recommended to use the conda environment provided with the code (mujoco36.yml) for ease:
conda env create -f mujoco36.yml
conda activate mujoco36
A-MFEA-RL depends on Metaworld and (license required). To install Metaworld please follow the instructions in the
or run:
pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld
The experimentation can be replicated by running the RUN_ALL.sh
. In order to run experiments independently:
python3 exp.py -exp INT -t INT -p STR
-
-exp
: Integer. 0 = TOY, 1 = MT-10/MT-10-R, 2 = MT-50/MT-50-R. -
-t
: Integer. Number of threads used by Ray. -
-p
: STRING. Name of the folder undersummary
where results are saved.
Results
MT-10 | MT-10-R | MT-50 | MT-50-R | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Environment name (complexity) | A | B | C | A | B | C | A | B | C | A | B | C |
assembly (H) | - | - | - | - | - | - | 0 | 0 | 0 | 0 | 0 | 0 |
basketball (H) | - | - | - | - | - | - | 0 | 0 | 0 | 22 | 33 | 0 |
bin-picking (H) | - | - | - | - | - | - | 0 | 0 | 0 | 0 | 0 | 11 |
box-close (H) | - | - | - | - | - | - | 44 | 44 | 0 | 22 | 33 | 0 |
button-press-topdown (M) | 100 | 100 | 100 | 100 | 89 | 91 | 100 | 100 | 100 | 100 | 100 | 97 |
button-press-topdown-wall (H) | - | - | - | - | - | - | 67 | 78 | 100 | 67 | 100 | 100 |
button-press (M) | - | - | - | - | - | - | 44 | 67 | 100 | 44 | 55 | 100 |
button-press-wall (H) | - | - | - | - | - | - | 100 | 100 | 100 | 100 | 100 | 98 |
coffee-button (H) | - | - | - | - | - | - | 44 | 78 | 100 | 56 | 89 | 100 |
coffee-pull (M) | - | - | - | - | - | - | 78 | 100 | 0 | 100 | 100 | 70 |
coffee-push (M) | - | - | - | - | - | - | 78 | 89 | 100 | 89 | 89 | 40 |
dial-turn (H) | - | - | - | - | - | - | 100 | 100 | 100 | 100 | 100 | 99 |
disassemble (H) | - | - | - | - | - | - | 0 | 0 | 0 | 0 | 0 | 0 |
door-close (H) | - | - | - | - | - | - | 78 | 56 | 100 | 78 | 55 | 100 |
door-lock (H) | - | - | - | - | - | - | 89 | 100 | 100 | 89 | 89 | 100 |
door-open (H) | 100 | 33 | 100 | 100 | 100 | 100 | 78 | 67 | 100 | 67 | 67 | 100 |
door-unlock (M) | - | - | - | - | - | - | 78 | 89 | 100 | 89 | 100 | 100 |
drawer-close (H) | 100 | 100 | 100 | 100 | 100 | 100 | 79 | 89 | 100 | 67 | 78 | 100 |
drawer-open (H) | 0 | 33 | 100 | 33 | 0 | 99 | 22 | 33 | 100 | 22 | 44 | 98 |
faucet-close (M) | - | - | - | - | - | - | 100 | 67 | 100 | 78 | 44 | 81 |
faucet-open (M) | - | - | - | - | - | - | 89 | 89 | 100 | 89 | 67 | 91 |
hammer (H) | - | - | - | - | - | - | 33 | 56 | 100 | 11 | 67 | 100 |
hand-insert (M) | - | - | - | - | - | - | 100 | 100 | 100 | 100 | 100 | 100 |
handle-press-side (H) | - | - | - | - | - | - | 0 | 11 | 100 | 100 | 33 | 40 |
handle-press (H) | - | - | - | - | - | - | 89 | 78 | 60 | 100 | 78 | 35 |
handle-pull-side (H) | - | - | - | - | - | - | 56 | 67 | 0 | 56 | 89 | 0 |
handle-pull (H) | - | - | - | - | - | - | 89 | 100 | 0 | 78 | 100 | 0 |
lever-pull (M) | - | - | - | - | - | - | 0 | 0 | 0 | 0 | 0 | 0 |
peg-insert-side (H) | 67 | 33 | 0 | 56 | 56 | 0 | 0 | 22 | 0 | 44 | 33 | 0 |
peg-unplug-side (H) | - | - | - | - | - | - | 100 | 100 | 0 | 100 | 100 | 0 |
pick-out-of-hole (H) | - | - | - | - | - | - | 0 | 0 | 0 | 0 | 0 | 0 |
pick-place (H) | 66 | 100 | 0 | 0 | 0 | 0 | 44 | 11 | 0 | 33 | 11 | 0 |
pick-place-wall (H) | - | - | - | - | - | - | 44 | 33 | 0 | 33 | 0 | 10 |
plate-slide-back-side (M) | - | - | - | - | - | - | 100 | 89 | 40 | 78 | 89 | 45 |
plate-slide-back (M) | - | - | - | - | - | - | 67 | 89 | 100 | 89 | 100 | 58 |
plate-slide-side (M) | - | - | - | - | - | - | 100 | 89 | 100 | 55 | 100 | 100 |
plate-slide (M) | - | - | - | - | - | - | 33 | 100 | 100 | 78 | 78 | 77 |
push-back (E) | - | - | - | - | - | - | 89 | 100 | 0 | 89 | 100 | 71 |
push (E) | 100 | 100 | 100 | 78 | 67 | 59 | 44 | 89 | 100 | 78 | 33 | 47 |
push-wall (M) | - | - | - | - | - | - | 56 | 33 | 100 | 55 | 44 | 47 |
reach (E) | 100 | 100 | 100 | 100 | 100 | 91 | 100 | 100 | 100 | 100 | 100 | 98 |
reach-wall (E) | - | - | - | - | - | - | 100 | 100 | 100 | 100 | 100 | 98 |
shelf-place (H) | - | - | - | - | - | - | 0 | 0 | 0 | 44 | 55 | 0 |
soccer (E) | - | - | - | - | - | - | 67 | 78 | 0 | 55 | 33 | 48 |
stick-pull (H) | - | - | - | - | - | - | 11 | 33 | 0 | 11 | 44 | 79 |
stick-push (H) | - | - | - | - | - | - | 0 | 0 | 0 | 11 | 0 | 100 |
sweep-into (E) | - | - | - | - | - | - | 100 | 78 | 100 | 67 | 89 | 80 |
sweep (E) | - | - | - | - | - | - | 100 | 89 | 100 | 100 | 67 | 74 |
window-close (H) | 33 | 33 | 100 | 100 | 78 | 100 | 67 | 44 | 100 | 89 | 44 | 100 |
window-open (H) | 67 | 100 | 100 | 78 | 89 | 99 | 11 | 67 | 100 | 44 | 78 | 93 |
Average success rate | 73.3 | 73.2 | 80.0 | 74.5 | 67.9 | 73.9 | 57.3 | 62.0 | 60.0 | 61.5 | 62.1 | 59.7 |