Reward-based selection can be used within Multi-armed bandit framework for Multi-objective optimization to obtain a better approximation of the Pareto front.
[1]
The newborn
and its parents receive a reward
, if
was selected for new population
, otherwise the reward is zero.
Several reward definitions are possible:
- 1.
, if the newborn individual
was selected for new population
.
- 2.
, where
is the rank of newly inserted individual in the population of
individuals. Rank can be computed using a well-known non-dominated sorting procedure.[2]
- 3.
, where
is the hypervolume indicator contribution of the individual
to the population
. The reward
if the newly inserted individual improves the quality of the population, which is measured as its hypervolume contribution in the objective space.
- 4. A relaxation of the above reward, involving a rank-based penalization for points for
-th dominated Pareto front: 
Reward-based selection can quickly identify the most fruitful directions of search by maximizing the cumulative reward of individuals.