Generalized swarm intelligence algorithms with domain-specific heuristics

Received Oct 24, 2020 Revised Jan 13, 2021 Accepted Feb 16, 2021 In recent years, hybrid approaches on population-based algorithms are more often applied in industrial settings. In this paper, we present the approach of a combination of universal, problem-free swarm intelligence (SI) algorithms with simple deterministic domain-specific heuristic algorithms. The approach focuses on improving efficiency by sharing the advantages of domainspecific heuristic and swarm algorithms. A heuristic algorithm helps take into account the specifics of the problem and effectively translate the positions of agents (particle, ant, bee) into the problem's solution. And a swarm algorithm provides an increase in the adaptability and efficiency of the approach due to stochastic and self-organized properties. We demonstrate this approach on two non-trivial optimization tasks: scheduling problem and finding the minimum distance between 3D isomers.


INTRODUCTION
Swarm intelligence algorithms are applied in almost every area of science and engineering, since in practice, they have shown their high efficiency in solving, first of all, complex real-life engineering optimization problems [1][2][3]. Complex optimization problems are understood as high-dimensionality problems with a large-scale complex topology of the solution search space, nonlinear criteria and constraints. Also, these problems may have more than one objective function, stochastic and dynamic properties. A scientist with an understanding of various swarm algorithms' mechanisms, their weak and strong sides can create hybrid algorithms, combining, as a rule, two swarm algorithms or swarm and evolutionary algorithms. It allows achieving a synergistic effect, increasing the overall efficiency of the resulting hybrid algorithm [4]. There are several directions for hybridization.
Each algorithm in the ensemble works according to its own rules. However, they use they use the general population of agents or exchange the best-found solutions, for example, a hybrid of the PSO and the genetic algorithm (GA) [5], PSO + differential evolution [6], PSO + grey wolf optimizer [7], PSO + biogeography-based optimization [8]. This approach does not require a change in the underlying algorithms. They work in parallel, exchanging data, or using a shared (public) population. As a result, the complexity of the algorithms does not increase; the basic algorithms can be easily replaced. Simultaneously, in such an approach, the synergistic effect may be small because of the weak integration of algorithms into the system. Adding a rule or stage of updating the positions of agents from another algorithm to the main algorithm. In [9], a mutation stage was added to the classical PSO algorithm at the end of each iteration, similar to that in GA; similarly, the mutation was added to the grey wolf optimizer in [10]; the study [11] also added a mutation stage to the PSO algorithm, but according to the principles of the differential evolution. Papers [12][13] describe PSO with an additional allowance for particle acceleration, calculated using formulas from the gravitational search algorithm. In this approach, a new algorithm is created, often using the principles of both swarm intelligence and evolutionary computation.
Hybrid algorithms with a higher degree of integration of the rules and principles of operation of two algorithms are also possible: FireFly + PSO algorithm [14], PSO + artificial bee colony algorithm [15], PSO + ant colony optimization (ACO) [16], PSO + wolf pack algorithm [17]. This method allows us to create an algorithm that considers problem's features being solved and compensates for the shortcomings of one algorithm by adding properties of another to it. Nevertheless, this significantly complicates the created algorithm for solving the problem and selecting the heuristic parameters' algorithm. In other words, the complexity of the hybrid algorithm exceeds the sum of the complexities of the parts that leave it.
Using meta-optimization when one algorithm performs the selection of parameters of another algorithm. For example, in [18], local unimodal sampling was used to select the values of the PSO algorithm parameters; in the article [19], the PSO parameters are selected using GA; in [20], the ACO parameters are selected using the bacterial foraging algorithm. This approach requires a very large expenditure of computational resources.
At each iteration of the algorithm, all agents' positions in the population or only the best agents are used as an initial approximation for the local search algorithm. In [21], the unstringing and stringing operator was used for the ACO with the local search; the use of various local search algorithms in PSO is discussed in [22]; FireFly + gradient descent is used in [23]; The PSO + lin-kernighan algorithm was considered in [24].
Swarm and evolutionary metaheuristic algorithms themselves are quite complex in understanding their work and fine-tuning to the features of the tasks being solved. Hybrids from different metaheuristic algorithms turn out to be even more complex and less universal. However, as shown in [17], the metaheuristic algorithm's efficiency can sometimes be increased on the contrary, by simplifying it. Perhaps that is why there are many works in which complex hybrid algorithms are successfully applied to solve a specific problem. Still, none of them has become as widespread as less complicated ACO and PSO among SI algorithms [4] or GA among evolutionary ones.
Our approach to solving optimization problems aims to increase SI algorithms' efficiency and versatility without increasing their complexity. We apply a combination of a universal, problem-free SI algorithm with simple deterministic domain-specific heuristic algorithms or, in other words, greedy heuristics. A hybrid of a stochastic universal SI algorithm and a deterministic simple domain-specific algorithm achieves a greater synergy effect than combining two swarm or evolutionary algorithms.

GENERALIZED SCHEME OF SI ALGORITHM AND APPLICATION APPROACH
To implement the proposed approach, it is necessary to make sure that it does not depend on the choice of a SI algorithm and a domain-specific heuristic algorithm. In this case, it will be quite versatile and flexible and will not require any modifications to the algorithms; only their interaction features will change. Therefore, it is necessary to define a generalized SI algorithm model so that various SI algorithms can be easily applied since it is impossible to say in advance which one will be better in a given task. Next, we need to define a general universal scheme of interaction between a SI algorithm and a domain-specific heuristic without strong dependencies.

Generalized SI algorithm model
An SI algorithm is specified by data structures and operations on them, like many other algorithms. SI uses the swarm, i.e., a population of agents as a basic data structure and rules of moving agents as an algorithm for transforming the data. A distinctive feature of the SI algorithms is using an indirect information exchange between agents. For example, the PSO uses the best-found position in the search space [25]; the ant colony optimization uses a pheromone graph [26]; the monkey search algorithm uses a weighted center of agents' positions [27]. It is possible to formulate basic parts of an SI algorithm: If the stop condition is satisfied, the algorithm needs to be finished or otherwise needs to be passed to Step 2. The saved best solution is the algorithm result.

Interaction between a SI algorithm and a domain-specific algorithm
The following interface and interaction steps were proposed for the interaction of an SI algorithm and a domain-specific heuristic algorithm. − The SI algorithm receives a dimension of the search space of an optimization task via input I. It is used as the length of the vector X. − The SI algorithm generates the variants of the problem solution by agents' movements. A count of variants is a count of agents. − Each position is sent to a domain-specific heuristic algorithm of an optimized system via output Opos. − The domain-specific heuristic algorithm encodes the position and uses it for solving an optimization problem and calculating the criterion values. Thus, the SI algorithm works as a meta-optimizer for the domain-specific algorithm. − All obtained criterion values are sent to the SI via I. − The SI algorithm receives criterion values and performs agents' movement, i.e., we go to the second step of this algorithm. When a stop-condition is satisfied, then the best agent position is transmitted to output O. The simplest stopping condition is the execution of a given number of iterations.
The interaction's listed steps provide the independence of used algorithms. The approach proposed allows us to implement different SI algorithms quickly. As far as changes in the optimization problem do not cause the necessity of changing anything in the SI algorithm implementation, and vice versa. The absence of close relationships makes it possible to combine any SI algorithm with domain-specific heuristics.

Mapping of agents' positions
In the proposed approach, we use the search space constraints from 0 to 1 for each dimension. The introduced restrictions are necessary for successful adaptation and simple integration of the algorithm for solving various optimization problems. The fact is that some SI parameters have a different effect depending on the scale of optimization variables. For example, in the firefly optimization algorithm [28], the distance between agents affects agents' attraction nonlinearly. Therefore, if a range of search space is from 0 to 100, the distance from the middle (50) to the high edge (100) will be 75. While if this distance is scaled from 0 to 1, then the same distance will be equal to 0.75, and the degree of attraction between the agents would be completely different. At the same time, this distance is equivalent to the optimization task. Also, the radius of the neighborhood region width of the bee's algorithm [29]. The larger the distance between the boundaries of a search space direction, the greater the neighborhood region width should be. It would be necessary for a non-homogeneous search space to introduce different values of the parameters for different directions.
As a result, the algorithm parameters would have to be changed even if a simple change of measurement units in the task (hours to seconds, and kilometers to feet). We suggest specifying the search space bounded from 0.0 to 1.0 in all directions into the algorithm to avoid this. And mapping of the agents' coordinates from the algorithm into the values of the optimized variables. In the simplest variant, this can be done as follows. We denote the coordinate vector of an agent X = {x1, x2, …, xL}, and the vector of optimized variables Q = {q1, q2, ..., qL}. Under the restrictions ai ≤ qi ≤ bi, we obtain a map of the form qi = xi(biai) + ai, i = 1, …, L.
A more sophisticated variant is shown in sections 3-4. The proposed parameter mapping is necessary to eliminate unnecessary relationships between the optimization problem model and the optimization algorithms.

. Application example. job-shop scheduling problem
This subsection introduces its own symbolic notation, which should not be confused with the notation in the descriptions of the SI algorithms (Section II). As mentioned above, a model of an optimization problem and the SI algorithms are completely independent. The job-shop scheduling problem is among the most challenging combinatorial optimization problems. Scheduling tasks may be characterized as one of the most significant optimization problems since plans and schedules need to be arranged in all fields. The problem consists of scheduling a set of jobs N on a set of machines M to minimize the processing time named as the makespan. It is the time needed to process all jobs. Each job includes a number of operations and has a predefined operation order. The order defines a specific machine and time interval for each operation and sequence of operation. Each machine can process only one operation at a time. It needs to find the shortest (quickest) schedule as a distribution of the operations to time intervals on the machines. This problem belongs to the NP-hard class [30].
The value of max(t(o) + p(o)) is called the makespan.

SI Algorithm with the greedy heuristic for job-shop scheduling problem
The following algorithm can be used as a simple greedy heuristic for solving the problem.
As a result, for each operation o  O, the start time t(o) will be determined, and the value of the optimality criterion will be equal to t(oL).
This greedy heuristic allows us to schedule activities to prioritizing more extended activities. The longer the execution time of the operation p(o), the more difficult it is to find a free machine for it; therefore, priority is given to more extended operations. The advantage of the algorithm is its simplicity and the ability to add additional restrictions on the relationship of operations to it. However, such an algorithm can give solutions that are far from optimal.
The algorithm's efficiency in terms of the criterion of the problem (minimization of the makespan) can be significantly increased due to the intelligent prioritization of operations. In this case, the same heuristic scheme can be applied only to order the operations not by their duration but with priorities, which can be determined using SI algorithms. For applying an SI algorithm to the job-shop scheduling problem, an agent position (vector X) is needed to map into a possible schedule. The vector X must contain as many elements as total operations in all jobs (L). The mapping process consists of several steps.  {o1, o2, o3, o4, o5, o6, o7, o8, o9}. After sorting:  OX = {{o4, 0.92}, {o2, 0.72}, {o3, 0.57}, {o5, 0.43}, {o8, 0.4}, {o9, 0.3}, {o6, 0.21}, {o7, 0.12}, {o1, 0.11}}. The operation order obtained cannot be used as a schedule since it is possible to violate the sequence of operations within one job.
The vector O * does not exactly mean the order of the starting of the stages. It means that operations are extracted from O in the order in which they are located in O * . For the extracted operation, the moment of the fastest possible start is determined. Thus, the vector O * does not specify the order of processing operations, but the order of determining the start time for them. Since the time costs for performing each operation are assumed to be constant, the vector O * uniquely determines the final schedule. Therefore, the vector X is uniquely translated into the required discrete schedule. The advantage of the above scheme is obtaining an allowable schedule for any values of the vector X.

Experimental results
The PSO algorithm was used to solve the set of job-shop problem instances [33] denoted as LA01-LA21 of various sizes provided by OR-Library [34] (LA because the author is S. Lawrence). The sizes of instances are from 10×5 to 15×10 (n×m). Table 1 shows the results (average and the best solutions from ten runs of the algorithm). Also, Table 1 list the best well-known results from the literature [34]. The values of the PSO parameters: population size 50, α1 = 1.76, α2 = 1.38, ω = 0.73, β = 0.28 [18].
The summary deviation between the best and well-known results is 135 hours (0.71%) using the PSO. The results are close to the best (deviation < 1%) for most tasks (81%). Thus, the PSO algorithm allows solving job-shop scheduling problems very close to the best-known results. Moreover, the PSO algorithm is high-speed, since only 100 PSO iterations were used. Increasing the iteration number can give better results for a longer time. The largest time of solving one instance was about 0.5 seconds using 2.4 GHz Intel CPU i7. The minimum time was 0.08 seconds. Such variation is explained by the different dimensions of the instances.

Application example. exploring potential energy surface of nanocluster isomers
This subsection discusses two related problems. The first problem is the local minima search on the potential energy surface, which are isomers for a cluster of a given composition. The second problem is calculating the «reaction path» as the energetically most favorable path of the phase point movement between two known local minima. The lower the minimum, the more stable corresponding configuration of atomic nuclei. There are specialized software solutions in the field of bioinformatics for exploring ligand binding/unbinding pathway [35]. Isomers are two (or more) molecules (nanoclusters) with the same molecular formula, i.e., atomic composition. Isomers are divided into two main categories. Structural isomers (or constitutional isomer) have the same formula, but the atoms are bonded together in different orders. Stereoisomers have the same connectivity but the different spatial arrangements [36]. Isomers can have very different physical properties, such as boiling point, melting point, and chemical reactivity. In metal systems, the properties of stereoisomers, as a rule, do not differ.
The act of a structural phase transition in a nanocluster is a mutual geometric rearrangement of atoms occurring in time [37]. Accordingly, the cluster's energy and its physicochemical properties substantially depend on the geometric arrangement of the atoms. Even insignificant changes in the spatial structure of a cluster can lead to rather large changes in its total energy. The initial geometry's suitable choice is crucial for obtaining reliable, calculated values of a cluster's physicochemical characteristics in a stationary (equilibrium) state. For a successfully calculated geometry, changes in energy values during the optimization process should be insignificant (less than a given value). Also, for an optimized geometry, the first energy derivatives (gradient) with respect to all geometric distortions should be close to zero. It suggests that we are on a flat energy surface. When considering the broader features of the potential energy surface (PES), including topology and topography, it has become usual to refer to the PES as the «potential energy landscape» [38].
Energy surfaces and landscapes hold the key to understanding a wide range of molecular phenomena. To construct the PES, it is convenient to use the so-called internal coordinates, which include bond lengths or other interatomic distances, bond and dihedral angles of a cluster/molecule. The number of 3n-6 (where n is the number of atomic nuclei) of independent coordinates should include those structural parameters that change most dramatically during the transformation under investigation. The paper [39] describes a potential energy surface in terms of local minima and the transition states that connect them provides a conceptual and computational framework for understanding and predicting observable properties.
A potential barrier to structural transformation is the energy of the transition state (saddle point) relative to the initial minimum of the potential energy surface. It is the potential barrier that determines the rate of the process of structural transformation. In the Arrhenius equation, these are the activation energies ЕА: k = k0exp(-ЕА / kBT); ЕА is the energy activation, k0 is the pre-exponential factor for the reaction, kB is the Boltzmann constant, T is the absolute temperature (in Kelvins).
To construct a minimum energy path between two known constitutional isomers, an efficient solution to the assignment problem in a bipartite graph is required. Each atom of isomer A (reagent) is associated with one and only one atom of isomer B (product). The problem can be solved as a global optimization problem. Applying traditional numerical methods is impractical because they need huge amounts of computational resources like time and memory [40].

Greedy heuristic
Thus, the search for isomers corresponding to local minima and the construction of the minimum energy path between them are the basic stages of the PES construction process. It is necessary to find the permutation function P, which would associate with each atom a i from A (i = 1, …, n) one and only one atom bp(i) from B, so that root mead squared distance, RMSD(f)=(Σi n =1|aibj * | 2 ) 1/2 →min, bi * =bP(i) Additional restrictions are imposed on collisions of the transition paths of atoms. The minimum distance between the centers of the vectors connecting the atoms ai and bP(i) (i = 1, …, n) must be no less than cmin = cdistmin(min_disance_A, min_disance_B), where cdist is an empirical coefficient (~0.8), min_disance_X the minimum distance among all pairs of atoms of isomer X (A or B). This limitation, high dimensionality and the presence of several types of atoms do not allow the existing methods for solving assignment problems without their significant modification.
We combine a straightforward greedy heuristic algorithm and the universal PSO algorithms. The heuristic used involves sequentially sorting the atoms of isomer B and matching each of them with the nearest atom from A that has not yet been matched. This heuristic can be written: Input: A, B, n, L (L list of numbers from 1 to n without repetitions) 1. tabu ← {}. 2. For i = 1 … n; j = 1 … n: Dij ← |ai -bj| 2 (for atoms of different sorts, the distance is infinite) Step 3.2 means that Li atom of B goes over to the position of the j-th atom of A.
The resulting table function P is the desired permutation. The advantage of such heuristic is a very high speed of operation so the solution is obtained in one pass through the atoms of isomer B. If the phase structure of two isomers is close and isomer B is not rotated relative to isomer A, this solution method allows to find the optimal solution regardless of the number of atoms and number sorts of atoms. But it is not suitable for more complex problems since at the very first iterations of the work, it can match atoms of isomer B to such atoms of isomer A, which, if optimally solved, should be matched with other atoms of isomer B.

SI algorithm with the greedy heuristic
The resulting algorithm can be written: The developed method has been tested on some isomer configurations from 7 to 39 atoms. Table 2 shows the values of the RMSD obtained by the greedy heuristic only and by the PSO with the greedy heuristic. The mathematical modeling confirmed that the combination of greedy heuristics based on the physical properties of the problem and SI algorithms significantly improved the results regarding applying these methods separately.

CONCLUSION
In this article, a hybrid approach is proposed utilizing the strengths of SI and domain-specific heuristic algorithms. The main idea behind developing is: (a) to help the SI algorithm to take into account the specifics of the solved problem thanks to the domain-specific algorithm and (b) to improve the efficiency of the domain-specific algorithm by introducing stochastic and self-organized properties from the SI algorithm. Different SI algorithms can be easily applied due to SI algorithm's low coupling and the domain-specific heuristic algorithm. The approach's application is demonstrated on the job-shop scheduling problem and the problem of construction of the minimum energy path between two isomers. The experiments confirmed that the proposed approach allows the use of SI algorithms as a meta-optimizer that increases domain-specific heuristic algorithms' efficiency.