Efficient fault tolerant cost optimized approach for scientific workflow via optimal replication technique within cloud computing ecosystem

ABSTRACT


INTRODUCTION
The use of "pay as you go" is an option with cloud computing services, which are regarded as the most efficient commercial framework for computation by providing consumers with both a computing platform and computing resources.Additionally, this virtual computing paradigm gives users the freedom to present providers with their quality of services (QoS) requirements [1]- [5].Additionally, recent advances in cloud computing (CC) have prompted a significant expansion of workflow applications across a number of disciplines, including astrophysics along with astronomy as well as bioinformatics, in order to assess applications in light of CC platforms.Additionally, the CC-properties prototypes that includes dynamic resource allocation as well as storage resources.Further, these features can be taken advantage via effective scheduling, which addresses the given specific issues covered later within the segment to perform the efficient system performance [6]- [9]. Figure 1 shows the directed acyclic graph (DAG) model.The main aim for workflow scheduling tends to maximize the given heterogeneous cloud paradigm; in this context, users concentrate for the QoS that includes the charge and deadline execution, when submitting the workflow requirements.Additionally, the growing need for computation as well as services in task scheduling ISSN: 2252-8938  Efficient fault tolerant cost optimized approach for … (Asma Anjum) 123 applications brings with it issues with energy consumption, deadline pressure, time-frame minimization, and cost cutting.Therefore, DAG is used to model workflows; DAG is a pipeline model where the given node is defined as task as well as the edge provides the link between the tasks [10]- [12].
Figure 1.DAG model The epigenetic condition of cells in human being is mapped by the epigenomics method, a highly pipelined biology application.The majority of the chores have minimal I/O as well as high CPU consumption.In order to detect gravitational waves, the laser interferometer gravitational-wave observatory (LIGO) workflow uses many CPU-intensive tasks that demand a lot of memory.Using the Pegasus project's generator, we may create workflows with a variety of job counts, and these processes are present in DAX format (DAG in XML).The DAX tasks' completing times depends on Intel Core 2 quad-core processor running at 2.4 GHz, which has a processing power of roughly equal to 8 ECUs (2.33 4/1.2 8).We take into account three sizes for each of these workflows: Small comprises 50 jobs, Medium comprises 200 tasks, as well as Large comprises 1,000 tasks.Additionally, 20 distinct instances with the similar structure but varying communication and compute workloads are constructed for each size.Researchers that are interested in green computing and want to reduce costs generally develop energy-aware scheduling; most likely, dynamic voltage and frequency scaling (DVFS) is utilized as the mechanism to reduce energy.Energy-aware mechanisms, however, neglect fault tolerance and cost in favor of just minimizing energy use.To handle enormous data on clouds, scientific workflows need many resources.Real-time cloud services also demand a variety of computational capabilities, which raises the risk of transitory failure.Additionally, a growth in failures and complexity has a negative impact on resource management, leading to QoS issues, particularly with regard to dependability requirements [13]- [15].While serving the requests of the clients, the servers have to balance the load of the requests from several clients.When the servers are clustered, the main original server is being scaled out horizontally.Moreover, Cloud provides various service; infrastructure service is one of the popular and usable resources that provides the capabilities to re-release or pre-configure the virtual machines from the cloud infrastructure.Moreover, diverse requirement of computing can be fulfilled through the cloud computing as it provides the on demand scalable service including resource and platform [16]- [19].Additionally, cost minimization is crucial because an ideal cost reflects the effectiveness of the model [20], [21].The proposed fault-tolerant technique is very efficient way to enhance the dependability of any workflow as well as replication that is primary backup, which seems to be one of the vital software regarding fault-tolerant methods that is applied to meet the given reliability needs.Those fault-tolerant techniques, which already exist, applies either fixed backup for every primary to tolerate simultaneous failures depending as per the active replication method.This may meet the reliability requirements but can lead to unnecessary redundancy and cost, or apply one backup in each main to accept one fault depending on the passive replication scheme, which cannot resist potentially numerous failures [22]- [24].The following factors further emphasize the value of research work.a.For meeting the reliability requirements, here we establish a design and build a fault-tolerant as well as effective mechanism.The proposed methodology is called optimal replication technique with fault tolerance and cost minimization (ORT-FTC).b.ORT-FTC employs an iterative selection process to choose virtual machines and attainable duplicates with the least amount of redundancy. .The cost parameter is taken into account while evaluating ORT-FTC; typically, as the virtual machine crashes and more resources are required to deal with the failure, leading to an increase the cost.d.In addition, scientific process is considered to demonstrate the effectiveness of paradigm, and four instances with a specific number of virtual machines are constructed to test the model.e.A comparison is made, and the proposed methodology shows to be more effective than the current paradigm.

LITERATURE SURVEY
One of the key study areas in cloud computing is the scheduling and minimization of scientific workflows.Several other topics have been investigated, including the exploration of diverse workloads, workflows, platforms, and scheduling mechanisms.There are several cases such as makespan as well as energy consumption, cost or reliability and the multi-objective for all minimization targets, this section concentrates on many pertinent jobs related to fault tolerance along with cost minimization.A powerful technique against missing information was developed in [22], where it was recognized that workflow scheduling is now more difficult in the presence of potential resource failure.A failure-aware technique was also suggested in [23] utilizing a Markov chain-based resource availability methodology.The model, however, had a significant degree of dependency, thus [24] created a second dependent model with replication strategy and additional schema included in situation of further failures.Given that resource allocation in clouds is highly reliant, Tao et al. [25] presented work queue using replication, or world quality report (WQR), to address the substantial performance penalties that occur from this.Fault tolerance is typically achieved using one of two distinct techniques, either passive replication or active replication [26], [27].Additionally, the fault tolerance approach can be thought of as a cost-saving improvement to reliability [28]- [31].
When the primary replica fails, passive replication typically refers to the backup replica that is conducted while taking into consideration the virtual machine; therefore, passive replication acceptance through backup alone is rather difficult [32], [33].Additionally, the cost of adopting passive replication is higher, and redundancy caused by replication is uncertain.The primary replication is reproduced a predetermined number of occasions in every virtual machine in which the task can be effectively finished in the event of active replication.The MaxRe algorithm was first presented in [34], and it generates both cost and redundancy.Here, the reliability requirements for each task are comparable, and each task's need is mathematically equivalent to the defined reliability requirement's square root, where n is assumed to be the given workflow.In contrast to the previous mechanism, Zhao et al. [28] offers an optimal resources methodology for confirmation of reliability requirement which minimizes reliability demands, but it does so at a high expense in terms of reduced reliability requirements as well as redundancy.To further apply quantitative-based replication, Zhao et al. [28] proposed the engine room resource management (ERRM) technique, which uses an iterative-based approach.For smaller workflows, this model does provide low-cost redundancy minimization, but when applied to large workflows; it fails terribly, making it a bad efficiency model.Despite the fact that evolutionary as well as multi objective techniques and energy utilization are not addressed, their testing results demonstrate that their method greatly minimizes failure events when compared to other existing load balance techniques [35]- [37].Improved non-dominated sorting genetic algorithm (NSGA-III) algorithm with the addition of an intelligent fault tolerance mechanism to maximize system usage.

PROPOSED METHODOLOGY
Because of the huge number of servers and components that are loaded down with workloads, cloud computing generally has significant failure rates.Additionally, these failures might limit the availability of virtual machines, but this problem can be resolved by using the best fault tolerance method.As a result, this component of the study shows the mathematical model of the suggested ORT-FTC methodology, this seeks to provide the best dependability requirement and further reduce costs.

Preliminaries
Let's have a look at a certain setup of virtual machines, represented by the variable G, where  = { 0 ,  1 … . .,   }; additionally, this configuration includes many parameters, such as cost as well as memory and the total count of virtual machines along with memory.Additionally, when taking into account the virtual machine set, a virtual machine occurrence can be designed by applying the variable W, this W can be mentioned as  = { 0 ,  1 , … … ,  ℎ }, where  ℎ directly involves the examples of virtual machine with a particular configuration.It is also important to keep in mind that ORT-FTC is made for a machine that uses parallel processing.

Workflow modelling
Consider a complex, scientific, and large workflow model including variable A, that is further explained as A=(X,G), where both the semi-variables represent given task set and their dependencies.The dependencies are frequently seen in complex, scientific, as well as large workflow models.Additionally, we initialize a few other factors related to the task, such as (), (  ), (  ) and (  ) ;for example, if the task from v is completed in consideration of the resources w, then the workflow expenses model might be mathematically calculated by the following as displayed in (1).In addition, bandwidth resources can be designed as, mentioned in (2).

Task modelling with respect to fault tolerance
Let us suppose, every task as per the set of tasks is assigned the similar timeframe to be completed.Along with the interval parameter of the given task may be computed mathematically as displayed in the following equation, where n p is the ideal fault tolerance level and i p is the frequency of operation.Mathematical modelling is given in (3).The scenario is optimum when there is never a failure because of the equation, which can be given as, in (4).Furthermore, assuming task w x is allocated to resource V l, ideal execution can be defined as the following equation, where Q(  ,   )includes the total number of jobs as well as Q l as overhead as displayed in (5).In the meanwhile, here we compute the length of interval through (6).
(  ,   ) = ((  )) ((  )) −1 .((  ,   ) + 1) −1 (6) After the best case has been created, it is crucial to create the worst scenario, where the most errors will occur with the task and virtual machine that have already been specified; moreover, (  ,   ) implies the entire overhead, and the fault tolerance overhead will be given by (  ,   ) as shown in (7).Therefore, intervals are optimized and their optimality is determined using the (8) for optimizing worst-case scenario.Additionally, error probability is created in order to accept the defect, further defining the task reliability as shown in (9).

Reliability requirement
Two types of system breakdown are present generally, i.e., permanent failure as well as transient failure.This research, however, only takes into account the second form of failure; as a result, we construct the reliability having respect to an occurrence in a specific time frame .This can be represented in given (11).As  11)  i.e., nu specifies the frequent failures within a given unit of time, on the other hand   is castoff to display the constant failures of given virtual machine; further the reliability of   within given time   is computed as (12).Further, the failure occurred without applying ORT-FTC model is given as in (13).
(  ,   ) =  −     (12) Further, by considering that every task holds a certain number of values that are duplicates, thus  ℎ ( ℎ ≤ ⌊⌋) defined as  ℎ .Further, we describe duplicate set  ℎ that is assumed as;〈 ℎ 1 ,  ℎ 2 , … .,  ℎ  ℎ 〉 whereas  ℎ 1 is the absolute as well as others are duplicates.The total count of duplicates for the workflow is mentioned (14).Once  ℎ replica is established, it is witnessed that no failure occurred and along with that reliability is updated as (15).

ORT-FTC cost modelling
While analyzing the proposed workflow model along with the provided virtual machine that is mentioned in the same section earlier results in difficulty is to allocate the replicas together along with the associated virtual machine regarding every individual task, that can be defined as: decreasing the cost executions with dependability.Additionally, execution costs are decreased and fault tolerance is guaranteed thanks to the designed reliability condition in the previous section.The following is the recommended allocation of duplicates (that are also called as backups) and virtual machines, () = ∑    (( ℎ )   ℎ      ) (18)

Reliability requirement for the individual task
For every individual job's dependability, firstly we define the requirements and then accomplish the requirements.At the initial level, we compute these requirements by applying the absolute reliability that is determined through the prior allocations as per (18).The ith task is denoted by  () in the (18).We then optimize the dependability requirements using the points provided.The upper constraint for the reliability requirements is job  ℎ that is supplied by the √   () ⌊⌋ , which we take into consideration when determining the operational needs for each individual task.
The given set of task is provided as the unallocated given task [  (1) ,  (2) , … . .,  (−1) ] as well as allotted task [  (1) ,  (2) , … . .,  (−1) ] if the task given is p (sq(i)) for task I. Additionally, the ORT-FTC model presupposes that the each job within the workflow paradigm will be allocated along with a virtual machine as well as with a reliability parameter value.This is calculated using the method (19), which guarantees reliability.Consequently, the overall reliability need is calculated as given in (21).The (21)

ORT-FTC process
The main goal of the algorithm is to delegate the need for reliability to sub division while taking into account every single individual task; in addition, the proposed algorithm, called ORT-FTC, reduces the operating cost by choosing duplicates as well as optimal virtual machine.On the other side here, optimal virtual machine are also the ones that have effective execution time; moreover, some redundancy has been witnessed; it was also noted that some of the multiple copies can be discarded.Table 1 displays the ORT-FTC algorithm Table 1.The ORT-FTC methodology schedules the tasks  ℎ through an order; the best order is determined using the (22), where   is the task's execution time as displayed in (22).
Table 1.ORT-FTC algorithm Step 1: Start Step 2: Input will be taken as the DAG data that is nodes set and execution time as well as communication time along with virtual machine sets the reliability necessity.
Output will be the Reliability value as well as the cost appeared along with the schedule length Step 3: Sorting is done in sloping order.Additionally, ORT-FTC selects the replicas with the best virtual machine, and these virtual machines are set aside as well as sorted in the best way possible.Once the ideal virtual machines have been sorted then the ORT-FTC model cleans the previous allocations and moves on to assigning duplicates.Only the virtual machines with the highest order of dependability are selected by ORT-FTC, and the computation of duplication, execution costs, and optimal makespan is also done.

PERFORMANCE EVALUATION
One of the effective real-time computing models that is accessible, affordable, and can be accessible from anywhere over the internet is cloud-computing resources.It can be used for a wide range of purposes and has a number of advantages and disadvantages, includes workflow scheduling in scientific workflows with interdependent as well as independent operations.In this section of our research, it displays the simulation outcomes as well as the comprehensive comparisons of the proposed methodology.Today, workflowscheduling problems are widely tested using simulation approaches to test novel routing algorithms.In this way, it is possible for academics to compute the performance of the algorithms, which is proposed as per the research in a predictable and skillful way.On a Windows 10 PC with an Intel(R) Core(TM) i5-8300H processor clocked at 2.30 GHz and 8 GB of RAM, the simulations are run using the Java coding environment.All the simulations are performed on a Windows 10 PC.This PC comprises the processor Intel(R) Core(TM) i5-8300H, which is clocked at 2.30 Hz along with 8 GB RAM.All the simulations are done by applying Java coding environment.Five scientific workflow requests are explored here along with huge amount of data as well as the computational characteristics in order to assess our suggested algorithms with a realistic workload: Montage (I/O intensive) along with CyberShake (data intensive) as well as Epigenomics (CPU intensive) and the LIGO (memory intensive), sipht (CPU intensive).

Instance design
For evaluating the model, we created four separate instances, each of which contains a particular number of all virtual machines and a particular variant of the workflow.In addition, four workflows-CyberShake as well as Inspiral and Montage and the sipht-are taken into consideration.For example, the CyberShake workflow is designed with a first instance of CyberShake 30 as well as 20 virtual machines, a second example of CyberShake 50 along with 40 virtual machines, and a third instance of CyberShake 100 with 60 virtual machines.Additionally, the fourth instance includes CyberShake 1,000 and 80 virtual machines.Similarily, there are 20, 40, 60, and 80 virtual machines for the four separate instances of Inspiral 30 along with Inspiral 50 as well as Inspiral 100 and 1,000, respectively.Additionally, the total number of all the virtual computers in the scenario of Montage and sipht.

Work cost comparison 4.2.1. CyberShake
The earthquake science program CyberShake, which has high memory and CPU needs, is used to quantify earthquake hazards by merging massive datasets.Figure 2 compares four instances while taking into account the CyberShake workflow; in the first instance, the execution cost for the current model is 1,933.71whereas the execution cost for the proposed model is 18,418.71.Similarly, the execution costs for the second and third instances of the current model are 21,451.33 and 26,222.62,respectively, while the cost of execution for ORT-FTC are 20,038.49as well as 23,877.95.In the fourth example, the current model cost us 177,836.21,while the cost of the proposed model is 153,904.64.

LIGO flow analysis
Here, we use the LIGO workflow for generating as well as analyzing the gravitational waveforms of the collected data during coalescing of the compact binary systems.Figure 3 displays the workflow of LIGO. Figure 4 compares the execution costs of the old system versus ORT-FTC.The cost of the current model in the first instance, that is 19,957.79,even though the amount of ORT-FTC is 13,340.13.In the same way, the execution costs for the second and third instances are 35,468.35 and 63,426.63,correspondingly, while the costs for ORT-FTC are 23,705.17and 42,400.89.In addition, the cost of running the fourth instance of the present model is 6,886,731.17,while the cost of running the ORT-FTC is 459,009.39.

Comparison of cost
The execution expenses of the current and planned ORT-FTC regarding the four examples are contrasted in Figure 4. Whereas the execution expenses regarding the ORT-FTC are 508.31 as well as 1,118.82, the execution expenses of the first and second occurrences are 736.37 as well as 1,628.23,respectively.The execution costs of the current model are 3,430.49as well as 36,032.25,respectively, similar to the third and fourth cases, while the costs of the suggested model are 2,349.4as well as 24,636.46.

Comparative analysis of makespan model
In mentioned Figure 5 at Appendix displays the makespan evaluation for each of the four scenarios.Makespan is typically described as the total time needed to accomplish the given task from the beginning to the end.First, the makespan of the existing model is 58.72 while the makespan of the ORT-FTC is 48.67; second, the makespan of the existing model is 97.84 while the makespan of the proposed paradigm is 82.13.

Comparative assessment
The ORT-improvisation fault tolerance and cost minimization (FTCs) while accounting for the CyberShake process; the ORT-FTC observes, respectively, 4.72% along with 6.58% and 8.94% as well as 13.45% for the first, second, third, as well as the fourth instances.Here ORT-improvisation FTCs with respect to the LIGO workflow; for each of the four cases, ORT-FTC achieves 33.15% as well as 33.16% along with 33.14% and with value of 93.33%, correspondingly.ORT-improvement FTCs over the current model when taking the montage workflow into account; for each of the four cases, ORT-FTC observes an improvement of 30.97% as well as 31.28%along with 31.51% and with value 31.62%,correspondingly.ORT-improvement FTCs over the current model when taking the montage workflow into account in relation to makespan; for each of the four cases, ORT-FTC records improvements of 17.11% as well as 16.05% along with 19.92%, and the value with 20.76%.Corresponding ORT-FTC witnesses an improvement of 33.28% for the three cases as well as 93.32% for the fourth instances as compared to the current model when taking the sipht workflow into account.

CONCLUSION
The widespread acceptance of cloud computing and its rising popularity have made it possible to deploy it in large-scale applications; in these situations, cloud environments are also chosen by scientific associations to ensure that workflows are implemented as intended.Despite being so dynamic, cloud computing has a greater failure rate.Failures can happen for a variety of reasons, and these failures lead to a virtual machine being unavailable to execute the task.The fault tolerance approach can be used to design a solution for this problem.As a result, the methodology ORT-FTC established in this research work aims to improve execution cost and reliability.In order to choose a virtual machine and available duplicates with the least amount of redundancy, ORT-FTC employs the duplication method.In addition, ORT-FTC not only minimizes the cost, on the other side it reduces the makes pan.For analyzing the proposed ORT-FTC methodology, we have established four random examples.Here all the instances comprise the particular workflow variation as well as virtual machines in certain number.In addition, the average instance's cost displays an improvisation of 8.42% as well as 48.19% and 31.34%,along with 48.29% for the corresponding CyberShake, Ligo, Montage, and sipht workflows.It's also crucial to note that an aggregate of 18.46% improvisation is shown in the case of the Montage workflow.Despite the fact that ORT-FTC has demonstrated superior fault tolerance as well as cost minimization over other mechanisms for a variety of scientific processes, it must be mentioned that this effectiveness of the model can be varied from one particular workflow to the next depending as per the workload as well as task complexity; correspondingly, future studies may consider this.

Int
Efficient fault tolerant cost optimized approach for … (Asma Anjum) 129 This is Similar to the second as well as third cases, the makespan of the present model is 125.93 and 100.84 respectively, but the makespan of the ORT-FTC model is 299.84 as well as 1,679.35correspondingly.

Figure 6
shows the cost assessment analysis within existing model and proposed model.

Figure 6 .
Figure 6.Cost assessment analysis within existing model and proposed model it for specific tasks as well, and in order to satisfy reliability requirements, duplicate copies and virtual machines with the shortest makespan are chosen iteratively.Additionally, we provide an algorithm that lowers costs and offers efficient fault tolerance regarding the workflow.
Efficient fault tolerant cost optimized approach for … (Asma Anjum) 127 used to calculate