
balancing is to transfer the possible computational burden from
overloaded nodes to lightly loaded nodes by prediction. In this
way, the performance and stability of the entire distributed
system is improved. On the other hand, because different tasks
arrive at different servers, there is a delay, and the time
required to complete the tasks is also different. Even in a
symmetric homogeneous distributed system, the same load
distribution is still uneven. In the current system, these
problems can only be dealt with manually by modifying the
structure and permissions of the system one by one, which will
reduce the system's flexibility.
This paper proposes a distributed database load balancing
prediction method based on convolutional neural network.
Compared with the traditional load balancing prediction
scheme described above, this method is more suitable for
dealing with real-time load prediction under the condition of
sudden real-time load and heterogeneous data in the grid, and
has good performance in the scenario of high inflow and
outflow data.
The structure of this paper is as follows: Firstly, the primary
solution of load balancing prediction in distributed database
management is introduced. Then the distributed database load
balancing predict solution based on improved convolutional
neural network is proposed, and the scheme is given on the
grid-specific data set. The test results, and finally the summary
and outlook are given.
II. L
OAD BALANCING PREDICTION SOLUTION IN DISTRIBUTED
DATABASE MANAGEMENT
A. Distributed database management
Large-scale scientific simulations need to be performed in
parallel in a high-performance computing (HPC) environment.
These simulations often handle many program calls, and each
program call is treated as a task-implying a multitasking (MTC)
paradigm that generates the data used by the next task to form a
data stream that can be modeled as a workflow. Carefully
planned by the Scientific Workflow Management System
(SWFMS). It is important to schedule which tasks are
scheduled to be distributed to the computers that make up the
HPC environment and what data the tasks will use for the
parallel execution engine. SWFMS should also collect source
data that represents metadata for workflow specifications and
execution results. Storing source data is critical for
repeatability, sharing, analysis, and knowledge reuse. In
addition to the source and workflow execution data, SWFMS
must also manage domain-specific data such as wave
propagation speed (seismic domain). In summary, SWFMS
should manage three types of data along the data stream: (i)
performance execution data - primarily related to MTC, (ii)
source data, and (iii) domain data.
The main goal of distributed database management is to
support the management of large amounts of data in an energy-
efficient manner. In fact, the energy consumption in the
network is mainly due to data communication tasks between
nodes. To solve this problem, there are various data reduction
techniques, including data aggregation, packet merging, data
compression techniques, data fusion, and approximation-based
techniques. Data aggregation techniques include performing
data aggregation at intermediate nodes between a source node
and a sink node. Packet merging combines multiple small
packets into one large packet, regardless of the semantics and
the correlation between the packets. Data compression
techniques are also used to reduce the amount of data
transmitted between nodes, but they involve data encoding at
the source node and data decoding at the sink node. Data fusion
technology refers to more complex operations on data sets and
is commonly used for multimedia data processing. The
approximation-based technique uses statistical techniques to
approximate the query results. Among other advantages, these
technologies provide the size of the transmitted data,
communication tasks, network load and data transfer time
reduction [9, 10, 11].
B. Load balancing prediction method
Load balancing in cloud computing provides an effective
solution to the various problems in cloud computing
environment setup and use. Load balancing must consider two
main tasks, one is resource configuration or resource allocation,
and the other is task scheduling in a distributed environment.
Effectively configuring resources and scheduling resources and
tasks will ensure that: Resources are easily available on
demand; Efficient use of resources under high/low load
conditions; Save energy at low loads; Reduce the cost of using
resources [12].
In the management of distributed database systems,
workloads typically exceed resources in the computing
environment. The common operations that need to be
performed are three separate units, usually called load
balancing, resource discovery, and process migration. Load
balancing algorithms offer the potential to improve the
performance of large-scale computing systems and applications
because they are designed to minimize resource response and
throughput while minimizing response time and avoiding the
possibility of computing system overload. In order to make
better use of resources, an effective load balancing solution
needs to potentially reduce resource over-provisioning. There
are several models and techniques that provide efficient
scheduling and load balancing, such as static and dynamic.
Static mechanisms require prior knowledge of the environment
and application requirements. However, as the application
begins to execute, these models cannot adapt to changes in the
environment or requirements. In contrast, in a dynamic
mechanism, load balancing monitors environment and
application requirements at runtime and attempts to adjust
reassign tasks and adjust the load as needed [13].
The commonly used load balancing algorithms fall into two
categories. One is a static load balancing algorithm, and the
other is a dynamic load balancing algorithm. Round Robin (RR)
and Weighted Round Robin (WRR) are commonly used static
algorithms. RR is relatively simple, regardless of server
availability, server load, and the distance between the client
and the server. WRR solves the problem of server performance
inconsistency by increasing the weight, but when the service is
requested for a long time, the load may be tilted. Minimum
connection (LC) and weighted minimum connection (WLC)
are commonly used dynamic algorithms. The LC does not
845
2018 IEEE International Con
erence o
Sa
et
Produce In
ormatization (IICSPI)