This research paper presents a method for load balancing prediction in distributed databases using convolutional neural networks. It addresses the challenges of real-time prediction accuracy and sudden load adjustments in distributed systems. The study demonstrates how the proposed method improves resource utilization and computational efficiency. Simulation results indicate that this approach effectively manages resource occupancy and adjusts to varying loads, making it suitable for big data applications. This work is essential for researchers and practitioners looking to enhance distributed database performance.

Key Points

  • Proposes a convolutional neural network method for load balancing in distributed databases.
  • Addresses challenges in real-time prediction accuracy and sudden load adjustments.
  • Demonstrates improved resource utilization and computational efficiency through simulations.
  • Highlights the importance of effective load management for big data applications.
Sharanya Kamath
Author:Xuanni Huo, Zhongshu Bo
5 pages
Language:English
Type:Research Paper
Sharanya Kamath
Author:Xuanni Huo, Zhongshu Bo
5 pages
Language:English
Type:Research Paper
240
/ 5
Distributed Database Load Balancing Prediction
Based on Convolutional Neural Network
Xuanni Huo
Northeast Yucai School
Shenyang, China
huoxuanni@163.com
Zhongshu Bo
Department of Mathematics and Computer science
University of central Oklahoma
Edmund, Oklahoma, USA.
Zbo@uco.edu
AbstractTraditional database services have been unable to
handle the data surge in terms of system scalability and price
performance ratio. Distributed database services are proposed to
support the rapid development of enterprise services and are
suitable for various applications in big data scenarios. Load
balancing prediction method, an important part of distributed
database services, is used to predict the current situation of
distributed system resources occupancy. However, the traditional
load balancing prediction algorithm has shortcomings in the
accuracy of real-time prediction and dealing with sudden loading.
This paper proposes a distributed database load balancing
prediction method based on convolutional neural network, which
further realizes better real-time load balancing prediction and
effective adjustment of sudden loading. The simulation results
show that the load balancing prediction method proposed in this
paper can effectively utilize the performance of each node to
predict the usage of distributed database resources and
effectively adjust the sudden loading, which can avoid the waste
of computing resources and ensure the computational efficiency.
Keywords—Convolutional Neural Network, Distributed
Database
I. INTRODUCTION
The computer network systems are often composed by
multiple individual nodes. All the information in the system is
saved and processed by the different forms of database at the
various nodes. The database saved on any node is designed
with the ability of accessing any other nodes. The consumption
RQ DQ\ QRGH VKRXOG EH UHGXFHG WR LWV OLPLWDWLRQ IRU V\VWHP¶V
best operation and avoiding any potential problem [1].
The distributed database is ³a collection of multiple,
logically interrelated databases distributed over a computer
network´. The distributed database management system is
defined as ³the software system that permits the management
of the distributed database and makes the distribution
transparent to the users´ [2, 3]. In this paper, the distributed
database is a collection of data that is distributed on several
computer network systems or nodes, which are regarded as one
same system. In this situation, each system or nodes has its
processing capabilities and permission to perform local tasks
and access to global tasks.
However, there are many difficulties with the load of
modern distributed database systems. One of the difficulties is
operating in the unpredictable and unstable environments. Load
Balancing methods assign the transactions to multiple
corresponding database servers to balance their respective
loads. Compared with static load balancing, the dynamic load
balancing is an effective load balancing which has no fixed
allocating decision and the load is balanced on the time
dependent state of the system. Its strategy is represented by
how to exchange its information and control its load
distribution [4, 5].
Load balancing prediction in literature [6] proposed a
neural network database load balance prediction method, which
can observe, learn and predict the resource utilization taken in
the system to promote its efficient operation. The different
tasks are allocated to different servers by predicting how to
assigned their resources such as CPU or Memory. The
predicting is realized by a neural network structure, which can
REVHUYHDQGSUHGLFWWKHUHVRXUFH¶VDOORFDWL RQ RQHDFKVHUYHU
In [7], the supermarket model was analyzed with other
randomized load balancing methods. the supermarket model is
defined as an idealized process, corresponding to a system of
infinite size, where the number of servers goes to infinity. This
idealized process is given by a family of differential equations,
whose behavior is cleaner and easier to analyze because its
behavior is completely deterministic. This idealized system can
be related to systems with finite using the appropriate
mathematical tools.
In [8], the CPU load balancing in networks of workstations
was introduced. The paper solved the question of whether
preemptive migration is necessary, or whether remote
execution is enough for load balancing. The paper showed this
issue is strongly tied to understanding the process lifetime
distribution. The paper showed how to apply this distribution to
derive a preemptive migration policy that requires no hand-
tuned parameters and used a trace-driven simulation to show
the new preemptive migration strategy is far more effective
than remote execution, even when the memory transfer cost is
high.
Despite all the advances in these techniques and methods
mentioned above, the current distributed database load
balancing problem still has some shortcomings. For example,
the uneven distribution of load is obviously happening
frequently in distributed databases. The important role of load
978-1-5386-5514-6/18/$31.00 ©2018 IEEE
844
2018 IEEE International Con
f
erence o
f
Sa
f
et
y
Produce In
f
ormatization (IICSPI)
balancing is to transfer the possible computational burden from
overloaded nodes to lightly loaded nodes by prediction. In this
way, the performance and stability of the entire distributed
system is improved. On the other hand, because different tasks
arrive at different servers, there is a delay, and the time
required to complete the tasks is also different. Even in a
symmetric homogeneous distributed system, the same load
distribution is still uneven. In the current system, these
problems can only be dealt with manually by modifying the
structure and permissions of the system one by one, which will
reduce the system's flexibility.
This paper proposes a distributed database load balancing
prediction method based on convolutional neural network.
Compared with the traditional load balancing prediction
scheme described above, this method is more suitable for
dealing with real-time load prediction under the condition of
sudden real-time load and heterogeneous data in the grid, and
has good performance in the scenario of high inflow and
outflow data.
The structure of this paper is as follows: Firstly, the primary
solution of load balancing prediction in distributed database
management is introduced. Then the distributed database load
balancing predict solution based on improved convolutional
neural network is proposed, and the scheme is given on the
grid-specific data set. The test results, and finally the summary
and outlook are given.
II. L
OAD BALANCING PREDICTION SOLUTION IN DISTRIBUTED
DATABASE MANAGEMENT
A. Distributed database management
Large-scale scientific simulations need to be performed in
parallel in a high-performance computing (HPC) environment.
These simulations often handle many program calls, and each
program call is treated as a task-implying a multitasking (MTC)
paradigm that generates the data used by the next task to form a
data stream that can be modeled as a workflow. Carefully
planned by the Scientific Workflow Management System
(SWFMS). It is important to schedule which tasks are
scheduled to be distributed to the computers that make up the
HPC environment and what data the tasks will use for the
parallel execution engine. SWFMS should also collect source
data that represents metadata for workflow specifications and
execution results. Storing source data is critical for
repeatability, sharing, analysis, and knowledge reuse. In
addition to the source and workflow execution data, SWFMS
must also manage domain-specific data such as wave
propagation speed (seismic domain). In summary, SWFMS
should manage three types of data along the data stream: (i)
performance execution data - primarily related to MTC, (ii)
source data, and (iii) domain data.
The main goal of distributed database management is to
support the management of large amounts of data in an energy-
efficient manner. In fact, the energy consumption in the
network is mainly due to data communication tasks between
nodes. To solve this problem, there are various data reduction
techniques, including data aggregation, packet merging, data
compression techniques, data fusion, and approximation-based
techniques. Data aggregation techniques include performing
data aggregation at intermediate nodes between a source node
and a sink node. Packet merging combines multiple small
packets into one large packet, regardless of the semantics and
the correlation between the packets. Data compression
techniques are also used to reduce the amount of data
transmitted between nodes, but they involve data encoding at
the source node and data decoding at the sink node. Data fusion
technology refers to more complex operations on data sets and
is commonly used for multimedia data processing. The
approximation-based technique uses statistical techniques to
approximate the query results. Among other advantages, these
technologies provide the size of the transmitted data,
communication tasks, network load and data transfer time
reduction [9, 10, 11].
B. Load balancing prediction method
Load balancing in cloud computing provides an effective
solution to the various problems in cloud computing
environment setup and use. Load balancing must consider two
main tasks, one is resource configuration or resource allocation,
and the other is task scheduling in a distributed environment.
Effectively configuring resources and scheduling resources and
tasks will ensure that: Resources are easily available on
demand; Efficient use of resources under high/low load
conditions; Save energy at low loads; Reduce the cost of using
resources [12].
In the management of distributed database systems,
workloads typically exceed resources in the computing
environment. The common operations that need to be
performed are three separate units, usually called load
balancing, resource discovery, and process migration. Load
balancing algorithms offer the potential to improve the
performance of large-scale computing systems and applications
because they are designed to minimize resource response and
throughput while minimizing response time and avoiding the
possibility of computing system overload. In order to make
better use of resources, an effective load balancing solution
needs to potentially reduce resource over-provisioning. There
are several models and techniques that provide efficient
scheduling and load balancing, such as static and dynamic.
Static mechanisms require prior knowledge of the environment
and application requirements. However, as the application
begins to execute, these models cannot adapt to changes in the
environment or requirements. In contrast, in a dynamic
mechanism, load balancing monitors environment and
application requirements at runtime and attempts to adjust
reassign tasks and adjust the load as needed [13].
The commonly used load balancing algorithms fall into two
categories. One is a static load balancing algorithm, and the
other is a dynamic load balancing algorithm. Round Robin (RR)
and Weighted Round Robin (WRR) are commonly used static
algorithms. RR is relatively simple, regardless of server
availability, server load, and the distance between the client
and the server. WRR solves the problem of server performance
inconsistency by increasing the weight, but when the service is
requested for a long time, the load may be tilted. Minimum
connection (LC) and weighted minimum connection (WLC)
are commonly used dynamic algorithms. The LC does not
845
2018 IEEE International Con
f
erence o
f
Sa
f
et
y
Produce In
f
ormatization (IICSPI)
consider service capabilities, the distance between the client
and the server, and other factors. WLC is a relatively good
dynamic scheduling algorithm [14].
The main focus of literature [15] is to propose a
computationally feasible and automated optimization based
residential load control scheme in the retail power market,
where RTP is combined with IBR. Our goal is to minimize
household electricity payments by optimally arranging the
operation and energy consumption of each device, depending
on the specific needs indicated by the user. We assume that
each residential user is equipped with a smart meter that is
connected via a computer network to an intelligent power
distribution system with two-way digital communication.
While periodically receiving updated information about the
price from the utility, each smart meter includes an energy
scheduling unit that determines energy consumption in the
home. Depending on the scheduling range, the operation of the
energy scheduling unit is supplemented by a price predictor
unit that estimates the upcoming price by applying a weighted
average filter to past prices. We obtained the best coefficients
for the price predictor filter and showed that it is best to use
different coefficients for different dates of the week.
III. D
ISTRIBUTED DATABASE LOAD BALANCING PREDICTION
BASED ON CONVOLUTIONAL NEURAL NETWORK
Load balancing is an important indicator that affects the
overall performance of distributed databases. By assigning all
operations to each node, high-efficiency task operations and
management are performed on each distributed node to
optimize node resource usage and achieve load balancing.
Load balancing algorithms are divided into static, dynamic, and
adaptive methods. Whether dynamic load balancing can be
achieved depends on whether the task migration is efficient,
and whether static load balancing can be achieved depends on
the accuracy of the task. The disadvantage of the former is that
since the assignment of tasks is predetermined, it is difficult to
adjust when an unscheduled task occurs; the latter is mainly
implemented by performing task assignment related to the
mapping task in the calculation process. In contrast, the latter's
distributed database has better overall performance, but in the
scenario of large data volume and heterogeneous data, the latter
cannot achieve maximum performance improvement.
A. Convolutional Neural Network
In recent years, the deep learning model has achieved
remarkable results in computer vision and speech recognition.
A Convolutional Neural Network (CNN) utilizes a layer with a
convolution filter applied to local features. The CNN model
originally invented for computer vision was subsequently
proven to be effective for natural language processing and
achieved excellent results in other traditional pattern
recognition tasks [16, 17].
Recent advances in Convolutional Neural Networks (CNN)
have yielded promising results in difficult deep learning tasks.
However, the success of CNN depends on finding the
architecture that fits the given problem. Due to the large
number of architectural design choices, the handcrafted
architecture is a challenging and time-consuming process that
requires expertise and effort.
Figure 1. Example of convolutional neural networks[18]
In fact, CNN has done a lot of research in the field of image
recognition and computer vision, providing improvements over
DNN on many tasks. Recently, CNN has been used for speech
recognition and also shows improvements to DNN, but for
small vocabulary tasks with shallow networks. Although a new
framework has been introduced to simulate spectral correlation,
one of the limitations of this spectral modeling approach is that
the network is limited to one convolutional layer, which is
different from most CNNs that use multiple convolutional
layers. In paper [18], it explores spatial modeling similar to that
in the image recognition community, which allows multiple
convolution layers and encourages deeper networks.
In paper [19], we present an effective framework for
automatically designing high-performance CNN architectures
for a given problem. In this framework, it introduces a new
optimization objective function that combines the error rate and
information learned through a set of feature maps using a
deconvnet network. The new objective function allows the
hyperparameters of the CNN architecture to optimize
performance by guiding CNN to better visualize learning
features through deconvnet. The actual optimization of the
objective function is performed by the Nelder-Mead method
(NMM). In addition, the objective function leads to faster
convergence towards a better architecture. The proposed
framework enables an efficient way to explore the numerous
design choices of the CNN architecture and also allows for
efficient distributed execution and synchronization through
Web services.
B. Load Balancing Prediction based on Convolutional Neural
Network
To overcome the problem of inaccurate real-time prediction
of distributed database system load under heterogeneous data
and burst load, this paper proposes a distributed database load
balancing prediction method based on improved convolutional
neural network. The main components are as shown in Figure 2.
Show. The input and output data are processed into the
distributed database management system; the distributed
database management system includes three modules: load
balancing, resource discovery and process migration; each
local node contains a local load balancing, a resource discovery
and a process migration. The three modules of the local node
interact with the overall load balancing module in the
management system, and the content of the interaction includes
resource allocation use and task running, and the resource
usage status of each local node by the convolutional neural
network load prediction analyzer. Make predictions.
846
2018 IEEE International Con
f
erence o
f
Sa
f
et
y
Produce In
f
ormatization (IICSPI)
/ 5
End of Document
240

FAQs

What is the main focus of the paper on distributed database load balancing?
The paper focuses on proposing a load balancing prediction method for distributed databases based on convolutional neural networks (CNN). It addresses the shortcomings of traditional load balancing algorithms, particularly their accuracy in real-time predictions and their ability to handle sudden loads. The study emphasizes the need for efficient load balancing to improve the performance and stability of distributed systems, especially in unpredictable environments.
How does the proposed CNN method improve load balancing prediction?
The proposed method utilizes an improved convolutional neural network to enhance real-time load balancing predictions under conditions of heterogeneous data and burst loads. By processing input data through a structured CNN, the system can predict local load distribution and resource usage more accurately. This approach allows for effective adjustments to sudden loading, thereby optimizing resource utilization and ensuring computational efficiency.
What are the key components of the distributed database management system described in the paper?
The distributed database management system outlined in the paper includes three main components: load balancing, resource discovery, and process migration. Each local node within the system interacts with these components to manage tasks efficiently. The load balancing module ensures that operations are evenly distributed across nodes, while resource discovery identifies available resources, and process migration facilitates the movement of tasks as needed to maintain balance.
What experimental results support the effectiveness of the CNN-based load balancing method?
The experimental results demonstrate that the CNN-based load balancing prediction method outperforms traditional methods in terms of stability and response time. Specifically, under varying levels of concurrent demand, the CNN approach shows a more balanced load distribution as indicated by metrics such as Requests per Second (RPS), Average Response Time (ART), and Peak Response Time (PRT). These results highlight the method's ability to maintain efficiency even during high-load scenarios.
What challenges do traditional load balancing algorithms face according to the paper?
Traditional load balancing algorithms often struggle with the uneven distribution of load in distributed databases, leading to performance issues. They typically require manual adjustments to address these imbalances, which reduces system flexibility. Additionally, these algorithms may not adapt well to sudden changes in workload or data heterogeneity, resulting in inefficiencies in resource utilization and increased response times.
What methodology is used to implement the CNN load prediction in the paper?
The methodology for implementing CNN load prediction involves several steps: first, inputting heterogeneous data into the distributed database management system. Next, the system initializes load distribution and resource discovery. Local nodes then predict future load distributions using the CNN model, which iteratively refines its predictions based on feedback from local load distributions. This process continues until the load balance meets specified requirements.
Why is load balancing crucial for distributed database systems?
Load balancing is crucial for distributed database systems because it directly impacts overall system performance and stability. By evenly distributing computational tasks across nodes, load balancing helps prevent any single node from becoming overloaded, which can lead to delays and inefficiencies. Effective load balancing ensures that resources are utilized optimally, enhancing the system's ability to handle varying workloads and maintain high levels of service.