Distributed Database Load Balancing Prediction Based on Convolutional Neural Network PDF

Distributed Database Load Balancing Prediction

Based on Convolutional Neural Network

Xuanni Huo

Northeast Yucai School

Shenyang, China

[email protected]

Zhongshu Bo

Department of Mathematics and Computer science

University of central Oklahoma

Edmund, Oklahoma, USA.

[email protected]

AbstractTraditional database services have been unable to

handle the data surge in terms of system scalability and price

performance ratio. Distributed database services are proposed to

support the rapid development of enterprise services and are

suitable for various applications in big data scenarios. Load

balancing prediction method, an important part of distributed

database services, is used to predict the current situation of

distributed system resources occupancy. However, the traditional

load balancing prediction algorithm has shortcomings in the

accuracy of real-time prediction and dealing with sudden loading.

This paper proposes a distributed database load balancing

prediction method based on convolutional neural network, which

further realizes better real-time load balancing prediction and

effective adjustment of sudden loading. The simulation results

show that the load balancing prediction method proposed in this

paper can effectively utilize the performance of each node to

predict the usage of distributed database resources and

effectively adjust the sudden loading, which can avoid the waste

of computing resources and ensure the computational efficiency.

Keywords—Convolutional Neural Network, Distributed

Database

I. INTRODUCTION

The computer network systems are often composed by

multiple individual nodes. All the information in the system is

saved and processed by the different forms of database at the

various nodes. The database saved on any node is designed

with the ability of accessing any other nodes. The consumption

RQ DQ\ QRGH VKRXOG EH UHGXFHG WR LWV OLPLWDWLRQ IRU V\VWHP¶V

best operation and avoiding any potential problem [1].

The distributed database is ³a collection of multiple,

logically interrelated databases distributed over a computer

network´. The distributed database management system is

defined as ³the software system that permits the management

of the distributed database and makes the distribution

transparent to the users´ [2, 3]. In this paper, the distributed

database is a collection of data that is distributed on several

computer network systems or nodes, which are regarded as one

same system. In this situation, each system or nodes has its

processing capabilities and permission to perform local tasks

and access to global tasks.

However, there are many difficulties with the load of

modern distributed database systems. One of the difficulties is

operating in the unpredictable and unstable environments. Load

Balancing methods assign the transactions to multiple

corresponding database servers to balance their respective

loads. Compared with static load balancing, the dynamic load

balancing is an effective load balancing which has no fixed

allocating decision and the load is balanced on the time

dependent state of the system. Its strategy is represented by

how to exchange its information and control its load

distribution [4, 5].

Load balancing prediction in literature [6] proposed a

neural network database load balance prediction method, which

can observe, learn and predict the resource utilization taken in

the system to promote its efficient operation. The different

tasks are allocated to different servers by predicting how to

assigned their resources such as CPU or Memory. The

predicting is realized by a neural network structure, which can

REVHUYHDQGSUHGLFWWKHUHVRXUFH¶VDOORFDWL RQ RQHDFKVHUYHU

In [7], the supermarket model was analyzed with other

randomized load balancing methods. the supermarket model is

defined as an idealized process, corresponding to a system of

infinite size, where the number of servers goes to infinity. This

idealized process is given by a family of differential equations,

whose behavior is cleaner and easier to analyze because its

behavior is completely deterministic. This idealized system can

be related to systems with finite using the appropriate

mathematical tools.

In [8], the CPU load balancing in networks of workstations

was introduced. The paper solved the question of whether

preemptive migration is necessary, or whether remote

execution is enough for load balancing. The paper showed this

issue is strongly tied to understanding the process lifetime

distribution. The paper showed how to apply this distribution to

derive a preemptive migration policy that requires no hand-

tuned parameters and used a trace-driven simulation to show

the new preemptive migration strategy is far more effective

than remote execution, even when the memory transfer cost is

high.

Despite all the advances in these techniques and methods

mentioned above, the current distributed database load

balancing problem still has some shortcomings. For example,

the uneven distribution of load is obviously happening

frequently in distributed databases. The important role of load

844

2018 IEEE International Con

erence o

Produce In

ormatization (IICSPI)

balancing is to transfer the possible computational burden from

overloaded nodes to lightly loaded nodes by prediction. In this

way, the performance and stability of the entire distributed

system is improved. On the other hand, because different tasks

arrive at different servers, there is a delay, and the time

required to complete the tasks is also different. Even in a

symmetric homogeneous distributed system, the same load

distribution is still uneven. In the current system, these

problems can only be dealt with manually by modifying the

structure and permissions of the system one by one, which will

reduce the system's flexibility.

This paper proposes a distributed database load balancing

prediction method based on convolutional neural network.

Compared with the traditional load balancing prediction

scheme described above, this method is more suitable for

dealing with real-time load prediction under the condition of

sudden real-time load and heterogeneous data in the grid, and

has good performance in the scenario of high inflow and

outflow data.

The structure of this paper is as follows: Firstly, the primary

solution of load balancing prediction in distributed database

management is introduced. Then the distributed database load

balancing predict solution based on improved convolutional

neural network is proposed, and the scheme is given on the

grid-specific data set. The test results, and finally the summary

and outlook are given.

II. L

OAD BALANCING PREDICTION SOLUTION IN DISTRIBUTED

DATABASE MANAGEMENT

A. Distributed database management

Large-scale scientific simulations need to be performed in

parallel in a high-performance computing (HPC) environment.

These simulations often handle many program calls, and each

program call is treated as a task-implying a multitasking (MTC)

paradigm that generates the data used by the next task to form a

data stream that can be modeled as a workflow. Carefully

planned by the Scientific Workflow Management System

(SWFMS). It is important to schedule which tasks are

scheduled to be distributed to the computers that make up the

HPC environment and what data the tasks will use for the

parallel execution engine. SWFMS should also collect source

data that represents metadata for workflow specifications and

execution results. Storing source data is critical for

repeatability, sharing, analysis, and knowledge reuse. In

addition to the source and workflow execution data, SWFMS

must also manage domain-specific data such as wave

propagation speed (seismic domain). In summary, SWFMS

should manage three types of data along the data stream: (i)

performance execution data - primarily related to MTC, (ii)

source data, and (iii) domain data.

The main goal of distributed database management is to

support the management of large amounts of data in an energy-

efficient manner. In fact, the energy consumption in the

network is mainly due to data communication tasks between

nodes. To solve this problem, there are various data reduction

techniques, including data aggregation, packet merging, data

compression techniques, data fusion, and approximation-based

techniques. Data aggregation techniques include performing

data aggregation at intermediate nodes between a source node

and a sink node. Packet merging combines multiple small

packets into one large packet, regardless of the semantics and

the correlation between the packets. Data compression

techniques are also used to reduce the amount of data

transmitted between nodes, but they involve data encoding at

the source node and data decoding at the sink node. Data fusion

technology refers to more complex operations on data sets and

is commonly used for multimedia data processing. The

approximation-based technique uses statistical techniques to

approximate the query results. Among other advantages, these

technologies provide the size of the transmitted data,

communication tasks, network load and data transfer time

reduction [9, 10, 11].

B. Load balancing prediction method

Load balancing in cloud computing provides an effective

solution to the various problems in cloud computing

environment setup and use. Load balancing must consider two

main tasks, one is resource configuration or resource allocation,

and the other is task scheduling in a distributed environment.

Effectively configuring resources and scheduling resources and

tasks will ensure that: Resources are easily available on

demand; Efficient use of resources under high/low load

conditions; Save energy at low loads; Reduce the cost of using

resources [12].

In the management of distributed database systems,

workloads typically exceed resources in the computing

environment. The common operations that need to be

performed are three separate units, usually called load

balancing, resource discovery, and process migration. Load

balancing algorithms offer the potential to improve the

performance of large-scale computing systems and applications

because they are designed to minimize resource response and

throughput while minimizing response time and avoiding the

possibility of computing system overload. In order to make

better use of resources, an effective load balancing solution

needs to potentially reduce resource over-provisioning. There

are several models and techniques that provide efficient

scheduling and load balancing, such as static and dynamic.

Static mechanisms require prior knowledge of the environment

and application requirements. However, as the application

begins to execute, these models cannot adapt to changes in the

environment or requirements. In contrast, in a dynamic

mechanism, load balancing monitors environment and

application requirements at runtime and attempts to adjust

reassign tasks and adjust the load as needed [13].

The commonly used load balancing algorithms fall into two

categories. One is a static load balancing algorithm, and the

other is a dynamic load balancing algorithm. Round Robin (RR)

and Weighted Round Robin (WRR) are commonly used static

algorithms. RR is relatively simple, regardless of server

availability, server load, and the distance between the client

and the server. WRR solves the problem of server performance

inconsistency by increasing the weight, but when the service is

requested for a long time, the load may be tilted. Minimum

connection (LC) and weighted minimum connection (WLC)

are commonly used dynamic algorithms. The LC does not

845

2018 IEEE International Con

erence o

Produce In

ormatization (IICSPI)

consider service capabilities, the distance between the client

and the server, and other factors. WLC is a relatively good

dynamic scheduling algorithm [14].

The main focus of literature [15] is to propose a

computationally feasible and automated optimization based

residential load control scheme in the retail power market,

where RTP is combined with IBR. Our goal is to minimize

household electricity payments by optimally arranging the

operation and energy consumption of each device, depending

on the specific needs indicated by the user. We assume that

each residential user is equipped with a smart meter that is

connected via a computer network to an intelligent power

distribution system with two-way digital communication.

While periodically receiving updated information about the

price from the utility, each smart meter includes an energy

scheduling unit that determines energy consumption in the

home. Depending on the scheduling range, the operation of the

energy scheduling unit is supplemented by a price predictor

unit that estimates the upcoming price by applying a weighted

average filter to past prices. We obtained the best coefficients

for the price predictor filter and showed that it is best to use

different coefficients for different dates of the week.

III. D

ISTRIBUTED DATABASE LOAD BALANCING PREDICTION

BASED ON CONVOLUTIONAL NEURAL NETWORK

Load balancing is an important indicator that affects the

overall performance of distributed databases. By assigning all

operations to each node, high-efficiency task operations and

management are performed on each distributed node to

optimize node resource usage and achieve load balancing.

Load balancing algorithms are divided into static, dynamic, and

adaptive methods. Whether dynamic load balancing can be

achieved depends on whether the task migration is efficient,

and whether static load balancing can be achieved depends on

the accuracy of the task. The disadvantage of the former is that

since the assignment of tasks is predetermined, it is difficult to

adjust when an unscheduled task occurs; the latter is mainly

implemented by performing task assignment related to the

mapping task in the calculation process. In contrast, the latter's

distributed database has better overall performance, but in the

scenario of large data volume and heterogeneous data, the latter

cannot achieve maximum performance improvement.

A. Convolutional Neural Network

In recent years, the deep learning model has achieved

remarkable results in computer vision and speech recognition.

A Convolutional Neural Network (CNN) utilizes a layer with a

convolution filter applied to local features. The CNN model

originally invented for computer vision was subsequently

proven to be effective for natural language processing and

achieved excellent results in other traditional pattern

recognition tasks [16, 17].

Recent advances in Convolutional Neural Networks (CNN)

have yielded promising results in difficult deep learning tasks.

However, the success of CNN depends on finding the

architecture that fits the given problem. Due to the large

number of architectural design choices, the handcrafted

architecture is a challenging and time-consuming process that

requires expertise and effort.

Figure 1. Example of convolutional neural networks[18]

In fact, CNN has done a lot of research in the field of image

recognition and computer vision, providing improvements over

DNN on many tasks. Recently, CNN has been used for speech

recognition and also shows improvements to DNN, but for

small vocabulary tasks with shallow networks. Although a new

framework has been introduced to simulate spectral correlation,

one of the limitations of this spectral modeling approach is that

the network is limited to one convolutional layer, which is

different from most CNNs that use multiple convolutional

layers. In paper [18], it explores spatial modeling similar to that

in the image recognition community, which allows multiple

convolution layers and encourages deeper networks.

In paper [19], we present an effective framework for

automatically designing high-performance CNN architectures

for a given problem. In this framework, it introduces a new

optimization objective function that combines the error rate and

information learned through a set of feature maps using a

deconvnet network. The new objective function allows the

hyperparameters of the CNN architecture to optimize

performance by guiding CNN to better visualize learning

features through deconvnet. The actual optimization of the

objective function is performed by the Nelder-Mead method

(NMM). In addition, the objective function leads to faster

convergence towards a better architecture. The proposed

framework enables an efficient way to explore the numerous

design choices of the CNN architecture and also allows for

efficient distributed execution and synchronization through

Web services.

B. Load Balancing Prediction based on Convolutional Neural

Network

To overcome the problem of inaccurate real-time prediction

of distributed database system load under heterogeneous data

and burst load, this paper proposes a distributed database load

balancing prediction method based on improved convolutional

neural network. The main components are as shown in Figure 2.

Show. The input and output data are processed into the

distributed database management system; the distributed

database management system includes three modules: load

balancing, resource discovery and process migration; each

local node contains a local load balancing, a resource discovery

and a process migration. The three modules of the local node

interact with the overall load balancing module in the

management system, and the content of the interaction includes

resource allocation use and task running, and the resource

usage status of each local node by the convolutional neural

network load prediction analyzer. Make predictions.

846

2018 IEEE International Con

erence o

Produce In

ormatization (IICSPI)

Overview

Distributed Database Load Balancing Prediction Based on Convolutional Neural Network

This research paper presents a method for load balancing prediction in distributed databases using convolutional neural networks. It addresses the challenges of real-time prediction accuracy and sudden load adjustments in distributed systems. The study demonstrates how the proposed method improves resource utilization and computational efficiency. Simulation results indicate that this approach effectively manages resource occupancy and adjusts to varying loads, making it suitable for big data applications. This work is essential for researchers and practitioners looking to enhance distributed database performance. Key Points Proposes a convolutional neural network method for load balancing in distributed databases. Addresses challenges in real-time prediction accuracy and sudden load…

/ 5

240

Figures

Example of convolutional neural networks[18]

The Requests per second (RPS) compared with two methods

The Average response time (ART) compared with two methods

The Peak response time (PRT) compared with two methods

FAQs

What is the main focus of the paper on distributed database load balancing?

The paper focuses on proposing a load balancing prediction method for distributed databases based on convolutional neural networks (CNN). It addresses the shortcomings of traditional load balancing algorithms, particularly their accuracy in real-time predictions and their ability to handle sudden loads. The study emphasizes the need for efficient load balancing to improve the performance and stability of distributed systems, especially in unpredictable environments.

How does the proposed CNN method improve load balancing prediction?

The proposed method utilizes an improved convolutional neural network to enhance real-time load balancing predictions under conditions of heterogeneous data and burst loads. By processing input data through a structured CNN, the system can predict local load distribution and resource usage more accurately. This approach allows for effective adjustments to sudden loading, thereby optimizing resource utilization and ensuring computational efficiency.

What are the key components of the distributed database management system described in the paper?

The distributed database management system outlined in the paper includes three main components: load balancing, resource discovery, and process migration. Each local node within the system interacts with these components to manage tasks efficiently. The load balancing module ensures that operations are evenly distributed across nodes, while resource discovery identifies available resources, and process migration facilitates the movement of tasks as needed to maintain balance.

What experimental results support the effectiveness of the CNN-based load balancing method?

The experimental results demonstrate that the CNN-based load balancing prediction method outperforms traditional methods in terms of stability and response time. Specifically, under varying levels of concurrent demand, the CNN approach shows a more balanced load distribution as indicated by metrics such as Requests per Second (RPS), Average Response Time (ART), and Peak Response Time (PRT). These results highlight the method's ability to maintain efficiency even during high-load scenarios.

What challenges do traditional load balancing algorithms face according to the paper?

Traditional load balancing algorithms often struggle with the uneven distribution of load in distributed databases, leading to performance issues. They typically require manual adjustments to address these imbalances, which reduces system flexibility. Additionally, these algorithms may not adapt well to sudden changes in workload or data heterogeneity, resulting in inefficiencies in resource utilization and increased response times.

What methodology is used to implement the CNN load prediction in the paper?

The methodology for implementing CNN load prediction involves several steps: first, inputting heterogeneous data into the distributed database management system. Next, the system initializes load distribution and resource discovery. Local nodes then predict future load distributions using the CNN model, which iteratively refines its predictions based on feedback from local load distributions. This process continues until the load balance meets specified requirements.

Why is load balancing crucial for distributed database systems?

Load balancing is crucial for distributed database systems because it directly impacts overall system performance and stability. By evenly distributing computational tasks across nodes, load balancing helps prevent any single node from becoming overloaded, which can lead to delays and inefficiencies. Effective load balancing ensures that resources are utilized optimally, enhancing the system's ability to handle varying workloads and maintain high levels of service.

Figures

You May Also Like

FAQs

What is the main focus of the paper on distributed database load balancing?

How does the proposed CNN method improve load balancing prediction?

What are the key components of the distributed database management system described in the paper?

What experimental results support the effectiveness of the CNN-based load balancing method?

What challenges do traditional load balancing algorithms face according to the paper?

What methodology is used to implement the CNN load prediction in the paper?

Why is load balancing crucial for distributed database systems?