Today, the typing speed is exceptionally slow, only less than a thousand words in two hours, so the update will be delayed, probably around midnight or one or two o'clock. By then, just refresh this chapter to see the update. By the way, it's already the twenty-eighth, but surprisingly, this chapter's comments haven't been opened yet; it seems we'll have to wait until next month for the unblocking.
...
Abstract: To ensure network security, a method for network security risk mining and estimation based on big data analysis is proposed. The Map and Reduce functions of the Hadoop platform are selected to mine association rules of network security events. The mined association rules are used as features of network security events, and the features of network security events serve as inputs to the Support Vector Machine with a Radial Basis Kernel Function. A network security risk estimation model is built through training, and the QPSO method's optimization capability is used to find the Support Vector Machine's optimal parameters. Experimental results show that this method enhances the precision of network security risk estimation and provides important reference value for defending against network security risks.
Keywords: Big data analysis; Network security risk; Association rules; Support Vector Machine
1 Introduction
The development of internet technology is incredibly rapid, and the internet environment is highly open. Some attackers exploit the uncertainty and diversity of the network to attack, seriously threatening the safe operation of the network [1-2]. Previous network defense methods only utilized information contained in data packets to obtain risk estimation results, which had low accuracy. To ensure network operation safety, enable network administrators to clearly understand the network status in real-time, anticipate network security risks, and adopt appropriate defense measures to resist risks, establishing a secure network operation is crucial [3-5]. Many researchers currently conduct extensive studies on network security risks. Han Xiaolu and He Chunrong, among others, use intuitionistic fuzzy sets and attention mechanisms to assess network security status [6-7]. However, network security risks still have the defects of excessive alarms and high false alarm rates due to large data volumes. Mining useful network security risk data from massive network big data is key to precise network security risk assessment. When there are attack behaviors in the network, a multitude of alarm information of various types will form, increasing data mining difficulty [8]. Efficient big data mining methods are extremely important for improving network security risk assessment accuracy. Therefore, this paper proposes a method for network security risk mining and estimation based on big data analysis, and tests and analyzes its performance.
2 Network security risk mining and estimation method based on big data analysis
2.1 Extraction of association rules in data mining
Collect security events from massive network data. Due to significant differences in the format of collected network security events, security events need to be normalized to facilitate the mining of association rules contained within. Use the association rules mined to analyze similar viruses [9], similar vulnerabilities, and other attack behaviors in network security risks, thereby enhancing the accuracy of network security risk assessments. Use data mining methods of big data analysis technology to extract association rules of network security events. Let W={w1, w2, …, wn} denote the set of security event elements, R={r1, r2, …, rn} denote the dataset, where each element ri in dataset R is a set established by W, i.e., riW exists. Definition 1: Use elements within Set R to establish Set C, and the formula for calculating the support degree of Set C within the dataset when the elements satisfy the Cri requirement in quantity l is as follows: (1) (1) Definition 2: Given sets C and D satisfying AW∩IDW, use the confidence of C→D. The C→D that satisfies the minimum confidence and minimum support in the mined data set is the association rule to be mined by the big data mining method. Association rules are obtained by mining frequent item sets within the transaction set, identifying association rules existing between different transactions. Network security events have the characteristic of being extremely large in scale [10]. A cloud computing platform, the Hadoop platform, was selected to achieve the mining of association rules from massive network security events. The process of mining association rules using big data analysis technology is divided into two parts: (1) Mining frequent item sets that meet the minimum support; (2) Using the frequent item sets obtained through data mining to mine association rules that meet the minimum confidence conditions. The Hadoop platform uses the Map function and the Reduce function to obtain project subsets and comprehensively assess the support of acquired subsets. By analyzing all subsets' support, the frequent item support degree of the mined network security events is obtained, and the frequent item sets contained in the network security event dataset are identified. The process of mining association rules on the Hadoop platform is as follows: input the minimum support β and the original network security event dataset R into the Hadoop platform for computation; output frequent items that meet the minimum support as computation output for the Hadoop platform. Map Task: (1) According to the input file path, use the frequent item set's minimum support to divide the original network security dataset into data subsets of size n, format each divided subset, obtaining a key-value pair, where value and key respectively represent data information and character offset; (2) Read the key-value pairs obtained from different subsets using the Map function, parse the data information value with the split function, and transmit the parsed result to the set; (3) Use the output key to represent all subsets, setting the subset value equal to 1; (4) Call all optional Combin functions. All Map ends generate key-value pairs with the same key in the network security data, merge all identical key-value pairs through the Combin function, improving the defect of low computational efficiency caused by sending acquired key-value pairs to Reduce ends through the network; Reduce Task: (1) Sort the key-value pairs sent by the Combin function, merge key-value pairs with the same key, obtain key-value pairs read by the Reduce function, and accumulate the L() values within the key-value pairs. The number of supports of the key set in the network security dataset R is the global support for candidate frequent item sets in the Reduce end; (2) Send candidate item sets that exceed the minimum support to an external storage table based on minimum support, use the external table obtained to query and mine frequent item sets, setting these frequent items as inputs and related files of the MapReduce program. The minimum confidence δ and association rules satisfying minimum confidence δ are used as inputs and outputs for mining network security event association rules, respectively. The computation process is as follows: (1) Select the Map function to start the setup method to connect to the database; (2) Divide the frequent item sets in the external table of stored data, after division obtain data subsets of quantity n, format all data to key-value pairs; (3) Parse elements within frequent item sets in value, after parsing obtain the corresponding value represented as (C, D, SValue), storing the acquired (C, D) in the set; (4) Solve the subset C within the frequent item sets, read subset C's support degree sup(C), use it to express C→D's confidence; (5) When the confidence exceeds the pre-set threshold, the frequent item sets contain all elements outside this subset with association rules with this subset, use the obtained difference set and subset to establish the key value, and that key value's confidence is the value. Through the above process mine network security event association rules and realize network security risk estimation based on mined association rules using the support vector machine method.
2.2 Network Security Risk Estimation Method
Use the mined association rules as network security event features and estimate network security risks using the association rules extracted. Use the sample input xi and sample output yi composed of (xi, yi) to represent the training sample set of network security events. This sample set satisfies xiRn, yiRn. Map the network security event samples within the sample set (xi, yi) to a high-dimensional feature space using a nonlinear mapping function φ(), yielding the optimal linear regression function expression for network security event assessment as follows: (2) in the equation, b and w represent bias and weight, respectively. Obtain the solution of the LSSVM regression model using the structural risk minimization principle, with the formulas as follows: (3) (4) where ei and C represent the regression function's error with actual results and punitive function, respectively. Introduce the constraint optimization problem in formula (4) to obtain the formula using the Lagrange multiplier as follows: (5) where ai represents the Lagrange multiplier. Using the Mercer condition to define the kernel function, the formula is as follows: (6) Select the radial basis kernel function as the kernel function for network security risk estimation, yielding the expression for the radial basis kernel function as follows: (7) Obtain the final support vector machine regression model as follows: (8) where σ is the width of the radial basis kernel function. The precision of the estimation by the support vector machine is determined by its parameters, selecting appropriate parameters can enhance the accuracy of network security risk estimation. Use the QPSO algorithm to optimize the parameters of the support vector machine. Set the QPSO algorithm with particle quantity m in the D-dimensional search space, representing the particle's original positions using xi(xi1, xi2, …, xid), PB(pb1, pb2, …, pbd) represents the current optimal position, and GB(bg1, bg2, …, bgd) represents the global optimal position. The particle evolution expression is as follows: (8) in the equation, mbest and β represent the best particle value within the particle swarm and the algorithm's convergence speed, respectively. At iteration t, the formula for calculating the algorithm's convergence speed is as follows: (9) The process of network security risk assessment is as follows: (1) Set the quantity of particles within the swarm according to the scale of network security risk assessment, the particle dimensions in the swarm representing the parameters C and σ for estimating network security risk support vector machine respectively; (2) Set the parameters of the particle swarm algorithm for optimizing support vector machine parameters and maximum iteration times; (3) Obtain the particle's fitness function; (4) Calculate the particle's optimal individual position and global optimal position, establish a network security information database; (5) Update each particle's position within the particle swarm; (6) Repeatedly iterate computation according to the above process, determine if the termination condition is met, if so proceed to step (7), if not return to step (3); (7) The optimal particle obtained through the above process is used as the support vector machine's parameter, completing the establishment of the network security risk estimation model, and obtaining network security risk estimation results using the established model.
3 Case Analysis
Select 60 minutes of communication data from a certain communication network's operation time as the test object, and collect a total of 5,846,544 sample data using the method proposed in this article to assess network security risk. The intuitionistic fuzzy set method (reference [6]) and the attention mechanism method (reference [7]) are selected as comparison methods. The method proposed in this article uses big data analysis technology to mine association rules existing among massive network communication data, and counts the number of association rules mined at different minimum confidence and minimum support. The statistics are shown in Figure 1. As shown in the experimental results of Figure 1, with a minimum confidence and minimum support of 0.7 and 0.3 respectively, a relatively large number of association rules can be mined. When mining massive network data using the method proposed in this paper, the β and б values are set to 0.7 and 0.3 respectively. The method proposed in this article possesses a high association rule mining performance, and still maintains high mining efficiency when applied to massive network communication data. After completing the mining of association rules, the QPSO algorithm's optimization performance is used to obtain the optimal parameters for the support vector machine, with the convergence situation of the QPSO algorithm at different iteration times shown in Figure 2. As shown in Figure 2's experimental results, the method proposed in this article uses the QPSO algorithm to find the optimal parameters for the support vector machine to estimate network security risk, needing only about 40 iterations to quickly obtain the optimal support vector machine parameters. The QPSO algorithm selected by this method has high optimization efficiency, and can quickly acquire optimal parameters for the support vector machine in a short time, enhancing network security risk estimation performance. The optimal parameters of the support vector machine obtained through the QPSO algorithm are C=130 and σ=135. The network security risk assessment model is established using the optimal parameters obtained from the QPSO algorithm and uses the established security risk assessment model to evaluate the number of security risk incidents in a 5-hour network operation, comparing the results with the other two methods, as shown in Figure 3. As shown in the experimental results of Figure 3, the method proposed in this article evaluates network security risk results with a high degree of similarity to the actual network security risk results, with a high degree of agreement in fluctuation trends. The comparison results indicate that the method proposed in this article can effectively predict network security risks, and the prediction results are highly reliable, serving as effective evidence for network administrators to manage network security. Through multiple tests, the performance of network security risk assessment is compared among the three methods, as shown in Figure 4. As observed in Figure 4's experimental results, evaluating network security risk using the method proposed in this article can effectively improve deficiencies such as the significant amount of historical data needed and high sensitivity to missing data, indicating high reliability when applied to network security risk assessment. Table 1 displays the evaluation of security risk conditions for a test network from 7:00 to 24:00 on January 3, 2020, using the method proposed in this paper. Based on Table 1's experimental network security event situation table, the method proposed in this paper is used to evaluate the attack types of risk events, with the results appearing in Table 2. Analysis of Table 2 demonstrates that the method proposed in this article can evaluate security risk events, effectively identify the specific attack behaviors of network security risk events, and validate the high validity of the method proposed in this article in assessing security risk events.
4 Conclusion
Network security risk estimation is a crucial part of the current network security defense system. With the increase in data volume within networks, higher requirements have been put forward for network security risk estimation. By fully considering the attack situation during network operation and applying big data analysis technology to network security risk estimation, the advantages of processing massive data are utilized to fully explore the association rules existing within network security events and estimate network security risks. The experimental verification shows the studied method can achieve effective network security risk estimation, ensuring effective protection of network security in a massive data operation environment.
If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.