Sgcc dataset. You signed in with another tab or window.
Sgcc dataset The suggested technique is based on electricity consumption data from the State Grid Corporation of China (SGCC). Electricity theft detection released by the State Grid Corporation of China (SGCC) dataset data set. In this project, I have presented a study that utilizes a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model for detecting electricity theft in the State Grid This is a realistic electricity consumption dataset released by State Grid Corporation of China (http://www. With CAN, the knowledge is transferred from the source domain containing rich annotation information to the target application scenario without annotations, which effectively reduces the demand for labelled data. The SGCC dataset includes raw data where incomplete information, erroneous values, and missing values are there. Such results outperform the other recently reported state-of-the-art methods for NTL detection that are applie Electricity Theft Detection With Automatic Labeling and Enhanced RUSBoost Classification Using Differential Evolution and Jaya Algorithm For the SGCC dataset, CAN does not use any SGCC label information, but only the unlabelled data in SGCC (abundant and easy to access). ARIFET AL. This imbalanced nature of the dataset adversely affects the performances of the supervised learning techniques because of the biasn towards the majority class. In this research work State Grid Corporation of China (SGCC) dataset is used for electricity theft detection. Likewise, in the SGCC dataset, we have identified outliers, which skew the data, making the training process complex and have a negative impact on the final ETD performance because of overfitting. SGCC data first column is consumer ID that is alphanumeric. Taking the SGCC dataset as an example, there are 1034 consumption records for every customer committing electricity theft. Compared with some of the current studies, the number of features extracted in this paper is small, and the detection SGCC dataset; electricity theft; metaheuristic algorithms 1. formed dataset and the power of the GBM algorithm in detecting NTL fraud cases. Download scientific diagram | Metadata Information of SGCC Dataset. Then I evaluated the performance of the CNN-LSTM model on the SGCC dataset, using various performance metrics such as precision, recall, and F1 score. Subsequently, once the data is balanced, we pass the data to the GRU for ETD. Simulations are performed on a Core-i7 machine with 8GB of RAM. ABSTRACT Electricity theft is a widespread issue with far-reaching consequences, negatively impacting both utility companies and consumers. ACQUIRING THE DATASET The performance of the proposed model is evaluated through SGCC dataset. The SGCC dataset is an authentic dataset made publicly available by the SGCC and includes the electricity consumption data of 42,372 customers over a duration of 1035 days. The dataset history starts from January in the year 2014 to October in the year 2016. 72, which The data preprocessing phase contains the following steps, which are explained below. The selected SGCC dataset is high dimensional and not linearly separable. It includes the electricity consumption data of 42,372 customers within 1,035 days (from Jan. 1, 2014 to Oct. SGCC dataset utilized in this study lacked statistical characteristics. sgcc. In this paper, the “three-sigma rule of thumb” [44] is practiced for detecting and recovering the outliers according to the following equation: O SGCC GASCAD-II dataset. Although SMOTE has been widely used in the literature to handle data imbalance issues in the SGCC dataset, this work prefers to use the under-sampling technique due to the following reasons. supply, including substations, transmission lines, and maintenance records. State Grid Corporation of China (SGCC) dataset is used, and it handles missing values by linear interpolation, removes an empty record from a dataset, and orders the dataset by date. The simulation results depict that the proposed solution efficiently performs ETD using both datasets with a 61. 1. [5] present a study using a dataset with real electricity theft data provided by State Grid Corporation of China (SGCC). 3%, F1-score increases from 0. The SGCC dataset is used in the model. 1 Dataset collection details. The average detection accuracy under the SGCC dataset containing five FDIs is 80. Data Preparation: Data preprocessing is necessary for the identification of Then I evaluated the performance of the CNN-LSTM model on the SGCC dataset, using various performance metrics such as precision, recall, and F1 score. used the CNN-LSTM approach on the SGCC dataset and achieved 87. This study, which has become a baseline for following recent works, introduces a neural network architecture based on a wide (dense) and a deep (convolu-tional) component trained together. The dataset is organized in tabular dataset used in this work is the State Grid Corporation of China (SGCC) (https:// github. The SGCC dataset will likely contain comprehensive records of electricity usage, carefully organized by residential, commercial, and industrial sectors and covering various periods. In particular, they use a special long short-term memory (LSTM), UNet model and an ensemble adaptive boosting (AdaBoost) approach for The SGCC dataset is imbalanced in the ratio of 10:1 with class 0 (genuine consumers) as the majority and class 1 (electricity thieves) as the minority. Therefore, in this study, The SVM model takes 220 s during the training phase, which is higher than all other schemes. 6%, an F1-score of 0. The proposed models are compared with benchmark models, such as. 46% are normal and remaining are thieves. Related studies have attempted to fill in the missing values using methods such as Lagrange interpolation, mean interpolation and piecewise cubic hermite interpolating polynomial (PCHIP) [19] , [20] , [21] . , 2022). This is a key resource in the field of power distribution and management, with The SGCC dataset, released by the State Grid Corporation of China, is a realistic electricity consumption dataset. There are several reasons of such erroneous data as data corruption, hardware failure, and missing values, especially in time series data. The columns of the dataset are ordered by date. Javaid et al. The SGCC dataset includes daily power usage statistics for 42,372 consumers, including 3615 electrical criminals (class 1) and 38,757 real consumers (class 0). The primary emphasis lies in hyperparameter optimization for fraud detection in smart grid applications. We are using 1500 benign consumers’ data of six months due to the limited resources of our machine. P. 851 to 0. from publication: Electricity Theft Detection in Smart Meters Using a Hybrid Bi-directional GRU Bi-directional LSTM Model | In Our proposed attack is validated with State Grid Corporation of China (SGCC) dataset. • SGCC dataset [70] Energy theft detection [70] Renewable energy effects and solar panel simulation AEMO [63] 6-ALTAMIMI applied on the smart grid corporation of China (SGCC) dataset. Table 2 shows the details of the dataset (Shehzad et al. from publication: Security Threats and Promising Solutions Arising from the Intersection of AI and IoT: A Study of IoMT and IoET Information of State Grid Corporation of China (SGCC) dataset. My findings demonstrate that deep learning models outperform traditional methods for electricity theft detection and effectively detects technical and non-technical losses. The dataset provided includes authentic power consumers as well as those engaged in electricity theft, with more information about the dataset available in Table 1. Javaid devised a theft detection model utilizing the SGCC dataset, which extracts features through an attention-driven feature extractor and classifies potential thieves using an Echo State Network [24]. Feature Engineering: We crafted and engineered new features from the dataset to enhance the predictive power of the model, provide better insights into useful features, and add robustness to In the second stage, the distributed random forest (DRF) generates the learned model. Data mining in SGCC Dataset Something went wrong, please refresh the page to try again. Such results outperform the other recently reported state-of-the-art methods for NTL detection that are applied to the same SGCC dataset. Thus, the model constructed from such a dataset may be Simulations are performed on a Core-i7 machine with 8GB of RAM. SGCC equipment inspection using Histogram Gradient Boosting for health assessment. It comprises EC data of 42,372 consumers, out of which 91. The data preprocessing of the SGCC dataset was achieved in these three steps; handling missing values, data reduction, outliers, and data normalisation. The dataset released by State Grid Corporation of China (SGCC) commonly used in electricity theft detection contains numerous vacant values. Introduction Electricity theft is a widespread problem that presents considerable challenges to the stability and economic sustainability of energy distribution networks across the globe. SGCC dataset has missing values and also has class imbalance problems. The dataset consisted of power consumption in kWh unit. Feature Selection For the day dimension, the dataset published by the SGCC is selected [31]. This study strictly analyzes the electricity consumption characteristics of customers on a weekly basis. So it is a reasonable assumption that the users are honest consumers. The SGCC dataset consists of a total of 42,372 records, with 3615 instances representing abnormal consumer data and 38,757 instances representing normal consumer data. The advantages and disadvantages of existing techniques. The dataset was trained to represent the consumer class. Electricity data are first preprocessed using normalization, the three-sigma rule, and interpolation of missing values techniques. PCA applies to reduce the number of features to reduce the classification model’s complexity. Furthermore Codes and datasets for the paper "Unsupervised Abnormal Power Consumption Detection Via Deep Siamese Autoregressive Network" - ChenBaiyang/DSAD Energies 2023, 16, 2852 4 of 18 2. , January 2014 to October 2016 (1,035 days) [26]. Our experiments show that the proposed LSTM-CNN surpasses current methods, with a precision of 93. These outliers must be data, this study uses a real-time electricity consumption dataset released by State Grid Corporation of China (SGCC) [32]. Extensive experiments based on realistic dataset show that wide and deep CNN model outperforms other existing methods. 31, 2016). Keywords: Electricity theft detection · Anomaly Transformer · Deep Learning · Non-technical losses · Smart meters 1 Introduction Line losses in power systems can be divided into technical formed dataset and the power of the GBM algorithm in detecting NTL fraud cases. 2. 2. The dataset contains values of 1 for the abnormal class and 0 for the normal user class. Imbalanced dataset for the classification problem of electricity theft detection This paper provides a comprehensive review of ETD methods, highlighting the limitations of current datasets and technical approaches to improve training datasets and the ETD in smart grids. For achiev ing an effi-cient performance by any theft-detection model, its input features must reflect sufficient . TABLE 3. by "Sensors"; Science and technology, general Forecasts and trends Business performance management Computational linguistics Consumer research Electric utilities Language processing Machine learning Marketing You signed in with another tab or window. 1, it was evident that the ratio of normal electricity consumers to electricity theft consumers in the SGCC dataset was close to 10:1. In our model the dataset is reduced using principal component analysis (PCA). FIGURE 1. This dataset contains the electricity consumption data of 42,372 The dataset used in this research is real customers’ electricity usage data publicly provided by the State Grid Corporation of China (SGCC). SVM draws n − 1 hyperplanes and then picks an optimal hyperplane of high margin for distinguishing two classes (n represents the number of dimensions). 7% to 96. However, malicious consumers tamper with their SMs to Corporation of China (SGCC) dataset. cn/). The essential features from the preprocessed dataset were extracted using CNN, and then the data were classified using the XGB model. RFE Based Feature Selection and KNNOR Based Data Balancing for Electricity Theft Detection Using BiLSTM-LogitBoost China (SGCC) is preprocessing; first, order dataset by date, second, remove the empty record from the dataset, third, missing values by linear interpolation, and finally, imbalanced data SGCC dataset: This is the first largest publicly available dataset that provides a detailed analysis of user/consumer consumption patterns. In summary, XGBoost is a valuable tool in the fight against fraud. Whole exome sequencing was conducted using Agilent SureSelect Human All Exon V6 kits. You signed in with another tab or window. Electricity theft dataset of SGCC is a labeled dataset recorded for the time period of approximately 3 years, i. A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. For achieving an efficient performance by any theft-detection model, its input features must reflect sufficient underlying abnormalities in customer consumption data. Our findings reveal model performance degradation under our proposed generative evasion attack ranging from 96. Ashraf Ullah et al. This dataset contains the electricity consumption data of 42,372 users for a total of 1035 days from 2014 to 2016. Mebarkia, and I In the second stage, the distributed random forest (DRF) generates the learned model. from publication: A novel feature engineered-CatBoost-based supervised machine learning framework for electricity Similarly, in the SGCC dataset, the benign electricity consumers are higher in number than the electricity thieves, as shown in Table 1. from publication: An Attention Guided Semi-Supervised Learning Mechanism to Detect Electricity Frauds in the Distribution The SGCC dataset will likely contain comprehensive records of electricity usage, carefully organized by residential, commercial, and industrial sectors and covering various periods. Fig. 951, Furthermore, the Optuna framework was utilized to optimize the hyperparameters of XGBoost and validated by applying the established TBM dataset of the KS Tunnel. Based on the analysis results, the DANN outperforms compared to other supervised learning classifiers such as ANN, AdaBoost, and DT in recall, F1-Score, and AUC. The pre-processing steps are performed initially to refine the data. suggested a hybrid DL approach for ETD in SGs. However, compared to other approaches, the proposed model is not very satisfactory. It contains consumers’ IDs, daily EC and labels either 0 or 1. csv contains 1037 columns and 42,372 rows for electric consumption from January first 2014 to 30 October 2016. This pervasive problem not only hampers the economic development of utility providers but also poses risks of electrical hazards while contributing to the overall high cost of energy for end-users. The dataset used is the Cora citation graph dataset, which comes built-in with torch-geometric. Zheng et at. The GASCAD-II dataset from the Singapore Gastric Cancer Consortium includes paired tumor-blood whole exome sequencing data for 209 gastric cancer (GC) patients, along with whole transcriptome sequencing data for 125 GC samples. Moreover, the Feature Extraction and Scalable Corporation of China (SGCC) dataset is used for electricity theft detection. The experimental findingsindicate that the suggested technique outperforms the state‐of‐the‐art Finally, a comprehensive study was conducted in [28] on the SGCC dataset, where various ML supervised algorithms such as Decision trees, ANN, deep ANN, and Adaboost were compared. Kocaman and Tümen [ 32 ] introduced an LSTM classifier for This hybrid model, consisting of both LSTM and CNN components, adeptly processes time-series electricity usage data. A public dataset from the State Grid Corporation of China (SGCC) was used for this study. This data is crucial for understanding consumption patterns comprehensively and plays a vital role in forecasting demand. A. The number of theft users is significantly lower than the number of honest consumers, which is addressed by using the Synthetic Minority Oversampling Technique (SMOTE). Our findings reveal model performance deg-radation under our proposed generative evasion attack ranging from 96. The proposed models are compared with benchmark models, such as SAGAN, Wide and Deep Convolutional Neural Network (WDCNN), CNN and Long Short Term Memory (LSTM). Moreover, The dataset used in this study is collected from PRECON 1, an energy informatics group in Pakistan. Most researchers use the SGCC dataset for ET detection due to its extensive coverage. pre-processing steps of the SGCC dataset. The paper entitled “Analog Circuits Fault Diagnosis Using ISM Technique and a GA-SVM Classifier Approach,” authors S. If the data contain missing values, Corporation of China (SGCC) dataset, and the reported results show that increase of adversarial accuracy by up to 97% and decrease of the attack success rate (ASR) by up to 3%. The suggested approach is used for the smart meter data of consumers’ daily electricity consumption, which is sourced from the State Grid Corporation of China (SGCC) . Moreover, The simulator is Google CoLab. The presented scheme achieved an ACY of 92% for ETD when tested on the State Grid Corporation of China (SGCC) dataset. This causes problems in ML model's data generalization. The SGCC dataset utilized in this study lacked statistical characteristics. 80 GHz 1. 9% accuracy. 35% to 89. This is a key resource in the field of power distribution and management, with The dataset we used comes from the real-world and open-sourced labeled data collected by the State Grid Corporation of China (SGCC). If the problem persists, check the GitHub status page or contact support . Initially, this study takes the dataset of electricity theft from SGCC . : Using GANCNN and ERNET for Detection of NTLs to Secure Smart Grids TABLE 2. China (SGCC) dataset. Download scientific diagram | Statistics of obtained SGCC data. Considering the SGCC dataset, a special method of interpolation is used, which is adopted from paper [23], because there are more missing values in SGCC dataset as compared to UMass∗ . The SGCC dataset does, however, show an imbalance in the distribution of EC, which must be acknowledged [6]. As a result, wide and deep CNN model can achieve the excellent performance in electricity-theft detection. Remove all empty records, where there are five empty All simulations are performed using State Grid Corporation of China (SGCC) dataset. These values occur for many reasons, such as improper operation of smart meters, human typos, data storage problems, and distribution line faults. The method is validated using the SGCC dataset and provides a detection rate of 77. Third, residual network extracts the latent features from the SGCC dataset. 41%. The data is collected at 30-min intervals data, this study uses a real-time electricity consumption dataset released by State Grid Corporation of China (SGCC) [32]. Free Online Library: Hyperparameter Optimization with Genetic Algorithms and XGBoost: A Step Forward in Smart Grid Fraud Detection. 3% on the SGCC dataset. The SGCC dataset is divided into training and testing data. In order to verify the effectiveness of the proposed method, this paper uses the State Grid Corporation of China (SGCC) dataset [11] to conduct simulation experiments and compares it with other Finally, we used bidirectional GRU for classification of NTL detection by analyzing the electricity consumption patterns of consumers. From the analysis in Section 3. 1: TCN Loss on training and Validation TABLE I: SGCC Dataset Description Description Values Time window of SGCC 2014/01/01 - 2016/10/31 Total Users 42,372 Honest Users 38,757 Fraudulent Users 3,615 B. In smart grids, homes are equipped with smart meters (SMs) to monitor electricity consumption and report fine-grained readings to electric utility companies for billing and energy management. SGCC dataset, it is evident that these features carry some in-trinsic relation between them. Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals. Description Value; Administering years of the dataset: 2014–2016: Total number of benign consumers: 38,756: Total number of fraudulent consumers: 3616 The SGCC dataset used in this work has 1034 columns considered 1034 features. The SGCC dataset has a relatively high imbalance distribution of the target class values and using traditional k-fold CV may lead to inconsistent test results [36]. / IJEEE, 10(2), 35-43, 2023 DATASET SGCC Data pre-processing and Normalization Missing values Interpolation Feature extraction using Cheetah Optimization Technique Moreover, SGCC dataset covers data from Jan 2014 to Oct 2016 while ISET consists of data of only 2 years, i. This study provides a comprehensive analysis of the combination of Genetic Algorithms (GA) and XGBoost, a well-known machine-learning model. Training and testing samples are confined to their specific operations only. In order to balance the data, the synthetic theft attacks are applied on the smart grid corporation of China (SGCC) dataset. 00 GHz, RAM 4 GB. from publication: Electricity theft detection in smart grid systems: A CNN-LSTM based approach | Among an For example, on the SGCC dataset, when training the Transformer with additional synthesized anomalies, the precision increases slightly from 92. Footnote 1 The details of the dataset are shown in Table 1 . SGCC dataset contains 9% fraudulent consumers, which are extremely less than non-fraudulent consumers, due to the imbalance nature of data. the SGCC dataset to achieve the classification of the consumers as fraudulent or non- fraudulent smart meter readings, and the results are compared with the results of the state-of-the-art methods. Table 3 shows the detail of SGCC dataset. All simulations are performed using State Grid Corporation of China (SGCC) dataset. 5of21 FIGURE 2 Flowchartoftheproposedmodel TABLE 2 Datasetinformation Dataset SGCC Totalobservations 42,372 Fraudulent 3615 Normal 38,757 Year 2014/01/01–2016/10/31 It is a dataset of 1035 days and 42,372 consumers. The 1034 features are reduced to 960, 480, 240, 120, 60, and 30. SGCC dataset has missing values Consumption data from real SGCC dataset is used for analysis purpose. SGCC Dataset: The proposed method comes from the CNCP structure, which can capture features of electricity-consumption time series at different scales. The SGCC dataset comprises of daily electricity consumption of 42,372 consumers with 38,757 genuine consumers (class 0) and 3615 electricity thieves (class 1) recorded over a period of 2 years (1st January 2014 to 31st Download scientific diagram | SGCC dataset based ROC-AUC comparison. Dataset consists of customer identification number, flag, and features. However, a major drawback of LSTM is hard training because of memory-bandwidth-bound processing. Download scientific diagram | Overview of the SGCC dataset [2]. The detail of SGCC dataset. 08% with a detection speed of 2105 obs/sec, while the average detection accuracy under the SDSS dataset is 85. For example, on the SGCC dataset, when training the Transformer with additional synthesized anomalies, the precision increases slightly from 92. Elec-tricity theft, including illegal connections, meter tampering, and bypassing, has significant Meanwhile, the wide component can capture the global features of 1-D electricity consumption data. Following this, five distinct classification models are used to train and evaluate a fraud detection model using the SGCC dataset. Curse of Dimensionality The Curse of Dimensionality in Machine Learning arises when working with high- dimensional data, leading to increased computational complexity, overfitting, and spurious correlations. The data is collected at 30-min intervals throughout the specified time as indicated in Table 2. The evaluation of the proposed strategy and comparing it to current methods in the literature that also utilise the SGCC dataset. Electricity Fraud Problem Analysis To demonstrate the electricity fraud problem, Zheng et al. Subsequently, once the data is balanced, we pass the data to the GRU In State Grid Corporation of China (SGCC) dataset, there are numerous outliers due to which data is skewed; hence training process becomes complex. With the defence mechanism, we success - The proposed attack is validated using State Grid Corporation of China (SGCC) dataset, and the reported results show that increase of adversarial accuracy by up to 97% and decrease of the attack success rate (ASR) by up to 3%. This integration showed promising results and opens avenues for further research. The meta information of the SGCC dataset is shown in Table 2. The features are extracted after identifying abrupt changes in electricity consumption patterns using the sum of finite differences, the Auto-Regressive Integrated Moving Average model, and the Holt-Winters model. Therefore, the SGCC dataset from January 6, 2014 (Monday) to October 30, 2016 (Sunday) will be used, and the 1029 days of electricity consumption data for each customer will be divided into 147 weeks. Dimf et al. Reload to refresh your session. The training and testing samples are segregated into subgroups by opting stratified sampling in order to avoid misclassification due to extensive diversity in the data. Then, the slope, average, and moment for each month are determined and entered into the CNN model. com. from publication: A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection Out of these, 3,615 are electricity thieves and remaining 38,373 are normal consumers [23]. The dataset is from a reported duration of 1035 days, which totals around three years. A deep learning (DL)-based approach is proposed in [3] to circumvent the limitations of manual feature engineering. It reported Recall, F1 score, and AUC as performance metrics, recommending experimentation with other supervised learning algorithms to The SGCC dataset has 3615 abnormal and 38,757 normal consumers’ data instances out of total of 42,372 records. The introduced model ability to detect theft and fraud from electricity smart meters readings is verified over the SGCC dataset . If there are less than seven consecutive A public dataset from the State Grid Corporation of China (SGCC) was used for this study. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). We test the CNN-Adaboost energy theft detection model and other models’ performance under 5% and 10% evasion attacks. The process of visualizing data is relatively complex and carries the potential risk of Corporation of China (SGCC) dataset is used for electricity theft detection. 9%, while the recall increases from 78. Download scientific diagram | SGCC dataset and their description from publication: MicroTrust: Empowering Microgrids with Smart Peer-to-Peer Energy Sharing through Trust Management in IoT | The The performance of the proposed deeplearning framework assessed using the electricity consumption dataset provided by the State Grid Corporationof China (SGCC) demonstrates the reliability and efficiency of the LSTM-FCN model 4. This dataset contains the electricity consumption records of 42,372 customers from January 1, 2014, to October 31, 2016, and its distribution is shown in Table 1. Google Colab is used for code simulation. 23%. 20. Another approach focuses on detecting abnormal behaviours in a private manner, such as the Dataset used “SGCC” and PDCL electricity consumption dataset publicly available and for experimentation and validation of results respectively. As the GRU model stores and memorizes a huge In the second stage, the distributed random forest (DRF) generates the learned model. The SGCC dataset was used for both model training and testing, and it was split 80:20. However, these data are an irregularly mixed distribution of normal and electricity theft modes and are limited in number. SGCC dataset provides comprehensive data on the physical infrastructure of energy . VOLUME 9, 2021 N. 82% with a detection speed of 2291 obs/sec. Electricity Consumption Dataset of State Grid Corporation of China (SGCC) 来自 SGCC ,在论文 Wide and Deep Convolutional Neural Networks for Electricity-Theft Detection to Secure Smart Grids, TII 2017 中使用。 The SGCC dataset contains missing values and non-numeric values, indicated by ’NAN’. , 2009 and 2010. features. Machine specifications are Intel(R) core (TM) M-5y10c, CPU@ 0. This dataset undergoes a two‐stage categorisation procedure. 947, and an accuracy of 95. Table 1 shows that the ratio of normal customers to those committing electricity theft is 10. The study of focused on the SGCC dataset and applied a deep artificial neural network for electricity theft detection. e. These data . Therefore, applying the dimen-sionality reduction technique to identify a set of informative. from publication: Detection of electricity theft using data processing and LSTM method in Out of these, 3,615 are electricity thieves and remaining 38,373 are normal consumers [23]. The performance is evaluated on each set of features. com/henryRDlab/ElectricityTheftDetection, accessed on 28 December 2023) which The only publicly available known labelled dataset for this purpose is the State Grid Corporation of China (SGCC) dataset . Bourouba, K. . Our proposed attack is validated with State Grid Corporation of China (SGCC) dataset. The proposed model is applied to the public SGCC dataset, and the approach results have reported 98% accuracy and F1-score. Data Pre-processing int eh data pre-processing we perform the normalization and handle the missing values. This hybrid model, consisting of both LSTM and CNN components, adeptly processes time-series electricity usage data. Then from column 2 to columns 1036 daily electricity consumption is given. 9%, a recall of 95. The experimental findingsindicate that the suggested technique outperforms the state‐of‐the‐art [Show full abstract] data, the synthetic theft attacks are applied on the smart grid corporation of China (SGCC) dataset. Based on the analysis results, the DANN outperforms compared to other supervised In the second stage, the distributed random forest (DRF) generates the learned model. 6% to 93. (1 January 2014 to 31 October 2016). The original dataset consists of actual values along with the erroneous and missing values. More specifically, Below is a comprehensive Python example implementing Self-Supervised Graph Convolutional Clustering (SGCC) using popular libraries like PyTorch, PyTorch Geometric, and Scikit-learn. There are large number of data points in the dataset and it is difficult to use all of them for analysis due to the higher computational complexity problem. 851 Download scientific diagram | Consumer data SGCCD electricity theft detection dataset in 2016. The ratio of the normal and abnormal consumers in the dataset is 1:9. Step-2 addresses the issue of high dimensionality of the SGCC dataset. Techniques like dimensionality reduction, feature selection, and careful model design are essential for mitigating its effects and improving algorithm SGCC dataset contains 9% fraudulent consumers, which are extremely less than non-fraudulent consumers, due to the imbalance nature of data. In order to evaluate the model performance, we use five performance evaluation metrics using real electricity consumption dataset of SGCC. Moreover, the Feature Extraction and Scalable Hypothesis algorithm was employed for the purpose of collecting and selecting the most optimal and pertinent temporal, statistical, and Adil et al. Explanation Values; Total consumers: 42,372: Data collection period: 01-01-2014 to 31-10-2016: Honest consumers: 38,575: Theft consumers: 3,615: Most researchers use the state grid of China dataset for electricity theft detection due to its extensive coverage. The SGCC dataset comprises electrical energy smart meter readings for over 32,000 users during a 1035‐day time frame. We test the CNN-Adaboost energy theft detection model and other models’ performance under 5% and 10% evasion When the model was tested, 37 G. The following model gets the preprocessed data and pre-processing steps of the SGCC dataset. For use in both model training The SGCC dataset is an authentic dataset made publicly available by the SGCC and includes the electricity consumption data of 42,372 customers over a duration of 1035 days. Furthermore The SGCC dataset consists of a total of 42,372 records, with 3615 instances representing abnormal consumer data and 38,757 instances representing normal consumer data. The SGCC dataset comprises of daily electricity consumption of 42,372 consumers with 38,757 genuine consumers (class 0) and 3615 electricity thieves (class 1) recorded over a period of 2 years (1st January 2014 to 31st State Grid Corporation of China (SGCC) Dataset: Researchers explored integrating Genetic Algorithms with XGBoost for fraud detection in the SGCC dataset. SAGAN, W ide and Deep Convolutional Neural Network (WDCNN), CNN and Long Short T erm Memory Among these, the State Grid Corporation of China (SGCC) dataset stands as the most preva-lent for theft detection studies. This is a key resource in the field of power distribution and management, with a large and varied set of data about electricity transport and grid operations. The consumers who participated in this study have smart electricity meters installed in their homes. In this paper, real time electricity consumption data of consumers is used, which is taken from an easily available online source, named as State Grid Corporation of China (SGCC). Download scientific diagram | Histogram of missing values present in SGCC dataset. You switched accounts on another tab or window. There are specifically 42,372 consumer data records in total, 38,752 of which belong to regular users and 3615 of which are connected to customers who have engaged in stealing activity. You signed out in another tab or window. The issues mentioned earlier are present in the SGCC dataset, and the final output of the proposed ETD model will be plagued with faulty insights if the dataset is not well-preprocessed. 1. This order is easy to deal with the dataset and determine the periods of the theft. have used the dataset provided by the State Grid Corporation of China (SGCC) [60]. Kouachi, N. 3. 2 Dataset Preparation. The large number of features cause time complexity issue and also reduce the model's performance having irrelavant features. Download scientific diagram | Metadata information of the electricity theft dataset. Remove all empty records, where there are five empty SGCC dataset is used for the CNN-LSTM model in Reference. The empirical findings demonstrate a noteworthy enhanc We validate the proposed model using the SGCC dataset, and our exper-imental results demonstrate high accuracy, precision, F1-score, and AUC values. gpuxjhst vjvze buenv mkyax frnqsu xhlsa uiuw qly ddoczi apsh