| Home | E-Submission | Sitemap | Contact Us |  

The Korean text of this paper can be translated into multiple languages on the website of http://jksee.or.kr through Google Translator.

J Korean Soc Environ Eng > Volume 43(3); 2021 > Article
J Korean Soc Environ Eng 2021;43(3): 160-170. doi: https://doi.org/10.4491/KSEE.2021.43.3.160
앙상블 경험적 모드 분해를 이용한 수질자료의 이상치 탐색
박상수1 , 박노석2 , 김성수3 , 조귀래4 , 윤석민5
1경상북도 보건환경연구원
2경상대학교 토목공학과 및 공학연구원
4경상대학교 정보통계학과
5경상대학교 산학협력단 및 공학연구원
Outlier Detection of Water Quality Data Using Ensemble Empirical Mode Decomposition
Sangsu Park1 , No-Suk Park2 , Seong-su Kim3 , Gwirae Jo4 , Sukmin Yoon5
1Gyeongsangbuk-do Institute of Health and Environment 
2Department of Civil Engineering and Engineering Research Institute, Gyeongsang National University
3Korea Water Resources Corporation
4Department of Information and Statistics, Gyeongsang National University
5Gyeongsang National University Office of Academy and Industry Collaboration and Engineering Research Institute
Corresponding author  Sukmin Yoon ,Tel: 055-755-8707, Fax: 055-772-1797, Email: gnuysm@gmail.com
  Published online: April 7, 2021.
This study was conducted to propose a new methodology for efficiently identifying and removing various outliers that occur in data collected through automated water quality monitoring systems. In the present study, water temperature data were collected from domestic G_water supply system, and the performance of the proposed methodology was tested for water temperature data collected from domestic G_water supply system.
We applied the following analytical procedure to identify outliers in the water quality data: First, a normality test was performed on the collected data. If normality condition was satisfied, the Z-score was used. However, if the normality condition was not satisfied, outliers were identified using the quartile, and the limitations of the existing methodology were analyzed. Second, we decomposed the intrinsic mode function using empirical mode decomposition and ensemble empirical mode decomposition for the collected data, and then considered the occurrence of modal mixing. Finally, a group of intrinsic mode functions was selected using statistical characteristics to identify outliers. In addition, the performance of the method was verified after removing and interpolating outliers using regression analysis and Cook’s distance.
Results and Discussion
In the case of water temperature data, as normality condition was not satisfied, outlier identification was carried out by applying the modified quartile method. It was confirmed that outliers distributed within the seasonal component could not be identified at all. In the case of empirical mode decomposition, modal mixing occurred because of the effect of outliers. However, in the case of the ensemble empirical mode decomposition, modal mixing was resolved and the distinct seasonal components were decomposed as intrinsic mode functions. The intrinsic mode functions were synthesized, which showed statistical correlation with the raw water temperature data. As a result of developing a regression model using the synthesized intrinsic mode functions and raw water temperature data and performing outlier search based on Cook’s distances, we concluded that various outliers distributed within the seasonal component could be effectively identified.
Considering that satisfactory results could be derived from statistical analysis of the data collected from the automated water quality monitoring system, it can be concluded that outlier identification procedures are essential. However, in the case of the conventional univariate outlier search method, it is apparent that the outlier search performance is significantly poor for data with strong inherent variability, and the interpolation method for the searched outlier cannot be performed. Conversely, the outlier identification method based on ensemble empirical mode decomposition and regression analysis proposed in this study shows excellent discrimination performance for outliers distributed in data with strong inherent variability. Moreover, this method has the advantage of reducing the analyst’s dependence on subjective judgment by presenting statistical cutoff criteria. An additional advantage of the method is that data can be interpolated after removing outliers using intrinsic mode functions. Therefore, the outlier search and interpolation method proposed in this study is expected to have greater applicability as a more effective analysis tool compared to the existing univariate outlier search method.
Key Words: Automated Water Quality Monitoring System, Water Quality Data, Outlier Detection, Ensemble Empirical Mode Decomposition, Cook’s Distance
Editorial Office
464 Cheongpa-ro, #726, Jung-gu, Seoul 04510, Republic of Korea
TEL : +82-2-383-9653   FAX : +82-2-383-9654   E-mail : ksee@kosenv.or.kr
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Copyright © Korean Society of Environmental Engineers. All rights reserved.                 Developed in M2PI