Clustering of Time Series Data With Application
Main Article Content
Abstract
Cluster analysis in time series data is one of the important topics in analyzing data and finding similar trends in time series, which represents a major challenge in various fields. Scientists have increased interest in studying time series data clustering, as it has proven its effectiveness in providing important information in various fields. Clustering time-series data to facilitate prediction of formed clusters and exploit time and effort. Clustering time-series data has been used in various scientific fields to discover patterns that enable data analysts to extract valuable information from a large and complex data set. Homogeneous clusters are grouped together on the basis of a specific similarity measure. The monthly data of electrical energy productivity in Kirkuk was used to study its temporal behavior. The hierarchical clustering method was used, and the method adopted in the linkage method is the hierarchical linking method, the ward's method, based on the similarity matrix, and we adopted the city-block distance scale manhaten) distance to find the similarity matrix between clusters and in order to reach homogeneous groups (clusters) that have common characteristics based on their productivity, hierarchical clustering, tree diagramming and prediction of future values of cluster productivity are used.
The most important findings of the research are the formation of four clusters and the construction of a time series model for each cluster, and through the analysis of the series, it was found that it is unstable and not random, and for the purpose of achieving stability and randomness, the necessary conversions were made, and the differentiation criteria (Akaik, Information Criteria: AIC) were used. (Schwartz Bayesian Criteria: SBIC), (Hanna-Quinn Criterion: H-Q), (Root Mean Sguare Error RMSE). To diagnose the significant models to choose the appropriate and efficient model, the prediction for the first cluster is in the ARIMA(0,1,1) model, the model that was used for the second cluster is SARIMA(2,0,0)x(1,1,2)12, the third cluster due to stopping the units About production since 2014 and until now the units have been idle, meaning that they do not produce electricity as a basis for leaving the rehabilitation works, the future values predicted here are zero, the fourth cluster was used ARIMA (2,1,0) model and the predictions were good and close to reality for the period from November 2020 to November 2022 for a period of two years.
Downloads
Article Details
References
اولاً. المصادر العربية:
أحمد، طالب، (2015)، تصنيف المحافظات السورية حسب الانفاق الاستهلاكي للأسرة باستخدام التحليل العنقودي، مجلة جامعة تشرين للبحوث والدراسات العلمية، مجلد (37)، العدد (2).
إسماعيل، أبوذر اسماعيل مفرح، (2014)، المقارنة بين نموذج السلاسل الزمنية والانحدار البسيط في التنبؤ بحجم المبيعات، رسالة ماجستير، جامعة السودان للعلوم والتكنولوجيا، كلية الدراسات العليا، قسم الاقتصاد التطبيقي.
الجوهري، هناء طه عطاالله، (2016)، أسلوب احصائي مقترح لمعالجة بيانات السلاسل الزمنية العنقودية متعددة المتغيرات (دراسة تطبيقية)، أطروحة دكتوراه، جامعة المنصورة، كلية التجارة، قسم الإحصاء التطبيقي والتأمين، 2016.
علوان، اقبال محمود وحمزة، لمى كريم، (2020)، استعمال السلاسل الزمنية في التنبؤ بكمية النفايات الصلبة لمحافظة بغداد للمدة 2008-2018، مجلة كلية الرافدين الجامعة للعلوم، وقائع المؤتمر العلمي الخامس عشر والدولي الثاني للتطبيقات الإحصائية-الجمعية العراقية للعلوم الإحصائية، العدد 46، ص270-284.
الكلابي، صفاء مجيد مطشر، (2018)، استعمال بعض طرائق التنبؤ المختلفة لتحليل اعداد المصابين بالأورام الخبيثة رسالة ماجستير، جامعة كربلاء، كلية الإدارة والاقتصاد، قسم الاحصاء.
نامق، فيصل ناجي، أسلوب التحليل العنقودي لتصنيف الانفاق على السلع والخدمات الأساسية وفقا للمستوى البيئي (حضر وريف) للسنوات 1971-2007، الكلية التقنية الإدارية-بغداد، مجلة كلية بغداد للعلوم الاقتصادية الجامعة، العدد 25، 2010، ص 331-352.
ثانياً. المصادر الأجنبية:
Aghabozorgi, S., Seyed Shirkhorshidi, A., & Ying Wah, T., (2015), Time-series clustering, A decade review. In Information Systems (Vol. 53, pp. 16–38). https://doi.org/10.1016/j.is.2015.04.007
Andrés M. Alonso, Francisco J. Nogales, and C. R., (2020), Hierarchical Clustering for Smart Meter Electricity Loads Based on Quantile Autocovariances Andrés.
Bagnall, A., & Janacek, G., (2005), Clustering Time Series with Clipped Data. Machine Learning, 58 (2–3), 151–178. https://doi.org/10.1007/s10994-005-5825-6
Chakravarti, I. M., Box, G. E. P., & Jenkins, G. M., (1973), Time Series Analysis Forecasting and Control. In Journal of the American Statistical Association (Vol. 68, Issue 342, p. 493). https://doi.org/10.2307/2284112
Ergüner Özkoç, E., (2021), Clustering of Time-Series Data. In Data Mining - Methods, Applications and Systems. IntechOpen. https://doi.org/10.5772/intechopen.84490
Karakaş, E., (2019), Forecasting Automotive Export Revenue of Turkey using ARIMA Model. Journal of Yasar University, 318–328.
Kavitha, V., & Punithavalli, M., (2010), Clustering Time Series Data Stream-A Literature Survey. http://arxiv.org/abs/1005.4270
Li, J., (2017), Clustering and forecasting for rain attenuation time series data. In Journal of Atmospheric and Solar-Terrestrial Physics. https://doi.org/174771260
Liao, T. W., (2005), Clustering of time series data - A survey. In Pattern Recognition (Vol. 38, Issue 11, pp. (1857–1874). https://doi.org/10.1016/j.patcog.2005.01.025
Mumbare, S., Gosavi, S., Almale, B., Patil, A., Dhakane, S., & Kadu, A., (2014), Trends in average living children at the time of terminal contraception: A time series analysis over 27 years using ARIMA (p, d, q) nonseasonal model. Indian Journal of Community Medicine, 39(4), 223. https://doi.org/10.4103/0970-0218.143024
Rodrigues, P. P., Gama, J., & Pedroso, J. P., (2008), Hierarchical Clustering of Time-Series Data Streams. IEEE Transactions on Knowledge and Data Engineering, 20(5), 615–627. https://doi.org/10.1109/TKDE.2007.190727
Roelofsen, P., (2015), Time-series clustering. In Master thesis Business Analytics (pp. 241–264). https://doi.org/10.1201/b19706
Zhang, P. G., (2003), Time series forecasting using a hybrid ARIMA and neural network model. In Neurocomputing (Vol. 50, pp. 159-175).