Distance measures-based information technology for identifying similar data series

Батурінець, Анастасія Геннадіївна; Baturinets, Anastasiia

doi:https://doi.org/10.33108/visnyk_tntu2022.01.128

Будь ласка, використовуйте цей ідентифікатор, щоб цитувати або посилатися на цей матеріал: http://elartu.tntu.edu.ua/handle/lib/39491

Повний запис метаданих

Поле DC	Значення	Мова
dc.contributor.author	Батурінець, Анастасія Геннадіївна
dc.contributor.author	Baturinets, Anastasiia
dc.date.accessioned	2022-12-23T08:09:23Z	-
dc.date.available	2022-12-23T08:09:23Z	-
dc.date.created	2022-04-19
dc.date.issued	2022-04-19
dc.date.submitted	2022-01-18
dc.identifier.citation	Baturinets A. Distance measures-based information technology for identifying similar data series / Anastasiia Baturinets // Scientific Journal of TNTU. — Tern. : TNTU, 2022. — Vol 105. — No 1. — P. 128–140.
dc.identifier.issn	2522-4433
dc.identifier.uri	http://elartu.tntu.edu.ua/handle/lib/39491	-
dc.description.abstract	Метою є розроблення та реалізація технології визначення схожих рядів даних, а також її апробація на рядах даних, представлених гідрологічними показниками. Предметом дослідження є методи та підходи визначення схожих рядів даних. Об'єктом дослідження є процес визначення схожих рядів даних, представлених певними показниками. Завдання: запропонувати й реалізувати міри відстані, одна з яких враховує схожість між значеннями рядів даних та їх зв’язок, а друга – заснована на зваженій евклідовій відстані, але з урахуванням необхідності актуалізації даних, які є важливішими за певних умов задачі. Реалізувати технологію визначення схожих рядів даних, представлених певними показниками. Для стійкішого розв’язку реалізувати процедуру визначення набору схожих рядів на підставі отриманих результатів за кожною окремою відстанню. Проаналізувати отримані результати та зробити висновки щодо можливості практичного використання технології. Використовуваними методами є методи статистичного аналізу, методи обчислення відстаней та схожості між рядами. Отримані результати реалізовано технологію визначення схожих рядів даних. Як складову технології реалізовано дві запропоновані й описані міри відстаней. Реалізовано процедуру визначення набору схожих рядів за отриманими значеннями відстаней. Наукова новизна: описано та застосовано евклідову зважену відстань з урахуванням актуальності даних. Описано та застосовано нову міру відстані, яка дозволяє врахувати як ступінь подібності між значеннями рядів, так і їх кореляційний зв’язок. Розроблено технологію визначення схожих рядів за множиною обраних відстаней. Практична значущість розробленої та реалізованої технології полягає в таких можливостях: застосування на рядах даних різних прикладних областей; проведення оцінювання та визначення схожих рядів, зокрема як проміжний етап аналізу. Крім того, запропоновані міри відстані дозволяють підвищити якість визначення схожих рядів або їх групування. Подальші дослідження планується спрямувати на дослідження можливостей подовження рядів даних та поповнення пропусків за значеннями показників інших рядів, визначених як схожі.
dc.description.abstract	The aim of the work is to develop and implement a technology for identifying similar series, and to test on series of data represented by hydrological samples. The subject of the study is the methods and approaches for identifying similar series. The object of the study is the process of identifying similar series, which are represented by certain indicators. The task is to propose and implement distance measures, where one of them takes into consideration the similarity between the values of the series and their relationship, and another is based on a weighted Euclidean distance taking into account the need to actualize the values that are the most important under certain conditions of the task; to implement a technology to find similar series represented by certain indicators values; to obtain a more resilient solution, to implement a procedure for determining a set of similar series based on the results obtained for each individual distance; the results should be analyzed and the conclusions have to be drawn dealing with practical application of the technology. The following methods were used: statistical analysis methods, methods for calculating distances, and similarity between data series. The following results were obtained: the technology for similar data series detection has been implemented; two distance measures were proposed and described as a part of the technology implemented; a procedure for determining a set of similar rows was implemented that was based on the obtained distances calculation. The scientific novelty of the research under discussion involves: Euclidean weighted distance was described and applied taking into account the actuality of data series values; a new measure of distance has been described and applied that allows both the degree of similarity between the values of the series and their correlation to be taken into account, as well as a technique has been developed for determining similar series from a set of selected distance measures. The practical importance of the developed and implemented technology consists in the following possibilities application to data series of different applied fields: conducting an assessment and identifying some similar series, in particular as an intermediate step in the analysis; in addition, the proposed distance measures improve the quality of identifying similar data series. In our further research, we plan to investigate the possibilities of lengthening the data series and filling in the gaps with values from other series defined as similar ones.
dc.format.extent	128-140
dc.language.iso	en
dc.publisher	ТНТУ
dc.publisher	TNTU
dc.relation.ispartof	Вісник Тернопільського національного технічного університету, 1 (105), 2022
dc.relation.ispartof	Scientific Journal of the Ternopil National Technical University, 1 (105), 2022
dc.relation.uri	https://doi.org/10.1016/j.patcog.2005.01.025
dc.relation.uri	https://doi.org/10.1016/j.neucom.2017.06.053
dc.relation.uri	https://doi.org/10.1007/978-81-322-1665-0_17
dc.relation.uri	https://doi.org/10.1137/1.9781611972719.1
dc.relation.uri	https://doi.org/10.1007/s10618-018-0565-y
dc.relation.uri	https://doi.org/10.1109/VTC2020-Fall49728.2020.9348487
dc.relation.uri	https://doi.org/10.5815/ijisa.2018.07.07
dc.relation.uri	https://doi.org/10.1145/359581.359603
dc.relation.uri	https://doi.org/10.1145/322033.322044
dc.relation.uri	https://doi.org/10.1609/aaai.v24i1.7493
dc.relation.uri	https://doi.org/10.1109/ICPP.2008.79
dc.relation.uri	https://doi.org/10.1007/s10618-012-0250-5
dc.relation.uri	https://doi.org/10.23939/sisn2021.09.096
dc.relation.uri	https://doi.org/10.1007/s10618-015-0418-x
dc.subject	міри відстані
dc.subject	схожість числових рядів
dc.subject	LCS
dc.subject	DTW
dc.subject	TSD
dc.subject	подібність рядів даних
dc.subject	гідрологія
dc.subject	distance measures
dc.subject	similarity of numerical series
dc.subject	LCS
dc.subject	DTW
dc.subject	TSD
dc.subject	similarity of data series
dc.subject	hydrology
dc.title	Distance measures-based information technology for identifying similar data series
dc.title.alternative	Інформаційна технологія визначення схожих рядів даних із використанням мір відстаней
dc.type	Article
dc.rights.holder	© Тернопільський національний технічний університет імені Івана Пулюя, 2022
dc.coverage.placename	Тернопіль
dc.coverage.placename	Ternopil
dc.format.pages	13
dc.subject.udc	004.67
dc.subject.udc	519.25
dc.relation.references	1. Liao T. W. Clustering of time series data – A survey. Pattern Recognit. Vol. 38. No. 11. Nov. 2005. P. 1857–1874. DOI: https://doi.org/10.1016/j.patcog.2005.01.025
dc.relation.references	2. Saxena A., Prasad M., Gupta A., Bharill N., et. al. A review of clustering techniques and developments. Neurocomputing. 267. 2017. P. 664–681. DOI: https://doi.org/10.1016/j.neucom.2017.06.053
dc.relation.references	3. Zhu X., Li Y., Wang J., Zheng T., Fu J. Automatic Recommendation of a Distance Measure for Clustering Algorithms. ACM Transactions on Knowledge Discovery from Data (TKDD), 15 (1). 2020. P. 1–22. DOI: https://doi.org/10.1007/978-81-322-1665-0_17
dc.relation.references	4. Савчук Т. О., Петришин С. І. Визначення евклідової відстані між надзвичайними ситуаціями на залізничному транспорті під час кластерного аналізу. Наукові праці Вінницького національного технічного університету. Серія «Інформаційні технології та комп’ютерна техніка». 2010. Випуск № 3. 2010. 8 с.
dc.relation.references	5. Keogh E. J., Pazzani M. J. Derivative dynamic time warping. In Proceedings of the 2001 SIAM international conference on data mining. Society for Industrial and Applied Mathematics. April 2001. P. 1–11. DOI: https://doi.org/10.1137/1.9781611972719.1
dc.relation.references	6. Dau H. A., Silva D. F., Petitjean F. et al. Optimizing dynamic time warping’s window width for time series data mining applications. Data Mining and Knowledge Discovery 32. 2018. P. 1074–1120. DOI: https://doi.org/10.1007/s10618-018-0565-y
dc.relation.references	7. Raida V., Svoboda P., Rupp M. Modified dynamic time warping with a reference path for alignment of repeated drive-tests. In 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) IEEE. 2020. P. 1–6. DOI: https://doi.org/10.1109/VTC2020-Fall49728.2020.9348487
dc.relation.references	8. Senin P. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 2008, 23 p.
dc.relation.references	9. Kate R. J. Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery, 30 (2). 2016. P. 283–312. Doi:10.1007/s10618-015-0418-x.
dc.relation.references	10. Hu Z., Mashtalir S. V., Tyshchenko O. K., Stolbovyi M. I. Clustering matrix sequences based on the iterative dynamic time deformation procedure. International Journal of Intelligent Systems and Applications,10 (7). 2018. P. 66–73. DOI: https://doi.org/10.5815/ijisa.2018.07.07
dc.relation.references	11. Hunt J.W., Szymanski T. G. A fast algorithm for computing longest common subsequences. Communications of the ACM. Vol. 20. No. 5. 1977. P. 350–353. DOI: https://doi.org/10.1145/359581.359603
dc.relation.references	12. Hirschberg, Daniel S. Algorithms for the longest common subsequence problem. Journal of the ACM (JACM) 24.4. 1977. P. 664–675. DOI: https://doi.org/10.1145/322033.322044
dc.relation.references	13. Wan, Qingguo, et al. A fast heuristic search algorithm for finding the longest common subsequence of multiple strings. Twenty-Fourth AAAI Conference on Artificial Intelligence. 2010. P. 1287–1292. DOI: https://doi.org/10.1609/aaai.v24i1.7493
dc.relation.references	14. Wang Q., Dmitry K., Shang Y. Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem. Twenty-First International Joint Conference on Artificial Intelligence. 2009. P.1494–1499.
dc.relation.references	15. Korkin D., Wang Q. Shang Y. An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. 37th International Conference on Parallel Processing. IEEE, 2008. P. 354–363. DOI: https://doi.org/10.1109/ICPP.2008.79
dc.relation.references	16. Wang X., Mueen A., Ding H., Trajcevski G., Scheuermann P., Keogh E. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26 (2). 2013. P. 275–309. DOI: https://doi.org/10.1007/s10618-012-0250-5
dc.relation.references	17. Григорович В. Аналіз метрик для інтелектуальних інформаційних систем. Вісник Національного університету «Львівська політехніка». «Інформаційні системи та мережі». 2021. Вип. 9. С. 96–111. URL: https://doi.org/10.23939/sisn2021.09.096
dc.relation.references	18. Батурінець А., Антоненко С. Найдовша спільна підпослідовність в задачі визначення схожості гідрологічних рядів даних. Deutsche Internationale Zeitschrift für zeitgenössische Wissenschaft. 2021. № 18. С. 62–64.
dc.relation.referencesen	1. Liao T. W., Clustering of time series data – A survey, Pattern Recognit. Vol. 38. No. 11. Nov. 2005. P. 1857–1874. DOI: https://doi.org/10.1016/j.patcog.2005.01.025
dc.relation.referencesen	2. Saxena A., et. al. A review of clustering techniques and developments. Neurocomputing, 267, 2017. P. 664-681. DOI: https://doi.org/10.1016/j.neucom.2017.06.053
dc.relation.referencesen	3. Zhu X., Li Y., Wang J., Zheng T., Fu J. Automatic Recommendation of a Distance Measure for Clustering Algorithms. ACM Transactions on Knowledge Discovery from Data (TKDD), 15 (1). 2020. P. 1–22. DOI: https://doi.org/10.1007/978-81-322-1665-0_17
dc.relation.referencesen	4. Savchuk T. O. Viznachennya evklidovoyi vidstani mizh nadzvichaynimi situatsiyami na zaliznichnomu transporti pid chas klasternogo analizu, Naukovi pratsi Vinnitskogo natsionalnogo tehnichnogo universitetu. – Seriya “Informatsiyni tehnologiyi ta komp’yuterna tehnika”. 2010. No. 3. 2010.
dc.relation.referencesen	5. Keogh E. J., Pazzani M. J. Derivative dynamic time warping. In Proceedings of the 2001 SIAM international conference on data mining. Society for Industrial and Applied Mathematics. 2001. April. P. 1–11. DOI: https://doi.org/10.1137/1.9781611972719.1
dc.relation.referencesen	6. Dau H. A., Silva D. F., Petitjean F. et al. Optimizing dynamic time warping’s window width for time series data mining applications. Data Mining and Knowledge Discovery 32. 2018. P. 1074–1120. DOI: https://doi.org/10.1007/s10618-018-0565-y
dc.relation.referencesen	7. Raida V., Svoboda P., Rupp M. Modified dynamic time warping with a reference path for alignment of repeated drive-tests. In 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall) IEEE. 2020. P. 1–6. DOI: https://doi.org/10.1109/VTC2020-Fall49728.2020.9348487
dc.relation.referencesen	8. Senin P. Dynamic time warping algorithm review. Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA, 2008, 23 p.
dc.relation.referencesen	9. Kate R. J. Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery, 30 (2). 2016. P. 283–312. DOI: https://doi.org/10.1007/s10618-015-0418-x
dc.relation.referencesen	10. Hu Z., Mashtalir S. V., Tyshchenko O. K., Stolbovyi M. I. Clustering matrix sequences based on the iterative dynamic time deformation procedure. International Journal of Intelligent Systems and Applications,10 (7). 2018. P. 66–73. DOI: https://doi.org/10.5815/ijisa.2018.07.07
dc.relation.referencesen	11. Hunt J.W., Szymanski T. G. A fast algorithm for computing longest common subsequences. Communications of the ACM. Vol. 20. No. 5. 1977. P. 350–353. DOI: https://doi.org/10.1145/359581.359603
dc.relation.referencesen	12. Hirschberg, Daniel S. Algorithms for the longest common subsequence problem. Journal of the ACM (JACM) 24.4. 1977. P. 664–675. DOI: https://doi.org/10.1145/322033.322044
dc.relation.referencesen	13. Wan, Qingguo, et al. A fast heuristic search algorithm for finding the longest common subsequence of multiple strings. Twenty-Fourth AAAI Conference on Artificial Intelligence. 2010. P. 1287–1292. DOI: https://doi.org/10.1609/aaai.v24i1.7493
dc.relation.referencesen	14. Wang Q., Dmitry K., Shang Y. Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem. Twenty-First International Joint Conference on Artificial Intelligence. 2009. P.1494–1499.
dc.relation.referencesen	15. Korkin D., Wang Q. Shang Y. An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. 37th International Conference on Parallel Processing. IEEE, 2008. P. 354–363. DOI: https://doi.org/10.1109/ICPP.2008.79
dc.relation.referencesen	16. Wang X., Mueen A., Ding H., Trajcevski G., Scheuermann P., Keogh E. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26 (2). 2013. P. 275–309. DOI: https://doi.org/10.1007/s10618-012-0250-5
dc.relation.referencesen	17. Hryhorovych V. Analiz metryk dlia intelektualnykh informatsiinykh system, Visnyk Natsionalnoho universytetu “Lvivska politekhnika” “Informatsiini systemy ta merezhi”. 2021. 9. P. 96–111. URL: https:// doi.org/10.23939/sisn2021.09.096
dc.relation.referencesen	18. Baturinets A., Antonenko S. Longest common subsewuence in the problem of determining the similarity of hydrological data series, Deutsche Internationale Zeitschrift für zeitgenössische Wissenschaft. 2021. No. 18. P. 62–64.
dc.identifier.citationen	Baturinets A. (2022) Distance measures-based information technology for identifying similar data series. Scientific Journal of TNTU (Tern.), vol. 105, no 1, pp. 128-140.
dc.identifier.doi	https://doi.org/10.33108/visnyk_tntu2022.01.128
dc.contributor.affiliation	Дніпровський національний університет імені Олеся Гончара, Дніпро, Україна
dc.contributor.affiliation	Oles Honchar Dnipro National University, Dnipro, Ukraine
dc.citation.journalTitle	Вісник Тернопільського національного технічного університету
dc.citation.volume	105
dc.citation.issue	1
dc.citation.spage	128
dc.citation.epage	140
Розташовується у зібраннях:	Вісник ТНТУ, 2022, № 1 (105)

Файли цього матеріалу:

Файл	Розмір	Формат
TNTUSJ_2022v105n1_Baturinets_A-Distance_measures_based_128-140.pdf	6,5 MB	Adobe PDF	Переглянути/відкрити
TNTUSJ_2022v105n1_Baturinets_A-Distance_measures_based_128-140.djvu	768,93 kB	DjVu	Переглянути/відкрити
TNTUSJ_2022v105n1_Baturinets_A-Distance_measures_based_128-140__COVER.png	1,39 MB	image/png	Переглянути/відкрити

Показати базовий опис матеріалу Перегляд статистики

Усі матеріали в архіві електронних ресурсів захищені авторським правом, всі права збережені.