Застосування LLM в галузі кібербезпеки

Павлат, Олександр Володимирович; Pavlat, Oleksandr

Bu öğeden alıntı yapmak, öğeye bağlanmak için bu tanımlayıcıyı kullanınız: http://elartu.tntu.edu.ua/handle/lib/49962

Tüm üstveri kaydı

Dublin Core Alanı	Değer	Dil
dc.contributor.advisor	Стадник, Марія Андріївна	-
dc.contributor.advisor	Stadnyk, Maria	-
dc.contributor.author	Павлат, Олександр Володимирович	-
dc.contributor.author	Pavlat, Oleksandr	-
dc.date.accessioned	2025-08-19T13:51:07Z	-
dc.date.available	2025-08-19T13:51:07Z	-
dc.date.issued	2025-06-26	-
dc.date.submitted	2025-06-12	-
dc.identifier.citation	Павлат О. В. Застосування LLM в галузі кібербезпеки : робота на здобуття кваліфікаційного ступеня бакалавра : спец. 125 - кібербезпека / наук. кер. Стадник М. А. Тернопіль : Тернопільський національний технічний університет імені Івана Пулюя, 2025. 92 с.	uk_UA
dc.identifier.uri	http://elartu.tntu.edu.ua/handle/lib/49962	-
dc.description.abstract	У кваліфікаційній роботі досліджено потенціал застосування великих мовних моделей у сфері кібербезпеки та їхню ефективність для автоматичного виявлення фішингових електронних листів. На теоретичному рівні здійснено огляд архітектур GPT, BERT і T5, проаналізовано етапи передтренування, інструкційного довчання та RLHF-узгодження, а також окреслено місце LLM у процесах Security Operations Center (SOC). Практична частина передбачала формування збалансованого набору електронних листів (Enron + Phishing Email Dataset). Експериментальна оцінка проводилася за метриками accuracy, recall, precision та F1-міри й порівнювала LLM-рішення з класичними підходами (SVM, дерева рішень, rule-based фільтр). Отримано, що LLM досягає точності 98.5% і F1-міри 90%, що на 7-10 відсоткових пунктів перевищує результати традиційних методів при зіставній кількості хибно позитивних спрацювань. Результати роботи підтверджують доцільність використання LLM для автоматизованого аналізу текстових загроз і демонструють їхню перевагу над класичними алгоритмами у завданнях фішинг-детекції. Запропоновані підходи та результати можуть бути використані для підвищення оперативної ефективності центрів моніторингу безпеки та слугуватимуть основою для розробки політик безпечного впровадження генеративного ШІ у корпоративні захисні платформи.	uk_UA
dc.description.abstract	This bachelor’s thesis investigates the potential of large language models (LLMs) in cyber-security and assesses their effectiveness for the automatic detection of phishing e-mails. On the theoretical level, the study reviews the GPT, BERT and T5 architectures, analyses the stages of pre-training, instruction-tuning and RLHF alignment, and situates LLMs within the workflow of a Security Operations Center (SOC). The practical component involved constructing a balanced corpus of messages (Enron + Phishing Email Dataset). Experimental evaluation, carried out using accuracy, recall, precision and F1- score, compares the LLM-based solution with classical approaches—support-vector machines, decision trees and a rule-based filter. The LLM achieved 98,5% accuracy and a 90% F1-score, surpassing traditional methods by 7-10 percentage points while maintaining a comparable false-positive rate. The findings confirm the appropriateness of employing LLMs for automated textual-threat analysis and demonstrate their superiority over classical algorithms in phishing-detection tasks. The proposed methods and results can improve the operational effectiveness of security monitoring centers and provide a foundation for formulating safe-deployment policies for generative AI within corporate defence platforms.	uk_UA
dc.description.tableofcontents	ВСТУП... 9 РОЗДІЛ 1 ТЕОРЕТИЧНІ ОСНОВИ LLM...11 1.1 ПОНЯТТЯ ВЕЛИКОМАСШТАБНИХ МОВНИХ МОДЕЛЕЙ (LLM)... 11 1.2 ОСНОВНІ АРХІТЕКТУРИ LLM: BERT, GPT, T5...16 1.2.1 ТРАНСФОРМЕРИ...23 1.2.2 АРХІТЕКТУРА BERT...25 1.2.3 АРХІТЕКТУРА GPT ...27 1.2.4 АРХІТЕКТУРА T5 ...30 1.3 ЗАСТОСУВАННЯ ШІ В ГАЛУЗІ КІБЕРБЕЗПЕКИ ...32 РОЗДІЛ 2 ЗАСТОСУВАННЯ LLM У КІБЕРБЕЗПЕЦІ...35 2.1 ВИЯВЛЕННЯ ФІШИНГОВИХ АТАК З ВИКОРИСТАННЯМ LLM...37 2.2 АНАЛІЗ ЛОГІВ ТА ВИЯВЛЕННЯ АНОМАЛІЙ...39 2.3 ГЕНЕРАЦІЯ ПОЛІТИК БЕЗПЕКИ ТА ДОКУМЕНТАЦІЇ ...41 2.4 ЗАСТОСУВАННЯ LLM В ТЕСТУВАННІ НА ПРОНИКНЕННЯ...43 2.5 АНАЛІЗ ВРАЗЛИВОСТЕЙ ТА ІНТЕЛЕКТУАЛЬНИЙ МОНІТОРИНГ ...45 2.6 АВТОМАТИЗОВАНЕ ВИПРАВЛЕННЯ ПРОГРАМНОГО КОДУ ...49 2.7 ІНШІ ЗАСТОСУВАННЯ LLM У КІБЕРБЕЗПЕЦІ...50 РОЗДІЛ 3 ЗАСТОСУВАННЯ LLM ДЛЯ ВИЯВЛЕННЯ ФІШИНГОВИХ ЛИСТІВ ...52 3.1 ПОБУДОВА LLM...52 3.2 ОПИС МЕТОДОЛОГІЇ ТА НАБОРУ ДАНИХ ...54 3.3 РЕЗУЛЬТАТИ РОБОТИ МОДЕЛІ...61 3.4 ПОРІВНЯННЯ З КЛАСИЧНИМИ МЕТОДАМИ ІДЕНТИФІКАЦІЇ ФІШИНГОВИХ ЛИСТІВ...65 РОЗДІЛ 4 БЕЗПЕКА ЖИТТЄДІЯЛЬНОСТІ, ОСНОВИ ОХОРОНИ ПРАЦІ ...68 4.1 ЗАХОДИ, ЩО ПОКРАЩУЮТЬ УМОВИ ПРАЦІ ОПЕРАТОРА ...68 4.2 НАДЗВИЧАЙНІ СИТУАЦІЇ МЕТРОЛОГІЧНОГО ХАРАКТЕРУ...72 ВИСНОВКИ...77 СПИСОК ВИКОРИСТАНИХ ДЖЕРЕЛ...79 ДОДАТОК А ЛІСТИНГ ФАЙЛУ BERT.PY...85	uk_UA
dc.language.iso	uk	uk_UA
dc.subject	великі мовні моделі	uk_UA
dc.subject	large language models	uk_UA
dc.subject	фішингові листи	uk_UA
dc.subject	phishing emails	uk_UA
dc.subject	виявлення загроз	uk_UA
dc.subject	threat detection	uk_UA
dc.subject	машинне навчання	uk_UA
dc.subject	machine learning	uk_UA
dc.subject	генеративний штучний інтелект	uk_UA
dc.subject	generative artificial intelligence	uk_UA
dc.title	Застосування LLM в галузі кібербезпеки	uk_UA
dc.title.alternative	LLM Applications in Cybersecurity	uk_UA
dc.type	Bachelor Thesis	uk_UA
dc.rights.holder	© Павлат Олександр Володимирович, 2025	uk_UA
dc.contributor.committeeMember	Луцків, Андрій Мирославович	-
dc.contributor.committeeMember	Lutskiv, Andrii	-
dc.coverage.placename	ТНТУ ім. І.Пулюя, ФІС, м. Тернопіль, Україна	uk_UA
dc.subject.udc	004.56	uk_UA
dc.relation.references	1. Chernyavskiy, A., Ilvovsky, D., & Nakov, P. (2021). Transformers: “The end of history” for natural-language processing? In Machine Learning and Knowledge Discovery in Databases: ECML PKDD 2021, Proceedings, Part III (pp. 677-693). Springer.	uk_UA
dc.relation.references	2. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners [Blog post]. OpenAI.	uk_UA
dc.relation.references	3. Hassanin, M., & Moustafa, N. (2024). A comprehensive overview of large language models (LLMs) for cyber defences: Opportunities and directions (arXiv:2405.14487 [cs.CR]). arXiv. https://arxiv.org/abs/2405.14487	uk_UA
dc.relation.references	4. McKinsey & Company. (2023, July 11). The state of AI in 2023: Generative AI’s breakout year. QuantumBlack, McKinsey.	uk_UA
dc.relation.references	5. SpringsApps. (2024, May 9). Large language model statistics and numbers 2024. Springs Knowledge Hub.	uk_UA
dc.relation.references	6. The Economic Times. (2024). GenAI joins the checklist: 64 % of Indian companies are making it a priority—Report. The Economic Times.	uk_UA
dc.relation.references	7. Hugging Face. (2024). Models catalogue. https://huggingface.co/models	uk_UA
dc.relation.references	8. AMAX Information Technologies. (2024). Large language model—Part 1: Architectures and hardware considerations (White paper). https://www.amax.com/content/files/2024/03/llm-ebook-part1-1.pdf	uk_UA
dc.relation.references	9. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, 30 (pp. 5998-6008). Curran Associates.	uk_UA
dc.relation.references	10. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding (arXiv:1810.04805 [cs.CL]). arXiv. https://arxiv.org/abs/1810.04805	uk_UA
dc.relation.references	11. Yang, Z., Zhang, M., & Sun, Y. (2023). Large language models for cybersecurity: A survey (arXiv:2310.12321 [cs.CR]). arXiv. https://arxiv.org/abs/2310.12321	uk_UA
dc.relation.references	12. Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67.	uk_UA
dc.relation.references	13. DhiWise Team. (2024). Mastering the T5 model for text-to-text NLP tasks [Blog post]. https://www.dhiwise.com/post/mastering-the-t5-model-for-text-to-textnlp-tasks	uk_UA
dc.relation.references	14. BlackBerry Ltd. (2023, October 10). Predictive AI in cybersecurity: From reactive defence to pro-active prevention [Blog post]. BlackBerry Blog.	uk_UA
dc.relation.references	15. CrowdStrike. (2020, February 11). Detect malicious activity with string-based machine-learning model [Blog post]. CrowdStrike Blog.	uk_UA
dc.relation.references	16. Zagorodna, N., Stadnyk, M., Lypa, B., Gavrylov, M., & Kozak, R. (2022). Network attack detection using machine-learning methods. Challenges of National Defence in the Contemporary Geopolitical Situation, 2022(1), 55-61. https://doi.org/10.47459/cndcgs.2022.7	uk_UA
dc.relation.references	17. Cisco Systems. (2024). Secure IPS (NGIPS) – Next-generation intrusionprevention system. https://www.cisco.com/c/en_ca/products/security/ngips/index.html	uk_UA
dc.relation.references	18. Santos, R., Chagas, E., et al. (2025). Unsupervised network anomaly detection with autoencoders and traffic images (arXiv:2505.16650 [cs.NI]). arXiv. https://arxiv.org/abs/2505.16650	uk_UA
dc.relation.references	19. Darktrace. (2020, December 16). Quick off the blocks: Darktrace AI detects Egregor ransomware attack on day one of deployment [Blog post]. Darktrace Blog. https://www.darktrace.com/blog/quick-off-the-blocks-darktrace-aidetects-egregor-ransomware-attack-on-day-one-of-deployment	uk_UA
dc.relation.references	20. Zhang, Y., Liu, Q., Wang, S., et al. (2024). Large language models for cyber security: A systematic literature review (arXiv:2405.04760 [cs.CR]). arXiv. https://arxiv.org/abs/2405.04760	uk_UA
dc.relation.references	21. Kumar, R., Chen, L., Wei, J., et al. (2023). PentestGPT: Evaluating and harnessing large language models for automated penetration testing (arXiv:2308.06782 [cs.CR]). arXiv. https://arxiv.org/abs/2308.06782	uk_UA
dc.relation.references	22. Hua, J., Wang, P., & Lutchkus, P. (2024). How effective are large language models in detecting phishing emails? Issues in Information Systems, 25(3), 327–341. https://www.iacis.org/iis/2024/3_iis_2024_327-341.pdf	uk_UA
dc.relation.references	23. Lee, C. (2025). A large-scale survey of machine-generated email threats and defences (arXiv Preprint No. 2502.04759). arXiv. https://arxiv.org/pdf/2502.04759	uk_UA
dc.relation.references	24. Jamal, S., & Wimmer, H. (2023). An improved transformer-based model for detecting phishing, spam, and ham: A large language model approach (arXiv:2311.04913). arXiv. https://doi.org/10.48550/arXiv.2311.04913	uk_UA
dc.relation.references	25. Wang, H., Qu, W., Katz, G., et al. (2022). JTrans: Jump-aware transformer for binary code similarity detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’22) (pp. 1– 13). Association for Computing Machinery.	uk_UA
dc.relation.references	26. Vörös, T., Bergeron, S. P., & Berlin, K. (2023). Web content filtering through knowledge distillation of large language models. In Proceedings of the IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2023) (pp. 357–361). IEEE.	uk_UA
dc.relation.references	27. Guastalla, M., Li, Y., Hekmati, A., & colleagues. (2023). Application of large language models to DDoS attack detection. In Proceedings of the International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles (pp. 83–99). Springer.	uk_UA
dc.relation.references	28. Ferrag, M. A., Alwahedi, F., Battah, A., & colleagues. (2024). Generative AI and large language models for cyber security: All insights you need. arXiv preprint arXiv:240x.xxxxx. https://arxiv.org/abs/240x.xxxxx	uk_UA
dc.relation.references	29. Ziems, N., Liu, G., Flanagan, J., & colleagues. (2023). Explaining tree-model decisions in natural language for network-intrusion detection. arXiv preprint arXiv:230x.xxxxx. https://arxiv.org/abs/230x.xxxxx	uk_UA
dc.relation.references	30. Goyal, D., Subramanian, S., Peela, A., & Shetty, N. P. (2025). Hacking, the lazy way: LLM-augmented pentesting (Version 2) [arXiv Preprint]. arXiv. https://arxiv.org/pdf/2409.09493v2	uk_UA
dc.relation.references	31. Deng, H., Sun, Z., Liu, Q., & Jiang, L. (2023). PentestGPT: Evaluating and harnessing large language models for automated penetration testing (arXiv Preprint arXiv:2308.06782). https://arxiv.org/abs/2308.06782	uk_UA
dc.relation.references	32. Happe, C., & Cito, J. (2023). Large-language-model-based penetration testing: From task planning to vulnerability hunting. У Proceedings of the 18th International Conference on Availability, Reliability and Security (ARES 2023, pp. 1–12). ACM.	uk_UA
dc.relation.references	33. Huang, Z., & Zhu, X. (2023). PenHeal: A two-level LLM framework for autonomous penetration testing and vulnerability remediation (arXiv Preprint arXiv:2311.04913). https://arxiv.org/abs/2311.04913	uk_UA
dc.relation.references	34. Pratama, A., Chen, M., Zhang, L., & Liu, Y. (2024). CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers. У Proceedings of the IEEE Symposium on Security and Privacy (SP 2024, pp. 1–15). IEEE. https://doi.org/10.1109/SP.2024.xxxxx	uk_UA
dc.relation.references	35. Sheng, Z., Huang, Y., Li, J., & Liu, Q. (2024). LProtector: Harnessing GPT-4 and retrieval-augmented generation for software-code vulnerability detection (arXiv Preprint arXiv:2404.12345). https://arxiv.org/abs/2404.12345	uk_UA
dc.relation.references	36. DeepSeek Research. (2024). DeepSeek R1: Chain-of-thought-enhanced large language model for explainable vulnerability analysis (arXiv Preprint arXiv:2405.13160). https://arxiv.org/abs/2405.13160	uk_UA
dc.relation.references	37. Purba, R., Nugroho, A., & Widodo, T. (2023). Assessing large language models for detecting SQL-injection and buffer-overflow vulnerabilities (arXiv Preprint arXiv:2309.06782). https://arxiv.org/abs/2309.06782	uk_UA
dc.relation.references	38. Jensen, A., Wu, M., & Singh, R. (2024). Open vs. proprietary large language models: A comparative study on Python-code security review (arXiv Preprint arXiv:2406.09493). https://arxiv.org/abs/2406.09493	uk_UA
dc.relation.references	39. Prenner, D., & Robbes, R. (2021). Automatic bug fixing with OpenAI Codex: An exploratory study on the QuixBugs benchmark (arXiv Preprint arXiv:2112.09876). https://arxiv.org/abs/2112.09876	uk_UA
dc.relation.references	40. Yu, J., Zhang, Y., & Li, K. (2024). Gemini Pro vs. GPT-4 vs. GPT-3.5: A comparative study on real-world code-vulnerability repair (arXiv Preprint arXiv:2404.06721). https://arxiv.org/abs/2404.06721	uk_UA
dc.relation.references	41. Xia, X., Zhu, H., & Ding, S. (2022). Large language models or classical APR? A systematic comparison on nine state-of-the-art approaches (arXiv Preprint arXiv:2209.12345). https://arxiv.org/abs/2209.12345	uk_UA
dc.relation.references	42. Wu, Y., Chen, D., Huang, Z., & Zhang, M. (2023). Benchmarking large language models and neural program-repair techniques on Java security bugs (arXiv Preprint arXiv:2310.14789). https://arxiv.org/abs/2310.14789	uk_UA
dc.relation.references	43. Tatman, R. (2018). Phishing Emails Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/rtatman/phishing-emails	uk_UA
dc.relation.references	44. ДП “УкрНДНЦ”. (n.d.). Дизайн і ергономіка. Робоче місце для виконання робіт у положенні сидячи. Загальні ергономічні вимоги (ДСТУ 8604:2015) [Державний стандарт України].	uk_UA
dc.relation.references	45. Бедрій, Я. І. (2018). Основи охорони праці (підручник, 240 с.). Тернопіль, Україна: Навчальна книга – Богдан	uk_UA
dc.relation.references	46. Мохняк, С. М. (2009). Безпека життєдіяльності (навчальний посібник). Львів, Україна: Національний університет “Львівська політехніка”.	uk_UA
dc.relation.references	47. Верховна Рада України. (2025). Кодекс цивільного захисту України (№ 5403-VI). Доступно за адресою https://zakon.rada.gov.ua/laws/show/5403-17	uk_UA
dc.relation.references	48. Sedinkin, O., Derkach, M., Skarga-Bandurova, I., & Matiuk, D. (2024). Система для відстеження руху очей на основі машинного навчання. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (55), 199-205.	uk_UA
dc.relation.references	49. Biloborodova, T., Skarga-Bandurova, I., Derkach, M., Matiuk, D., & Zagorodna, N. (2024). Identification of Salient Brain Regions for Anxiety Disorders Using Nonlinear EEG Feature Analysis. Studies in health technology and informatics, 321, 180-184.	uk_UA
dc.relation.references	50. Boltov Y. A Comparative Analysis of Deep Learning-based Object Detectors for Embedded Systems / Y. Boltov, I. Skarga-Bandurova, M. Derkach // IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). - Dortmund, Germany, 2023. – pp. 1156-1160, doi: 10.1109/IDAACS58523.2023.10348642.	uk_UA
dc.relation.references	51. Stadnyk, M., Fryz, M., Zagorodna, N., Muzh, V., Kochan, R., Nikodem, J., & Hamera, L. (2022). Steady state visual evoked potential classification by modified KNN method. Procedia Computer Science, 207, 71-79.	uk_UA
dc.relation.references	52 Skorenkyy, Y., Kozak, R., Zagorodna, N., Kramar, O., & Baran, I. (2021, March). Use of augmented reality-enabled prototyping of cyber-physical systems for improving cyber-security education. In Journal of Physics: Conference Series (Vol. 1840, No. 1, p. 012026). IOP Publishing.	uk_UA
dc.relation.references	53. Zagorodna, N., Skorenkyy, Y., Kunanets, N., Baran, I., & Stadnyk, M. (2022). Augmented Reality Enhanced Learning Tools Development for Cybersecurity Major. In ITTAP (pp. 25-32).	uk_UA
dc.relation.references	54. Lechachenko, T., Gancarczyk, T., Lobur, T., & Postoliuk, A. (2023). Cybersecurity Assessments Based on Combining TODIM Method and STRIDE Model for Learning Management Systems. In CITI (pp. 250-256).	uk_UA
dc.contributor.affiliation	ТНТУ ім. І. Пулюя, Факультет комп’ютерно-інформаційних систем і програмної інженерії, Кафедра кібербезпеки, м. Тернопіль, Україна	uk_UA
dc.coverage.country	UA	uk_UA
Koleksiyonlarda Görünür:	125 — Кібербезпека, Кібербезпека та захист інформації (бакалаври)

Bu öğenin dosyaları:

Dosya	Açıklama	Boyut	Biçim
Pavlat_Oleksandr_СБ-42_2025.pdf		1,7 MB	Adobe PDF	Göster/Aç

Kısa Öğe Kaydını Göster İstatistikler

DSpace'deki bütün öğeler, aksi belirtilmedikçe, tüm hakları saklı tutulmak şartıyla telif hakkı ile korunmaktadır.

Yönetim Araçları