This paper proposes a method that combines ChatGPT data augmentation with Instruction Supervised Fine-Tuning of open large language models.
Real-world TTPs are often embedded in a vast amount of heterogeneous unstructured text. Relying solely on manual identification requires significant human resources and effort. Automating the efficient classification of TTPs from unstructured text becomes a crucial task.
Prominent TTPs description frameworks include Stride, Cyber Kill Chain, and MITRE ATT&CK.
- This method which exhibits a long-tail issue [9] results in a lack of categories for 108 techniques, with some having only descriptions and others having only one procedure example.
- Traditional data augmentation methods prove insufficient to meet the needs of preserving context semantic integrity and enhancing the diversity of training samples.