A CROSS-LANGUAGE FRAMEWORK FOR MALICIOUS SCRIPT DEOBFUSCATION

Lu, Weijia

Please use this identifier to cite or link to this item: https://hdl.handle.net/11264/2743

Title:	A CROSS-LANGUAGE FRAMEWORK FOR MALICIOUS SCRIPT DEOBFUSCATION
Authors:	Lu, Weijia Royal Military College of Canada Leblanc, Sylvain
Keywords:	Malware Large Language Model Abstract Syntax Tree AST LLM PowerShell JavaScript Parser Malicious Script Deobfuscation Transformer Island Parser
Issue Date:	28-Apr-2026
Abstract:	Syntax Analysis, the process of parsing code sequences into an intermediate representation, is a critical procedure within the malware deobfuscation pipeline, bringing grammatically defined hierarchical structure to seemingly random sequences of obfuscated characters. Modern efforts to perform this analytical procedure are restricted to traditional parsers that rely on static grammatical rules, thereby contributing to a fragmented landscape of tools with narrow applications. The aim of this research is to evaluate the effectiveness of the transformer architecture to perform syntax analysis on malicious scripts. To this end, the Transformer-based Robust Island Parser (TRIP) is introduced, a cross-language framework designed around heuristic-driven fragmentation and neural inference parsing of obfuscated code. In the modern machine learning landscape, transformer models such as TreeBERT and AST-T5 have demonstrated code understanding and generation capabilities by accurately mapping programming language syntax and semantics. This research explores the application of this structural awareness towards the neural inference of an intermediate representation of code. Results demonstrate that the transformer achieved superior robustness in parsing fragmented code when compared against traditional parsing solutions, especially for JavaScript, while presenting a unified cross-language output format to simplify analysis. L'analyse syntaxique, le processus d'analyse de séquences de code en une représentation intermédiaire, est une procédure critique au sein du pipeline de déobfuscation des logiciels malveillants, apportant une structure hiérarchique définie grammaticalement à des séquences apparemment aléatoires de caractères obscurcis. Les efforts modernes pour effectuer cette procédure analytique sont limités aux analyseurs traditionnels qui s'appuient sur des règles grammaticales statiques, contribuant ainsi à un paysage fragmenté d'outils aux domaines d'application restreints. L'objectif de cette recherche est d'évaluer l'efficacité de l'architecture Transformer pour effectuer l'analyse syntaxique des scripts malveillants. À cette fin, le Transformer-based Robust Island Parser (TRIP) est présenté, un cadre inter-langage conçu autour de la fragmentation heuristique et de l'analyse syntaxique par inférence neuronale du code obscurci. Dans le paysage actuel de l'apprentissage automatique, les modèles de transformateurs tels que TreeBERT et AST-T5 ont démontré leurs capacités de compréhension et de génération de code en cartographiant avec précision la syntaxe et la sémantique des langages de programmation. Cette recherche explore l'application de cette conscience structurelle à l'inférence neuronale d'une représentation intermédiaire du code. Les résultats démontrent que le transformateur a atteint une robustesse supérieure dans l'analyse du code fragmenté par rapport aux solutions d'analyse traditionnelles, en particulier pour JavaScript, tout en présentant un format de sortie inter-langage unifié pour simplifier l'analyse.
URI:	https://hdl.handle.net/11264/2743
Appears in Collections:	Theses

Files in This Item:

File	Description	Size	Format
A_CROSS-LANGUAGE_FRAMEWORK_FOR_MALICIOUS_SCRIPT_DEOBFUSCATION_Lu.pdf	A Cross-Langauge Framework for Malicious Script Deobfuscation, by Capt Lu, Weijia	1.88 MB	Adobe PDF	View/Open

Show full item record

Language selection

Search