Date of Award

12-2009

Type

Thesis

Major

Computer Science - Applied Computing Track

Degree Type

Master of Science in Applied Computer Science

Department

TSYS School of Computer Science

First Advisor

Mohamed R. Chouchane

Second Advisor

Edward L. Bosworth

Third Advisor

Jianhua Yang

Abstract

Malware-generating engines challenge typical malware analysts by requiring them to quickly extract and upload to their customers' machines, a signature for each of a possibly vast number of never-before-seen malware instances that an engine can generate in a short amount of time In this thesis we propose and evaluate two methods for linking variants of engine-generated malware to its engine. The proposed methods use the w-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware in- stance as a feature vector for the instance. An NFV is a tuple that maps «-grams with their frequencies. The in-formation contained within the NFV of an engine-generated malware instance is then used to attribute the instance to the engine. The first method implements a Bayesian-like classifier that uses 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and one of malicious programs from the W 3 2. Simile family of self-mutating mal- ware. The second method, which is an extension of the first method, uses optimized 2-gram frequency vectors as feature vectors and classifies malware by computing its proximity to the average of the NFVs of instances known to have been generated by a known engine. The second method was successfully evaluated on four ma) ware-generating engines: W32 . Simile, W32.Evol, W32.NGCVK, and W32.VCL. The evaluation yielded a set of four 1 7-tuples of doubles as signatures for each of the engines, and achieved a 95% discrimination accuracy between a sample of benign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8% were achieved for engine signatures of size 6. 8 and, 14 doubles. We also used four k-rm classifiers which, unlike the second method, require the time-consuming task of creating and storing one signature per known malware instance, to countercheck the ac- curacies achieved by the second method. This work is inspired by successful methods for attributing natural language texts to their respective authors. The proposed methods may be viewed as filtering (or decision support) tools that malware detectors may use to determine whether extensive engine-specific program analyses such as emulation and control flow analysis are needed on a suspect program.

Recommended Citation

Milgo, Edna Chelangat, "Statistical Tools for Linking Engine-Generated Malware to Its Engine" (2009). Theses and Dissertations. 83.
https://csuepress.columbusstate.edu/theses_dissertations/83

Download

DAISY

Included in

Cybersecurity Commons

COinS

CSU ePress

Theses and Dissertations

Statistical Tools for Linking Engine-Generated Malware to Its Engine

Date of Award

Type

Major

Degree Type

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Library Corner

CSU ePress

Theses and Dissertations

Statistical Tools for Linking Engine-Generated Malware to Its Engine

Author

Date of Award

Type

Major

Degree Type

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Library Corner