Abhishek_Polymorphic - Background Current Approach to...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Background Current Approach to Malware Identification Signature Generation Scanning for a Signature One or more strings of machine code unique to the malicious program. Produced by human examination or possibly automation. Known Offset, Known Order Any Offset, Known Order Any Offset, Any Order BufferOverflow Idea introduced in, "Smashing the stack for fun and profit" Most common method of attack IDS apply pattern matching to defend against buffer overflow attacks Polymorphic Viruses Dark Avenger introduced polymorphism in 1992 Method against pattern matching Cipher the code and generate a decipher routine which is different each time. push byte 0x68 push dword 0x7361622f push dword 0x6e69622f mov ebx,esp xor edx,edx push edx push ebx mov ecx,esp push byte 11 pop eax int 80h "\x6A\x68\x68\x2F\x62\x61\x73\x68\x2F\x62\x69\x6E\x89\xE3\x31\xD2\x52\x53 \x89\xE1\x6A\x0B\x58\xCD\x80" [PPPPPPPPPPPPPPPP] Plain Text [KKKKKKKKKKKKKK] Cipher Key [DDDKKKKKKKKKKKKKK] Encrypted Code If Decipher routine [DDD] does not change much, signatures can be created. Generate a different decipher routine each time with a different key Do not use simple XOR alone. ADD/SUB/ROL/ROR can be used Use "Dead Code" between decipher code. Use fake registers CLET A Polymorphic Shell Code NOP ShellCode Cram Bytes Return Address NIDS tries to find consecutive NOP and apply pattern matching on ShellCode. Usually a combination of NOP, Return Addresses can be identified GOALS of CLET Generate fake NOPs Cipher Shellcode (use random methods more than only XOR), and use a randomly generated decipher routine Avoid a big return address zone to prevent against data mining methods. Decipher Routine NOP ShellCode Cram Bytes Return Address Fake NOPs with 2,3 byte instructions NOPs are necessary before the shellcode, since we don't know where our JMP will end up. For a "NOP sledge", anywhere within the sledge is fine. (1 byte instructions) We can replace NOP's be other "nondangerous" instructions. Problem: There are not too many one byte non dangerous instructions. (Advantage to NIDS) Many one byte instructions are also alphanumeric How do we generate random fakeNOPs using other instructions which are severalbytes long ? We could generate twobyte instructions, the second byte of which is a onebyte instruction or the first byte of a twobyte instruction. Consider "\x15\x11\xF8\xFA\x81\xF9\x27\x2F\x90\x9E" ADC $0x11F8FA81 STC DAA DAS NOP SAHF ADC %eax,%edx CMP %ecx,$0x272F909E ./clet n nnop : generate nnop NOP. a : use american english dictonnary to generate NOP. c : print C form of the buffer. i nint : decryption routine has nint instructions (default is 5) f file : spectrum file used to polymorph. b ncra : generate ncra cramming bytes using spectrum or not B : cramming bytes zone is adapted to beginning t : number of bytes generated is a multiple of 4 x XXXX : XXXX is the address for the address zone FE011EC9 for instance z nadd : generate address zone of nadd*4 bytes e : execute shellcode. d : dump shellcode to stdout. s : spectrum analysis. S file : load shellcode from file. E [13]: load an embeded shellcode. h : display this message. Metamorphic viruses Do not encrypt themselves Contain a Mutation Engine They completely rewrite themselves at random, keeping functionality the same but appearing different. Involves incredible skill to create this kind of virus. Eg: W32.Simile Example A: A_Instr1 A_Instr2 B: B_Instr1 B_Instr2 C: C_Instr1 C_Instr2 Code flow A: A_Instr1 JMP Z Y: B_Instr1 B_Instr2 JMP X Z: A_Instr2 JMP Y C: C_Instr1 C_Instr2 X: NOP NOP JMP C Mutate Code Example: Insertion of Dead Code Code Transposition add mov add r1, r2, r3 r4, r1 r5, r6, r7 add mov add r1, r2, r3 r4, r1 r5, r6, r7 Example: Insertion of Dead Code Code Transposition add nop push r1 r1 r4, r1 r5, r6, r7 r1, r2, r3 add mov add r1, r2, r3 r4, r1 r5, r6, r7 pop mov add Example: Insertion of Dead Code Code Transposition add nop push r1 r5, r6, r7 r1 r4, r1 r1, r2, r3 add mov add r1, r2, r3 r4, r1 r5, r6, r7 add pop mov Biological Immune Systems Human body under constant siege by pathogen which replicate Bacteria Parasites Viruses Fungi Homeostasis: a stable state of equilibrium The immune system (IS) helps maintain homeostasis Different pathogens eliminated in different ways, hence the IS must pick the correct "effectors" IS Immune System Mapping IS detects abuses of security policy Responds by counter attacking the source of abuse Policy specified by "Natural Selection" and emphasizes the following Availability enables body to continue functioning Correctness prevents the IS from attacking itself AutoImmune Disorder Integrity ensures that genes that encode for correctness of functions is not corrupted Accountability finding and destroying pathogens responsible for illness Principles of an Artificial Immune System Computers for an analogy to computer systems Some principles Distributed protection millions of local interactions Diversity every individual has a different immune system Robustness lack of hierarchy Adaptability IS adapts (learns) pathogenic structure Memory of previously encountered pathogens Implicit policy specification Self / nonSelf. It knows its behavior Flexibility more resources used when infection detected Scalable being distributed, communication is localized Architecture of an Immune System Multilayered protection Skin anatomic barrier Physiological pH, temperature provide inappropriate living conditions Innate Immune system consists of endocyte and phagocyte systems which involve motile scavenger cells such as macrophages that ingest extracellular molecules Adaptive Immune system Learns specific kind of pathogens and retains memory for faster responses the next time Previously unknown pathogens generate "learning" Primary response and Secondary Reponse Adaptive Immune System Consists of white blood cells called Lymphocytes Lymphocytes are Mobile independent detectors which cooperate and circulate via the lymph system Millions of cells (detectors) making simple localized interactions Recognition Receptors surface of immune cells Epitope location on surface of pathogens (protein fragements peptides) Detection chemical bond established between receptor and epitope Monospecificity all receptors on the lymphocytes are identical Pathogens often have multiple different epitopes High affinity binding causing the lymphocyte to cross a certain threshold activates the lymphocyte Adaptation BCells (Generated and trained in Bone Marrow) T Cells (Generated and trained in Thymus) Negative Selection (Clonal Detetion) Cells are trained in the bone and the thymus where they are placed with "Self" Receptor surfaces are formed at random and over generations are taught to NOT identify self If immature cells are activated by binding to self they will be eliminated. Mature cells will tolerate most self epitopes and are said to have undergone Central Tolerization On maturation these cells are released into the lymph system where they attempt to form affinity bonds with NonSelf by competing with each other If a certain cells wins this affinity race, copies of it are made which are subject to very high mutations called Somatic Hypermutations Goal Evaluate a method of improving the identification of polymorphic malware using Genetic Algorithms (GAs). Sig1(P) = {substring1,substring2,..substringM} Our Solution: The Classifier File 11001010001001010100101010 10100101000101110110101001 01010010110100101000100101 00110101000111101010101001 01010101010101010101001110 10010000001010101010101010 10101011100101010010101010 10101001111101110001010101 0101010101010 110010101010 Classifier Is it of the target class? Classifier Our Solution: The Classifier Signature of File Fragments: File: 11001011010010110101 10101010010101110100 11001011010010110100 10010010100101110101 10101101010010100100 10101000111101010111 11010110101101011011 000100111... 1011010010110100... 1001011010010110... 0010110100101101... 1101011010110101... return number_found > threshold Our Solution: The GA Representation Our Solution: The GA Operators Fitness Function Common bitwise mutation and uniform crossover. Accuracy on a Training Dataset Training Dataset Should Consist of examples of both classes. (Equal number?) Be diverse. Accurately represent reality. Experiments: Solution has Merit? Target Class Programs produced by CLET. Training Dataset CLET is a polymorphic engine that creates shellcode buffer overflow malware. Validation of our Solution Testing Dataset 20 CLET shellcodes, 20 nonCLET shellcodes 180 CLET shellcodes, 13 nonCLET shellcodes Experiments: Solution has Merit? Procedure Search for GA Parameters All the normal parameters. Plus: Number of file fragments in a signature, Number of bytes in a file fragment. Results Relative insensitivity to normal GA parameters. Number of file fragments and bytes per file seem to be the most important Led to three final experiments... Results Testing One best classifier of each type. Number File Fragments in a Signature 3 File Fragments 5 File Fragments 10 File Fragments Training Accuracy Testing Accuracy CLET nonCLET 97.5% 100.0% 100.0% 100.0% 83.3% 95.0% 92.3% 92.3% 100.0% Conclusions Extensions Verify results on larger datasets. Evolve signature length. Extend to other areas Number of file fragments. Number of bytes per file fragment. Other types of polymorphic / metamorphic malware. Parallelize the search process. Nonpolymorphic / metamorphic malware ...
View Full Document

This note was uploaded on 07/04/2011 for the course CIS 3360 taught by Professor Guha during the Fall '06 term at University of Central Florida.

Ask a homework question - tutors are online