Embedded Files: Binwalk – Searching for data

January 15, 2026

Embedded Files: Binwalk – Finding unseen data

Have you ever wondered what sort of data could be embedded into a file?

I don’t mean application resources and other easily found files like, pictures, resources, and data you might find using something such as Resource Hacker, PE Explorer, PPEE (Professional PE file Explorer), or DiE (Detect it Easy). I mean file structures that are embedded in such a way, only the signature is what tells you they are most likely there.

Before we look at binwalk, lets start off with some basics to understand how it works.

The Basics: Viewing Resources


Using Resource Hacker (PE Explorer would work as well) we can, for example, view the icon associated with the executable (steam.exe in this case) and change it if we would like to. Pretty cool if you ask me.

So what else can we do to explorer a little more? If you don’t already know, a lot of the time you can use 7-Zip to view some things, in some files. It won’t show you everything but its a useful thing to keep in mind if you’re trying to view data that might be stored whithin a file. It’s more helpful for self extracting archives and files that have archives stored in them. You might be surprised at what you find if you start poking around! Below is the same .exe opened with 7-Zip. What you see below is the root “folder” containing the PE Sections and the Resource Table folder, then below, the same resource we have selected in Resource Hacker in the example above.

If you find this interesting or just want to learn more about the PE file format of x86 executable files, here are a few links to get you started:

Understanding: Its Magic Numbers!


If you dug around and read a bit from those links above, you may have noticed something – the “magic number” or “magic byte(s)”. The magic of those 2 bytes (it can be larger) is that it’s actually the file (type) signature or what tells the OS and other applications, the expected contents or type of file it is. Apologies if that’s a bit wordy.

Now that we have taken a quick look at the resource for “steam.exe”; Have you ever wondered how 7-Zip knows something is an archive that it can extract? You might think it’s looking at the structure of the file, and while possible, it would waste precious resources to do so; and it can be validated during extraction. What it’s actually doing, is looking for the file signature or those magic bytes at the beginning of a file used to identify what kind of file it is without relying on the file extension. Neat huh? You could change the extension to “.8z” and it would still extract just the same.

So, if you ever wondered how the OS knows a given file is an executable even if you remove the .exe file extension, now you know! The beauty of this is that not only can we have some fun with it, we can use them to identify files embedded within other files! One fun things you can do by manipulating the magic bytes is to spoof a file type to circumvent file uploads that filter the file types you may upload.

Getting a PHP Reverse Shell through an image upload web page exploit via magic byte manipulation.

Above we are looking at “steam.exe” but executable files aren’t the only file type with a file signature. The specific byte pattern 7-Zip uses to identify a “.7Z” file is “37 7A BC AF 27 1C” in hexadecimal, which if we take a look at an archive with a hex editor, it looks like this:

The same is true for Windows Executable (.exe) files, with a different signature. In this case it actually shares the same signature with a few other file types – COM, DLL, DRV, EXE, PIF, QTS, QTX, and SYS.

Fun fact: “MZ” are the initials of Mark Zbikowski, one of leading developers of MS-DOS.

For anyone that is curious, the hex editor I’m using is ImHex and it’s by far (to date) my absolute favorite hex editor. It has a ton of cool and useful feature, including analysis, pattern searching, YARA rules and its own custom language “PatternLanguage“. You should check it out if your looking to find or try a new open source hex editor out.

Finding Known Signatures


Since we now understand what they are, how many are there? basically infinite since the magic bytes are decided by the creator / developer of a given file type. There are quite a few common ones you will usually find: exe, dll, com, jpg, png, etc. This is where its nice to have a list of known signature you can find:

Keep in mind, if you can’t find a signature for something publicly, you can “create” your own. A signature is just a unique byte pattern that can be used to identify a file or anything else. Hopefully its at the start or end of a file. Otherwise you’re looking for any pattern that spans multiple files of the same type. (ie. you see XY or ’58 59′ at the beginning of every file).

Discovering Data: Binwalk


GitHub Repository – https://github.com/ReFirmLabs/binwalk

Now that we understand what file signatures are and how they work, we actually already understand how Binwalk works to identify embedded files! Originally written to find embedded data within firmware files (usually .bin) it can be used on any file! So lets take a look at steam.exe again, and see what binwalk finds.

Just above is another example using my motherboards BIOS update file – Just to show you some other things it finds. You can also check this GitHub page to view the most up to date signatures binwalk detects.

$ binwalk --list
----------------------------------------------------------------------------------------------------
Signature Description              Signature Name                     Extraction Utility
----------------------------------------------------------------------------------------------------
7-zip archive data                 7zip                               7zz
AES Acceleration Table             aes_acceleration_table             None
AES Forward Table                  aes_forward_table                  None
AES RCON                           aes_rcon                           None
AES Reverse Table                  aes_reverse_table                  None
AES S-Box                          aes_sbox                           None
Android boot image                 android_bootimg                    None
Android sparse image               android_sparse                     Built-in
Apple Disk iMaGe                   dmg                                dmg2img
APple File System                  apfs                               7zz
Arcadyan obfuscated LZMA           arcadyan                           Built-in
ARJ archive data                   arj                                7zz
Autel obfuscated firmware          autel                              Built-in
BIN firmware header                binhdr                             None
BMP image (Bitmap)                 bmp                                Built-in
BTRFS file system                  btrfs                              None
bzip2 compressed data              bzip2                              Built-in
CFE bootloader                     cfe                                None
CHK firmware header                chk                                None
compress'd data                    compressd                          7zz
Copyright text                     copyright                          None
CPIO ASCII archive                 cpio                               7zz
CramFS filesystem                  cramfs                             7zz
CRC32 polynomial table             crc32                              None
CSman DAT file                     csman                              Built-in
D-Link Encrpted Image              encrpted_img                       Built-in
D-Link MH01 firmware image         mh01                               Built-in
D-Link TLV firmware                dlink_tlv                          Built-in
Dahua ZIP archive                  dahua_zip                          Built-in
Debian package file                deb                                None
Device tree blob (DTB)             dtb                                Built-in
DirectX shader bytecode            dxbc                               Built-in
DKBS firmware header               dkbs                               Built-in
DLK encrypted firmware             dlke                               Built-in
DLOB firmware header               dlob                               None
DMS firmware image                 dms                                Built-in
DOS Master Boot Record             mbr                                Built-in
DPAPI blob data                    dpapi                              None
eCos kernel exception handler      ecos                               None
EFI Global Partition Table         efigpt                             7zz
ELF binary                         elf                                None
EXT filesystem                     ext                                tsk_recover
FAT file system                    fat                                tsk_recover
GIF image                          gif                                Built-in
GPG signed file                    gpg_signed                         Built-in
gzip compressed data               gzip                               Built-in
HP Printer Job Language data       pjl                                None
Intel serial flash for PCH ROM     pchrom                             uefi-firmware-parser
ISO9660 primary volume             iso9660                            7zz
JBOOT firmware header              jboot_arm                          None
JBOOT SCH2 header                  jboot_sch2                         Built-in
JBOOT STAG header                  jboot_stag                         None
JFFS2 filesystem                   jffs2                              jefferson
JPEG image                         jpeg                               Built-in
Known encrypted firmware           encfw                              Built-in
Linux ARM boot executable zImage   linux_arm_zimage                   None
Linux kernel ARM64 boot image      linux_arm64_boot_image             None
Linux kernel boot image            linux_boot_image                   None
Linux kernel version               linux_kernel                       vmlinux-to-elf
LogFS file system                  logfs                              None
LUKS header                        luks                               None
LZ4 compressed data                lz4                                lz4
LZFSE compressed data              lzfse                              lzfse
LZMA compressed data               lzma                               Built-in
LZO compressed data                lzop                               lzop
Matter OTA firmware                matter_ota                         Built-in
MD5 hash constants                 md5                                None
Microsoft Cabinet archive          cab                                cabextract
Motorola S-record                  srecord                            srec_cat
Motorola S-record (generic)        srecord_generic                    srec_cat
NTFS partition                     ntfs                               tsk_recover
OpenSSL encryption                 openssl                            Built-in
PackImg firmware header            packimg                            None
Pcap-NG capture file               pcapng                             Built-in
PDF document                       pdf                                None
PEM certificate                    pem_certificate                    Built-in
PEM private key                    pem_private_key                    Built-in
PEM public key                     pem_public_key                     Built-in
PKCS DER hash                      pkcs_der_hash                      None
PNG image                          png                                Built-in
POSIX tar archive                  tarball                            tar
QEMU QCOW Image                    qcow                               None
QNX IFS image                      qnx_ifs                            dumpifs
RAR archive                        rar                                unrar
RIFF image                         riff                               Built-in
RomFS filesystem                   romfs                              Built-in
RSA encrypted session key          rsa                                None
RTK firmware header                rtk                                None
SEAMA firmware header              seama                              None
SHA256 hash constants              sha256                             None
SHRS encrypted firmware            shrs                               Built-in
SquashFS file system               squashfs                           sasquatch
SVG image                          svg                                Built-in
TP-Link firmware header            tplink                             None
TP-Link RTOS firmware              tplink_rtos                        None
TRX firmware image                 trx                                Built-in
U-Boot version string              uboot                              None
UBI image                          ubi                                ubireader_extract_images
UBIFS image                        ubifs                              ubireader_extract_files
UEFI capsule image                 uefi_capsule                       uefi-firmware-parser
UEFI PI firmware volume            uefi_pi_volume                     uefi-firmware-parser
uImage firmware image              uimage                             Built-in
VxWorks symbol table               vxworks_symtab                     Built-in
VxWorks WIND kernel version        wind_kernel                        None
Windows CE binary image            wince                              Built-in
Windows PE binary                  pe                                 None
XZ compressed data                 xz                                 Built-in
YAFFSv2 filesystem                 yaffs                              unyaffs
ZIP archive                        zip                                7zz
Zlib compressed file               zlib                               Built-in
ZSTD compressed data               zstd                               zstd
----------------------------------------------------------------------------------------------------
Total signatures: 111
Extractable signatures: 72

That’s quite a bit of stuff output from running binwalk on steme.exe. As you can see, binwalk can identify tons of different file types. So not only can it be used to see whats in the file, it also tells you where in the file. This allow us to use something like “dd” in Linux to extract the files ourself if we would like to, or utilize binwalk’s extraction features.

Binwalk: Extraction


Ok, we finally made it to the important part! Getting the data we want out. Below are the arguments I like to use most of the time. As well as a screenshot of what the output files look like.

binwalk -MDe steam.exe

-M, –matryoshka Recursively scan extracted files
-D, –dd= Extract signatures (regular expression), give the files an extension of , and execute
-e, –extract Automatically extract known file types

brock@DESKTOP-V89BI36:/mnt/c/Program Files (x86)/Steam$ sudo binwalk -MDe --run-as=root steam.exe
[sudo] password for brock:

Scan Time:     2025-12-06 21:02:41
Target File:   /mnt/c/Program Files (x86)/Steam/steam.exe
MD5 Checksum:  0384cab4ffa96ec5973f99cd18852e75
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Microsoft executable, portable (PE)
17408         0x4400          SHA256 hash constants, little endian
3341080       0x32FB18        Base64 standard index table
3350660       0x332084        Copyright string: "Copyright (C) 2010, Thomas G. Lane, Guido Vollbeding"
3382176       0x339BA0        HTML document header
3382214       0x339BC6        HTML document footer
3383640       0x33A158        CRC32 polynomial table, little endian
3391876       0x33C184        OpenSSH RSA1 private key, version "519"
3434600       0x346868        Copyright string: "Copyright Violation"
3434976       0x3469E0        Zip archive data, at least v2.0 to extract, name: z
3435056       0x346A30        End of Zip archive, footer length: 22
3454464       0x34B600        HTML document header
3454534       0x34B646        HTML document footer
3476484       0x350C04        PEM certificate
3481184       0x351E60        Base64 standard index table
3807952       0x3A1AD0        AES Inverse S-Box
3935300       0x3C0C44        Ubiquiti firmware header, third party, ~CRC32: 0x0, version: "SSL_ENGINES"
3968944       0x3C8FB0        Base64 standard index table
4343312       0x424610        PNG image, 256 x 256, 8-bit/color RGBA, non-interlaced
4343353       0x424639        Zlib compressed data, best compression
4407492       0x4340C4        Boot section Start 0x42424242 End 0x9E4142
4407497       0x4340C9        Boot section Start 0x9E41 End 0x0
4413224       0x435728        PNG image, 256 x 256, 8-bit/color RGBA, non-interlaced
4413265       0x435751        Zlib compressed data, best compression
4484636       0x446E1C        PNG image, 256 x 256, 8-bit/color RGBA, non-interlaced
4484677       0x446E45        Zlib compressed data, best compression
4554519       0x457F17        XML document, version: "1.0"
4689416       0x478E08        Object signature in DER format (PKCS header length: 4, sequence length: 11404
4689557       0x478E95        Certificate in DER format (x509 v3), header length: 4, sequence length: 1421
4690982       0x479426        Certificate in DER format (x509 v3), header length: 4, sequence length: 1424
4692410       0x4799BA        Certificate in DER format (x509 v3), header length: 4, sequence length: 1712
4694126       0x47A06E        Certificate in DER format (x509 v3), header length: 4, sequence length: 1716
4695846       0x47A726        Certificate in DER format (x509 v3), header length: 4, sequence length: 1753
4697603       0x47AE03        Certificate in DER format (x509 v3), header length: 4, sequence length: 1773


Scan Time:     2025-12-06 21:02:45
Target File:   /mnt/c/Program Files (x86)/Steam/_steam.exe-6.extracted/424639
MD5 Checksum:  cf93e82f08b20e896d044bdbe1bf4b2c
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------


Scan Time:     2025-12-06 21:02:45
Target File:   /mnt/c/Program Files (x86)/Steam/_steam.exe-6.extracted/435751
MD5 Checksum:  ce8b997ea29fd4cb68776af65b1a49af
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------


Scan Time:     2025-12-06 21:02:45
Target File:   /mnt/c/Program Files (x86)/Steam/_steam.exe-6.extracted/446E45
MD5 Checksum:  19115af34f22073c014e5de1d7b26e3a
Signatures:    411

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
229393        0x38011         MySQL MISAM index file Version 1

You might have noticed, the files are named the base offset address of the data. That just means its telling us how far from the start of the file the data begins; So from the start of the file, we move X bytes, and begin reading until (in this case) we hit the beginning of the next detected file signature.

Binwalk: Command Arguments


Binwalk v2.3.3
Craig Heffner, ReFirmLabs
https://github.com/ReFirmLabs/binwalk

Usage:
binwalk [OPTIONS] [FILE1] [FILE2] [FILE3] …

-B, –signature Scan target file(s) for common file signatures
-R, –raw= Scan target file(s) for the specified sequence of bytes
-A, –opcodes Scan target file(s) for common executable opcode signatures
-m, –magic= Specify a custom magic file to use
-b, –dumb Disable smart signature keywords
-I, –invalid Show results marked as invalid
-x, –exclude= Exclude results that match
-y, –include= Only show results that match

-e, –extract Automatically extract known file types
-D, –dd= Extract signatures (regular expression), give the files an extension of , and execute
-M, –matryoshka Recursively scan extracted files
-d, –depth= Limit matryoshka recursion depth (default: 8 levels deep)
-C, –directory= Extract files/folders to a custom directory (default: current working directory)
-j, –size= Limit the size of each extracted file
-n, –count= Limit the number of extracted files
-0, –run-as= Execute external extraction utilities with the specified user’s privileges
-1, –preserve-symlinks Do not sanitize extracted symlinks that point outside the extraction directory (dangerous)
-r, –rm Delete carved files after extraction
-z, –carve Carve data from files, but don’t execute extraction utilities
-V, –subdirs Extract into sub-directories named by the offset

-E, –entropy Calculate file entropy
-F, –fast Use faster, but less detailed, entropy analysis
-J, –save Save plot as a PNG
-Q, –nlegend Omit the legend from the entropy plot graph
-N, –nplot Do not generate an entropy plot graph
-H, –high= Set the rising edge entropy trigger threshold (default: 0.95)
-L, –low= Set the falling edge entropy trigger threshold (default: 0.85)

-W, –hexdump Perform a hexdump / diff of a file or files
-G, –green Only show lines containing bytes that are the same among all files
-i, –red Only show lines containing bytes that are different among all files
-U, –blue Only show lines containing bytes that are different among some files
-u, –similar Only display lines that are the same between all files
-w, –terse Diff all files, but only display a hex dump of the first file

-X, –deflate Scan for raw deflate compression streams
-Z, –lzma Scan for raw LZMA compression streams
-P, –partial Perform a superficial, but faster, scan
-S, –stop Stop after the first result

-l, –length= Number of bytes to scan
-o, –offset= Start scan at this file offset
-O, –base= Add a base address to all printed offsets
-K, –block= Set file block size
-g, –swap= Reverse every n bytes before scanning
-f, –log= Log results to file
-c, –csv Log results to file in CSV format
-t, –term Format output to fit the terminal window
-q, –quiet Suppress output to stdout
-v, –verbose Enable verbose output
-h, –help Show help output
-a, –finclude= Only scan files whose names match this regex
-p, –fexclude= Do not scan files whose names match this regex
-s, –status= Enable the status server on the specified port

Conclusion


So for now, I think that’s all I have! Play around a bit with the extraction arguments as there are a few different and useful ways you can have everything extracted. Otherwise, using “dd” in linux is by far the easiest if you are just looking to extract a single artifact.

I hope this was at least interesting and something more people will play with as it can be quite a lot of fun.