Giter Site home page Giter Site logo

cisco-talos / clamav-large-archive-scanner Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 2.0 133 KB

This project extends the ClamAV software capability to be able to extract and scan the contents of archives greater than 2GB. ClamAV is unable to scan files larger than 2GB.

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 1.85% Python 98.15%

clamav-large-archive-scanner's Introduction

ClamAV Large Archive Scanner

The ClamAV Large Archive Scanner utility is a wrapper around the ClamAV clamd and clamdscan programs that provides a way to scan archives which exceed ClamAV's maximum file size limit. At the time of writing (2024/03/09), ClamAV may not scan any file or archive larger than 2 GiB.

Important: This utility is a workaround to supplement ClamAV until such time as archives larger than 2 GiB can be scanned. This utility is not intended to replace clamscan or clamdscan. We have no intention of providing feature parity between this utility and clamscan or clamdscan.

This utility works around ClamAV's file size limitations for non-archive files. It will not enable you scan large documents, graphics, videos, etc. In case you were wondering if large files could be chunked into smaller files and then scanned... No. That is not an effective solution to scan large files.

The utility has three sub-commands: scan, unpack, and cleanup.

The scan command combines the other two commands to unpack, scan, and clean up.

The unpack command provides the ability extract archives or mount disk images of the supported archive types without scanning them.

The cleanup command is complementary to the unpack command, enabling you to easily un-mount or delete the extracted archive contents.

Supported Archive Types

The ClamAV Large Archive Scanner supports extraction or mounting of the following types of archives:

  • TAR
  • ZIP
  • ISO
  • VMDK
  • TARGZ
  • QCOW2

Installation

We provide two options for installation. You may run the utility in your local environment or you may run the utility in a Docker container. The Docker container is easier.

Running in Your Local Environment

To use the ClamAV Large Archive Scanner in your local environment, you will need to install an assortment of supporting tools and libraries:

  • Install Python 3.9 or newer.

  • Install the required Python packages. We suggest using a venv virtual environment. From the project root directory, run the following:

    python3 -m venv .venv
    source .venv/bin/activate
    pip3 install .

    If you open a new terminal, you will need to reactivate your Python virtual environment again, using: source .venv/bin/activate

  • Install ClamAV. Both clamd and clamdscan are required. On some Linux distributions, these are packaged separately. You can verify that they are present by running which clamd and which clamdscan.

  • Install libmagic which is required to determine file types.

  • Install libguestfs which is needed to unpack VMDK/QCOW2 disk images.

You will need to start the clamd service before you can use the ClamAV Large Archive Scanner. This may require some initial configuration to include using freshclam to download the latest malware detection signatures. See the ClamAV documentation for more information on how to set up ClamAV.

Regarding clamd.conf config options, you must set the LocalSocket option (or TCPSocket option), at a minimum. On some systems, this is preconfigured. For the ClamAV Large Archive Scanner project, the goal is to scan extremely large archives, so you'll also need to add the following settings to max out ClamAV's file size capabilities:

MaxFileSize 0
MaxScanSize 0
MaxScanTime 3600000
MaxFiles 100000
MaxRecursion 20

Finally, you may also wish to raise an alert if the limits have been exceeded. You can do so by adding this option:

AlertExceedsMax yes

Note: Regarding the selected config options...

  • MaxFileSize 0 - This maxes out the file size limit for ClamAV.

  • MaxScanSize 0 - This maxes out the scan size limit. The scan size is the total amount of bytes scanned per file when extracting files from archives, decompressing embedded files, normalizing scripts, or even re-scanning the same data as a different type. The total number of bytes scanned is often much larger than the original file size, even for plain text files.

  • MaxScanTime 3600000 - This increases the scan time limit per file scanned to 1 hour (60 x 60 x 1000 milliseconds). Scanning large archives will take a long time. You may wish to increase or disable this limit.

  • MaxFiles 100000 - This increases the limit for the number of embedded files scanned. You may wish to increase or disable this limit.

  • MaxRecursion 20 - This increases the maximum recursion depth from 17 to 20. Scan recursion is the process of unpacking and scanning embedded files. Files unpacked by the Large Archive Scanner before passing to ClamAV for scanning do not count towards the maximum recursion depth. The maximum recursion depth cannot be disabled.

  • AlertExceedsMax yes - This option will cause scans to alert when a scan limit was exceeded, with the signature names like:

    • Heuristics.Limits.Exceeded.MaxFileSize
    • Heuristics.Limits.Exceeded.MaxScanSize
    • Heuristics.Limits.Exceeded.MaxFiles
    • Heuristics.Limits.Exceeded.MaxRecursion
    • Heuristics.Limits.Exceeded.MaxScanTime

See the clamd.conf.sample config for more details.

After you've installed everything and have started clamd, you may run a scan with the ClamAV Large Archive Scanner. For example:

archive scan /path/to/archive

Tip: If you have multiple ClamAV installations outside the normal $PATH, you may need to add the bin directory for your preferred version to the $PATH before attempting a scan.

For example:

PATH=~/clams/1.3.0/bin:$PATH

archive scan /path/to/archive

To learn more, run archive --help, or skip to Usage.

Running in a Docker Container

The simplest way to use the ClamAV Large Archive Scanner is using a Docker container.

The provided Dockerfile may be used to build an container with the environment and tools necessary to run this utility. This Dockerfile also increases the ClamAV scan limits described in the previous section.

This Docker container is based on the ClamAV project's clamav-debian image. You can find additional instructions for how to customize and use this container here.

Note: Privileged mode will be needed for to mount ISO archives when they are unpacked.

To build the image, run:

docker build . -t clamav-large-archive-scanner --load

To start the container, run:

docker run \
    --interactive \
    --tty \
    --rm \
    --name "clam_container_01" \
    clamav-large-archive-scanner

Tip: You may wish to mount a /some/path containing the archives you wish to scan as a volume in the container. You can do so when starting the container, like this. Replace /some/path with the actual directory you wish to mount. The directory contents will be found within the running container under /target:

docker run \
    --interactive \
    --tty \
    --rm \
    --mount type=bind,source=/some/path,target=/target \
    --name "clam_container_01" \
    clamav-large-archive-scanner

After the container is up and running, and after clamd has finished loading, you may execute commands in the running container, or open a shell in the running container to execute commands.

Tip: Within the container, you can use clamdscan --ping 100 to wait up to 100 second for clamd to finish loading. If clamd takes longer than 100 seconds to load, or fails to load, then the command will exit with a non-zero exit code. For example:

docker exec --interactive --tty "clam_container_01" clamdscan --ping 100

Suppose you used the --mount option to mount a directory at /target containing some_archive.tgz, you might try scanning it like this:

docker exec --interactive --tty "clam_container_01" archive scan /target/some_archive.tgz

Or, to enter a shell in the container, run:

docker exec --interactive --tty "clam_container_01" /bin/bash

To shut down the container, run:

docker kill clam_container_01

Usage

Usage: archive [OPTIONS] COMMAND [ARGS]...

Options:
  -t, --trace        Enable trace logging. By default, log all actions to
                     /tmp/clam_unpacker.log
  --trace-file PATH  Override the default trace log file
  -v, --verbose      Enable verbose logging
  -q, --quiet        Disable all logging
  --help             Show this message and exit.

Commands:
  cleanup
  scan
  unpack

Commands

  • scan

    This command is used to scan regular files and directories.

    The scan command combines the other two commands to unpack, scan, and clean up.

    Use the following options to customize scan behavior:

    Usage: archive scan [OPTIONS] PATH
    
    Options:
      --min-size TEXT   Minimum file size to unpack (default: 2.0 GiB).
      --ignore-size     Ignore file size lower limit (equivalent to --min-size=0).
      --tmp-dir PATH    Temporary working directory (default: /tmp).
      -ff, --fail-fast  Stop scanning after the first failure.
      --allmatch        Continue scanning if a signature match occurs.
      --help            Show this message and exit.
    
  • unpack

    This command unpacks or mounts supported large archives to a given directory. By default, a "large" archive is a one greater than 2 GiB. This action is recursive.

    Archives smaller than 2 GiB will be skipped. You may use --ignore-size or --min-size=0 unpack all supported archives, regardless of size.

    Usage: archive unpack [OPTIONS] PATH
    
    Options:
      -r, --recursive  Recursively unpack files.
      --min-size TEXT  Minimum file size to unpack (default: 2.0 GiB).
      --ignore-size    Ignore file size lower limit (equivalent to --min-size=0).
      --tmp-dir PATH   Directory to unpack files to (default: /tmp).
      --help           Show this message and exit.
    
  • cleanup

    This command will clean up the temp directories/files created as part of the script to scan input file or directory.

    Usage: archive cleanup [OPTIONS] PATH
    
    Options:
      --file          Recursively cleanup directories associated with the file.
      --tmp-dir PATH  Directory to search for unpacked files(default: /tmp).
      --help          Show this message and exit.
    

Examples

Using the scan command to scan an archive:

archive -t -v scan /path/to/archive

Using the unpack command to unpack and archive:

archive -t -v unpack /path/to/archive

Contributing

There are many ways to contribute.

Unit Tests

This repo includes some tests to verify correct functionality. You can run the tests from your local environment or within the running Docker container.

  1. First install ClamAV Large Archive Scanner utility one of the two ways.

  2. Then run this to install the test prerequisites:

source .venv/bin/activate
pip3 install -r ./src/clamav_large_archive_scanner/test/requirements.txt
  1. Now run the unit tests:
pytest -v

License

This project is licensed under the BSD 3-Clause license.

clamav-large-archive-scanner's People

Contributors

micahsnyder avatar rasundri avatar yanbzhu-cisco avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

clamav-large-archive-scanner's Issues

Multiple scan paths are not supported

The scan command only handles one scan target.

E.g. this will fail

❯ archive scan ./*
Usage: archive scan [OPTIONS] PATH
Try 'archive scan --help' for help.

Error: Got unexpected extra arguments (./clam.zip ./CODE_OF_CONDUCT.md ./CONTRIBUTING.md ./Dockerfile ./LICENSE ./pyproject.toml ./README.md ./SECURITY.md ./src)

It would be convenient to be able to pass multiple files to be scanned. But it is also fine for now that it does not. We just need to adjust the documentation and help-message to make it clear how to use it.

If file size less than min-size, should still scan.

If the file size is less than min-size (default 2GiB) the scan skips it. Note, this "clam.zip" file contains the clam.exe test program which would alert with Clamav.Test.File-6 FOUND:

❯ docker exec --interactive --tty "clam_container_01" archive scan /target/clam.zip
24.03.13 21:53:57: root: WARNING: File size is below the threshold of 2.0 GiB, not unpacking. See help for options
24.03.13 21:53:57: root: INFO   : ================================================================================
24.03.13 21:53:57: root: INFO   : No malware found by clamdscan, all clear!
24.03.13 21:53:57: root: INFO   : ================================================================================

Note that if I run with --ignore-size, it does scan:


~ via 🐍 v3.8.10
❯ docker exec --interactive --tty "clam_container_01" archive scan --ignore-size /target/clam.zip
24.03.13 21:53:47: root: WARNING: Ignoring unhandled large file: /tmp/clam_unpacker_zip_clam.zip_ht6dv3lp/clam.exe
24.03.13 21:53:47: root: INFO   : Found and unpacked the following:
24.03.13 21:53:47: root: INFO   : clam.zip -> /tmp/clam_unpacker_zip_clam.zip_ht6dv3lp
24.03.13 21:53:47: root: INFO   : Scanning clam.zip
24.03.13 21:53:47: root: WARNING: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
24.03.13 21:53:47: root: WARNING: Malware found by clamdscan in file clam.zip:
24.03.13 21:53:47: root: WARNING: clam.zip/clam.exe: Clamav.Test.File-6 FOUND

----------- SCAN SUMMARY -----------
Infected files: 1
Time: 0.034 sec (0 m 0 s)
Start Date: 2024:03:13 21:53:47
End Date:   2024:03:13 21:53:47

24.03.13 21:53:47: root: WARNING: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error: Found virus in /target/clam.zip

In short, for files less than the unpack "min-size", It should skip unpacking through this tool, but should still scan it.

CVE-2023-38545 - Libcurl

The base image set in the Dockerfile clamav/clamav-debian:1.3 contains libcurl.so.4.7.0. That is being flagged with CVE-2023-38545. The base image should be update and patched. Thank you.

root@18310c7298b3:/usr/lib/x86_64-linux-gnu# ls -lah libcur*            
lrwxrwxrwx. 1 root root   19 Dec 10 06:05 libcurl-gnutls.so.3 -> libcurl-gnutls.so.4
lrwxrwxrwx. 1 root root   23 Dec 10 06:05 libcurl-gnutls.so.4 -> libcurl-gnutls.so.4.7.0
-rw-r--r--. 1 root root 610K Dec 10 06:05 libcurl-gnutls.so.4.7.0
lrwxrwxrwx. 1 root root   16 Dec 10 06:05 libcurl.so.4 -> libcurl.so.4.7.0
-rw-r--r--. 1 root root 618K Dec 10 06:05 libcurl.so.4.7.0
root@18310c7298b3:/usr/lib/x86_64-linux-gnu# dpkg -l | grep curl
ii  curl                                 7.74.0-1.3+deb11u11                    amd64        command line tool for transferring data with URL syntax
ii  libcurl3-gnutls:amd64                7.74.0-1.3+deb11u11                    amd64        easy-to-use client-side URL transfer library (GnuTLS flavour)
ii  libcurl4:amd64                       7.74.0-1.3+deb11u11                    amd64        easy-to-use client-side URL transfer library (OpenSSL flavour)

If the file type is not supported by the large archive scanner, and is less then the min-size, it should still scan it.

If the file type is not supported for unpacking, this tool skips it:

❯ docker exec --interactive --tty "clam_container_01" archive scan --ignore-size /target/README.md
Usage: archive scan [OPTIONS] PATH
Try 'archive scan --help' for help.

Error: Invalid value: Unhandled file type: FileType.UNKNOWN

File types not supported for archive extraction should still be scanned if they are less than the min-size.

Scanning directories is not supported

While testing #6, I encountered this issue:

clamav-large-archive-scanner on  main [?] is 📦 v0.1.0 via 🐍 v3.8.10 (.venv) took 10s
❯ archive scan .
Traceback (most recent call last):
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/bin/archive", line 8, in <module>
    sys.exit(cli())
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/clamav_large_archive_scanner/main.py", line 184, in scan
    _scan(path, min_size, ignore_size, fail_fast, allmatch, tmp_dir)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/clamav_large_archive_scanner/main.py", line 153, in _scan
    unpacked_ctxs = _unpack(path, True, min_size, ignore_size, tmp_dir)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/clamav_large_archive_scanner/main.py", line 99, in _unpack
    unpack_ctxs = unpacker.unpack_recursive(file_meta, min_file_size, tmp_dir)
  File "/home/micah/workspace/clamav-large-archive-scanner/.venv/lib/python3.9/site-packages/clamav_large_archive_scanner/lib/unpack.py", line 204, in unpack_recursive
    for root, _, files in os.walk(ctx_to_inspect.unpacked_dir_location):
  File "/usr/lib/python3.9/os.py", line 342, in walk
    return _walk(fspath(top), topdown, onerror, followlinks)
TypeError: expected str, bytes or os.PathLike object, not NoneType

I hadn't realized until now that the scan command only handles one file at a time. You cannot scan a directory, or pass in multiple paths to one scan command.

That's okay for now, though we should emit an error if you attempt to scan a directory and should properly document it.

Set root logger name

When running the program you will see "root" as name of the application in the log instead of something recognizable:

24.03.10 00:42:39: root: WARNING: File size is below the threshold of 2.0 GiB, not unpacking. See help for options
24.03.10 00:42:39: root: INFO   : ================================================================================
24.03.10 00:42:39: root: INFO   : No malware found by clamdscan, all clear!
24.03.10 00:42:39: root: INFO   : ================================================================================

We should set it to something recognizable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.