Giter Site home page Giter Site logo

file-format's Introduction

file-format

Build Crates.io Docs Rust License

Crate for determining the file format of a given file or stream.

It provides a variety of functions for identifying a wide range of file formats, including ZIP, Compound File Binary (CFB), Extensible Markup Language (XML) and more.

It checks the signature of the file to determine its format and intelligently employs specific readers when available for accurate identification. If the signature is not recognized, the crate falls back to the default file format, which is Arbitrary Binary Data (BIN).

Examples

Determines from a file:

use file_format::{FileFormat, Kind};

let fmt = FileFormat::from_file("fixtures/document/sample.pdf")?;
assert_eq!(fmt, FileFormat::PortableDocumentFormat);
assert_eq!(fmt.name(), "Portable Document Format");
assert_eq!(fmt.short_name(), Some("PDF"));
assert_eq!(fmt.media_type(), "application/pdf");
assert_eq!(fmt.extension(), "pdf");
assert_eq!(fmt.kind(), Kind::Document);

Determines from bytes:

use file_format::{FileFormat, Kind};

let fmt = FileFormat::from_bytes(&[0xFF, 0xD8, 0xFF]);
assert_eq!(fmt, FileFormat::JointPhotographicExpertsGroup);
assert_eq!(fmt.name(), "Joint Photographic Experts Group");
assert_eq!(fmt.short_name(), Some("JPEG"));
assert_eq!(fmt.media_type(), "image/jpeg");
assert_eq!(fmt.extension(), "jpg");
assert_eq!(fmt.kind(), Kind::Image);

Usage

Add this to your Cargo.toml:

[dependencies]
file-format = "0.25"

Crate features

All features below are disabled by default.

Reader features

These features enable the detection of file formats that require a specific reader for identification.

  • reader - Enables all reader features.
  • reader-asf - Enables Advanced Systems Format (ASF) based file formats detection.
  • reader-cfb - Enables Compound File Binary (CFB) based file formats detection.
  • reader-ebml - Enables Extensible Binary Meta Language (EBML) based file formats detection.
  • reader-exe - Enables MS-DOS Executable (EXE) based file formats detection.
  • reader-mp4 - Enables MPEG-4 Part 14 (MP4) based file formats detection.
  • reader-pdf - Enables Portable Document Format (PDF) based file formats detection.
  • reader-rm - Enables RealMedia (RM) based file formats detection.
  • reader-sqlite3 - Enables SQLite 3 based file formats detection.
  • reader-txt - Enables Plain Text (TXT) file format detection.
  • reader-xml - Enables Extensible Markup Language (XML) based file formats detection.
  • reader-zip - Enables ZIP-based file formats detection.

Supported file formats

Archive

  • 7-Zip (7Z)
  • ACE
  • ALZ
  • Archived by Robert Jung (ARJ)
  • Cabinet (CAB)
  • Extensible Archive (XAR)
  • LArc (LZS)
  • LHA
  • Mozilla Archive (MAR)
  • Multi Layer Archive (MLA)
  • PMarc (PMA)
  • Roshal Archive (RAR)
  • SeqBox (SBX)
  • Squashfs
  • StuffIt (SIT)
  • StuffIt X (SITX)
  • Tape Archive (TAR)
  • UNIX archiver (archiver)
  • Windows Imaging Format (WIM)
  • ZIP
  • ZPAQ
  • cpio
  • zoo

Audio

  • 8-Bit Sampled Voice (8SVX)
  • Adaptive Multi-Rate (AMR)
  • Advanced Audio Coding (AAC)
  • Apple iTunes Audio (M4A)
  • Apple iTunes Audiobook (M4B)
  • Apple iTunes Protected Audio (M4P)
  • Au
  • Audio Codec 3 (AC-3)
  • Audio Interchange File Format (AIFF)
  • Audio Visual Research (AVR)
  • Creative Voice (VOC)
  • FastTracker 2 Extended Module (XM)
  • Flash MP4 Audio (F4A)
  • Flash MP4 Audiobook (F4B)
  • Free Lossless Audio Codec (FLAC)
  • Impulse Tracker Module (IT)
  • MPEG-1/2 Audio Layer 2 (MP2)
  • MPEG-1/2 Audio Layer 3 (MP3)
  • MPEG-4 Part 14 Audio (MP4)
  • Matroska Audio (MKA)
  • Monkey's Audio (APE)
  • Musepack (MPC)
  • Musical Instrument Digital Interface (MIDI)
  • Ogg FLAC (OGA)
  • Ogg Opus (Opus)
  • Ogg Speex (Speex)
  • Ogg Vorbis (Vorbis)
  • Qualcomm PureVoice (QCP)
  • Quite OK Audio (QOA)
  • RealAudio (RA)
  • Scream Tracker 3 Module (S3M)
  • Sony DSD Stream File (DSF)
  • SoundFont 2 (SF2)
  • Ultimate Soundtracker Module (MOD)
  • WavPack (WV)
  • Waveform Audio (WAV)
  • Windows Media Audio (WMA)

Compressed

  • BZip3 (BZ3)
  • LZ4
  • Lempel-Ziv Finite State Entropy (LZFSE)
  • Lempel-Ziv-Markov chain algorithm (LZMA)
  • Long Range ZIP (LRZIP)
  • Snappy
  • UNIX compress (compress)
  • XZ
  • Zstandard (zstd)
  • bzip (BZ)
  • bzip2 (BZ2)
  • gzip (GZ)
  • lzip (LZ)
  • lzop (LZO)
  • rzip (RZ)

Database

  • Microsoft Access 2007 Database (ACCDB)
  • Microsoft Access Database (MDB)
  • Microsoft Works Database (WDB)
  • OpenDocument Database (ODB)
  • SQLite 3

Diagram

  • Circuit Diagram Document (CDDX)
  • Microsoft Visio Drawing (VSD)
  • Office Open XML Drawing (VSDX)
  • StarChart (SDS)
  • draw.io (DRAWIO)

Disk

  • Amiga Disk File (ADF)
  • Apple Disk Image (DMG)
  • ISO 9660 (ISO)
  • Microsoft Virtual Hard Disk (VHD)
  • Microsoft Virtual Hard Disk 2 (VHDX)
  • QEMU Copy On Write (QCOW)
  • Virtual Machine Disk (VMDK)
  • VirtualBox Virtual Disk Image (VDI)

Document

  • AbiWord (ABW)
  • AbiWord Template (AWT)
  • Adobe InDesign Document (INDD)
  • DjVu
  • InDesign Markup Language (IDML)
  • LaTeX (TeX)
  • Microsoft Publisher Document (PUB)
  • Microsoft Word Document (DOC)
  • Microsoft Works Word Processor (WPS)
  • Microsoft Write (WRI)
  • Office Open XML Document (DOCX)
  • OpenDocument Text (ODT)
  • OpenDocument Text Master (ODM)
  • OpenDocument Text Master Template (OTM)
  • OpenDocument Text Template (OTT)
  • OpenXPS (OXPS)
  • Portable Document Format (PDF)
  • PostScript (PS)
  • Rich Text Format (RTF)
  • StarWriter (SDW)
  • Sun XML Writer (SXW)
  • Sun XML Writer Global (SGW)
  • Sun XML Writer Template (STW)
  • Uniform Office Format Text (UOT)
  • WordPerfect Document (WPD)

Ebook

  • Broad Band eBook (BBeB)
  • Electronic Publication (EPUB)
  • FictionBook (FB2)
  • FictionBook ZIP (FBZ)
  • Microsoft Reader (LIT)
  • Mobipocket (MOBI)

Executable

  • Commodore 64 Program (PRG)
  • Common Object File Format (COFF)
  • Dalvik Executable (DEX)
  • Dynamic Link Library (DLL)
  • Executable and Linkable Format (ELF)
  • Java Class
  • LLVM Bitcode (BC)
  • Linear Executable (LE)
  • Lua Bytecode
  • MS-DOS Executable (EXE)
  • Mach-O
  • New Executable (NE)
  • Nintendo Switch Executable (NSO)
  • Optimized Dalvik Executable (DEY)
  • Portable Executable (PE)
  • WebAssembly Binary (Wasm)
  • Xbox 360 Executable (XEX)
  • Xbox Executable (XBE)

Font

  • BMFont ASCII (FNT)
  • BMFont Binary (FNT)
  • Embedded OpenType (EOT)
  • Glyphs
  • OpenType (OTF)
  • TrueType (TTF)
  • Web Open Font Format (WOFF)
  • Web Open Font Format 2 (WOFF2)

Formula

  • Mathematical Markup Language (MathML)
  • OpenDocument Formula (ODF)
  • OpenDocument Formula Template (OTF)
  • StarMath (SMF)
  • Sun XML Math (SXM)

Geospatial

  • Flexible and Interoperable Data Transfer (FIT)
  • GPS Exchange Format (GPX)
  • Geography Markup Language (GML)
  • Keyhole Markup Language (KML)
  • Keyhole Markup Language ZIP (KMZ)
  • Shapefile (SHP)
  • Training Center XML (TCX)

Image

  • AV1 Image File Format (AVIF)
  • AV1 Image File Format Sequence (AVIFS)
  • Adaptable Scalable Texture Compression (ASTC)
  • Adobe Illustrator Artwork (AI)
  • Adobe Photoshop Document (PSD)
  • Animated Portable Network Graphics (APNG)
  • Apple Icon Image (ICNS)
  • Better Portable Graphics (BPG)
  • Canon Raw (CRW)
  • Canon Raw 2 (CR2)
  • Canon Raw 3 (CR3)
  • Cineon (CIN)
  • Digital Picture Exchange (DPX)
  • Encapsulated PostScript (EPS)
  • Experimental Computing Facility (XCF)
  • Free Lossless Image Format (FLIF)
  • Fujifilm Raw (RAF)
  • Graphics Interchange Format (GIF)
  • High Efficiency Image Coding (HEIC)
  • High Efficiency Image Coding Sequence (HEICS)
  • High Efficiency Image File Format (HEIF)
  • High Efficiency Image File Format Sequence (HEIFS)
  • JPEG 2000 Codestream (J2C)
  • JPEG 2000 Part 1 (JP2)
  • JPEG 2000 Part 2 (JPX)
  • JPEG 2000 Part 6 (JPM)
  • JPEG Extended Range (JXR)
  • JPEG Network Graphics (JNG)
  • JPEG XL (JXL)
  • JPEG-LS (JLS)
  • Joint Photographic Experts Group (JPEG)
  • Khronos Texture (KTX)
  • Khronos Texture 2 (KTX2)
  • Magick Image File Format (MIFF)
  • Microsoft DirectDraw Surface (DDS)
  • Multiple-image Network Graphics (MNG)
  • Nikon Electronic File (NEF)
  • Olympus Raw Format (ORF)
  • OpenDocument Graphics (ODG)
  • OpenDocument Graphics Template (OTG)
  • OpenEXR (EXR)
  • OpenRaster (ORA)
  • Panasonic Raw (RW2)
  • Picture Exchange (PCX)
  • Portable Arbitrary Map (PAM)
  • Portable BitMap (PBM)
  • Portable FloatMap (PFM)
  • Portable GrayMap (PGM)
  • Portable Network Graphics (PNG)
  • Portable PixMap (PPM)
  • Quite OK Image (QOI)
  • Radiance HDR (HDR)
  • Scalable Vector Graphics (SVG)
  • Silicon Graphics Image (SGI)
  • Sketch
  • Sketch 43
  • StarDraw (SDA)
  • Sun XML Draw (SXD)
  • Sun XML Draw Template (STD)
  • Tag Image File Format (TIFF)
  • WebP
  • Windows Animated Cursor (ANI)
  • Windows Bitmap (BMP)
  • Windows Cursor (CUR)
  • Windows Icon (ICO)
  • Windows Metafile (WMF)
  • WordPerfect Graphics (WPG)
  • X PixMap (XPM)
  • farbfeld (FF)

Metadata

  • Android Binary XML (AXML)
  • BitTorrent (Torrent)
  • CD Audio (CDA)
  • Meta Information Encapsulation (MIE)
  • TASTy
  • Windows Shortcut (LNK)
  • macOS Alias

Model

  • 3D Manufacturing Format (3MF)
  • 3D Studio (3DS)
  • 3D Studio Max (MAX)
  • Additive Manufacturing Format (AMF)
  • AutoCAD Drawing (DWG)
  • Autodesk 123D (123DX)
  • Autodesk Alias (WIRE)
  • Autodesk Inventor Assembly (IAM)
  • Autodesk Inventor Drawing (IDW)
  • Autodesk Inventor Part (IPT)
  • Autodesk Inventor Presentation (IPN)
  • Blender (BLEND)
  • Cinema 4D (C4D)
  • Collaborative Design Activity (COLLADA)
  • Design Web Format (DWF)
  • Design Web Format XPS (DWFX)
  • Drawing Exchange Format ASCII (DXF)
  • Drawing Exchange Format Binary (DXF)
  • Extensible 3D (X3D)
  • Filmbox (FBX)
  • Fusion 360 (F3D)
  • GL Transmission Format Binary (GLB)
  • Google Draco (Draco)
  • Initial Graphics Exchange Specification (IGES)
  • Inter-Quake Export (IQE)
  • Inter-Quake Model (IQM)
  • MagicaVoxel (VOX)
  • Maya ASCII (MA)
  • Maya Binary (MB)
  • Model 3D ASCII (A3D)
  • Model 3D Binary (M3D)
  • Polygon ASCII (PLY)
  • Polygon Binary (PLY)
  • SketchUp (SKP)
  • SolidWorks Assembly (SLDASM)
  • SolidWorks Drawing (SLDDRW)
  • SolidWorks Part (SLDPRT)
  • SpaceClaim Document (SCDOC)
  • Standard for the Exchange of Product model data (STEP)
  • Stereolithography ASCII (STL)
  • Universal 3D (U3D)
  • Universal Scene Description ASCII (USDA)
  • Universal Scene Description Binary (USDC)
  • Universal Scene Description ZIP (USDZ)
  • Virtual Reality Modeling Language (VRML)
  • openNURBS (3DM)

Other

  • ActiveMime (MSO)
  • Advanced Systems Format (ASF)
  • Android Resource Storage Container (ARSC)
  • Apache Arrow Columnar (Arrow)
  • Apache Avro (Avro)
  • Apache Parquet (Parquet)
  • Arbitrary Binary Data (BIN)
  • Atom
  • Clojure Script
  • Compound File Binary (CFB)
  • DER Certificate (DER)
  • Digital Imaging and Communications in Medicine (DICOM)
  • Empty
  • Extensible Binary Meta Language (EBML)
  • Extensible Markup Language (XML)
  • Extensible Stylesheet Language Transformations (XSLT)
  • Flash CS5 Project (FLA)
  • Flash Project (FLA)
  • Flexible Image Transport System (FITS)
  • HyperText Markup Language (HTML)
  • ICC Profile (ICC)
  • JSON Feed
  • Java KeyStore (JKS)
  • Lua Script
  • MPEG-4 Part 14 (MP4)
  • MS-DOS Batch (Batch)
  • Microsoft Compiled HTML Help (CHM)
  • Microsoft Project Plan (MPP)
  • Microsoft Visual Studio Solution (SLN)
  • MusicXML
  • MusicXML ZIP (MXL)
  • Ogg Multiplexed Media (OGX)
  • PCAP Dump (PCAP)
  • PCAP Next Generation Dump (PCAPNG)
  • PEM Certificate (PEM)
  • PEM Certificate Signing Request (PEM)
  • PEM Private Key (PEM)
  • PEM Public Key (PEM)
  • PGP Message (PGP)
  • PGP Private Key Block (PGP)
  • PGP Public Key Block (PGP)
  • PGP Signature (PGP)
  • PGP Signed Message (PGP)
  • Perl Script
  • Personal Storage Table (PST)
  • Plain Text (TXT)
  • Python Script
  • RealMedia (RM)
  • Really Simple Syndication (RSS)
  • Ruby Script
  • Shell Script
  • Simple Object Access Protocol (SOAP)
  • Small Web Format (SWF)
  • Tiled Map XML (TMX)
  • Tiled Tileset XML (TSX)
  • Tool Command Language Script (Tcl Script)
  • WebAssembly Text (WAT)
  • WordPerfect Macro (WPM)
  • XML Localization Interchange File Format (XLIFF)
  • age encryption
  • gettext Machine Object (MO)
  • iCalendar (ICS)
  • vCalendar (VCS)
  • vCard (VCF)

Package

  • Adobe Integrated Runtime (AIR)
  • Android App Bundle (AAB)
  • Android Package (APK)
  • AppImage
  • Debian Package (DEB)
  • Enterprise Application Archive (EAR)
  • Google Chrome Extension (CRX)
  • Java Archive (JAR)
  • Microsoft Software Installer (MSI)
  • Microsoft Visual Studio Extension (VSIX)
  • Nintendo Switch Package (NSP)
  • Red Hat Package Manager (RPM)
  • Web Application Archive (WAR)
  • Windows App Bundle (APPXBUNDLE)
  • Windows App Package (APPX)
  • XAP
  • XPInstall (XPI)
  • iOS App Store Package (IPA)

Playlist

  • Advanced Stream Redirector (ASX)
  • MP3 URL (M3U)
  • MPEG-DASH MPD (MPD)
  • SHOUTcast Playlist (PLS)
  • Windows Media Playlist (WPL)
  • XML Shareable Playlist Format (XSPF)

Presentation

  • Corel Presentations (SHW)
  • Corel Presentations 7 (SHW)
  • Microsoft PowerPoint Presentation (PPT)
  • Office Open XML Presentation (PPTX)
  • OpenDocument Presentation (ODP)
  • OpenDocument Presentation Template (OTP)
  • StarImpress (SDD)
  • Sun XML Impress (SXI)
  • Sun XML Impress Template (STI)
  • Uniform Office Format Presentation (UOP)
  • WordPerfect Presentations (SHW)

ROM

  • Atari 7800 ROM (A78)
  • Commodore 64 Cartridge (CRT)
  • Game Boy Advance ROM (GBA)
  • Game Boy Color ROM (GBC)
  • Game Boy ROM (GB)
  • Game Gear ROM (GG)
  • Mega Drive ROM (MD)
  • Neo Geo Pocket Color ROM (NGC)
  • Neo Geo Pocket ROM (NGP)
  • Nintendo 64 ROM (Z64)
  • Nintendo DS ROM (NDS)
  • Nintendo Entertainment System ROM (NES)
  • Nintendo Switch ROM (XCI)
  • Sega Master System ROM (SMS)

Spreadsheet

  • Microsoft Excel Spreadsheet (XLS)
  • Microsoft Works 6 Spreadsheet (XLR)
  • Microsoft Works Spreadsheet (WKS)
  • Office Open XML Spreadsheet (XLSX)
  • OpenDocument Spreadsheet (ODS)
  • OpenDocument Spreadsheet Template (OTS)
  • StarCalc (SDC)
  • Sun XML Calc (SXC)
  • Sun XML Calc Template (STC)
  • Uniform Office Format Spreadsheet (UOS)

Subtitle

  • MPEG-4 Part 14 Subtitles (MP4)
  • Matroska Subtitles (MKS)
  • SubRip Text (SRT)
  • Timed Text Markup Language (TTML)
  • Universal Subtitle Format (USF)
  • Web Video Text Tracks (WebVTT)

Video

  • 3rd Generation Partnership Project (3GPP)
  • 3rd Generation Partnership Project 2 (3GPP2)
  • Actions Media Video (AMV)
  • Apple QuickTime (MOV)
  • Apple iTunes Video (M4V)
  • Audio Video Interleave (AVI)
  • Autodesk Animator (FLI)
  • Autodesk Animator Pro (FLC)
  • BDAV MPEG-2 Transport Stream (M2TS)
  • Flash MP4 Protected Video (F4P)
  • Flash MP4 Video (F4V)
  • Flash Video (FLV)
  • JPEG 2000 Part 3 (MJ2)
  • MPEG-1/2 Video (MPG)
  • MPEG-2 Transport Stream (TS)
  • MPEG-4 Part 14 Video (MP4)
  • MTV
  • Material Exchange Format (MXF)
  • Matroska 3D Video (MK3D)
  • Matroska Video (MKV)
  • Microsoft Digital Video Recording (DVR-MS)
  • Ogg Media (OGM)
  • Ogg Theora (Theora)
  • RealVideo (RV)
  • Silicon Graphics Movie (SGI)
  • Sony Movie (MQV)
  • WebM
  • Windows Media Video (WMV)
  • Windows Recorded TV Show (WTV)

Fixtures

The fixtures are samples of file formats used for testing purposes, located in the fixtures directory and organized by kind in subdirectories. These samples are often intentionally truncated to reduce size, which can sometimes prevent them from being fully decoded by compatible software.

License

This project is licensed under either of:

file-format's People

Contributors

kin9-0rz avatar mlund avatar mmalecot avatar nicolasgb avatar petru-tazz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

file-format's Issues

.ini file recognised as Mpeg1AudioLayer1 in Windows

Hello !

It seems that when trying to find the type of a .ini file in Windows, it detects as a MpegAudioLayer1
image

It is kinda problematic when you want to filter out audio file ...

I'm kinda sure that error doesn't come from my code , current fix is to skip .ini files

Handle handle complex "svg headers" starting with <?xml version="1.0" encoding="utf-8"?>

FileFormat does not parse complex or mixed "svg" files.
To handle this, there is a need for code to parse through the xml header to find an instance of the <svg.... within a range.

Example case:

Parsing....as_bytes()

<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 27.4.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
<svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
	 viewBox="0 0 375.5 134.2" style="enable-background:new 0 0 375.5 134.2;" xml:space="preserve">

Using FileFormat::from_bytes(&buffer);

let format = FileFormat::from_bytes(&buffer);
 if format == FileFormat::ScalableVectorGraphics {
    debug!( Is format a SVG file: {}",
        format.media_type()
    );
} esle ....

It should be that the the parser drills past the <?xml ... to see if <?svg exists within a text/xml format, since it is always possible this text/xml is in fact an svg format.

FileFormat can only parse the <?xml node in this example.

Add support for APPX Bundles

.msixbundle and .appxbundle are bundles of MSIXs or APPXs respectively.

These can be detected in a similar way to APPXs. Instead of an AppxManifest.xml at the root of the archive, the bundles have an AppxBundleManifest.xml under the AppxMetadata folder.

More kind types

Hello, is there any possibility for new kind types, such as archives, documents, executables, etc.?

Does not detect file shebangs with extra whitespace

File-format's shebang based checks (e.g. #!/usr/bin/env bash) do not match against shebangs with extra whitespace. For example the following is a valid shebang #! /usr/bin/env bash but will not match as file-format looks only for static strings.

Any number of spaces or tabs may be present after the #! and between each argument

One solution could be to do what the libmagic library does, use a whitespace flag that treats any whitespace in a pattern as 0 or more whitespaces.

Possibility of ignoring certain control chars in txt reader

Hi, I'll open by saying thanks for writing this lib, I've been testing a few different crates of this type for a file previewer I'm writing and this is the best I've found.

This is more of a request rather than an actual issue, I mostly write terminal centric code and a lot of it features a handful of control chars (mostly terminal escape char and bell char), this causes the src files to be detected as the default "application/octect-stream" instead of "text/plain" and so my app skips generating a preview for these files.

Would you be open to adding some control char exclusions to from_txt_reader()?

I'm currently using a local fork that changes the function to

    pub(crate) fn from_txt_reader<R: Read + Seek>(reader: &mut BufReader<R>) -> Result<Self> {
        // Constants for limits.
        const READ_LIMIT: u64 = 8_388_608;
        const LINE_LIMIT: usize = 256;

        // Rewinds to the beginning of the stream.
        reader.rewind()?;

        // Determines if the reader contains mostly ASCII/UTF-8-encoded text by checking the first
        // lines for control characters other than whitespace and certain select control characters.
        reader
            .take(READ_LIMIT)
            .lines()
            .take(LINE_LIMIT)
            .try_for_each(|line| {
                line?
                    .chars()
                    .find(|char| if char.is_control() {
                        match *char {
                            '�' => false,
                            '�' => false,
                            _ if char.is_whitespace() => false,
                            _ => true
                        }
                    } else { false })
                    .map(|_| Err(Error::new(ErrorKind::InvalidData, "Invalid characters")))
                    .unwrap_or(Ok(()))
            })
            .map(|_| Self::PlainText)
    }

I used a match statement to make it easier to add more options later if needed.

I can open a PR with the changes if this is something you're ok with, thanks for your time / consideration.

How are some files categorized

How are some files categorized using the .kind() method?

For example https://github.com/jvns/dnspeep/releases/download/v0.1.3/dnspeep-linux.tar.gz is categorized as an application even though it is packaged as a compressed file. The compressed file contains a binary so does that mean it reads the file and instead of showing that the file kind is compressed instead it detects the binary inside the compressed file and returns the Kind as Kind::Application but I would expect it to return Kind::Compression so that if I want to search all compressed files I would just match on Kind::Compression?

Similary, a simple Txt file containing the words Foo Bar is read as an Kind::Application with media type application/octet-stream ?

PDF files are classified as Kind::Application , wouldnt they be classified as Kind::Document ?

My question, is this how the library works or is it a bug

Detect SVG when XML declaration is missing

Hello,

I'm currently working on a project that relies heavily on file-format to handle images by it's media type accordingly.

I've just finished implementation for SVG support in our app and, while testing, the crate failed to detect correctly that our logo is a SVG. The main reason is that it lacks the <?xml declaration, so the code never reaches this part.

After looking in our (source) SVGs, I found that there are many cases:

  • some of them contains <?xml declaration;
  • some of them contains xmlns attribute on <svg>;

The key point of this issue that, as per this doc, <?xml is optional unless the encoding is not UTF-8 or UTF-16.

While I made a patch for our needs (and willing to finish up the PR), I wanted to open this issue to talk about other SVG versions and how it can be treated better for wider / more general use.

[new requirement] would you support metadata for audio/video files? both read and write

As a hobby project, I want to batch edit my movie tags, but I don't want to have to identify the file formats and metadata in my repository, and would love to have a third party repository that can do the job
I would like to ask if this project will consider extending the metadata for identifying video and audio files and provide setting capabilities?

MKV shows as Application

As title says, i got 2 mkvs, one is version 2 the other version 4,
both are shown as "application" instead of "video".

Name on both shows as: Extensible Binary Meta Language

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.