Giter Site home page Giter Site logo

larskoelpin / magic-bytes Goto Github PK

View Code? Open in Web Editor NEW
114.0 1.0 26.0 3.69 MB

A library for detecting file types.

Home Page: https://github.com/LarsKoelpin/magic-bytes

License: MIT License

JavaScript 2.30% HTML 6.97% TypeScript 87.25% CSS 3.48%
magic-bytes file-type detection file validation bytes signature

magic-bytes's Introduction

Magic bytes

Build and test

Magic Bytes is a javascript library analyzing the first bytes of a file to tell you its type. Use it inside your browser or serversided using nodejs.

The procedure is based on https://en.wikipedia.org/wiki/List_of_file_signatures.

Note

A small note on versioning. Strictly speaking, each new filetype which is supported by this library can break someones' API. Please note that this library adds new filetypes with minor release. This means files, which validate to "null" in some versions, may find a result in a new version.

Or in some cases the library will find more results, than before. So don't depend on the found-array size in any shape or form. Filetypes will not be remoevd tho

Installation

Run npm install magic-bytes.js

Interactive example

There is an interactive example present at https://larskoelpin.github.io/magic-bytes/.

Usage

The following functions are available:

  • filetypeinfo(bytes: number[]) Contains typeinformation like name, extension and mime type: [{typename: "zip"}, {typename: "jar"}]
  • filetypename(bytes: number[]) : Contains type names only: ["zip", "jar"]
  • filetypemime(bytes: number[]) : Contains type mime types only: ["application/zip", "application/jar"]
  • filetypeextension(bytes: number[]) : Contains type extensions only: ["zip", "jar"]

Both function return an empty array [] otherwise, which means it could not detect the file signature. Keep in mind that txt files for example fall in this category.

You don't have to load the whole file in memory. For validating a file uploaded to S3 using Lambda for example, it may be
enough to load the files first 100 bytes and validate against them. This is especially useful for big files.

see examples for practical usage.

On server:

import filetype from 'magic-bytes.js'

filetype(fs.readFileSync("myimage.png")) // ["png"]

To run an HTML-Example checkout the project and run

npm install; npm run example

This opens an HTML example using magic bytes as a window variable. It kinda looks like that.

<input type="file" id="file" />

 <script src="node_modules/magic-bytes.js/dist/browser.js" type="application/javascript"></script>
<script>
    document.getElementById("file").addEventListener('change', (event, x) => {
      const fileReader = new FileReader();
      fileReader.onloadend = (f) => {
        const bytes = new Uint8Array(f.target.result);
        console.log("Possible filetypes: " + filetypeinfo(bytes))
      }
      fileReader.readAsArrayBuffer(event.target.files[0])
    })
</script>

Tests

Run npm test

Example

See examples/

How does it work

The create-snapshot.js creates a new tree. The tree has a similar shape to the following

{
  "0x47": {
    "0x49": {
      "0x46": {
        "0x38": {
          "0x37": {
            "0x61": {
              "matches": [
                {
                  "typename": "gif",
                  "mime": "image/gif",
                  "extension": "gif"
                }
              ]
            }
          },
        }
      }
    }
  }
}

It acts as a giant lookup map for the given byte signatures.

magic-bytes's People

Contributors

alexkiro avatar aoliva-sefas avatar bioruebe avatar daveteu avatar dependabot[bot] avatar drinking-code avatar hustle-dev avatar iamfirecracker avatar komagata avatar larskoelpin avatar mbuhot avatar nikitapryymak avatar pastelmind avatar pierrejeanjacquot avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

magic-bytes's Issues

Can we relax the pattern for JPGs?

I encountered a JPEG file with magic bytes 0xFF 0xD8 0xFF 0xE2. The current version (1.7.0) of magic-bytes.js does not recognize this file, but my PC and Google Chrome seems to handle it fine. I also tested it against file-type, which correctly identified it as a JPG.

file-type accepts my JPG because it only checks the first three bytes (0xFF 0xD8 0xFF) to decide that a file is JPG:

https://github.com/sindresorhus/file-type/blob/37233b1c9c5c55aab4308f5c1991569dd4e414cc/core.js#L304-L316

		if (this.check([0xFF, 0xD8, 0xFF])) {
			if (this.check([0xF7], {offset: 3})) { // JPG7/SOF55, indicating a ISO/IEC 14495 / JPEG-LS file
				return {
					ext: 'jls',
					mime: 'image/jls',
				};
			}

			return {
				ext: 'jpg',
				mime: 'image/jpeg',
			};
		}

(technically, there is a check for JLS files, but I am unfamiliar with them)
I assume this is OK, since file-type has kept this code around for at least 3 years and no one has complained about false positives. (also see sindresorhus/file-type#487 (comment))

Unfortunately, our bundler situation prevents us from using file-type. It would be great if magic-bytes.js supported my JPG as well, perhaps by relaxing the pattern for JPGs.

Can this be done? I am willing to submit a PR.

Question: dealing with false positives within same signature group

Hi,

First, thanks for putting this together, because I was trying to use "file-type", but I can't get it running in the browser at all with various different setups and configs, so finding this package was fortuitous.

So, not a bug, just querying about if there is a way to deal with false positives when changing extensions within the same signature set.

Example: change a file extension of a "zip" to "docx". These files both have signatures matching ["0x50", "0x4B", "0x03", "0x04"], so the guess list will contain matches for both the mime type (application/zip) and the extension (docx).

Thanks.

Issue with MKV files

Hi,
I'm testing 'filetypemime' functionality on '.mkv' file and it returns me an array of length 5
which contains 4 empty strings and 1 string of 'webm'(I would expect it to return 'mkv', 'mka', 'mks', 'mk3d' and 'webm' as suggested in the list of file signature).
I have tested it with two '.mkv' files.
Would appreciate you'r help in the matter unless Im doing something wrong:)
Thanks!

image

Filetype mime

Thank you for this package.

Have to use this package since file-type@17 changed to ESM mode and forced a lot of code change just to accommodate the package. Checking out this package proves to be a wise choice.

May I know how your pattern tree is generated and any way for us to help in updating the pattern tree?

There are a lot of file types without mime, for e.g. pdf with mime-type application/pdf but it's returning empty ''.

The automated release is failing 🚨

🚨 The automated release from the master branch failed. 🚨

I recommend you give this issue a high priority, so other packages depending on you can benefit from your bug fixes and new features again.

You can find below the list of errors reported by semantic-release. Each one of them has to be resolved in order to automatically publish your package. I’m sure you can fix this πŸ’ͺ.

Errors are usually caused by a misconfiguration or an authentication problem. With each error reported below you will find explanation and guidance to help you to resolve it.

Once all the errors are resolved, semantic-release will release your package the next time you push a commit to the master branch. You can also manually restart the failed CI job that runs semantic-release.

If you are not sure how to resolve this, here are some links that can help you:

If those don’t help, or if this issue is reporting something you think isn’t right, you can always ask the humans behind semantic-release.


Invalid npm token.

The npm token configured in the NPM_TOKEN environment variable must be a valid token allowing to publish to the registry https://registry.npmjs.org/.

If you are using Two Factor Authentication for your account, set its level to "Authorization only" in your account settings. semantic-release cannot publish with the default "
Authorization and writes" level.

Please make sure to set the NPM_TOKEN environment variable in your CI with the exact value of the npm token.


Good luck with your project ✨

Your semantic-release bot πŸ“¦πŸš€

Detects "tda2" instead of truetype ttf

The "Palm Desktop Data File (Access format)" (which I suppose is detected here) has the magic number 00 01 00 00.
TrueType has the magic number 00 01 00 00 00.

AFAIS, TrueType isn't added as of v1.5.0.

.srt & . vtt subtitle/caption file support

Just started out with this lib and was wondering why I couldn't get it working, until I realized that .srt isn't yet supported πŸ˜…

Also .vtt for WebVTT files would be fantastic please πŸ™β€οΈ

Specs and magic numbers
SRT
WebVTT

according to this SO the mime type of .srt could be application/x-subrip or application/octet-stream tho should probably just return the former which is also named by wikipedia linked above

Update: tried to implement but probably have no idea what I'm doing πŸ˜… Should I still push a PR as a starting point maybe?

new file types: ELF, macho, eml

Hi, was hoping to add support for ELF, mach-o, and eml file types.

ELF:
hex: 7F 45 4C 46 mime: application/x-executable

Mach-O:
hex: FE ED FA C mime: application/x-mach-binary
&
hex: FE ED FA CF mime: application/x-mach-binary

EML:
hex: 52 65 63 65 69 76 65 64 3A mime: message/rfc822

Thanks!

Add CSV support?

Hi,
thanks for the nice lib.
can you please add for the csv file?
text/csv
image

Thanks

License

Hi, thanks for the great library. I was wondering if it has a MIT license or not since i cannot find it in the repo but the package.json contains "MIT" as license.

Thanks in advance!

Add support for SVG

Would be useful if this library could verify SVG files as well. E.g. by checking for <svg

mp4

Hi. Could you please add the mime type for .mp4?

This is what I am getting now:
[{typename: 'pic'}, {typename: 'pif'}, {typename: 'sea'}, {typename: 'ytr'}]

Thank you!

filetypeinfo failing on subsequent requests

I'm using node and express to run some microservices. One of the services needs to get the mime-type from a file and I noticed that the first request performs as expected. Every subsequent (identical) requests didn't get the right mimetype. The input to the function is the same (verified), but the result is just an empty array.
If I use filetypeinfo several times in the same request it works as expected.

I narrowed it down: It seems as the the orginal tree gets modified by accident, and whatever was matched there is removed from the tree, so It can't be matched again.

I have a fix that works for me (PR incoming). I was not able to reproduce this behaviour with jest.

No browser.js file in /dist folder

Hi, I just tried 1.0.6 and according to the docs there should be a module in node_modules/magic-bytes.js/dist/browser.js but there isn't. Has this changed and you forgot to update the docs or is something else wrong here?

new file types to add

Was hoping to add in a few more file types to this lib.

pcap
hex: D4 C3 B2 A1 & hex: A1 B2 C3 D4
ext: pcap
mime: 'application/vnd.tcpdump.pcap'

cfbf
hex: D0 CF 11 E0 A1 B1 1A E1
multiple exts: doc, xls, ppt, msi, msg
multiple mime types

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.