ajmitev / filetypechecker Goto Github PK
View Code? Open in Web Editor NEWCross platform file type validator for .NET
License: MIT License
Cross platform file type validator for .NET
License: MIT License
Word Version 2108 (Microsoft 365)
Default "docx" format.
Most of the mp3 file types are MPEG-1 Audio Layer 3 (MP3) audio files with hex signatures '49 44 33' and 'FF FB. But in our scenario we record blob data as mp3 in the client. With this the signature is 'FF E3' in a MPEG audio file frame synch pattern.
The following signatures should be added to the mp3 filetype: 'FF Ex' and 'FF Fx' where x is a wildcard. https://www.garykessler.net/library/file_sigs.html
Download blob mp3 recording with this signature:
https://gofile.io/d/REND6F
For now I added a custom file type as workaround.
Looks like a feature was added but it caused a behavior change on 0fcfcc4 for me:
System.InvalidOperationException: Can't search in collection with no items!
at FileTypeChecker.FileTypeValidator.FindMaxScore(IEnumerable`1 matches)
at FileTypeChecker.FileTypeValidator.FindBestMatch(Stream fileContent, IEnumerable`1 result)
at FileTypeChecker.FileTypeValidator.GetFileType(Stream fileContent)
I think I'm passing a zero byte (unit test) or unknown file type and before it returned null. Now it's throwing this exception.
This is the expected behavior now?
Zip file is recognized as docx file and also IsArchive() returns false.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The file is recognized as zip archive.
Desktop (please complete the following information):
Describe the bug
A filestream of a dll is identified as DOS MZ executable .exe
To Reproduce
Steps to reproduce the behavior:
Use a local windows dll like one provided by dot.net 7 like System.Windows.Extensions.dll
Run it trough var fileType = FileTypeValidator.GetFileType(stream);
Examine the fileType.
Expected behavior
reported as a dll / dynamic linked library.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
It's not really important, I mean I will not allow executables either, but it would be nice to know if it's a exe or dll if possible.
Describe the bug
Hi, I'm using your FileTypeChecker package from nuget in my project and when I run Windows App Certification Kit to publish to Windows store, I get a error message saying that your package's dll files were build in Debug mode. Would you publish a nuget package built with Release mode?
To Reproduce
Steps to reproduce the behavior:
Desktop (please complete the following information):
Is your feature request related to a problem? Please describe.
When we upload a file with extension .xlsx, the library recognizes it with an exception of .docx (by calling
fileTypeObject.Extension)
Describe the solution you'd like
It would be nice if document types extensions are distinguished
Describe alternatives you've considered
Possibly instead of isDocument, that can break down to isDocx, isXlsx etc
Additional context
Add any other context or screenshots about the feature request here.
When trying to get the type from an xml file with utf8 encoding everything is ok but if the file is utf8-bom it is not recognized.
Hi,
I honestly do not have any idea what the following error mean. Under the DependencyCheck scan this package failed every time with the message below.
**cdf_read_property_info in cdf.c in file through 5.37 does not restrict the number of CDF_VECTOR elements, which allows a heap-based buffer overflow (4-byte out-of-bounds write).
cvssV3: HIGH, score: 7.8 (CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H)**
I added a custom file type for mp3 with different magic bytes '0xFF, 0xE3' then the already existing mp3 file type. When I try to get the file type via FileTypeValidator.GetFileType() for this custom file type, I get the following exception:
System.InvalidOperationException: Sequence contains more than one matching element
at System.Linq.ThrowHelper.ThrowMoreThanOneMatchException()
at System.Linq.Enumerable.SingleOrDefault[TSource](IEnumerable`1 source, Func`2 predicate)
at FileTypeChecker.FileTypeValidator.GetFileType(Stream fileContent)
Its failing on line 74 of FileTypeValidator.cs
Suggestion:
Support custom file types with same extensions, but with different magic bytes.
Describe the bug
GetFileType method is blowing up when testing HEIC file.
This is more of a question.
Is HEIC file type supported?
If its not supported, are there plans to support it, as this is the default file format for ios
The problem I see is that if I send an Excel file with an "xlsx" extension, it returns as a filetype. Extension "docx" this can cause an error. I understand that the magic bytes are the same, but the extension is not.
I use to make use of the Nuget package version 3.0.0 that was available and earlier this week upgraded to the latest version 4.0.0.
The file version on the latest Nuget package (4.0.0) reflects as 1.0.0.0. This causes problems with some installers (like wix). Problem comes in when one tries to install your new deployment package that was made with these tools. It will not replace the file of the previous Nuget package version on the server as that file had a file version of 3.0.0.0. So as the new package file version is seen as older it just gets skipped.
This could be potential problem for other users in the future as well.
Product version of file also shows 1.0.0 instead of 4.0.0 where previous package file was 3.0.0
System.IO.FileNotFoundException: Could not load file or assembly 'System.Linq.Async, Version=5.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263'. The system cannot find the file specified.
File name: 'System.Linq.Async, Version=5.0.0.0, Culture=neutral, PublicKeyToken=94bc3704cddfc263'
at FileTypeChecker.FileTypeValidator.GetFileTypeAsync(Stream fileContent)
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
at FileTypeChecker.FileTypeValidator.GetFileTypeAsync(Stream fileContent)
happens when only .NET 6 SDK is installed...this is from dockerized CI.
If a file type magic sequence have an offset, it cannot be recognised because the offset is applied to the magic byte sequence as well; see for instance Mp4 and M4v file types
Steps to reproduce the behavior:
According to https://en.wikipedia.org/wiki/List_of_file_signatures, the docx/xlsx/zip share the same signature. So I guess @hedandan1989 suggest better behavior for these file types.
Originally posted by @duynl71 in #30 (comment)
Many file types will use the same recognition method. Is it possible to recognize them more intelligently, as I have now tried to recognize office files as zip.
But some online websites can be correctly identified.
Thanks。
In different situations, the file data source may be either stream or bytes, and although it can be converted to each other, the conversion of large files will result in double the memory usage.
It would be useful to be able to verify what file type validations are currently supported, just from the extension. In the context to verifying that a file is what it claims to be, by comparing extension with the resolved extension from FileTypeValidator.GetFileType(stream) however, I still need to know or hard-code what extensions FileTypeValidator actually supports in the first place.
For example:
FileTypeValidator.IsSupportedType("png"); //true
FileTypeValidator.IsSupportedType("webp"); //false
Describe the bug
If I rename a file calc.exe -> calc.png and then upload it, it's not detected as executable.
To Reproduce
Rename an exe file to another file extension, eg: calc.exe -> calc.png and then upload and try to detect it as executable:
if (stream.Is<Executable>() || stream.Is<ExecutableAndLinkableFormat>())
{
return new ValidationResult("Invalid file was detected.");
}
Expected behavior
I would expect the file to be still be detected as an executable.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.