Giter Site home page Giter Site logo

wand's Introduction

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Signed by Signed commit % Linux build Status builds.sr.ht status Windows build status Coverage Status cran checks CRAN status Minimal R Version License

wand

Retrieve Magic Attributes from Files and Directories

Description

MIME types are shorthand descriptors for file contents and can be determined from “magic” bytes in file headers, file contents or intuited from file extensions. Tools are provided to perform curated “magic” tests as well as mapping MIME types from a database of over 1,800 extension mappings.

SOME IMPORTANT DETAILS

The header checking is minimal (i.e. nowhere near as comprehensive as libmagic) but covers quite a bit of ground. If there are content-check types from magic sources that you would like coded into the package, please file an issue and include the full line(s) from that linked magic.tab that you would like mapped.

What’s Inside The Tin

The following functions are implemented:

  • get_content_type: Discover MIME type of a file based on contents
  • guess_content_type: Guess MIME type from filename (extension)
  • simplemagic_mime_db: File extension-to-MIME mapping data frame

Installation

install.packages("wand", repos = "https://cinc.rud.is")
# or
remotes::install_git("https://git.rud.is/hrbrmstr/wand.git")
# or
remotes::install_git("https://git.sr.ht/~hrbrmstr/wand")
# or
remotes::install_gitlab("hrbrmstr/wand")
# or
remotes::install_bitbucket("hrbrmstr/wand")
# or
remotes::install_github("hrbrmstr/wand")

NOTE: To use the ‘remotes’ install options you will need to have the {remotes} package installed.

Usage

library(wand)
library(tidyverse)

# current verison
packageVersion("wand")
## [1] '0.6.0'
list.files(system.file("extdat", "pass-through", package="wand"), full.names=TRUE) %>% 
  map_df(~{
    tibble(
      fil = basename(.x),
      mime = list(get_content_type(.x))
    )
  }) %>% 
  unnest()
## # A tibble: 85 x 2
##    fil                        mime                                                             
##    <chr>                      <chr>                                                            
##  1 actions.csv                application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##  2 actions.txt                application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##  3 actions.xlsx               application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##  4 test_128_44_jstereo.mp3    audio/mp3                                                        
##  5 test_excel_2000.xls        application/msword                                               
##  6 test_excel_spreadsheet.xml application/xml                                                  
##  7 test_excel_web_archive.mht message/rfc822                                                   
##  8 test_excel.xlsm            application/zip                                                  
##  9 test_excel.xlsx            application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## 10 test_nocompress.tif        image/tiff                                                       
## # … with 75 more rows

wand Metrics

Lang # Files (%) LoC (%) Blank lines (%) # Lines (%)
R 7 0.78 159 0.62 62 0.78 72 0.71
JSON 1 0.11 80 0.31 0 0.00 0 0.00
Rmd 1 0.11 17 0.07 17 0.22 29 0.29

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

wand's People

Contributors

hrbrmstr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wand's Issues

encoding has a bug

An UTF-8 Unicode (with BOM) text is just utf-8 in encoding, but under Windows, utf-8 and utf-8-bom are different.

btw, I have compiled a static file.exe (version 5.28)

and I can not unzip magic.mgc.zip under Windows

Make wand::get_content_type free from guess_content_type or a new function..

[Feature request]

If possible make get_content_type independent from guess_content_type. Which I mean that get_content_type should only perform magic number based detection and will simply say let say '???' (that can be changed) for unknown types.

This is a specific indication to the user that magic number based detection has returned unknown file type. What is happening now is that a csv file wrongly saved as docx is returned as docx file. Reason being csv files are not binary and all magic number checks failed to detect it but guess_content_type is determining it as docx.

Here is the reproducible example

wand::get_content_type(system.file("extdata", "messy", "csv.docx", package = "tidycells", mustWork = TRUE))

via Ripley (I knowingly hacked the configure script so I expected something like this)

PKG_CPPFLAGS= -L/usr/include -L/usr/local/include

(on my OS X box) make no sense, and by calling $CPP you have not checked compilation let alone linking. And it is linking which is failing on Fedora: it has

/usr/lib64/libmagic.so.1
/usr/lib64/libmagic.so.1.0.0

installed but no libmagic.so (nor is an RPM listed containing one). Similarly on Solaris (but that are in /opt/csw/lib, not on your list).

You ship magic.h, but whether or not it is used depends on the compiler---the default include path may or may not prefer '.'. If that is what you intend, use "./magic.h" or -I. .

Next, configure is not a legal sh script, as $(mkdir) is a bashism for mkdir (see the R manual). As it does not log what it does, it is hard to know what goes wrong ....

Next, you are guessing at library paths in

LIBDIRS="/usr/lib/x86_64-linux-gnu /usr/lib/i386-linux-gnu /usr/lib64 /usr/lib32 /usr/local/lib /opt/local/lib /usr/lib /lib"

and give user no way to override your guesses (which are inconsistent: why no /usr/local/lib64? and would be wrong for a 32-bit build on most 64-bit Linuxen).

Finally, you have

SystemRequirements: libmagic (>= 5.14) for Unix/Linux/macOS; Rtools

but you don't check that. The system file (and hence I guess libmagic) on OS X is 5.04 -- I have 5.28 installed as file 5.04 does not work very well.


s/mkdir/mktemp/

And a few more things:

Some compilers require a valid extension on a C++ file.

You try to delete ${temp.exe} which may not exist.

On Solaris if I try to compile wand.cpp:

"wand.cpp", line 68: Error: The function "strnlen" must have a prototype.
"wand.cpp", line 94: Error: The function "strnlen" must have a prototype.
"wand.cpp", line 121: Error: The function "strnlen" must have a prototype.
"wand.cpp", line 147: Error: The function "strnlen" must have a prototype.

strnlen is not C++98, and even on Linux the man page says

Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

   strnlen():
       Since glibc 2.10:
           _POSIX_C_SOURCE >= 200809L
       Before glibc 2.10:
           _GNU_SOURCE

Error: 'file.exe' not found. Please install 'Rtools' and restart R.

Hi,

On windows,I can compile the package but I cant use it. I have Rtools, but the error ask me to install it again (I did twice). where is thsi file.exe file ?

Regards


> system.file("extdata", "img", package="wand") %>% 
+   list.files(full.names=TRUE) %>% 
+   incant() %>% 
+   glimpse()
Error: 'file.exe' not found. Please install 'Rtools' and restart R. See 'https://github.com/stan-dev/rstan/wiki/Install-Rtools-for-Windows' for more information on how to install 'Rtools'
> 
> devtools::find_rtools()
[1] TRUE
> Sys.getenv()
ALLUSERSPROFILE                       C:\ProgramData
APPDATA                               C:\Users\Vincent\AppData\Roaming
BINPREF                               C:/RBuildTools/3.4/mingw_$(WIN)/bin/
CLICOLOR_FORCE                        1
CommonProgramFiles                    C:\Program Files\Common Files
CommonProgramFiles(x86)               C:\Program Files (x86)\Common Files
CommonProgramW6432                    C:\Program Files\Common Files
COMPUTERNAME                          DESKTOP-L8OS3S3
ComSpec                               C:\Windows\system32\cmd.exe
DISPLAY                               :0
FONTCONFIG_PATH                       C:/Users/Vincent/Documents/R/win-library/3.4/gdtools/fontconfig
FPS_BROWSER_APP_PROFILE_STRING        Internet Explorer
FPS_BROWSER_USER_PROFILE_STRING       Default
GFORTRAN_STDERR_UNIT                  -1
GFORTRAN_STDOUT_UNIT                  -1
HOME                                  C:/Users/Vincent/Documents
HOMEDRIVE                             C:
HOMEPATH                              \Users\Vincent
LOCALAPPDATA                          C:\Users\Vincent\AppData\Local
LOGONSERVER                           \\DESKTOP-L8OS3S3
MOZ_PLUGIN_PATH                       C:\Program Files (x86)\Foxit Software\Foxit Reader\plugins\
NUMBER_OF_PROCESSORS                  8
OneDrive                              C:\Users\Vincent\OneDrive
OS                                    Windows_NT
PATH                                  C:\Rtools\bin;C:\Program Files\R\R-3.4.3\bin\x64;C:\Program Files
                                      (x86)\Intel\iCLS Client\;C:\Program
                                      Files\Docker\Docker\Resources\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program
                                      Files\Intel\iCLS
                                      Client\;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
                                      Files\PuTTY\;C:\Program Files (x86)\Intel\Intel(R) Management Engine
                                      Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine
                                      Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine
                                      Components\IPT;C:\Program Files\Intel\Intel(R) Management Engine
                                      Components\IPT;C:\Program
                                      Files\Git\cmd;C:\Users\Vincent\AppData\Local\Microsoft\WindowsApps
PATHEXT                               .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE                AMD64
PROCESSOR_IDENTIFIER                  Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
PROCESSOR_LEVEL                       6
PROCESSOR_REVISION                    8e0a
ProgramData                           C:\ProgramData
ProgramFiles                          C:\Program Files
ProgramFiles(x86)                     C:\Program Files (x86)
ProgramW6432                          C:\Program Files
PSModulePath                          C:\Program
                                      Files\WindowsPowerShell\Modules;C:\Windows\system32\WindowsPowerShell\v1.0\Modules
PUBLIC                                C:\Users\Public
R_ARCH                                /x64
R_COMPILED_BY                         gcc 4.9.3
R_DOC_DIR                             C:/PROGRA~1/R/R-34~1.3/doc
R_HOME                                C:/PROGRA~1/R/R-34~1.3
R_LIBS_USER                           C:/Users/Vincent/Documents/R/win-library/3.4
R_USER                                C:/Users/Vincent/Documents
RMARKDOWN_MATHJAX_PATH                C:/Program Files/RStudio/resources/mathjax-26
RS_LOCAL_PEER                         \\.\pipe\20436-rsession
RS_RPOSTBACK_PATH                     C:/Program Files/RStudio/bin/rpostback
RS_SHARED_SECRET                      63341846741
RSTUDIO                               1
RSTUDIO_CONSOLE_COLOR                 256
RSTUDIO_CONSOLE_WIDTH                 80
RSTUDIO_MSYS_SSH                      C:/Program Files/RStudio/bin/msys-ssh-1000-18
RSTUDIO_PANDOC                        C:/Program Files/RStudio/bin/pandoc
RSTUDIO_SESSION_PORT                  20436
RSTUDIO_USER_IDENTITY                 Vincent
RSTUDIO_WINUTILS                      C:/Program Files/RStudio/bin/winutils
SESSIONNAME                           Console
SystemDrive                           C:
SystemRoot                            C:\Windows
TEMP                                  C:\Users\Vincent\AppData\Local\Temp
TERM                                  xterm-256color
TMP                                   C:\Users\Vincent\AppData\Local\Temp
USERDOMAIN                            DESKTOP-L8OS3S3
USERDOMAIN_ROAMINGPROFILE             DESKTOP-L8OS3S3
USERNAME                              Vincent
USERPROFILE                           C:\Users\Vincent
windir                                C:\Windows


> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.4 wand_0.2.1 

loaded via a namespace (and not attached):
 [1] zoo_1.8-1            purrr_0.2.4          rJava_0.9-9          lattice_0.20-35      XLConnect_0.2-14    
 [6] colorspace_1.3-2     htmltools_0.3.6      yaml_2.1.18          base64enc_0.1-3      rlang_0.2.0         
[11] R.oo_1.21.0          pillar_1.2.1         glue_1.2.0           withr_2.1.1          R.utils_2.6.0       
[16] rappdirs_0.3.1       gdtools_0.1.7        readxl_1.0.0         bindrcpp_0.2         uuid_0.1-2          
[21] ReporteRsjars_0.0.3  bindr_0.1.1          plyr_1.8.4           munsell_0.4.3        gtable_0.2.0        
[26] cellranger_1.1.0     R.methodsS3_1.7.1    zip_1.0.0            htmlwidgets_1.0      devtools_1.13.5     
[31] leaps_3.0            ReporteRs_0.8.9      memoise_1.1.0        knitr_1.20           httpuv_1.3.6.2      
[36] curl_3.1             Rcpp_0.12.16         xtable_1.8-2         XLConnectJars_0.2-14 scales_0.5.0        
[41] flashClust_1.01-2    scatterplot3d_0.3-40 mime_0.5             ggplot2_2.2.1        png_0.1-7           
[46] digest_0.6.15        stringi_1.1.7        shiny_1.0.5          grid_3.4.3           tools_3.4.3         
[51] magrittr_1.5         lazyeval_0.2.1       tibble_1.4.2         cluster_2.0.6        tidyr_0.8.0         
[56] FactoMineR_1.39      pkgconfig_2.0.1      MASS_7.3-47          xml2_1.2.0           dygraphs_1.1.1.4    
[61] rvg_0.1.8            httr_1.3.1           assertthat_0.2.0     officer_0.2.1        R6_2.2.2            
[66] git2r_0.21.0         compiler_3.4.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.