Giter Site home page Giter Site logo

boxuk / describr Goto Github PK

View Code? Open in Web Editor NEW
7.0 11.0 2.0 1.74 MB

So, tell me about your file... Given any file, PHP library describr will describe it

Home Page: http://www.boxuk.com/

License: GNU General Public License v3.0

PHP 99.38% Shell 0.62%

describr's Introduction

Describr

Describr is an object-oriented library for describing files. It uses a plugins to determine as much information as it can about a file.

Describr can be run from the command line via a supplied script, or it can be integrated into any PHP project to gather information about a variety of file types. You can write your own plugins, and a few of the file types Describr supports "out of the box" with the bundled plugins include:

  • GIF
  • JPEG
  • PNG
  • WMV
  • MP4
  • MP3
  • AVI
  • ... and many more!

If there's a file you want to interrogate and its type isn't supported, you can add a plugin to {describr root}/lib/BoxUK/Describr/Plugins and it will automatically be picked up and ready for use!

Requirements:

  • PHP 5.3+
  • Fileinfo extension

Plugins may have their own requirements. Some of the requirements for bundled plugins are:

Installation

It's easy to get Describring! If you just want to use it without writing plugins, you can install through our PEAR channel, pear.boxuk.net (check our Box UK Labs page for details on that), but we're assuming that because you're here on our Github you want to get stuck in and grab the source. Great! Here's how you do it:

  1. Install Zend Framework version 1.(latest). Probably best installed via PEAR so it's automatically on your PHP path, else you'll need to add it to the path manually.
  2. Download or checkout PHP-reader 1.8.1 or better and add it to your PHP include path.
  3. Grab the source from Github by cloning the repo (or downloading the zip).

What if I can't add things to my php include path?

If, in stages (1) and (2), you weren't able to add Zend and PHP-Reader to your PHP include path, you can do the following:

In the root of Describr, copy {describr root}/lib/bootstrap.custom.php-sample to {describr root}/lib/bootstrap.custom.php. You'll now need to edit {describr root}/lib/bootstrap.custom.php and provide a path to PHP-Reader. You might also need to set a path to Zend Framework in there too.

If you don't include the Zend Framework and PHP-Reader, Describr will still work but you will not be able to use it to get much information about audio/video files.

Using Describr

It's time to test that Describr is installed and ready to rock! We've provided a simple command line executable script to help you do this, so if you're on Linux/OSX/Unix, chmod {describr root}/bin/describr to be executable. On Windows, you should just be able to run {describr root}/bin/describr.bat:

gavd@gavd-desktop:/opt/BoxUK/describr$ bin/describr
describr - tell me about your file...
(c) 2011 Box UK
Usage: describr [path to file]
       describr [path to file 1] [path to file 2] ... [path to file N]

If you don't see the above, head back to the Installation section and see if there's anything you've missed. Failing that, there's a Troubleshooting section below

and then you should be able to do something like:

gavd@gavd-desktop:/opt/BoxUK/describr$ bin/describr tmp.txt
Analysing tmp.txt...
array(2) {
  ["BoxUK\General"]=>
  array(6) {
    ["errors"]=>
    array(0) {
    }
    ["extension"]=>
    string(3) "txt"
    ["type"]=>
    string(8) "document"
    ["mimeType"]=>
    string(10) "text/plain"
    ["fileSizeInBytes"]=>
    int(4)
    ["fileSize"]=>
    string(11) "Extra Small"
  }
  ["BoxUK\PlainText"]=>
  array(4) {
    ["errors"]=>
    array(0) {
    }
    ["lines"]=>
    int(1)
    ["characters"]=>
    int(4)
    ["words"]=>
    int(1)
  }
}

So, we're up and running! Unless we're not, in which case:

Troubleshooting

Please check that:

  1. Make sure you've chmodded {describr}/bin/describr to be executable if you're on Linux/Unix/OSX:
    gavd@gavd-desktop:/opt/BoxUK/describr$ chmod u+x bin/describr
  1. Zend Framework and PHP-Reader are both on the PHP include path OR you are using a custom bootstrap.php. In the latter case, check that {describr root}lib/bootstrap.custom.php exists and $describr_pathToPHPReaderLibrary is set correctly. It must point to the "library" or "src" directory (depending on the version of php-reader you're using) - the one that contains the directory "Zend"
  2. Zend Framework should be installed. Installing this by PEAR is probably the cleanest way to do this, but if you are not able to use PEAR (e.g. you're on restrictive shared hosting, you can add the following to {describr root}/lib/bootstrap.custom.php:
<?php
// ... add the line below to the end of the file
set_include_path('.:/home/you/yourProject/lib/ZendFramework-1.11.1/library');

Of course, you will have to adjust the paths to point to where Zend is installed. This is only as a last resort, it's better to use PEAR.

Any other problems, please put in an issue on the Github project for Describr and we'll try to help you out. As much info as possible, please!

Plugins

Describr is based around plugins. Each plugin has a list of file types that it knows how to describe. The built-in plugins live in {describr root}/lib/BoxUK/Describr/Plugins/BoxUK. Each plugin has one class that implements Plugin, mainly by extending AbstractPlugin.

Plugin dependencies

You specify the dependencies for each plugin in the plugin's code. Here's an example from ImagePlugin.php:

/**
 * Make sure that this plugin has everything that it needs - i.e. GD
 *
 * @throws UnmetDependencyException If a dependency is not met
 */
public function checkDependencies() {
    if (!extension_loaded('gd') || !function_exists('gd_info')) {
        throw new UnmetDependencyException('GD is not installed');
    }
}

If the dependencies of a plugin are not met (i.e. checkDependencies() fails), that plugin cannot be used, and will fail silently and record an error when it finds a file it can match but can't report on because of missing dependencies:

gavd@gavd-desktop:/opt/BoxUK/describr$ bin/describr tests/resources/test.mov
Analysing tests/resources/test.mov...
array(2) {
  ["BoxUK\General"]=>
    *removed for brevity*
  ["BoxUK\AudioVideo\Iso14496Video"]=>
  array(1) {
    ["errors"]=>
    array(1) {
      [0]=>
      string(1236) "This plugin matched the file tests/resources/test.mov, but the dependencies could not be matched. Details:
exception 'BoxUK\Describr\Plugins\UnmetDependencyException' with message 'Class Zend_Media_Iso14496 is not loaded - please ensure the php-reader library is on the include path if you wish to use this plugin' in /opt/BoxUK/describr/lib/BoxUK/Describr/Plugins/BoxUK/AudioVideo/Iso14496VideoPlugin.php:30
Stack trace: *removed for brevity*

    }
  }
}

Creating a plugin

Let's create a trivial plugin. This plugin will just estimate the number of tags in an XML file by counting the < characters and dividing by two - of course, you'd probably not do this in a production application but it should serve to illustrate Describr plugins!

We create a file {describr root}/lib/BoxUK/Describr/Plugins/Custom/XmlPlugin.php with the following contents:

<?php

namespace BoxUK\Describr\Plugins\Custom;

/**
 * Plugin for automatically describing XML files
 *
 * @package   BoxUK\Describr\Plugins\BoxUK
 * @author    Box UK <[email protected]>
 * @copyright Copyright (c) 2011, Box UK
 * @license   http://opensource.org/licenses/mit-license.php MIT License and http://www.gnu.org/licenses/gpl.html GPL license
 * @link      http://github.com/boxuk/describr
 * @since     1.0.5
 */
class XmlPlugin extends \BoxUK\Describr\Plugins\AbstractPlugin
{

    /**
     * @return array Types of file this plugin can determine information about
     */
    public function getMatchingMimeTypes() {
        return array(
            'text/xml'
        );
    }

    /**
     * @return array File extensions this plugin can determine information about.
     * The "." is not included, so "wmf" is OK, ".wmf" is not
     */
    public function getMatchingFileExtensions() {
        return array(
            'xml',
            'xsl',
        );
    }

    /**
     * Stub out configuration loading
     */
    protected function loadConfiguration() {}

    /**
     * @return array With key 'tags' which is a count of tags in this document
     */
    protected function setAttributes() {
        $fileContents = file_get_contents($this->fullPathToFileOnDisk);
        $tagOpenCount = substr_count($fileContents, '<');
        $tagCloseCount = substr_count($fileContents, '</');
        $tagCount = $tagOpenCount - $tagCloseCount;

        $this->attributes['tags'] = $tagCount;
    }
}

Notice we haven't specified any dependencies here - it's all vanilla PHP.

Now, if we run our test script against Describr's own build.xml (for example), the XML plugin should automatically be used!

gavd@gavd-desktop:/opt/BoxUK/describr$ bin/describr build.xml
Analysing build.xml...
array(3) {
  ["BoxUK\General"]=>
  array(5) {
    ["errors"]=>
    array(0) {
    }
    ["extension"]=>
    string(3) "xml"
    ["type"]=>
    NULL
    ["mimeType"]=>
    string(15) "application/xml"
    ["fileSizeInBytes"]=>
    int(1191)
  }
  ["BoxUK\PlainText"]=>
  array(4) {
    ["errors"]=>
    array(0) {
    }
    ["lines"]=>
    int(34)
    ["characters"]=>
    int(1191)
    ["words"]=>
    int(122)
  }
  ["Custom\Xml"]=>
  array(2) {
    ["errors"]=>
    array(0) {
    }
    ["tags"]=>
    int(15)
  }
}

Accessing through your project

Using Describr on the command line is all well and good for getting going and testing plugins, but the meat of it is using it in a project to tell you about files. You can do something like:

<?php
// ...
$this->describr = new \BoxUK\Describr\Facade();
$responseFromDescribr = $this->describr->describeFile($pathToFile);

This will give you a BoxUK\Describr\MediaFileAttributes object, which you can interrogate to find out which plugins said what about the file! The toArray method gives you what we've used so far in the command line scripts, or you can access it plugin-by-plugin for more fine-grained and powerful control.

Get involved!

If you find this project useful, or you've found a cool way of using it, let us know! If you've written any useful plugins, then by all means submit a pull request and we'll take a look!

describr's People

Contributors

craigmarvelley avatar rodnaph avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

describr's Issues

Command line scripts should be included in PEAR package

We need to install the scripts that live in the bin directory as part of the PEAR install, so you can work with it on the command line from the off.

It also needs to allow you to setup your /lib/bootstrap.custom.php as part of the install process. Some PEAR packages seem to have an interactive install, so hopefully we could take advantage of that.

Failed to find mime.types

Running on Windows, the following error is raised:

Warning: fopen(/etc/mime.types): failed to open stream: No such file or directory in describr\0.9.2\lib\BoxUK\Describr\Helper\FileHelper.class.php on line 126

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.