Giter Site home page Giter Site logo

pombredanne / skeleton-test-suite-generator Goto Github PK

View Code? Open in Web Editor NEW

This project forked from exponential-decay/skeleton-test-suite-generator

0.0 1.0 0.0 2.78 MB

DROID Skeleton Test Suite Generator (skeleton-test-suite-generator): Tool for the automated generation of digital objects based on the digital signatures documented in the PRONOM database maintained by The National Archives, UK. The skeleton-test-suite-generator serves to fill the gap that exists whereby the community requires a corpus of digital objects for the validation and evaluation of format identification tools and techniques. The tool should be used to complement a methodology whereby skeleton files are also generated manually by signature developers. The tool takes a signature specified for a digital object in PRONOM and constructs a digital object that will match its footprint. For more information, see the README.md associated with the project...

License: zlib License

skeleton-test-suite-generator's Introduction

#DROID Skeleton Test Suite Generator (skeleton-test-suite-generator)

Herein lies a tool for the automated generation of digital objects based on the digital signatures documented in the PRONOM database maintained by The National Archives, UK: PRONOM Data is licensed under the Open Government Licence (OGL): http://www.nationalarchives.gov.uk/doc/open-government-licence/

The skeleton-test-suite-generator serves to fill the gap that exists whereby the community requires a corpus of digital objects for the validation and evaluation of format identification tools and techniques. The tool should be used to complement a methodology whereby skeleton files are also generated manually by signature developers.

The tool takes a signature specified for a digital object in PRONOM and constructs a digital object that will match its footprint. For example, given the signature:

CAFED00D{4}CAFEBABE(0D|0D0A)

The hex sequences comprising digital objects that will match this signature in DROID will look like the following:

CA FE D0 0D 00 00 00 00 CA FE BA BE 0D

Or:

CA FE D0 0D 00 00 00 00 CA FE BA BE 0D 0A

The scripts take an export of the PRONOM database in XML, extract the internal signature information belonging to each format record and generate the digital objects - creating the 'skeleton test suite'.

The objects can be used for:

  • Understanding where signatures in the PRONOM database will conflict, therefore generating multiple identifications for some files.

  • Creating signatures purely based on format specifications where getting sample files or making them available to those able to create signatures is extremely difficult.

  • Incorporation into the DROID unit test-suite to ensure modifications to identification engine do not impact identification capability.

  • Test the stability of signature files over time.

Other benefits include a small footprint - zipped the suite is just over 150kb in size. Unzipped the suite is approx 390kb.

Does not suffer issues relating to IPR and copyright. The suite and generator tool, licensed under CC BY-SA (see below).

The tool so far is a prototype and it doesn't handle every sequence in PRONOM as of yet. Signatures with multiple BOF sequences, for example, will not generate correctly. While this can be corrected by the team working on PRONOM, these are legitimate sequences that should be handled by the tool.

###HOWTO

python skeletongenerator.py

Easy as. The scripts require the existence of the 'pronom-export' folder generated by the scripts in the pronom-xml-export repository: https://github.com/exponential-decay/pronom-xml-export

The input and output locations can be configured by modifying the accompanying cfg file skeletonsuite.cfg.

Files are generated by default by using NULL bytes to 'fill' the file as dictated by a signature. This can be configured in the cfg file using the character value for the requested fill values or <0 or >255 for random bytes.

Version information can be displayed by running:

python skeletongenerator.py --version

###TODO

  • Handle multiples of sequence types, e.g. multiple non-colliding BOF sequences.

  • Understand the requirements for metadata to be associated with files, e.g. should the internal structure of files be self-describing?

  • A repository needs to be created on GitHub to host the first non-prototypical output of this generator and the test-suite henceforth.

  • Understand what do we need to do with multiple combinations of byte sequences - currently we always turn-left.

  • Unit tests for signature2bytegenerator.py and filewriter.py as a priority.

###For the community TODO

  • Incorporate suite into unit tests for DROID and FIDO

  • Together understand if we can adapt this approach for the UNIX File utility

  • Talk about this tool and potential approach and help to understand how to refine it!

  • Sit tight as we build an infrastructure to host the suite itself online.

###License

Copyright (c) 2012 Ross Spencer

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.

  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.

  3. This notice may not be removed or altered from any source distribution.

skeleton-test-suite-generator's People

Contributors

ross-spencer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.