Giter Site home page Giter Site logo

wyleyr / profiler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from weblicht/profiler

0.0 2.0 0.0 7.93 MB

A java library able to profile (i.e. determine the mediatype, language etc.) of an arbitrary file.

License: Other

Java 39.83% HTML 47.07% Rich Text Format 13.10%

profiler's Introduction

Profiler

The Profiler is a java library able to profile (i.e. determine the mediatype, format variant and language) of an arbitrary file.

How to use the Profiler

File myFile = new File("/path/to/my/file");
Profiler profiler = new DefaultProfiler();
List<Profile> detectedProfiles = profiler.profile(myFile);

API

A profiler is a Java class satisfying the Profiler interface (Profiler.java). The profiler interface specifies a single function:

List<Profile> profile(File file) throws IOException, ProfilingException;

A profiler can return multiple Profile objects for a single file when there is ambiguity in the data. However, the order of returned profiles is important, and the profile with the highest confidence should be first on the list.

The Profile is a simple data object for storing a data profile (e.g. mediatype, language, version, other features). Use the static builder() function to make a profile builder, or the nested Profile.Flat class for serialization/deserialization.

The default profiler

A profiler can be specialized for detection of a single format, or be more general and perform just a few simple tests then delegate the detection to other specialized profilers. The main profiler (DefaultProfiler) invokes the Apache Tika library for detecting the general mediatype, then invokes specialized profilers for various formats (xml, text). These profilers in turn invoke more specialized profilers.

Adding a specialized profiler

To add your own specialized profiler, first add a separate class with its implementation, then find the place where a call to your profiler should be inserted, starting from the DefaultProfiler.

For instance, a profiler for an xml subformat would be placed in the eu.clarin.switchboard.profiler.xml package, and a call to it would be placed in the more general XmlProfiler, in the profile method.

profiler's People

Contributors

emanueldima avatar proycon avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.