Giter Site home page Giter Site logo

jupitex / sisyphe Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 1.0 126.38 MB

:eyeglasses: NodeJS BIG-DATA analyser & transformer clustered processing chain

License: MIT License

JavaScript 99.90% Smarty 0.10%
nodejs xpath pdf xml extract-keywords analyzer multithread javascript poppler pdfjs

sisyphe's Introduction

Build Status bitHound Overall Score

sisyphe

Sisyphe

Sisyphe is a simple NodeJS (recursive) folders analyser application & a (lerna) git monorepo.

Basically it can provided somes informations, check here for informations

Sisyphe-pic

Requirements

Tested with [email protected], [email protected]

Works on Linux/OSX/Windows

Example to run a quick local redis (thanks to docker):

docker run --name sisyphe-redis -p 6379:6379 redis:3.2.6

Install it

  1. Download the latest Sisyphe version
  2. Just do : npm install (this will execute a npm postinstall)
  3. ... that's it.

Test

npm run test will test sisyphe & its workers

Help

./app.js --help Will output help

Options

-V, --version               output the version number
-n, --corpusname <name>     Corpus name (session name)
-s, --select <name>         Choose modules for the analyse
-c, --config-dir <path>     Configuration folder path
-t, --thread <number>       The number of process which sisyphe will take
-b, --bundle <number>       Regroup jobs in bundle of jobs
-r, --remove-module <name>  Remove module name from the workflow
-q, --quiet                 Silence output
-l, --list                  List all available workers
-h, --help                  output usage information

How it works ?

Just start Sisyphe on a folder with any files in it.

node app -n sessionName ~/Documents/customfolder/corpus

node app -n sessionName -c ~/Documents/customfolder/folderResources ~/Documents/customfolder/session

Sisyphe is now working in background using all your computer threads. Just take a coffee and wait , it will prevent you when it's done :)

The result of sisyphe is present @ sisyphe/out/{timestamp}-corpusname/ (errors,info,duration..)

Interface

For a control panel & full binded app, go to Sisyphe-monitor sisyphe has a server that allows to control it and to obtain more information on its execution. Simply run the server with npm run server to access these features

Sisyphe-dashboard

Modules

These are the default modules (focused on xml & pdf).

  • FILETYPE Will detect mimetype,extension, corrupted files..
  • PDF Will get info from PDF (version, author, meta...)
  • XML Will check if it's wellformed, valid-dtd's, get elements from balises ...
  • LANG Will detect lang of files (xml/text files ...)
  • XPATH Will generate a complete list of xpaths from submitted folder
  • OUT Will export data to json file & ElasticSearch database
  • NB Try to assing some categories to an XML document by using its abstract
  • MULTICAT Try to assing some categories to an XML document by using its identifiers
  • TEEFT Try to extract keywords of a fulltext
  • SKEEFT Try to extract keywords of a structured fulltext by using teeft algorithm and text structuration

Developpement on worker

When you work on worker, just:

  • Commit your changes as easy
  • Do a npm run updated (to check what worker has changed)
  • Do a npm run publish (it will ask you to change version of module worker & publish it to github)

Modules informations

Some bugs could occured with certains files with 'skeeft' on windows module please just disactivate it until we fix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.