Giter Site home page Giter Site logo

alexlaforge / inferno Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brigr/inferno

0.0 2.0 0.0 455 KB

InFeRno: An Intelligent Framework for Recognizing Pornographic Web Pages

License: GNU Lesser General Public License v3.0

C++ 98.63% Shell 1.37%

inferno's Introduction

InFeRno is an open-source implementation of the system described in [1]
and demonstrated at ECML/PKDD 2011 [2], available under the terms of GNU
LGPL (v3 or later).

InFeRno currently consists of two components:
a. A C-ICAP module -- srv_inferno -- responsible for prefetching web
   data to a local data store and for maintaining state in a MySQL
   database.
b. An image classification service -- Sead -- operating on files in the
   above data store, classifying them in one of three categories;
   namely, "benign", "bikini", and "porn".

By design, the system makes heavy use of threads:
* C-ICAP is using a process-pool model, where each of the worker
  processes maintains a separate thread pool. Each process instantiates
  a single InFeRno object and then threads from the pool are assigned to
  incoming requests, calling on InFeRno routines to do the task at hand.
* Each InFeRno thread is also using a thread pool of its own (whose size
  can be tweaked via srv_inferno.conf) to parallelize the transfers of
  requested web objects.
* Last, Sead is also using a thread pool for the classification
  requests. As image classification is highly CPU bound, the size of
  this pool is computed automatically, based on the number of processing
  cores in the host system.

All state is stored in a MySQL database table. Although all of the above
threads read/update rows of the same table, our design allows these
operations to be performed safely without any table/row locks, thus
resulting in a very low overhead concerning database I/O operations.

To avoid expensive data conversion/escaping operations, fetched web
objects are stored as files in a Squid-like directory structure, using
the MD5 of the object's URL as the filename. Please, see the section on
run-time dependencies for more on this.

Sead and InFeRno log all messages to syslog (facility: "local1", name:
"InFeRno"), unless built with the --enable-debug configure parameter in
which case all output is redirected to stderr. Several other aspects of
the system can be tweaked through srv_inferno.conf. Please refer to the
comments in said file for more information.

InFeRno is work-in-progress, in the sense that it is constantly updated
and tweaked, and is provided as-is.

For any bug reports, feature requests, or other comments, please contact
the authors.

[1] http://www.cs.uoi.gr/tech_reports/publications/TR-2011-09.pdf
[2] http://www.ecmlpkdd2011.org/acceptedDemos.php


+----------------------------------------------------------------------+
+ InFeRno build-time dependencies                                      +
+----------------------------------------------------------------------+

In order to build InFeRno, you will need the following packages (Ubuntu
package names in parentheses):
- GNU C++ compiler          (g++)
- GNU Automake              (automake)
- GNU Autoconf              (autoconf)
- GNU Libtool               (libtool)
- GNU make                  (make)
- XML/HTML parser library   (libxml2-dev)
- MySQL client library      (libmysqlclient-dev)
- URI parser library        (liburiparser-dev)
- C-ICAP API library        (libicapapi-dev)
- OpenSSL library           (libssl-dev)
- CURL library              (libcurl3)
- Croco library             (libcroco3-dev)
- LibSVM 3.1+               (libsvm-dev)
- OpenCV 2.1.0+             (libcv-dev)
- OpenCV Contrib            (libopencv-contrib-dev)
- HighGUI library           (libhighgui-dev)

NOTE: We recommend that you compile and install your own build of the
latest Open Computer Vision framework (http://opencv.sourceforge.net/).
Some Linux distributions provide development packages of older versions
of this library (e.g., Ubuntu bundles OpenCV 2.1.0). Although we have
tested and verified our system with some of them, we recommend that you
go with the 2.3.0 bundle.

+----------------------------------------------------------------------+
+ InFeRno run-time dependencies                                        +
+----------------------------------------------------------------------+

To run InFeRno, you will also need:

1. MySQL:

   It is mandatory that you have a server running MySQL with InFeRno's
   database schema (see doc/schema.sql), accessible by the C-ICAP module
   and Sead. The credentials can be configured through srv_inferno.conf.

2. Spool directory:

   As already mentioned, InFeRno spools and caches downloaded data in a
   Squid-like directory hierarchy. You will need to define a directory
   in srv_inferno.conf, making sure C-ICAP and Sead have full (rwx)
   access on it.

   For better performance it is advised that the spool directory resides
   on a memory/swap-backed storage device (aka. ramdisk). An easy way of
   accomplishing this in Linux is by using the tmpfs file system (see
   mount(8) for more information).

   When operating in this mode, you will have to make sure that either
   the cache database is emptied if the tmpfs volume is unmounted, or
   take extra steps to store cached data on a disk-backed volume and
   restore them in the tmpfs volume when it is remounted.

3. C-ICAP:

   InFeRno is implemented as a C-ICAP module, so a C-ICAP server is also
   mandatory. See the bundled srv_inferno.conf configuration file for
   ways of enabling InFeRno within C-ICAP and for relevant configuration
   parameters.

4. ICAP-enabled web proxy server:

   Last, unless you have an ICAP-capable web client, you will also need
   a web proxy cache to act as an intermediate between web clients and
   C-ICAP. Squid3 (squid3 v3.1.11) was used during the development of
   InFeRno, but any other ICAP-enabled web proxy server should be fine.
   InFeRno is built to work on the request/pre-cache path (that is, in
   the reqmod_precache vectoring point) and does not support PREVIEW
   requests. For more information on configuring Squid for use with an
   ICAP service, please refer to the documentation of Squid.

--

Copyright (C) 2011:
* Nikos Ntarmos <[email protected]>
* Sotirios Karavarsamis <[email protected]>

inferno's People

Contributors

brigr avatar ntarmos avatar ppaulojr avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.