Giter Site home page Giter Site logo

kiwix-apache's Introduction

Kiwix

Kiwix is an offline reader for Web content. It's especially thought to make Wikipedia available offline. This is done by reading the content of the project stored in a file format ZIM, a high compressed open format with additional meta-data.

COMPILATION INSTRUCTIONS

Take care, the paragraphs are about the target platforms. If you want to build Kiwix for Android on a GNU/Linux system, you should follow the instructions of the "Android" paragraph.

GNU/Linux

Install pre-requisties in your distro, eg, in Debian based:

sudo apt-get install zip pkg-config aptitude
sudo apt-get install libmagic-dev
sudo aptitude install libtool automake

Static (Probably what you need to do if you don't know)

Kiwix uses shared libraries only. A static build of Kiwix is a build which packages the dependencies. Command line tools (indexer, server, etc) are compiled statically.

Run automake

./autogen.sh

Run autoconf

./configure --enable-compileall --enable-staticbins --disable-android;

Download all dependencies

cd src/dependencies;
make;
cd ../..;

Reconfigure with dependencies (Gecko version)

./configure --enable-compileall --enable-staticbins --disable-android;

Compile

make;

Creates tarbal suitable for distribution (no deps)

make diststatic
  • If you want to be able to run directly from your dev repository copy the "xulrunner" directory in the distributation tarball to the "kiwix" sub-directory of your developement tree.

  • Don't try to install with "make install", if you want to have kiwix install on your system just copy the directory in a /usr/local/bin directory.

Dynamic

GNU/Linux build uses shared libraries by default. You will need the following dependencies (also -dev packages) to compile Kiwix for Linux (this list can vary a little bit depending of the GNU/Linux distribution):

  • g++
  • gcc
  • autoconf
  • automake
  • libtool
  • pkg-config
  • liblzma
  • libicu
  • libmicrohttpd
  • zlib
  • libcrypto++
  • cmake
  • wget
  • aria2
  • libuuid
  • libssl
  • libzim
  • libpugixml
  • libctpp2
  • xulrunner

Debian is the only distribution providing natively all these packages. Ubuntu provides most of them, excluding xulrunner and zimlib. You will have to download and install them separatly or run configure --with-static-dep=SELF. Then run the following commands:

Run automake

./autogen.sh;

Configure the compilation check --help for options. Most dependencies accept --with-dep=XX and with-static-dep=XX.a. Static version of libraries are used for building static binaries (server, indexer, etc) Use --with-dep=SELF or --with-static-dep=SELF to trigger fetch and build for the dependency.

./configure;

Compile

make;

Optionaly install on the filesystem

sudo make install;

Mac OSX Universal

WARNING: To build the Mac OS version you will have to install proprietary software which are free of charge. You will also need to build on an Apple Mac computer.

Configure Macports

Install the following tools and libraries

sudo port install autogen +universal pkgconfig +universal wget +universal gmake +universal coreutils +universal libidl +universal autoconf213 +universal icu +universal;

NOTE: the following commands seem to work better on OS X Yosemite, apparently 'universal' is less relevant and various packages are no longer available as universal options.

sudo port  install autogen pkgconfig  wget  gmake  coreutils  libidl  autoconf213  icu;
sudo port -v install aclocal automake libtool autoconf cmake imagemagick
./autogen.sh alt && ./configure --enable-compileall;
cd src/dependencies && make;
make clean;
./autogen.sh alt && ./configure --enable-compileall --enable-compileall --disable-dependency-tracking --with-target-arch=i386;
make;
make distmac;
make clean;
./autogen.sh alt && ./configure --enable-compileall --enable-compileall --disable-dependency-tracking --with-target-arch=x86_64;
make;
make distmac;
make universal;
Debugging tips:
  • components file type can be either Mach-O dynamically linked shared library or Mach-O bundle but above instructions will create dynamic libraries.
  • Shared Object (.so files on Linux) have .dylib extension on OSX.
  • use dtruss to inspect program execution like strace.
  • dyldinfo -lazy_bind | dyldinfo -bind components/zimAccessor.dylib |grep zim
  • nm -gm components/zimAccessor.dylib | nm -u | nm -g
  • otool -L libzim.dylib
  • install_name_tool -change @executable_path/../libicuuc.dylib libicuuc.dylib kiwix-serve
  • lldb

Android

Look at android/README

Windows

  • Install Windows XP SP2+

  • Install Visual Studio Express 2010

  • Install 7-zip

  • Install MozillaBuild 1.6

  • Install ActivePerl

  • Install Ruby

  • Install NSIS 2.46

  • Install nsis_locate

  • Install nsis_uac

  • Replace installed UAC.dll by new one.

  • Install all software in default locations.

  • Change your Windows PATH environment variable:

    • 7zip
    • NSIS
    • ruby
    • Perl
    • mozilla-build\msys\bin\
  • Get shell from c:\mozilla-build\start-msvc10.bat

mkdir -p /c/slave/windows-32b
git clone git://git.code.sf.net/p/kiwix/kiwix kiwix
cd kiwix
./autogen.sh alt
./configure --disable-indexer --enable-jar
make win
make windist
make wininstaller

Contact

Email: [email protected] or [email protected]

Jabber: [email protected]

IRC: #kiwix on irc.freenode.net

You can use IRC web interface on http://chat.kiwix.org/

More... http://www.kiwix.org/wiki/Communication

LEGAL & DISCLAIMER

Read 'COPYING' file

kiwix-apache's People

Contributors

julianharty avatar kelson42 avatar mossroy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kiwix-apache's Issues

Engineering: Improve the CB/CI/Testing of this codebase

Let's start well, and consider ways to improve how we build, integrate and test our work.

  • How can we find ways to test Apache Modules, especially automatically?
  • What can we do to convert the current basic build script into something more capable, refined and useful?

Searched term specific address opening

Is it possible now or could such a feature be added to Kiwix, so as, after configuring once the zim file, when any searchTerm specific address of the extension is visited (like moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/term.html or moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/index.html?title=A/term.html), the proper offline Wiktionary page to open (as happens by visiting https://en.wiktionary.org/wiki/term on the online Wiktionary)?

This is possible with some online Wiktionary extensions e.g. by visiting moz-extension://67f64f05-efc0-4366-8a29-de881b3aa9ad/main.html#input=term#English (Quick Dictionary extension).

Engineering: Testing performance and reliability of the module

Even early testing of the module indicates there's work to do to improve it's reliability and resilience so it's able to cope with sustained and frequent requests.
Here's my initial testing:

  1. Install httping (sudo apt-get install httping)
  2. Explore running the command (see below)
    httping -G -b -s -r -i 0.01 localhost/kiwix/A/index.html

After several 1,000 requests I noticed the 302 responses changed and were replaced with 200 response codes. I didn't expect this (much as I'd like httping to follow redirects) so I tested the module in Firefox and found the following error waiting for me.
error 24 opening file "/var/www/html/wiktionary_fr_all_2016-11.zim": Too many open files

So it seems there's some useful work to do to improve the file handling, and then the performance, reliability, robustness, resilience, etc. Oh, and improve the tests too :)

Engineering: Integrate travis-ci build

Let's at least compile the code. No tests yet and this may be hard to test on travis-ci as we'd need an Apache Server configured and running to do a system test.

Feature: Random articles

It'd be good to add the random article feature/capability to kiwix-apache. This would reflect the behaviour of other Kiwix implementations e.g. the Android app and in kiwix-serve. The feature seems fairly simple to implement especially if it's modeled on the implementation in kiwix-serve.

Engineering: Implement integration/system tests

We've already got a ticket (issue) to implement some unit tests in C++. However these don't test the overall module. httping does, but doesn't check any aspects of the contents served, only the HTTP response code. Here are my outline ideas on adding/improving the testing of this Apache module.

  1. Firstly, it'd be helpful to have the random article feature #9 so we can easily serve some content
  2. Secondly, adding visibility into the runtime state of the module (e.g. by enhancing the ./status method would improve the ability of the tests to check the health of the server.
  3. Let's have some tests in Selenium, these may well be the 'heavyweight' tests in terms of infrastructure, nonetheless they're well understood in the industry and could be extended by others who wish to check for specific contents (from a known ZIM file).
  4. Perhaps some 'headless' browser tests would also be useful, they're likely to run faster and have a lower overhead than Selenium tests.
  5. Investigate whether we can use memory leak tools such as valgrind as suggested by other members of the Kiwix project team. In addition, explore ways to run long-running, high-volume tests to support #8

Personally, each of these help me practice and extend my skills, understanding and abilities. For the project they can help us to increase our confidence in this codebase. As AFAIK we don't have any users of this module yet there's little rush or external pressure to add these tests or capabilities. Let's measure, so we can also improve what we do (and how we do it).

Resolving licensing and permissions for kiwix-apache

I/we would like to make sure we address software licensing and permissions for kiwix-apache so that it's acceptable to users and contributors for this project. Some of the current code is based on the current implementation of kiwix-serve.cpp https://github.com/kiwix/kiwix-tools/blob/master/src/server/kiwix-serve.cpp which is currently released under GPLv3+ as part of establishing the proof-of-concept for the viability of creating an Apache module for kiwix. If we decide and agree that the GPLv3 code can be incorporated and made available as part of kiwix-apache under the current license for kiwix-serve that's fine, otherwise we may need to ask permission and possibly also consider changing the license for kiwix-serve. Another alternative may be to write kiwix-apache afresh, not based on kiwix-serve, however that may lead to additional maintenance and support challenges since we'd need to maintain and update both codebases in parallel. Personally I'd prefer us to establish a compilation unit (i.e. source code) that is common to both kiwix-apache and kiwix-serve and refactor both these servers to share the common code.

I'd like to resolve any licensing and permission issues before we officially 'release' kiwix-apache.

So we know who we may need to involve in the process the current contributors to kiwix-serve.cpp are listed as:

@kelson42
@mgautierfr
@kiranmathewkoshy
@rgaudin
@Skylsmoi
@ShivamSarodia
@schuellerf

Note: @Skylsmoi contributions are after kiwix-apache was created.

Please see also kiwix/libkiwix#60 regarding the licensing of kiwix-lib and openzim/libzim#30 regarding the licensing of zimlib (used by kiwix-lib).

TBC: Provide an externally callable way to track usage

Various people are keen to see how Kiwix is being used, what's popular (and what's not), etc. This issue is to gather various ideas on how to achieve this congruently in the Apache Server microcosm.

There are many ways we could implement this feature, ranging from an entirely internal module (that tracks the requests it's served), to using existing, other Apache modules (from various sources), to log-file parsing on-the-fly.

  • Internal: will involve using memory and local storage, and formatting the output. The code could probably be modularised and perhaps be useful for other modules.
  • Using existing modules: it's very likely there'll be at least one existing module available, probably as part of the main Apache Server project, that'll help implement the capabilities. We'd need to search for suitable modules, try them and write up our perspective on their suitability and viability.
  • Log-file parsing: This could be fairly compute intensive and risk exposing sensitive server-internals as it'd access files on the server and present information externally. We'd want/need ways to filter out data from other modules. Like the internal approach, perhaps we could write something that'd be generally useful - particularly if we don't find a suitable, existing module. Perhaps a scheduled task could generate the usage statistics e.g. on a daily basis (as webmasters have done for several decades).

For the internal option, perhaps the reporting could be integrated with the ./status request.

Build a deb package

... and publish it to Debian. This is important to get this module adopted my many users.

Content-Type sometimes incorrectly set in HTTP responses

As Mossroy kindly noted during his detailed testing the http content-type seemed to be sometimes set incorrectly. Sometimes the last letter is missing e.g. it's set to text/htm rather than text\htmlother times it's seemingly set correctly, and occasionally it seems to contain any random text. All indicate a classic C style memory issue.

Gameplan for implementing a ZIM module for Apache

Now that we have a p-o-c comprising a simple C++ file that also integrates and uses zimlib we're able to start sketching out an approach to implementing a module that's useful within the context of serving content from ZIM files via Apache Server. Here's the first proposal to enhance the current p-o-c. Our aim is to get a more functional alpha prototype as the next stage. It's unlikely to be production-ready.

No doubt the actual implementation steps and the functionality will change. That's fine, we'll iterate and learn by doing.

  1. Devise an interface for URL requests to navigate and serve the contents from a ZIM file. We'll investigate what we can reuse from kiwix-serve e.g. perhaps the protocol and URL parameters can be replicated? perhaps some of the code can be repurposed, etc.
  2. Switch to using kiwixlib rather than zimlib. I understand kiwixlib offers various useful improvements.
  3. Implement one or two basic capabilities e.g. /random so we can see the effects of serving content and rendering it in a web browser.
  4. Decide and then implement additional functionality for an alpha.

There are some technical challenges too. These include addressing potential limitations of using static libraries which may reduce portability of the code.

Target platforms include:

  • Debian (we're developing the code in Debian, so it'd better work on Debian)
  • Raspberry Pi (useful for field deployments)
  • Windows XP/Vista/7 (as a possible candidate for projects such as RACHEL-USB)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.