Giter Site home page Giter Site logo

kiwix-apache's Issues

Engineering: Improve the CB/CI/Testing of this codebase

Let's start well, and consider ways to improve how we build, integrate and test our work.

  • How can we find ways to test Apache Modules, especially automatically?
  • What can we do to convert the current basic build script into something more capable, refined and useful?

Searched term specific address opening

Is it possible now or could such a feature be added to Kiwix, so as, after configuring once the zim file, when any searchTerm specific address of the extension is visited (like moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/term.html or moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/index.html?title=A/term.html), the proper offline Wiktionary page to open (as happens by visiting https://en.wiktionary.org/wiki/term on the online Wiktionary)?

This is possible with some online Wiktionary extensions e.g. by visiting moz-extension://67f64f05-efc0-4366-8a29-de881b3aa9ad/main.html#input=term#English (Quick Dictionary extension).

Build a deb package

... and publish it to Debian. This is important to get this module adopted my many users.

Engineering: Testing performance and reliability of the module

Even early testing of the module indicates there's work to do to improve it's reliability and resilience so it's able to cope with sustained and frequent requests.
Here's my initial testing:

  1. Install httping (sudo apt-get install httping)
  2. Explore running the command (see below)
    httping -G -b -s -r -i 0.01 localhost/kiwix/A/index.html

After several 1,000 requests I noticed the 302 responses changed and were replaced with 200 response codes. I didn't expect this (much as I'd like httping to follow redirects) so I tested the module in Firefox and found the following error waiting for me.
error 24 opening file "/var/www/html/wiktionary_fr_all_2016-11.zim": Too many open files

So it seems there's some useful work to do to improve the file handling, and then the performance, reliability, robustness, resilience, etc. Oh, and improve the tests too :)

Engineering: Integrate travis-ci build

Let's at least compile the code. No tests yet and this may be hard to test on travis-ci as we'd need an Apache Server configured and running to do a system test.

Engineering: Implement integration/system tests

We've already got a ticket (issue) to implement some unit tests in C++. However these don't test the overall module. httping does, but doesn't check any aspects of the contents served, only the HTTP response code. Here are my outline ideas on adding/improving the testing of this Apache module.

  1. Firstly, it'd be helpful to have the random article feature #9 so we can easily serve some content
  2. Secondly, adding visibility into the runtime state of the module (e.g. by enhancing the ./status method would improve the ability of the tests to check the health of the server.
  3. Let's have some tests in Selenium, these may well be the 'heavyweight' tests in terms of infrastructure, nonetheless they're well understood in the industry and could be extended by others who wish to check for specific contents (from a known ZIM file).
  4. Perhaps some 'headless' browser tests would also be useful, they're likely to run faster and have a lower overhead than Selenium tests.
  5. Investigate whether we can use memory leak tools such as valgrind as suggested by other members of the Kiwix project team. In addition, explore ways to run long-running, high-volume tests to support #8

Personally, each of these help me practice and extend my skills, understanding and abilities. For the project they can help us to increase our confidence in this codebase. As AFAIK we don't have any users of this module yet there's little rush or external pressure to add these tests or capabilities. Let's measure, so we can also improve what we do (and how we do it).

Gameplan for implementing a ZIM module for Apache

Now that we have a p-o-c comprising a simple C++ file that also integrates and uses zimlib we're able to start sketching out an approach to implementing a module that's useful within the context of serving content from ZIM files via Apache Server. Here's the first proposal to enhance the current p-o-c. Our aim is to get a more functional alpha prototype as the next stage. It's unlikely to be production-ready.

No doubt the actual implementation steps and the functionality will change. That's fine, we'll iterate and learn by doing.

  1. Devise an interface for URL requests to navigate and serve the contents from a ZIM file. We'll investigate what we can reuse from kiwix-serve e.g. perhaps the protocol and URL parameters can be replicated? perhaps some of the code can be repurposed, etc.
  2. Switch to using kiwixlib rather than zimlib. I understand kiwixlib offers various useful improvements.
  3. Implement one or two basic capabilities e.g. /random so we can see the effects of serving content and rendering it in a web browser.
  4. Decide and then implement additional functionality for an alpha.

There are some technical challenges too. These include addressing potential limitations of using static libraries which may reduce portability of the code.

Target platforms include:

  • Debian (we're developing the code in Debian, so it'd better work on Debian)
  • Raspberry Pi (useful for field deployments)
  • Windows XP/Vista/7 (as a possible candidate for projects such as RACHEL-USB)

Content-Type sometimes incorrectly set in HTTP responses

As Mossroy kindly noted during his detailed testing the http content-type seemed to be sometimes set incorrectly. Sometimes the last letter is missing e.g. it's set to text/htm rather than text\htmlother times it's seemingly set correctly, and occasionally it seems to contain any random text. All indicate a classic C style memory issue.

Resolving licensing and permissions for kiwix-apache

I/we would like to make sure we address software licensing and permissions for kiwix-apache so that it's acceptable to users and contributors for this project. Some of the current code is based on the current implementation of kiwix-serve.cpp https://github.com/kiwix/kiwix-tools/blob/master/src/server/kiwix-serve.cpp which is currently released under GPLv3+ as part of establishing the proof-of-concept for the viability of creating an Apache module for kiwix. If we decide and agree that the GPLv3 code can be incorporated and made available as part of kiwix-apache under the current license for kiwix-serve that's fine, otherwise we may need to ask permission and possibly also consider changing the license for kiwix-serve. Another alternative may be to write kiwix-apache afresh, not based on kiwix-serve, however that may lead to additional maintenance and support challenges since we'd need to maintain and update both codebases in parallel. Personally I'd prefer us to establish a compilation unit (i.e. source code) that is common to both kiwix-apache and kiwix-serve and refactor both these servers to share the common code.

I'd like to resolve any licensing and permission issues before we officially 'release' kiwix-apache.

So we know who we may need to involve in the process the current contributors to kiwix-serve.cpp are listed as:

@kelson42
@mgautierfr
@kiranmathewkoshy
@rgaudin
@Skylsmoi
@ShivamSarodia
@schuellerf

Note: @Skylsmoi contributions are after kiwix-apache was created.

Please see also kiwix/libkiwix#60 regarding the licensing of kiwix-lib and openzim/libzim#30 regarding the licensing of zimlib (used by kiwix-lib).

TBC: Provide an externally callable way to track usage

Various people are keen to see how Kiwix is being used, what's popular (and what's not), etc. This issue is to gather various ideas on how to achieve this congruently in the Apache Server microcosm.

There are many ways we could implement this feature, ranging from an entirely internal module (that tracks the requests it's served), to using existing, other Apache modules (from various sources), to log-file parsing on-the-fly.

  • Internal: will involve using memory and local storage, and formatting the output. The code could probably be modularised and perhaps be useful for other modules.
  • Using existing modules: it's very likely there'll be at least one existing module available, probably as part of the main Apache Server project, that'll help implement the capabilities. We'd need to search for suitable modules, try them and write up our perspective on their suitability and viability.
  • Log-file parsing: This could be fairly compute intensive and risk exposing sensitive server-internals as it'd access files on the server and present information externally. We'd want/need ways to filter out data from other modules. Like the internal approach, perhaps we could write something that'd be generally useful - particularly if we don't find a suitable, existing module. Perhaps a scheduled task could generate the usage statistics e.g. on a daily basis (as webmasters have done for several decades).

For the internal option, perhaps the reporting could be integrated with the ./status request.

Feature: Random articles

It'd be good to add the random article feature/capability to kiwix-apache. This would reflect the behaviour of other Kiwix implementations e.g. the Android app and in kiwix-serve. The feature seems fairly simple to implement especially if it's modeled on the implementation in kiwix-serve.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.