kiwix / kiwix-apache Goto Github PK
View Code? Open in Web Editor NEW[ARCHIVED] An Apache module for ZIM files
License: Apache License 2.0
[ARCHIVED] An Apache module for ZIM files
License: Apache License 2.0
Let's start well, and consider ways to improve how we build, integrate and test our work.
Is it possible now or could such a feature be added to Kiwix, so as, after configuring once the zim file, when any searchTerm specific address of the extension is visited (like moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/term.html or moz-extension://4bdb0fa0-b097-4e0d-838d-c0bb566f3a71/www/index.html?title=A/term.html), the proper offline Wiktionary page to open (as happens by visiting https://en.wiktionary.org/wiki/term on the online Wiktionary)?
This is possible with some online Wiktionary extensions e.g. by visiting moz-extension://67f64f05-efc0-4366-8a29-de881b3aa9ad/main.html#input=term#English (Quick Dictionary extension).
... and publish it to Debian. This is important to get this module adopted my many users.
This is a minor enhancement, so we don't try to display redirects (which fail with an exception on getMimeType())
Even early testing of the module indicates there's work to do to improve it's reliability and resilience so it's able to cope with sustained and frequent requests.
Here's my initial testing:
sudo apt-get install httping
)httping -G -b -s -r -i 0.01 localhost/kiwix/A/index.html
After several 1,000 requests I noticed the 302 responses changed and were replaced with 200 response codes. I didn't expect this (much as I'd like httping to follow redirects) so I tested the module in Firefox and found the following error waiting for me.
error 24 opening file "/var/www/html/wiktionary_fr_all_2016-11.zim": Too many open files
So it seems there's some useful work to do to improve the file handling, and then the performance, reliability, robustness, resilience, etc. Oh, and improve the tests too :)
Let's at least compile the code. No tests yet and this may be hard to test on travis-ci as we'd need an Apache Server configured and running to do a system test.
We've already got a ticket (issue) to implement some unit tests in C++
. However these don't test the overall module. httping
does, but doesn't check any aspects of the contents served, only the HTTP
response code. Here are my outline ideas on adding/improving the testing of this Apache module.
./status
method would improve the ability of the tests to check the health of the server.valgrind
as suggested by other members of the Kiwix project team. In addition, explore ways to run long-running, high-volume tests to support #8Personally, each of these help me practice and extend my skills, understanding and abilities. For the project they can help us to increase our confidence in this codebase. As AFAIK we don't have any users of this module yet there's little rush or external pressure to add these tests or capabilities. Let's measure, so we can also improve what we do (and how we do it).
Break the need to compile kiwix lib locally.
Now that we have a p-o-c comprising a simple C++ file that also integrates and uses zimlib we're able to start sketching out an approach to implementing a module that's useful within the context of serving content from ZIM files via Apache Server. Here's the first proposal to enhance the current p-o-c. Our aim is to get a more functional alpha prototype as the next stage. It's unlikely to be production-ready.
No doubt the actual implementation steps and the functionality will change. That's fine, we'll iterate and learn by doing.
kiwixlib
rather than zimlib
. I understand kiwixlib offers various useful improvements.There are some technical challenges too. These include addressing potential limitations of using static libraries which may reduce portability of the code.
Target platforms include:
It'd be great to start implementing unit tests for various functions we're using. Let's find at least one unit testing framework for C++ and try it. To try it we'll need some tests, so let's write at least one. Some code restructuring may also help.
As Mossroy kindly noted during his detailed testing the http content-type seemed to be sometimes set incorrectly. Sometimes the last letter is missing e.g. it's set to text/htm
rather than text\html
other times it's seemingly set correctly, and occasionally it seems to contain any random text. All indicate a classic C style memory issue.
I/we would like to make sure we address software licensing and permissions for kiwix-apache so that it's acceptable to users and contributors for this project. Some of the current code is based on the current implementation of kiwix-serve.cpp https://github.com/kiwix/kiwix-tools/blob/master/src/server/kiwix-serve.cpp which is currently released under GPLv3+ as part of establishing the proof-of-concept for the viability of creating an Apache module for kiwix. If we decide and agree that the GPLv3 code can be incorporated and made available as part of kiwix-apache under the current license for kiwix-serve that's fine, otherwise we may need to ask permission and possibly also consider changing the license for kiwix-serve. Another alternative may be to write kiwix-apache afresh, not based on kiwix-serve, however that may lead to additional maintenance and support challenges since we'd need to maintain and update both codebases in parallel. Personally I'd prefer us to establish a compilation unit (i.e. source code) that is common to both kiwix-apache and kiwix-serve and refactor both these servers to share the common code.
I'd like to resolve any licensing and permission issues before we officially 'release' kiwix-apache.
So we know who we may need to involve in the process the current contributors to kiwix-serve.cpp are listed as:
@kelson42
@mgautierfr
@kiranmathewkoshy
@rgaudin
@Skylsmoi
@ShivamSarodia
@schuellerf
Note: @Skylsmoi contributions are after kiwix-apache was created.
Please see also kiwix/libkiwix#60 regarding the licensing of kiwix-lib and openzim/libzim#30 regarding the licensing of zimlib (used by kiwix-lib).
Various people are keen to see how Kiwix is being used, what's popular (and what's not), etc. This issue is to gather various ideas on how to achieve this congruently in the Apache Server microcosm.
There are many ways we could implement this feature, ranging from an entirely internal module (that tracks the requests it's served), to using existing, other Apache modules (from various sources), to log-file parsing on-the-fly.
For the internal option, perhaps the reporting could be integrated with the ./status
request.
It'd be good to add the random article feature/capability to kiwix-apache
. This would reflect the behaviour of other Kiwix implementations e.g. the Android app and in kiwix-serve
. The feature seems fairly simple to implement especially if it's modeled on the implementation in kiwix-serve
.
To date, I've used a hard-coded location and filename for the zim file. This hasn't been a problem when developing or testing locally, however it'd be useful to enable the server administrator to configure the module to specify the location and filename for the zim file.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.