openzim / overview Goto Github PK
View Code? Open in Web Editor NEW:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
:balloon: Start here for current projects, how to get involved, and joining community calls. A resource for new and veteran members of the offline commmunity
Please bump Python version used in all python repos (docker images) to the
latest minor versions (within that major version)
Note: this is an automatic reminder intended for the assignee(s).
Hi,
I tested artofproblemsolving_en_all_maxi_2019-04
zim file, some pages are empty or missing elements in the page.
I also tested it in this link : http://library.kiwix.org/artofproblemsolving_en_all_maxi_2019-04/A/Main_Page
the result is the same.
I guess this will mean we have to move to a cloud code signing solution. For example https://www.ssl.com/esigner/ (electron-userland/electron-builder#6158)
https://www.ssl.com/how-to/cloud-code-signing-integration-with-github-actions/
From @kelson42 on March 16, 2017 11:53
Data are here https://world.openfoodfacts.org/data
Copied from original issue: openzim/zimfarm#1
@Popolechien commented on Jan 24, 2020, 3:01 PM UTC:
There is a bunch of wikis running on that mediawiki competitor, it would be a neat addition to our portfolio.
This issue was moved by kelson42 from openzim/zim-requests#221.
Please bump Python version used in all python repos (docker images) to the
latest minor versions (within that major version)
Note: this is an automatic reminder intended for the assignee(s).
In many of our projects, we deploy local development servers (e.g., an API and a Database) on our development machines for testing purposes. These servers expose a TCP (occasionally UDP) port on our local machine. Currently, there is no standardized convention for the usage of these TCP/UDP ports across projects. For instance, some projects use port 8000 for web APIs, while others use 8080.
Note: This intentionally simplifies the distinction between TCP and UDP ports and assumes we don't want two distinct services, one on TCP and one on UDP, running on the same port number. Although technically possible, it's deemed cumbersome for our purposes.
The absence of a convention on TCP/UDP port assignments for local development services leads to two issues:
docker compose up -d
.We can address the problem by establishing a convention for TCP/UDP port assignments.
The proposed convention is to use port XXXY for every server in our systems, where:
All feedbacks are welcomed, after that I will transition this to a Wiki entry.
@srv-twry commented on Feb 13, 2018, 9:31 PM UTC:
Proposal
Most of the open-source organisations these days have a bot to manage things like assigning, unassigning developers. Adding labels like "In-Progress" etc.
OpenDataKit - opendatakitbot handles labelling/unlabelling of issues and pr's using commands.
You can claim an issue using it.
It automatically assigns developers and shows a welcome message if it's your first contribution to the project.
It automatically un-assigns the developer in 7 days if there isn't any activity associated with it.
Here is an example issue and how it works.
Other organisations such as Fossasia also have their bots.
This will be very helpful in the long run specially with GSoC round the corner.
@mhutti1 @kelson42 Please review.
PS: I don't know how to make one but i can certainly ask them if it seems interesting to you.
This issue was moved by kelson42 from kiwix/kiwix-android#373.
Please assess most appropriate major Python version to use for next year.
https://en.wikipedia.org/wiki/History_of_Python#Table_of_versions
If different than current one, upgrade all python repositories to it.
Note: this is an automatic reminder intended for the assignee(s).
In our standard stack, we previously decided to use Vuex for state management: https://github.com/openzim/overview/wiki/openZIM-workflow#frontend
This is not anymore the official state management management library : https://vuex.vuejs.org/
I recommend to adapt our standard to use Pinia instead of Vuex, so that new projects (including offspot/metrics frontend) use this new Vue official library.
I need you to help me clarify the PR review process, because it is too blurry for me and causing frustration.
I do not find enough precision in kiwix/overview/CONTRIBUTING.md or in openzim/overview Wiki. Please point me to the correct direction if I missed something.
This is a mid-term enhancement, I hope to reach a conclusion on this within few weeks, but there is clearly no hurry.
This issue describes below the situation(s) that occurred between Renaud and I to give a background, but this is not a personal conflict (at least from my position 😄 ).
I've once again failed to understand the PR process and caused frustation to Renaud by resolving a conversation too soon (this time) : offspot/metrics#25 (comment)
I had understood that once PR is approved, I have to resolve all conversations on my own because they are just comments to raise my awareness should I want to fix this. But obviously this is not always the case. And I find that it is weird to have pending conversation but an approved PR.
In contradiction to that, on an unapproved PR Renaud asked me to stop waiting for him on unresolved conversations and resolve the conversations on my own once the requested code change is done, but I'm always hesitating to do that because I often have doubt whether I got the code change request properly and whether I did the right change.
In the past, the situation already occurred that Renaud was disappointed by a change I've made after his PR approval and explicit request to resolve conversation and proceed with the merge once the change is done. This was exactly because I didn't applied the change / understood the change request properly. I consider that misunderstanding is normal.
All this is very frustrating for all of thus, and I would really like to come up with a much simpler process / rule(s). Simpler meaning, for me, less subject to personal interpretation.
Luckily (or not), we are not the only ones to struggle on this topic (I just did a very fast Google search):
My process proposition is:
This has some drawbacks of course (probably more work for the reviewer), but it is way clearer for me.
I would like also to add some resources to read as a PR author:
And other to read as a PR reviewer:
Suggestions / feedbacks welcomed!
copy/paste from zimfarm/719
Some of our zim files have fairly long descriptions and we end up with a block of text. It would be convenient if in the Description field of recipes we could insert some basic HTML (e.g.
) so as to render this text in a more palatable format
We are now sufficiently affected by sub-dependencies issues for it to be necessary.
Scrapers and other Python projects should all be switched from good-old requirements.txt to repeatable, frozen environments. Harmonizing how we build/publish (from python perspective, not publish workflows!) should probably be looked at as well.
I'd suggest using pipenv
/Pipfile
/Pipfile.lock
but looking at what's the current recommended way is mandatory.
Currently, it looks like there are a lot of places where documentation about content edition is placed, and I probably miss some locations.
This documentation is about:
Some documentation is stored in various google docs (and I don't know where most of them are stored).
Some documentation is placed in the Zimfarm wiki (https://github.com/openzim/zimfarm/wiki), e.g. https://github.com/openzim/zimfarm/wiki/Ticket-Lifecycle-(Zimit), https://github.com/openzim/zimfarm/wiki/Tickets-Lifecycle-(Mwoffliner), https://github.com/openzim/zimfarm/wiki/Youtube-scraper-configuration-and-debug
And I just added a new location with Kiwix content Google shared drive.
I find this situation not convenient because:
From my perspective we should have:
WDYT? Did I missed some locations? What would you suggest?
For the sake of harmonization and ease of dev/maint, we want to use a single major Python version as base for all our python projects.
Following https://en.wikipedia.org/wiki/History_of_Python#Table_of_versions, we want to use 3.11
as of now.
As such, support for python 3.6
(not receiving secu updates) and 3.7
(secu updates to end in 6m) will be dropped from those projects.
Python scrapers must be updated to use python-scraperlib 3.x
At the same time, they must:
For every scraper, do not forget to also update Zimfarm configuration to add these new CLI parameters + set type (input or textarea) + maximum length.
This is an overview ticket, works has to be done in the individual scrapers:
It seems to be defficult to get a natural/grassroot agreement about coding style. In particular the case regarding variables, types, classes, constants, etc. I would be happy if we get a minimal agreement on this.
Hi,
First of all thank you for developing and providing this tool.
I use Zimserver
python module to serve zim
archive in localhost
.
I noticed that wikipedia_tr_all_novid_2019-04.zim
wouldn't render the pages, I just see blurry empty pages. I also tried other tools such as web-archives and kiwix but none of them worked.
How can I create zim
archive of https://tr.wikipedia.org
by myself?
I also want to inform maintainers of https://download.kiwix.org/zim about the issue.
Not sure if this is the right place to note this issue. It may be "by design", which is fine, but at least one of the ZIMs for Stackexchange -- stackoverflow.com_eng_all_2017-05.zim -- is 52GB, so a split version could be useful if time/resources permit. Or if not, perhaps a README should be put in the /portable/stackexchange directory pointing people to the FAQ where it explains how to split files manually.
And specify it is the ZIM specification.
For the moment, we have only the ZIM creation date, but this might be really different from the content publication date, in particular if the content is really old.
Folllowing a comment from https://github.com/veloman-yunkan at kiwix/libkiwix#702 (comment)
Currently there is a link in https://wiki.openzim.org/wiki/ZIM_file_format#Namespaces to documentation on the fulltext
index that resides in namespace X of some ZIMs. Clicking on this link leads to https://wiki.openzim.org/wiki/ZIM_Index_Format, which is a mainly empty page "There is currently no text in this page". It would be useful to have some information about the format.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.