Giter Site home page Giter Site logo

overview's People

Contributors

benoit74 avatar kelson42 avatar rgaudin avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

overview's Issues

Define a convention on TCP/UDP ports used by development stacks

Background

In many of our projects, we deploy local development servers (e.g., an API and a Database) on our development machines for testing purposes. These servers expose a TCP (occasionally UDP) port on our local machine. Currently, there is no standardized convention for the usage of these TCP/UDP ports across projects. For instance, some projects use port 8000 for web APIs, while others use 8080.

Note: This intentionally simplifies the distinction between TCP and UDP ports and assumes we don't want two distinct services, one on TCP and one on UDP, running on the same port number. Although technically possible, it's deemed cumbersome for our purposes.

Problem Statement

The absence of a convention on TCP/UDP port assignments for local development services leads to two issues:

  • After starting a local development stack, it's unclear where the services are listening, causing delays when switching between projects.
    • This becomes more pronounced with the shift to docker-compose-based local dev stacks, initiated with a simple docker compose up -d.
  • Running two local development stacks simultaneously is usually impossible due to port conflicts.
    • This often occurs when transitioning from developing project A to reviewing project B.

Proposition

We can address the problem by establishing a convention for TCP/UDP port assignments.

The proposed convention is to use port XXXY for every server in our systems, where:

  • Y is a number indicating the type of service:
    • UI is always on Y=0
    • Backend server (+/- API) is on Y=1
    • Database is on Y=2
    • Y=3 to 5 are reserved for potential generic usage
    • Y=6 to 9 are available for non-generic services (e.g., a second backend server)
  • XXX is a number reserved per project (Github repository)
    • Each Github repository will reserve a number in a centralized reference.
    • Repositories may reserve multiple numbers if needed, and these numbers are contiguous. If the need wasn't anticipated, the project is moved to other contiguous numbers.
    • To determine where XXX starts, we need a broad port range to accommodate all our projects. Since we don't have many other services running on our development machines and the TCP/UDP port ranges are cluttered with various services, we can use any meaningful port range for these assignments, reserving some numbers for external services if conflicts arise.
    • XXX will hence start at 800, with the 800 and 808 ranges already reserved due to known conflicts with many of our (not yet migrated) projects and other web servers.

Feedback and implementation

All feedbacks are welcomed, after that I will transition this to a Wiki entry.

Proposal: Create a bot for the organisation.

@srv-twry commented on Feb 13, 2018, 9:31 PM UTC:

Proposal

Most of the open-source organisations these days have a bot to manage things like assigning, unassigning developers. Adding labels like "In-Progress" etc.

OpenDataKit - opendatakitbot handles labelling/unlabelling of issues and pr's using commands.
You can claim an issue using it.
It automatically assigns developers and shows a welcome message if it's your first contribution to the project.
It automatically un-assigns the developer in 7 days if there isn't any activity associated with it.
Here is an example issue and how it works.

Other organisations such as Fossasia also have their bots.

This will be very helpful in the long run specially with GSoC round the corner.

@mhutti1 @kelson42 Please review.
PS: I don't know how to make one but i can certainly ask them if it seems interesting to you.

This issue was moved by kelson42 from kiwix/kiwix-android#373.

Clarify the PR review process

I need you to help me clarify the PR review process, because it is too blurry for me and causing frustration.

I do not find enough precision in kiwix/overview/CONTRIBUTING.md or in openzim/overview Wiki. Please point me to the correct direction if I missed something.

This is a mid-term enhancement, I hope to reach a conclusion on this within few weeks, but there is clearly no hurry.

This issue describes below the situation(s) that occurred between Renaud and I to give a background, but this is not a personal conflict (at least from my position 😄 ).

What happened

I've once again failed to understand the PR process and caused frustation to Renaud by resolving a conversation too soon (this time) : offspot/metrics#25 (comment)

I had understood that once PR is approved, I have to resolve all conversations on my own because they are just comments to raise my awareness should I want to fix this. But obviously this is not always the case. And I find that it is weird to have pending conversation but an approved PR.

In contradiction to that, on an unapproved PR Renaud asked me to stop waiting for him on unresolved conversations and resolve the conversations on my own once the requested code change is done, but I'm always hesitating to do that because I often have doubt whether I got the code change request properly and whether I did the right change.

In the past, the situation already occurred that Renaud was disappointed by a change I've made after his PR approval and explicit request to resolve conversation and proceed with the merge once the change is done. This was exactly because I didn't applied the change / understood the change request properly. I consider that misunderstanding is normal.

All this is very frustrating for all of thus, and I would really like to come up with a much simpler process / rule(s). Simpler meaning, for me, less subject to personal interpretation.

Background research

Luckily (or not), we are not the only ones to struggle on this topic (I just did a very fast Google search):

Proposition

My process proposition is:

  • PR must not be approved until the reviewer is happy with the code / conversations
  • no conversation must be left unresolved before the approval is given by the reviewer
  • it is the reviewer responsibility to resolve conversations
  • authors must not resolve conversations, except for obvious code suggestions that have been applied to the code base
  • once the PR is approved it means that the author can merge
  • author must not change the code once the approval is given, at least not without requiring a new approval (but this should be a rare case, normal process is to merge asap to spread the change, and open a new PR for new changes ; this only makes sense if the merge makes no sense, e.g. because a very significant bug has been discovered)
  • should something still have to be discussed in a conversation but is not blocking for the change to be merged, an issue must be open to track the discussion point and the conversation will be resolved (reviewer can explicitly ask the author to do it, or the author can suggest it in a conversation)

This has some drawbacks of course (probably more work for the reviewer), but it is way clearer for me.

I would like also to add some resources to read as a PR author:

And other to read as a PR reviewer:

Suggestions / feedbacks welcomed!

Enable basic html formatting in content descriptor

copy/paste from zimfarm/719

Some of our zim files have fairly long descriptions and we end up with a block of text. It would be convenient if in the Description field of recipes we could insert some basic HTML (e.g.
) so as to render this text in a more palatable format

Capture d’écran 2022-09-23 à 14 41 09

Switch to repeatable Python setups

We are now sufficiently affected by sub-dependencies issues for it to be necessary.

Scrapers and other Python projects should all be switched from good-old requirements.txt to repeatable, frozen environments. Harmonizing how we build/publish (from python perspective, not publish workflows!) should probably be looked at as well.

I'd suggest using pipenv/Pipfile/Pipfile.lock but looking at what's the current recommended way is mandatory.

Note: this should be tackled with #13 and #14

Decide where content team documentation should be placed

Currently, it looks like there are a lot of places where documentation about content edition is placed, and I probably miss some locations.

This documentation is about:

  • what is the overall process of content edition (high level picture of how do we go from a zim request to a published ZIM)
  • how do we configure an offliner kind (youtube, mwoffliner, ...)

Some documentation is stored in various google docs (and I don't know where most of them are stored).

Some documentation is placed in the Zimfarm wiki (https://github.com/openzim/zimfarm/wiki), e.g. https://github.com/openzim/zimfarm/wiki/Ticket-Lifecycle-(Zimit), https://github.com/openzim/zimfarm/wiki/Tickets-Lifecycle-(Mwoffliner), https://github.com/openzim/zimfarm/wiki/Youtube-scraper-configuration-and-debug

And I just added a new location with Kiwix content Google shared drive.

I find this situation not convenient because:

  • the various google docs are not centralized and will probably get lost at some point ; from my experience, every team document created in Google drive must be placed in a Google Shared drive and this Shared drive must be shared with appropriate persons of the team
  • the documentation placed in Zimfarm wiki is mixed with very technical content for devs/ops
  • Github wikis do not allow at all a review process, every change is immediately implemented in production
  • Github wikis do not allow to have "folders" of documentation to structure stuff

From my perspective we should have:

  • a central, public, reviewable, dedicated location for most documentation
  • a very small Google shared drive (or something else) only for documentation which cannot be made public (like the API keys file I just created since these are secrets)

WDYT? Did I missed some locations? What would you suggest?

Upgrade python-scraperlib to 3.x, including CLI support for description / long_description flags

Python scrapers must be updated to use python-scraperlib 3.x

At the same time, they must:

  • add (if not already present) CLI parameters to set description + long_description
  • use the shared logic of openzim/python-scraperlib#110 to handle these fields

For every scraper, do not forget to also update Zimfarm configuration to add these new CLI parameters + set type (input or textarea) + maximum length.

This is an overview ticket, works has to be done in the individual scrapers:

Code norms

It seems to be defficult to get a natural/grassroot agreement about coding style. In particular the case regarding variables, types, classes, constants, etc. I would be happy if we get a minimal agreement on this.

wikipedia_tr_all_novid_2019-04.zim is not browsable

Hi,
First of all thank you for developing and providing this tool.
I use Zimserver python module to serve zim archive in localhost.
I noticed that wikipedia_tr_all_novid_2019-04.zim wouldn't render the pages, I just see blurry empty pages. I also tried other tools such as web-archives and kiwix but none of them worked.

How can I create zim archive of https://tr.wikipedia.org by myself?
I also want to inform maintainers of https://download.kiwix.org/zim about the issue.

There are no portable (split) distributions of stackexchange ZIMs

Not sure if this is the right place to note this issue. It may be "by design", which is fine, but at least one of the ZIMs for Stackexchange -- stackoverflow.com_eng_all_2017-05.zim -- is 52GB, so a split version could be useful if time/resources permit. Or if not, perhaps a README should be put in the /portable/stackexchange directory pointing people to the FAQ where it explains how to split files manually.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.