sjtug / mirror-clone Goto Github PK
View Code? Open in Web Editor NEWAll-in-one mirror utility for SJTUG mirror
License: Apache License 2.0
All-in-one mirror utility for SJTUG mirror
License: Apache License 2.0
Currently, we do not set MIME in S3. This would lead to HTML become downloadable files.
Now every task has its own configuration, and the mirror-clone program has its global configuration. I suggest the following config scheme.
The final config used by mirror-clone is composed of three parts. The configuration provided in toml
format, default config, and the command line configs.
For example, we have config.toml
[global]
io-limit = 16 # only a total of 16 concurrent downloads are allowed
cpu-limit = 16 # only 16 concurrent CPU-bound tasks are allowed
io-thread-pool = 4
cpu-thread-pool = 4
[global.log]
log-format = "json"
log-level = "warning"
[opam]
use-cache = false
[conda]
[[conda.repo]]
name = "anaconda/pkgs/main/win-64"
url = "balahbalah"
And now, we call mirror-clone
with the following arguments.
The basic usage of mirror-clone
is mirror-clone <task> <base_dir> <config>
mirror-clone --config config.toml conda /data/conda --all-repos # clone all repos specified in config
mirror-clone --config config.toml conda /data/conda --repos=anaconda/pkgs/main/win-64,anaconda/pkgs/main/linux-64 # use pre-defined repo in config
mirror-clone --config config.toml conda /data/conda/pkgs/main/win-64 --url=mirrors.sjtug.sjtu.edu.cn/anaconda/pkgs/main/win-64
Command-line arguments take precedence. For example, we could override use-cache
in opam.
mirror-clone --config config.toml opam --use-cache=true # clone all repos specified in config
If we do not specify cpu-thread-pool
in both config.toml
and command-line arguments, mirror-clone will use its default value specified in program.
Mirror-Clone (+https://mirrors.sjtug.sjtu.edu.cn)
, and users can specify their site
The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.
Developers will need to implement two interface, SourceFS
and TargetFS
, in order to clone a registry.
SourceFS
generally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:
snapshot
provides a file list of current software registry.
snapshot
involves download repo
and index.tar.gz
, and parse the information.repodata.json
and generate file list.entry
provides the way to download a file from source filesystem.
index.tar.gz
.TargetFS
generally refers to a local filesystem. It could also be an object storage, or a key-value database.
TargetFS
should be able to:
list
filesread
filewrite
filemetadata
of a filemirror-clone provides utilities for mirroring a repo.
tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when entry
is being called.
downloader helps download a file from a given URL.
Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.
Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.
Buffer layer stands between transferrer and target filesystem.
Transaction Buffer
provides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layer commits
a file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)
Fuse Buffer
ensures that a file is never downloaded twice by fusing
it. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.
Some Coq packages use http:
instead of src:
in OPAM file. This makes OPAM download logic very complex. We may wait for @PhotonQuantum 's OPAM parser before working on this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.