buildinspace / peru Goto Github PK
View Code? Open in Web Editor NEWa generic package manager, for including other people's code in your projects
License: MIT License
a generic package manager, for including other people's code in your projects
License: MIT License
Some things we should probably mention:
build
, export
, and files
.This would allow the user to import the same target twice without hacks. It would also let them control the merge order, which I don't know why you'd want to do that, but maybe.
Should just work after I add it to docopt.
I should be able to build (and force a build) of any target, not just local rules. Likewise, I should be able to export any tree, including the local imports. Once export can do that, our validate_third_party.sh
script can use it and be simpler/faster.
Our parallelism uses module object locking to avoid fetching the same module twice. But there's nothing preventing two different modules from using the same URL. Those two modules could get fetched in parallel, and then you have two instances of the git plugin (or whatever) trying to write to the same directory.
We definitely don't want to shove any locking responsibilities down to the plugins. What we should do is create more granular plugin cache directories (instead of one big global one) and use a lock in peru itself to prevent two fetches from touching one cache at the same time. I'm tempted to use the full hash of a module's fields to name this directory, but we don't want to invalidate a git clone when the user changes rev
for example. We could use the name+type of a module (because a module should definitely get a clean plugin cache if it changes type), but that could still get confused if one module swaps names with another. Maybe the solution is to name/lock the cache dir with a hash of all plugin fields, but also allow plugin.yaml
to restrict the list of fields that get hashed. So the git plugin for example could say, "Only use my url
field for the purposes of plugin caching." Is that too complicated? It might even make sense to make this configuration semi-mandatory, so that plugins that don't specify their cacheable fields get /dev/null
as their PERU_PLUGIN_CACHE
. Random upside to all this: we can get rid of the urlencoding that the plugins are doing now.
Related: You could have two modules with exactly the same fields. Ideally the second one should be a cache hit. But if they're fetched in parallel, they might both be cache misses, and then they would duplicate work. The solution to this would be to take module locks by cache key, rather than just by module object instance. (This should've been obvious from the beginning, since the read-write that we're protecting is done on that key.) Unlike the plugin issue above, this distinction is just a duplicated-work issue in a weird corner case, rather than a serious correctness issue. But since we already have to do module-level locking (to cover the case where both A and B depend on C), we might as well do it right.
All together, here's what that locking is going to look like. All of this lives in RemoteModule.get_tree
, though RemoteModule.reup
will probably want to do it too, so hopefully we can share it cleanly.
plugin.yaml
. Think of this as the "only one job at a time using a given plugin cache directory" lock. If the plugin hasn't configured these fields, there's no lock here, and we don't provide a cache dir at all.Right now they only output one line.
These need plugins. We should probably refactor some shared logic out of the plugin main functions when we do this.
We tend to use spaces in our field names, because it's just nicer to read (required fields
vs required_fields
). We should probably allow plugins to do the same with the names they define. For example, suppose the curl plugin wanted a field called "fallback url". We should probably let them call it fallback url
with the space. But we'd want to pass it along as $PERU_MODULE_FALLBACK_URL
rather than allowing whitespace into an env var name.
When doing a reup, it would be nice to see the before/after diff. One way to do this would be to support some kind of peru diff FILE [FILE2]
. Note that FILE
could be something like
<(git show HEAD^:peru.yaml
Peru could prepare the imports tree and then do some kind of git diff
between that and the current tree.
One way to hack around this right now is to git add -A --force
and commit all your imported files in a temp branch, do the reup, make another commit, and then compare those two.
If so, presumably we'd want a flag to suppress it.
Right now plugins have to do some nontrivial parsing to separate out plugin fields from command arguments. This gets duplicated in every plugin, even though some of it is shared. A fairly trivial plugin like cp
, which should be one line, ends up being four or five (let along the Bash rsync
plugin), and also the sets of mandatory and optional fields get duplicated between fetch and reup scripts.
One of the reasons we didn't use more environment variables earlier is that it's difficult for the plugin to recognize invalid fields if it doesn't its fields in a list. But it shouldn't be the plugin's responsibility to recognize invalid fields -- that's more duplicated logic that should live in peru core. We should create a plugin.yaml
convention that lets the plugin declare what fields it supports. (And possibly other stuff in the future, who knows.)
Once that's done, there's no reason not to pass the url
field as e.g. PERU_FIELD_URL
or something. Then the plugin never needs to parse anything.
We should probably let the build
field optionally take a mapping of system names to build commands. Complicated build commands can already do their own uname
testing on posix systems, so this will almost exclusively be intended to support Windows. We should probably use sys.platform
and the .startswith()
idiom (https://docs.python.org/3.4/library/sys.html#sys.platform), but it might be nice to also check against os.name
, so that users could specify posix
without needing to duplicate things for each different posix-like os. Should we match against an ordered list?
It's been fairly common for me to use the build
field to do something like
mkdir out && cp myfile out/
when I want to export only part of a directory. That feels pretty hacky, and it will be very inconvenient in builds that need to support Windows or even just cp -r
(Mac requires the -R
flag instead).
It would be better to have some explicit filter field. git add
supports * and ** globs natively, so it shouldn't be too much trouble to expose this through Cache.import_tree
.
My guess is that it would make sense to apply the filter step after export, which means that filter paths would be relative to the export dir rather than relative to the module root. That would save the user from duplicating the export path in the filter spec. The order of application of rule fields would then be:
We should add a requirements.txt
file for Python dependencies, and also a peru.yaml
file that does the same thing, so that we can be self-hosting without having to bootstrap ourselves.
How should we keep these two things in sync. Can we generate one from the other?
.peru/log
seems like a reasonable place. It would be nice to record entries like
module foo cached: 5d5fb9a5c41a0bca34af6fcb1e554b79af6534ea
so that when I want to clear the cache for just one module, I can find its cache key in the log. And of course, we should be logging errors.
We want peru sync
to be careful about overwriting the user's files. Peru doesn't pave over preexisting files, or any changes that have been made since a file was created. This is to avoid accidentally deleting users' work, and also to try to catch some of the cases where users have done the Wrong Thing with peru (like checking in synced files (#dowhatisaynotwhatido)). But currently we also freak out if the user has deleted files that peru synced, and I think that might be overzealous. Consider this scenario:
git clean -dfx --exclude .peru
. Maybe I have an alias for that.peru sync
. Peru absolutely refuses until I use -f
.I think it would be better if peru stopped complaining here. There's no risk of losing work, and it's not really catching any Wrong Things. If a user sees this error all the time, they're not going to be paying attention when it eventually catches a real mistake. (Especially if we get them in the habit of using -f
.)
tldr: peru sync
should consider deleted files "clean".
We want to be able to make changes to the format without needing everyone to git clean
their projects. Possibly also version the plugin caches?
It's often convenient to store metadata inside the .git
directory. See http://tbaggery.com/2011/08/08/effortless-ctags-with-git.html. We should make it easy to move .peru
to .git/peru
, without breaking the ability to run peru commands outside the project root. Probably some modification to how we handle $PERU_DIR
or some new variable.
Maybe support a plugins:
field in peru.yaml
, given as a path or a list of paths.
There's no reason we shouldn't keep backups of peru.yaml
under the .peru
dir when we modify that file. It wouldn't take very much disk space, and it could be helpful for users who aren't under version control. (As, for example, our future workspace feature might not be.)
Something in .peru. Maybe a pseudo symlink like git uses.
The PERU_CACHE env var is a little too broad. You might want to have some projects share the cache but not others.
It would be nice to be able to manage a big ecosystem of projects with peru. We'd probably build on the existing overrides
feature to do it. One idea we had was generating a peru.yaml
file (not version controlled) that refers to your project repository as an overridden remote module. We might recursive peru to make this work.
ProactorEventLoop
on Windows. The default loop doesn't support subprocesses.create_subprocess_shell
instead of create_subprocess_exec
to execute e.g. .py
plugins.As I'm writing the README, I find myself telling new users to set the $PERU_CACHE variable to avoid recloning things after they clean. When new users need to configure some random setting, that's usually a sign that the default is bad. Should we be storing the cache in a centralized spot by default?
Pros:
Cons:
--skipcache
flag or something in the future to force plugin fetches, and doing that would update the cache for all callers.This is a consistent question I get when I demo peru for people. Why not put the import path for a module in the module's declaration? There are two decent reasons and one bad reason:
Allowing import paths as part of a module declaration definitely simplifies the hello world example. I could go either way on examples that are more complicated than that. I really want to avoid having two different ways to do the same thing, like allowing both an imports list and inline import paths. I think the biggest question for me right now is whether point (1) is really true, or whether I just think it's true because I'm used to it...
https://github.com/olson-sean-k/dot-config
https://gist.github.com/oconnor663/fffb0f2fcd7ca472b589
Doing a peru sync in that repo causes YouCompleteMe to get fetched ten times. It's probably something to do with submodules. Weirdly, in the end everything seems to work fine.
The main blocker for this one is Cache.merge_trees()
. We use the --prefix
flag for git-read-tree
, and libgit2 doesn't seem to support a similar feature. Tracking issue: libgit2/libgit2#154 We could use the treebuilder feature to build a prefixed tree, and then use git_merge_trees()
on that, but that function isn't exposed through pygit2 anyway.
Other features we need in pygit2 that we've already implemented:
setting the working dir: https://github.com/oconnor663/pygit2/commit/a063867fe0e4506e29f22c45dd403d805e3fb1b7
setting a detatched HEAD: https://github.com/oconnor663/pygit2/commit/b190169f5e83cbdb2346acd52cea30e14a205eb5
EDIT: These were pushed as part of pygit2 v0.21.0 libgit2/pygit2#377
Remote modules should be able to include their own peru.yaml files. This should allow default rules, as well as referencing rules and modules defined in the remote.
The svn
plugin does not have adequate test coverage. The initial version of the plugin didn't even check out the correct revision, as seen in this change: https://phabricator.buildinspace.com/D37
The tests will likely need some refactoring for this.
I've opened up http://
on port 80, but our .arcconfig
and commit logs are still pointing to https://
, so we should really fix this.
The validate_third_party.sh
script has to copy peru.yaml
around and then clean it up. That's annoying. We also have hacks in tests to handle peru.yaml
when we're comparing contents of directories. Make all this cleaner.
https://github.com/halst/schema Same guy who wrote Docopt.
Others:
https://github.com/alecthomas/voluptuous
https://github.com/Julian/jsonschema
https://github.com/podio/valideer
Export loses it's safety guarantees hen the stomped files are gitignored. Import (probably) fails to import gitignored files.
Otherwise when we install peru, we won't be able to run it out of the repo anymore.
sync
only ever syncs one thing (everything). That's a good thing. It means you don't have a lot of state that you need to worry about. You're either synced, or you're not.
build
has a similar restriction, but it seems to make a lot less sense. Almost all builds need to support multiple different invocations, like make
and make install
. To be useful in anything but the most trivial cases, build
would need to start taking parameters that it passes along to build commands, and the target syntax would need to support this too.
Rather than trying to patch up a bad model, I think we should scrap the build
command. We should encourage the pattern where other build tools call peru sync
.
One question this raises: Projects can have a toplevel build:
field. Previously peru sync
ignored this, and only peru build
triggered it. With this change, the only way to invoke a toplevel build field will be to have another module depend on you as a recursive project (not implemented yet). Is that a world that makes sense?
Actually, it's no different from export:
and files:
, neither of which is meaningful at the top level unless someone depends on you as a recursive project. Maybe it's good that build:
would be more like those.
But that raises another question: Does it really make sense for build
, export
, and files
to be first-class, toplevel fields? Maybe we should cordon them off in a section of their own?
https://coveralls.io/ Should work well with Travis. Here's a Python API for coveralls that also showcases the coverage badge: https://github.com/z4r/python-coveralls
It can fall back to an --unshallow fetch if the needed rev is still missing after a standard fetch. (Or clone?)
Maybe that's more complicated than it's worth. (Especially when it comes to overridden modules, where we have to stick a .peru dir in them.) I'm not sure I can think of a good use case. We want to encourage nontrivial build commands to come out of the peru.yaml file anyway, right? Maybe only the toplevel project (and hypothetical recursive projects) should have imports.
We'll use asyncio
for this, from the 3.3-compatible "tulip" library. Some things to remember:
get_tree
methods. We don't want to allow multiple fetches to happen at once for the same module.resolver.py
so that not everything needs to become a coroutine.The test harness is kind of a mess. In particular, the plugin tests have some inflexible scaffolding that doesn't work well for anything but distributed VCS plugins like git
and hg
. Until this is done, it may be difficult and hacky to test plugins like svn
.
This will help us force the plugin interface to stay simple, and to give an example of how plugins in other languages should be written.
We can compute the cache key for a rule without building it. So we should really be able to do that without building its dependencies either. The current approach has the benefit of noticing when we've run a rule on the same inputs before though. Can we get both?
That conflicts with the fancy display. Maybe the displays could be extended to provide a different kind of output writer, which works like the print
method does now.
We don't do anything special to cancel existing jobs when something fails. What actually happens? Presumably we should be sending a kill signal to existing jobs. What if a job fails to die?
Right now we force plugins to separate their fetch and reup scripts, at least to some degree. This forces all of our plugins to use the *_shared
idiom, which is pretty annoying. That layout used to make sense before we have plugins.yaml
, but now maybe it doesn't. It should be easy enough for that file to tell us what to invoke for fetch and reup, and there's no reason those couldn't be the same thing. (We could use another env var like PERU_PLUGIN_COMMAND
to make it possible-but-not-required to use one script for both.) @olson-sean-k what do you think?
Say you run peru sync
and then you change the value of PERU_CACHE
and run peru sync
again. The lastimports
file will contain a reference to a tree that's not in your new cache, and you'll get a git error. We should detect this case ("hey, it looks like your last imports tree is no longer in cache") and allow the --force
flag to just pave over everything.
Users should be able to set PERU_CACHE
to their home dir without causing peru to write a ton of temp files there. Honestly, there really shouldn't be a reason to set PERU_PLUGINS_CACHE
instead of PERU_CACHE
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.