databio / unitar Goto Github PK
View Code? Open in Web Editor NEWTargets that span projects in R
Home Page: http://unitar.databio.org
License: Other
Targets that span projects in R
Home Page: http://unitar.databio.org
License: Other
In #3, I we talked about 3 use cases for unitar, and created a target factory to track external targets into a secondary project.
But a common use case (for me at least) is to want to track the upstream target as an input without duplicating it. But as @wlandau said,
I think it might be difficult to make a sufficiently flexible target factory for (3).
I've been thinking about it a bit more. I think we could create a target factory that would take as input:
I think it may be possible to use something like:
track_external_target = function(tname, ext_tname, func) {
fullpath = unitar_path(ext_tname)
name_file = paste0(sample_name, "_file")
command_data = substitute(func(fullpath), env=list(fullpath=as.symbol(name_file), func=as.symbol(func)))
list(
tar_target_raw(name_file, fullpath, format = "file"),
tar_target_raw(tname, command_data)
)
}
I just wanted to capture this as a separate issue
As I propose in ropensci/targets#297 (comment), it may be possible to take output from one pipeline (A) as input into another pipeline (B). For local files, this would rely on file tracking. ("url" and "aws_*" storage formats would need different workarounds.) Sketch of the tar_target() calls:
# _targets.R
# ...
list(
tar_target(file_a, unitar_path(...), format = "file").
tar_target(data_a, unitar_read_from_path(file_a))
)
unitar_read_from_path() would need to be defined separately. Given a path like .../project_a/_targets/objects/target_a
, it could call withr::with_dir("../project_a", tar_read_raw("target_a"))
or something like that.
It should be possible to simplify the above two target calls down to a single target factory:
# _targets.R
# ...
list(
tar_unitar_read(new_name, "other_pipeline_dir", "target_name_in_other_pipeline")
)
unitar_load()
calls readRDS()
to load data:
Line 71 in 2370582
Maybe use tar_read_raw()
instead? That way, if a target has a local storage format other than "rds"
, unitar_load()
will still be able to read it.
In targets
, I use "read" for functions that return values and "load" for functions that assign to an environment and returns NULL
. In other words, in targets
, "read" has a return value and no side effects, and "load" is the reverse. Interested in aligning on this, or do you prefer to stick with the name unitar_load()
?
One of my use cases for unitar
is that I have a bunch of general-purpose files that I re-use across lots of R projects. I also use these files outside of R.
For my R projects, I process the files in various ways and then re-use these derived files across many projects. I want to use targets
to track the processing and caching. Then, I want to use unitar
to think of this as a central repository with targets I re-use across many projects.
To make this simpler for me, I created a new target factory that builds targets from a list in a CSV file. This CSV contains 1 row per target. For example, each row can correspond to one of my resource files, and it tracks that file and specifies a function for loading it into R. I like this because the CSV file helps me keep track of each of my resource files, and it feels convenient to me that this is the way I specify targets. But in addition, a row in the CSV can also correspond to an R function call that would process data in some other way.
I created a demo repository to show how this works here: unitar resources demo. For now I put this target factory in the unitar
package, but I'm now realizing it might be a more general concept. While my original intent was to use it for a "resources repository" that works with unitar
's cross-project concept, in fact, it's really just a way to specify targets using a CSV file instead of traditional R functions.
@wlandau, I'm curious to hear your thoughts on this kind of a CSV-to-target factory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.