Giter Site home page Giter Site logo

intel / dffml Goto Github PK

View Code? Open in Web Editor NEW
241.0 18.0 141.0 589.83 MB

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.

Home Page: https://intel.github.io/dffml/main/

License: MIT License

Dockerfile 0.39% Python 97.43% Shell 1.57% JavaScript 0.60% HTML 0.01%
machine-learning data-flow asyncio event-based flow-based-programming dag hyperautomation dataflows datasets dffml

dffml's Introduction

Logo-dark Logo-light Actions Status Gitpod Ready-to-Code codecov CII Gitter chat PyPI version

Mission Statement

As we all know the Machine Learning space has a lot of tools and libraries for creating pipelines to train, test & deploy models, and dealing with these many different APIs can be cumbersome.

Our project aims to make this process a breeze by introducing interoperability under a modular and easily extensible API. DFFML’s plugin-based architecture makes it a swiss army knife of ML research & MLOps.

We heavily rely on DataFlows, which are basically directed graphs. We are also working on a WebUI to make dataflows completely a drag’n drop experience. Currently, all of our functionalities are accessible through Python API, CLI, and HTTP APIs.

We broadly have two types of audience here, one is Citizen Data Scientists and ML researchers, who’d probably use the WebUI to experiment and design models. MLOps people will deploy models and set up data processing pipelines via the HTTP/CLI/Python APIs.

Documentation

Documentation for the latest release is hosted at https://intel.github.io/dffml/

Documentation for the main branch is hosted at https://intel.github.io/dffml/main/index.html

Contributing

The contributing page will guide you through getting setup and contributing to DFFML.

Help

License

DFFML is distributed under the MIT License.

Legal

This software is subject to the U.S. Export Administration Regulations and other U.S. law, and may not be exported or re-exported to certain countries (Cuba, Iran, Crimea Region of Ukraine, North Korea, Sudan, and Syria) or to persons or entities prohibited from receiving U.S. exports (including Denied Parties, Specially Designated Nationals, and entities on the Bureau of Export Administration Entity List or involved with missile technology or nuclear, chemical or biological weapons).

dffml's People

Contributors

0dust avatar aadarshsingh191198 avatar aghinsa avatar aliceoa-intel avatar aryanxk02 avatar dependabot[bot] avatar ichisadashioko avatar justinwmoore avatar mhash1m avatar naeemkh avatar naman1233 avatar nitesh585 avatar onkar627 avatar patil2099 avatar pbhutori avatar pdxjohnny avatar pranav-msc avatar programmer290399 avatar pyther-hub avatar sakshamarora1 avatar sanjibansg avatar sgeetansh avatar sidhu1012 avatar sk-ip avatar spur19 avatar sudharsana-kjl avatar us avatar xomute avatar yash-varshney avatar yashlamba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dffml's Issues

df: Rust implementation

The Data Flow Facilitator portion of DFFML would be great to port to other languages. As it's just a helper to use a data flow architecture. I hear rust has some async like support these days.

APIs should be kept the same. If needed we'll change the python APIs to match so that it's all one API.

scripts: skel: Update setup for markdown README

These:

description='',
long_description=readme,
author='John Andersen',

description='',
long_description=readme,
author='John Andersen',

Need to be updated to include the text/markdown content type

long_description=readme,
long_description_content_type='text/markdown',
author='John Andersen',

source: Instantiation pattern should be consitant with data flow

As seen in df/base.py, we want to make the Source class and its children follow the same design pattern.

class BaseSourceContext(abc.ABC):
    '''
    Abstract Base Class for context managing a Source
    '''

    async def __aenter__(self) -> 'BaseSourceContext':
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        pass

class BaseSource(Entrypoint):
    '''
    Abstract Base Class for a Source
    '''

    ENTRY_POINT = 'dffml.source'

    def __init__(self, config: BaseConfig) -> None:
        self.config = config
        self.logger = LOGGER.getChild(self.__class__.__qualname__)

    @classmethod 
    @abc.abstractmethod 
    def args(cls) -> Dict[str, Arg]: 
        pass 
  
    @classmethod 
    @abc.abstractmethod 
    def config(cls, cmd: CMD): 
        pass 

    @abc.abstractmethod
    def __call__(self) -> 'BaseSourceContext':
        return BaseISourceContext(self)

CMD will need to change to resemble DataFlowFacilitatorCMD as well.

dffml/dffml/util/cli/cmd.py

Lines 130 to 193 in bf3493e

class BaseDataFlowFacilitatorCMD(CMD):
'''
Set timeout for features
'''
arg_ops = Arg('-ops', required=True, nargs='+',
action=ParseOperationAction)
arg_input_network = Arg('-input-network',
action=ParseInputNetworkAction, default=MemoryInputNetwork)
arg_operation_network = Arg('-operation-network',
action=ParseOperationNetworkAction, default=MemoryOperationNetwork)
arg_lock_network = Arg('-lock-network',
action=ParseLockNetworkAction, default=MemoryLockNetwork)
arg_rchecker = Arg('-rchecker',
action=ParseRedundancyCheckerAction,
default=MemoryRedundancyChecker)
# TODO We should be able to specify multiple operation implementation
# networks. This would enable operations to live in different place,
# accessed via the orchestrator transparently.
arg_opimpn = Arg('-opimpn',
action=ParseOperationImplementationNetworkAction,
default=MemoryOperationImplementationNetwork)
arg_orchestrator = Arg('-orchestrator',
action=ParseOrchestratorAction, default=MemoryOrchestrator)
arg_output_specs = Arg('-output-specs', required=True, nargs='+',
action=ParseOutputSpecsAction)
arg_inputs = Arg('-inputs', nargs='+',
action=ParseInputsAction, default=[],
help='Other inputs to add under each ctx (repo\'s src_url will ' + \
'be used as the context)')
arg_repo_def = Arg('-repo-def', default=False, type=str,
help='Definition to be used for repo.src_url.' + \
'If set, repo.src_url will be added to the set of inputs ' + \
'under each context (which is also the repo\'s src_url)')
arg_remap = Arg('-remap', nargs='+', required=True,
action=ParseRemapAction,
help='For each repo, -remap output_operation_name.sub=feature_name')
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dff = DataFlowFacilitator()
self.linker = Linker()
self.exported = self.linker.export(*self.ops)
self.definitions, self.operations, _outputs = \
self.linker.resolve(self.exported)
# Load all entrypoints which may possibly be selected. Then have them add
# their arguments to the DataFlowFacilitator-tots command.
@classmethod
def add_bases(cls):
class LoadedDataFlowFacilitator(cls):
pass
for base in [BaseInputNetwork,
BaseOperationNetwork,
BaseLockNetwork,
BaseRedundancyChecker,
BaseOperationImplementationNetwork,
BaseOrchestrator]:
for loaded in base.load():
for arg_name, arg in loaded.args().items():
setattr(LoadedDataFlowFacilitator, arg_name, arg)
return LoadedDataFlowFacilitator
DataFlowFacilitatorCMD = BaseDataFlowFacilitatorCMD.add_bases()

Related:

async def __aenter__(self):
await self.open()
# TODO Context management
return self

plugin: model: darknet: Create YOLO Model

DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged gsoc and project are not generally available bugs, but related to project ideas for GSoC.

Project Idea: YOLO/darknet Model.

Project description:
DFFML's initial release included a Model for Tensorflows DNN estimator.

YOLOv1,2,3 are awesome, it would be possible to wrap the YOLO work in a DFFML model and then it could be used within the DFFML API.

This involves filling out the Model abstract base class, just as the Tensorflow DNN does (use this as an example, or probably more as an education because this is likely to be rather different as we're working with images).

Tensorflow DNN: https://github.com/intel/dffml/blob/master/model/tensorflow/dffml_model_tensorflow/model/dnn.py

Model ABC:

async def train(self, sources: Sources, features: Features,
classifications: List[Any], steps: int, num_epochs: int):
'''
Train using repos as the data to learn from.
'''
raise NotImplementedError()
@abc.abstractmethod
async def accuracy(self, sources: Sources, features: Features,
classifications: List[Any]) -> Accuracy:
'''
Evaluates the accuracy of our model after training using the input repos
as test data.
'''
raise NotImplementedError()
@abc.abstractmethod
async def predict(self, repos: AsyncIterator[Repo], features: Features,
classifications: List[Any]) -> \
AsyncIterator[Tuple[Repo, Any, float]]:
'''
Uses trained data to make a prediction about the quality of a repo.
'''

Skills: Python, git
Difficulty level: Hard

Related Readings/Links:

Demo video
YOLOv3

Potential mentors: @pdxjohnny

Getting Started: Start by copying the directory model/tensorflow to model/darknet and re-naming everything. Then move dnn.py to darknet.py (still re-naming everything) and make sure all the tests in the model/darknet directory still pass. Then you'll need to gut the DNN class and start replacing it withsubprocess.call or check_output or whatever which will call out the the darknet binary which you compiled from Joseph's darknet repo for training, accuracy, and prediction (which is object detection in this case).

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", perhaps use the Python C bindings instead of calling subprocess out to the darknet binary. Don't forget to include some time for building appropriate tests.

source: file: Compression

DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged gsoc and project are not generally available bugs, but related to project ideas for GSoC.

Project Idea: File Source Compression

Project description:

DFFML's initial release includes a FileSource which saves and loads data from files using the load_fd and dump_fd methods.

JSON Example

async def load_fd(self, fd):
repos = json.load(fd)
self.mem = {src_url: Repo(src_url, data=data) \
for src_url, data in repos.items()}
LOGGER.debug('%r loaded %d records', self, len(self.mem))
async def dump_fd(self, fd):
json.dump({repo.src_url: repo.dict() for repo in self.mem.values()}, fd)
LOGGER.debug('%r saved %d records', self, len(self.mem))

For the open method of FileSource

async def _open(self):
if not os.path.exists(self.filename) \
or os.path.isdir(self.filename):
LOGGER.debug('%r is not a file, initializing memory to empty dict',
self.filename)
self.mem = {}
return
with open(self.filename, 'r') as fd:
await self.load_fd(fd)

Allow for reading and writing the following file formats, transparently (so without subclasses having to do anything) to any source which is a subclass of FileSource.

Skills: Python, git
Difficulty level: Easy

Related Readings/Links:

See https://docs.python.org/3/library/archiving.html for documentation

Potential mentors: @pdxjohnny

Getting Started: Figure out how to do one of the file types, probably gzip (as that probably is as simple as using https://docs.python.org/3/library/gzip.html#gzip.GzipFile if the filename ends in .gz) then move on to the rest. For now just make modifications directly to the FileSource class. We may have you split out the logic later, but don't worry about another class for now.

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", maybe implement a remote file source which reads form URLs. Don't forget to include some time for building appropriate tests.

source: Labeled and Versioned datasets

Assignee: @sudharsana-kjl

DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged gsoc and project are not generally available bugs, but related to project ideas for GSoC.

Project Idea: Labeled and Versioned Datasets.

Project description:
DFFML's initial release includes sources which abstract the format in which the data is stored from the dataset generation and usage in models.

Add information allowing users to have different versions and datasets from the same source.

Skills: Python, git
Difficulty level: Intermediate

Related Readings/Links:

class Source(abc.ABC, Entrypoint):
'''
Abstract base class for all sources. New sources must be derived from this
class and implement the repos method.
'''
ENTRY_POINT = 'dffml.source'
def __init__(self, src: str) -> None:
self.src = src
@abc.abstractmethod
async def update(self, repo: Repo):
'''
Updates a repo for a source
'''
@abc.abstractmethod
async def repos(self) -> AsyncIterator[Repo]:
'''
Returns a list of repos retrieved from self.src
'''
# mypy ignores AsyncIterator[Repo], therefore this is needed
yield Repo('') # pragma: no cover
@abc.abstractmethod
async def repo(self, src_url: str):
'''
Get a repo from the source or add it if it doesn't exist
'''

dffml/dffml/repo.py

Lines 90 to 116 in dd8007d

class Repo(object):
'''
Manages feature independent information and actions for a repo.
'''
REPO_DATA = RepoData
def __init__(self, src_url: str, *,
data: Optional[Dict[str, Any]] = None,
extra: Optional[Dict[str, Any]] = None) -> None:
if data is None:
data = {}
if extra is None:
extra = {}
data['src_url'] = src_url
if 'extra' in data:
# Prefer extra from init arguments to extra stored in data
data['extra'].update(extra)
extra = data['extra']
del data['extra']
self.data = self.REPO_DATA(**data)
self.extra = extra
def dict(self):
data = self.data.dict()
data['extra'] = self.extra
return data

Potential mentors: @pdxjohnny

Getting Started: Source.__init__ probably needs another two arguments, label and version, which should probably have defaults (say, default and v0). Since the same backend (aka, a csv file or json file) would be used to store all the data, you'll have to change the existing sources we have to understand how to deal with this. For CSVSource that might mean adding another column to each repo, for JSONSource that might mean instead of one big array, the array of repos is stored like so:

{
    "default": {
        "v0": [
            "... all the repos ..."
        ]
    }
}

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", maybe you'll implement a source using sqlite too or something. Don't forget to include some time for building appropriate tests.

Refactor README

  • Add: Asyncio enabled Data Flow programing feeding into Machine Learning for processing

Abstract it so that developers can run a lots of functions concurrently. Base on the idea of an event loop, where data of various types entering the network are the the events which tirgger operations (functions) to be run.

Abstracts multi threading for IO to the function level instead of the thread level

Give example of ThreadPoolExecutor usage.

df: base: BaseOperationImplementationNetworks

We should be able to specify multiple operation implementation networks.
This would enable operations to live in different place, accessed via the
orchestrator transparently.

As seen in:

dffml/dffml/df/base.py

Lines 682 to 689 in 062946b

def __call__(self,
ictx: BaseInputNetworkContext,
octx: BaseOperationNetworkContext,
lctx: BaseLockNetworkContext,
nctx: BaseOperationImplementationNetworkContext,
rctx: BaseRedundancyCheckerContext) -> 'BaseOrchestratorContext':
return self.CONTEXT(ictx=ictx, octx=octx, lctx=lctx, nctx=nctx,
rctx=rctx)

There is currently a BaseOperationImplementationNetworkContext

The Orchestrator calls dispatch on it to run an operation. Similar to how the Features class wraps a set of Features we should have a BaseOperationImplementationNetworks and BaseOperationImplementationNetworksContext which would be a AsyncContextManagerList of BaseOperationImplementationNetworkContexts and have a dispatch method which check's each BaseOperationImplementationNetworkContext to see if it contains the implementation of the requested operation, if so then dispatch it within that. If not then move on to the next BaseOperationImplementationNetworkContext and so forth until we have found one which contains the OperationImplemenation, and therefore can dispatch it.

df: memory: Locking must be dynamic

Current implementation of MemoryLockNetworkContext is such that it locks all parent inputs and the input itself.

What it should do is look at the parents and let descendants of a parent all operate on descendant Input items at the same time, but not let anyone operate on that parent item until all the operations working on descendant Inputs have completed.

Effectively, all operations working on the descendant Intputs "share" the lock on that parent they descend from.

dffml/dffml/df/memory.py

Lines 466 to 492 in 2c67198

async def acquire(self, parameter_set: BaseParameterSet):
'''
Acquire the lock for each input in the input set which must be locked
prior to running an operation using the input.
'''
need_lock = {}
# Acquire the master lock to find and or create needed locks
async with self.parent.lock:
# Get all the inputs up the ancestry tree
inputs = [item async for item in parameter_set.inputs()]
# Only lock the ones which require it
for item in filter(lambda item: item.definition.lock, inputs):
# Create the lock for the input if not present
if not item.uid in self.parent.locks:
self.parent.locks[item.uid] = asyncio.Lock()
# Retrieve the lock
need_lock[item.uid] = (item, self.parent.locks[item.uid])
# Use AsyncExitStack to lock the variable amount of inputs required
async with AsyncExitStack() as stack:
# Take all the locks we found we needed for this parameter set
for _uid, (item, lock) in need_lock.items():
# Take the lock
self.logger.debug('Acquiring: %s(%r)', item.uid, item.value)
await stack.enter_async_context(lock)
self.logger.debug('Acquired: %s(%r)', item.uid, item.value)
# All locks for these parameters have been acquired
yield

model: Instantiation pattern should be consitant with data flow

As seen in df/base.py, we want to make the Model class and its children follow the same design pattern.

class BaseModelContext(abc.ABC):
    '''
    Abstract Base Class for context managing a Model
    '''

    async def __aenter__(self) -> 'BaseModelContext':
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        pass

class BaseModel(Entrypoint):
    '''
    Abstract Base Class for a Model
    '''

    ENTRY_POINT = 'dffml.model'

    def __init__(self, config: BaseConfig) -> None:
        self.config = config
        self.logger = LOGGER.getChild(self.__class__.__qualname__)

    @classmethod 
    @abc.abstractmethod 
    def args(cls) -> Dict[str, Arg]: 
        pass 
  
    @classmethod 
    @abc.abstractmethod 
    def config(cls, cmd: CMD): 
        pass 

    @abc.abstractmethod
    def __call__(self) -> 'BaseModelContext':
        return BaseIModelContext(self)

CMD will need to change to resemble DataFlowFacilitatorCMD as well.

dffml/dffml/util/cli/cmd.py

Lines 130 to 193 in bf3493e

class BaseDataFlowFacilitatorCMD(CMD):
'''
Set timeout for features
'''
arg_ops = Arg('-ops', required=True, nargs='+',
action=ParseOperationAction)
arg_input_network = Arg('-input-network',
action=ParseInputNetworkAction, default=MemoryInputNetwork)
arg_operation_network = Arg('-operation-network',
action=ParseOperationNetworkAction, default=MemoryOperationNetwork)
arg_lock_network = Arg('-lock-network',
action=ParseLockNetworkAction, default=MemoryLockNetwork)
arg_rchecker = Arg('-rchecker',
action=ParseRedundancyCheckerAction,
default=MemoryRedundancyChecker)
# TODO We should be able to specify multiple operation implementation
# networks. This would enable operations to live in different place,
# accessed via the orchestrator transparently.
arg_opimpn = Arg('-opimpn',
action=ParseOperationImplementationNetworkAction,
default=MemoryOperationImplementationNetwork)
arg_orchestrator = Arg('-orchestrator',
action=ParseOrchestratorAction, default=MemoryOrchestrator)
arg_output_specs = Arg('-output-specs', required=True, nargs='+',
action=ParseOutputSpecsAction)
arg_inputs = Arg('-inputs', nargs='+',
action=ParseInputsAction, default=[],
help='Other inputs to add under each ctx (repo\'s src_url will ' + \
'be used as the context)')
arg_repo_def = Arg('-repo-def', default=False, type=str,
help='Definition to be used for repo.src_url.' + \
'If set, repo.src_url will be added to the set of inputs ' + \
'under each context (which is also the repo\'s src_url)')
arg_remap = Arg('-remap', nargs='+', required=True,
action=ParseRemapAction,
help='For each repo, -remap output_operation_name.sub=feature_name')
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dff = DataFlowFacilitator()
self.linker = Linker()
self.exported = self.linker.export(*self.ops)
self.definitions, self.operations, _outputs = \
self.linker.resolve(self.exported)
# Load all entrypoints which may possibly be selected. Then have them add
# their arguments to the DataFlowFacilitator-tots command.
@classmethod
def add_bases(cls):
class LoadedDataFlowFacilitator(cls):
pass
for base in [BaseInputNetwork,
BaseOperationNetwork,
BaseLockNetwork,
BaseRedundancyChecker,
BaseOperationImplementationNetwork,
BaseOrchestrator]:
for loaded in base.load():
for arg_name, arg in loaded.args().items():
setattr(LoadedDataFlowFacilitator, arg_name, arg)
return LoadedDataFlowFacilitator
DataFlowFacilitatorCMD = BaseDataFlowFacilitatorCMD.add_bases()

Misc Q & A and discussion

Please use this issue for any Google Summer of Code-related questions and discussion that does not fit elsewhere.

plugin: model: Add some new models!

Add a model!

This issue is for discussion and help needed comments while adding new Modelss to DFFML.

First, get familiar with how models can be used via the DFFML command line: https://intel.github.io/dffml/master/plugins/dffml_model.html

Make sure you follow: https://intel.github.io/dffml/master/contributing/dev_env.html

Look at what libraries are already being wrapped or models have already been implemented. If you want to use a library that has not yet been integrated, reference the new model tutorial: https://intel.github.io/dffml/master/tutorials/models/

If want to create a new model using any libraries we already have wrappers for,just start working on those packages that already exist under model/. Create a new file under dffml_model_library_name. Each library wrapper does things differently, you should check out how that wrapper is interacting with the underlying library by looking at how the existing models are implemented.

dffml: feature: Move to data flow architecture

Currently DFFML does not use a data flow architecture for evaluation. That's because locking was hard (shocking I know), and I had to just release. I put some effort into this around December / January. I'll try to finish it out in the next few weeks. It will make writing features SOOOOO much cleaner.

demo: Create interactive demo website

The web ui will be this.


What follows was old server side approach

We can use runme.io

When a user clicks a https://runme.io/run?app_id=a1774bbe-b840-4853-b4f1-e6d4440c5e2e style link they'll be redirected to a https://runme.io/show?build_id=3ee099f0-7680-464e-89bf-9e2357e9b5af style link.

We can take that UUID at the end and get the URL the HTTP server is running at: https://3ee099f0-7680-464e-89bf-9e2357e9b5af.runme.io/

So we'll want to add a runme button to the initial popup of the webUI saying that they can try in demo mode or they can try with a real backend if they click the link. If they click it open another model that gives them a couple steps:

  1. Click this link (replace with real one eventually): https://runme.io/run?app_id=a1774bbe-b840-4853-b4f1-e6d4440c5e2e
  2. Wait until the app launches
  3. Copy the URL in their address bar back into the new modal of the web UI and parse it to get the value of app_id
  4. Set the backend URL to https://$app_id.runme.io/ and close all the modals so they can now play around

df: types: Add properties to Definition

We need to have locality (thread, disk) and then attach things like disk usage to inputs and make the orchestrator aware of this when dispatching to an OpImpN

source: csv: Add update funcationality

Remove the _close method from CSVSource, as FileSource would normally use this to open the file and call dump_fd on it.

async def _close(self):
LOGGER.debug('%r save to file not implemented', self)

Implement dump_fd, this will probably include adding new headers to the file.

Something like prediction for the prediction.classification and confidence for prediction.confidence might be appropriate.

async def dump_fd(self, fd):
pass
# LOGGER.debug('%r saved %d records', self, len(self.mem))

df: types: Audit Input.get_parents/inputs/Parameter.inputs

get_parents was meant as a way to get the ancestry of a piece of data in the InputNetwork. This functionality may need to be removed.

dffml/dffml/df/types.py

Lines 115 to 117 in 2c67198

def get_parents(self) -> Iterator['Input']:
return itertools.chain(*[[item] + list(item.get_parents()) \
for item in self.parents])

Places needing modification

dffml/dffml/df/base.py

Lines 284 to 296 in 2c67198

class BaseInputSet(abc.ABC):
def __init__(self, config: BaseInputSetConfig) -> None:
self.config = config
self.ctx = config.ctx
self.logger = LOGGER.getChild(self.__class__.__qualname__)
@abc.abstractmethod
async def inputs(self) -> AsyncIterator[Input]:
pass
async def _asdict(self) -> Dict[str, Any]:
'''

dffml/dffml/df/base.py

Lines 294 to 304 in 2c67198

async def _asdict(self) -> Dict[str, Any]:
'''
Returns an input definition name to input value dict
'''
return {item.definition.name: item.value \
async for item in self.inputs()}
class BaseParameterSetConfig(NamedTuple):
ctx: BaseInputSetContext

dffml/dffml/df/base.py

Lines 313 to 321 in 2c67198

async def parameters(self) -> AsyncIterator[Parameter]:
pass
@abc.abstractmethod
async def inputs(self) -> AsyncIterator[Input]:
pass
async def _asdict(self) -> Dict[str, Any]:
'''

dffml/dffml/df/memory.py

Lines 79 to 87 in 2c67198

def __init__(self, config: MemoryInputSetConfig) -> None:
super().__init__(config)
self.__inputs = config.inputs
async def inputs(self) -> AsyncIterator[Input]:
for item in self.__inputs:
yield item
class MemoryParameterSetConfig(NamedTuple):

dffml/dffml/df/memory.py

Lines 97 to 105 in 2c67198

async def parameters(self) -> AsyncIterator[Parameter]:
for parameter in self.__parameters:
yield parameter
async def inputs(self) -> AsyncIterator[Input]:
for item in itertools.chain(*[[parameter.origin] + \
list(parameter.origin.get_parents()) \
for parameter in \
self.__parameters]):

dffml/dffml/df/memory.py

Lines 168 to 176 in 2c67198

definitions: Dict[Definition, List[Input]]
class MemoryDefinitionSetContext(BaseDefinitionSetContext):
async def inputs(self, definition: Definition) -> AsyncIterator[Input]:
# Grab the input set context handle
handle = await self.ctx.handle()
handle_string = handle.as_string()
# Associate inputs with their context handle grouped by definition

dffml/dffml/df/memory.py

Lines 194 to 202 in 2c67198

await ctx.add(input_set.ctx, [])
# Add the input set to the incoming inputs
async with self.parent.input_notification_set[handle_string]() as ctx:
await ctx.add(input_set,
[item async for item in input_set.inputs()])
# Associate inputs with their context handle grouped by definition
async with self.parent.ctxhd_lock:
# Create dict for handle_string if not present
if not handle_string in self.parent.ctxhd:

dffml/dffml/df/memory.py

Lines 205 to 213 in 2c67198

ctx=input_set.ctx,
definitions={}
)
# Go through each item in the input set
async for item in input_set.inputs():
# Create set for item definition if not present
if not item.definition in \
self.parent.ctxhd[handle_string].definitions:
self.parent.ctxhd[handle_string].definitions\

dffml/dffml/df/memory.py

Lines 345 to 353 in 2c67198

stage: Stage = Stage.PROCESSING) -> AsyncIterator[Operation]:
# Set list of needed input definitions if given
if not input_set is None:
input_definitions = set([item.definition \
async for item in input_set.inputs()])
# Yield all operations with an input in the input set
for operation in self.parent.memory:
# Only run operations of the requested stage
if operation.stage != stage:

dffml/dffml/df/memory.py

Lines 392 to 400 in 2c67198

SHA384 hash of the parameter set context handle as a string, the
operation name, and the sorted list of input uuids.
'''
uid_list = sorted(map(lambda x: x.uid, [item async for item in \
parameter_set.inputs()]))
uid_list.insert(0, (await parameter_set.ctx.handle()).as_string())
uid_list.insert(0, operation.name)
return hashlib.sha384(', '.join(uid_list).encode('utf-8')).hexdigest()

dffml/dffml/df/memory.py

Lines 471 to 479 in 2c67198

need_lock = {}
# Acquire the master lock to find and or create needed locks
async with self.parent.lock:
# Get all the inputs up the ancestry tree
inputs = [item async for item in parameter_set.inputs()]
# Only lock the ones which require it
for item in filter(lambda item: item.definition.lock, inputs):
# Create the lock for the input if not present
if not item.uid in self.parent.locks:

dffml/dffml/df/memory.py

Lines 592 to 600 in 2c67198

for value in output:
inputs.append(Input(value=value,
definition=operation.outputs[key],
parents=[item async for item in \
parameter_set.inputs()]))
except KeyError as error:
raise KeyError(
'Value %s missing from output:definition mapping %s(%s)' \
% (str(error), operation.name,

dffml/dffml/df/memory.py

Lines 770 to 778 in 2c67198

more, new_input_set = input_set_enters_network.result()
self.logger.debug('[%s]: input_set_enters_network: %s',
ctx_str,
[(i.definition.name, i.value) \
async for i in new_input_set.inputs()])
# Identify which operations have complete contextually
# appropriate input sets which haven't been run yet
async for operation, parameter_set in \
self.nctx.operations_parameter_set_pairs(self.ictx,

group_by = collections.OrderedDict(sorted(group_by.items()))
# Find all inputs within the input network for the by definition
async for item in od.inputs(output.by):
# Get all the parents of the input
parents = list(item.get_parents())
for group, related in group_by.values():
# Ensure that the definition we need to group by is in
# the parents
if not group in parents:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.