intel / dffml Goto Github PK

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.

Home Page: https://intel.github.io/dffml/main/

License: MIT License

Dockerfile 0.39% Python 97.43% Shell 1.57% JavaScript 0.60% HTML 0.01%

machine-learning data-flow asyncio event-based flow-based-programming dag hyperautomation dataflows datasets dffml

dffml's Introduction

Mission Statement

As we all know the Machine Learning space has a lot of tools and libraries for creating pipelines to train, test & deploy models, and dealing with these many different APIs can be cumbersome.

Our project aims to make this process a breeze by introducing interoperability under a modular and easily extensible API. DFFML’s plugin-based architecture makes it a swiss army knife of ML research & MLOps.

We heavily rely on DataFlows, which are basically directed graphs. We are also working on a WebUI to make dataflows completely a drag’n drop experience. Currently, all of our functionalities are accessible through Python API, CLI, and HTTP APIs.

We broadly have two types of audience here, one is Citizen Data Scientists and ML researchers, who’d probably use the WebUI to experiment and design models. MLOps people will deploy models and set up data processing pipelines via the HTTP/CLI/Python APIs.

Documentation

Documentation for the latest release is hosted at https://intel.github.io/dffml/

Documentation for the main branch is hosted at https://intel.github.io/dffml/main/index.html

Contributing

The contributing page will guide you through getting setup and contributing to DFFML.

Help

Ask a question via an issue
Send an email to [email protected]
- You can subscribe to the users mailing list here https://lists.01.org/postorius/lists/dffml-users.lists.01.org/
Ask a question on the Gitter chat

License

DFFML is distributed under the MIT License.

Legal

This software is subject to the U.S. Export Administration Regulations and other U.S. law, and may not be exported or re-exported to certain countries (Cuba, Iran, Crimea Region of Ukraine, North Korea, Sudan, and Syria) or to persons or entities prohibited from receiving U.S. exports (including Denied Parties, Specially Designated Nationals, and entities on the Bureau of Export Administration Entity List or involved with missile technology or nuclear, chemical or biological weapons).

dffml's People

Contributors

Stargazers

Watchers

Forkers

sudharsana-kjl pdxjohnny freddieee terriko pradeepbhadani neerajbhadani sakshamarora1 arv1ndh patil2099 kiran2115 codingcapricorn yashlamba aghinsa raghav-ys imgovind vieirinhaabr beinganonymous sanketsaurav abhishuraina maxrzv theoilie 0dust shabnam99 justinwmoore darkmatter00 shubhampandey28 senoyasuhiro clarisa-canez sourabhyadav999 govindshukla213 sk-ip a10mic robertdigital mhash1m iamandeepsandhu naman1233 montukv abdallah197 naeemkh sbs2001 purnimapatel byambaa0325 takshkamlesh niraj-kamdar emrul gitter-badger shivam-bit yash-varshney jankeromnes manikandanvengatesan aburgool killvxk ltears oliverob k-udupa2000 abhinashak kalimuthu-velappan aadarshsingh191198 sgeetansh harshkhd1234 ichisadashioko mrinath123 devansh252 nitesh585 aitikgupta m1-key abhaykatheria isunitha98selvan sidhu1012 siddhant150 sayantan108 pratikrocks sanjibansg deep455 aryanxk02 rajpratyush saanidhyavats nikhilbartwal narcos101 pranav-msc spur19 gauranshkumar up1512001 mrzilinxiao fermiq cnorman19 jeffreypaul15 nx6110a5100 ssgantayat javier77nance dassaniansh nimishkhattar mr-j5 slowy07 yopknopixx sheetalkubsad krishsethi19 vardaan-raj cuchulainx nikitaisme9696

dffml's Issues

df: Rust implementation

The Data Flow Facilitator portion of DFFML would be great to port to other languages. As it's just a helper to use a data flow architecture. I hear rust has some async like support these days.

APIs should be kept the same. If needed we'll change the python APIs to match so that it's all one API.

References

docs: Explain importance of community involvment

Explain that by adding features we can then use each others features to create new datasets containing combinations of data which have not been researched on before.

df: types: Remove conditions

scripts: skel: Update setup for markdown README

These:

dffml/scripts/skel/model/setup.py

Lines 25 to 27 in c0ec62c

    
           description='', 
        
           long_description=readme, 
        
           author='John Andersen',

dffml/scripts/skel/feature/setup.py

Lines 25 to 27 in c0ec62c

    
           description='', 
        
           long_description=readme, 
        
           author='John Andersen',

Need to be updated to include the text/markdown content type

dffml/model/tensorflow/setup.py

Lines 27 to 29 in c0ec62c

    
           long_description=readme, 
        
           long_description_content_type='text/markdown', 
        
           author='John Andersen',

community: Weekly Sync

Hangouts
Add info to CONTRIBUTING.md
Google docs for minutes: https://docs.google.com/document/d/16u9Tev3O0CcUDe2nfikHmrO3Xnd4ASJ45myFgQLpvzM
Add calendar link to docs: https://calendar.google.com/calendar/embed?src=0b6g0ki0qsq1fd1sr703ceeioc%40group.calendar.google.com&ctz=America%2FLos_Angeles

ci: run.sh is always exiting cleanly

tests: Create a Model test suite

All models should have a standard test suite.

source: Instantiation pattern should be consitant with data flow

As seen in df/base.py, we want to make the Source class and its children follow the same design pattern.

class BaseSourceContext(abc.ABC):
    '''
    Abstract Base Class for context managing a Source
    '''

    async def __aenter__(self) -> 'BaseSourceContext':
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        pass

class BaseSource(Entrypoint):
    '''
    Abstract Base Class for a Source
    '''

    ENTRY_POINT = 'dffml.source'

    def __init__(self, config: BaseConfig) -> None:
        self.config = config
        self.logger = LOGGER.getChild(self.__class__.__qualname__)

    @classmethod 
    @abc.abstractmethod 
    def args(cls) -> Dict[str, Arg]: 
        pass 
  
    @classmethod 
    @abc.abstractmethod 
    def config(cls, cmd: CMD): 
        pass 

    @abc.abstractmethod
    def __call__(self) -> 'BaseSourceContext':
        return BaseISourceContext(self)

CMD will need to change to resemble DataFlowFacilitatorCMD as well.

dffml/dffml/util/cli/cmd.py

Lines 130 to 193 in bf3493e

    
           class BaseDataFlowFacilitatorCMD(CMD): 
        
               ''' 
        
               Set timeout for features 
        
               ''' 
        
               arg_ops = Arg('-ops', required=True, nargs='+', 
        
                       action=ParseOperationAction) 
        
               arg_input_network = Arg('-input-network', 
        
                       action=ParseInputNetworkAction, default=MemoryInputNetwork) 
        
               arg_operation_network = Arg('-operation-network', 
        
                       action=ParseOperationNetworkAction, default=MemoryOperationNetwork) 
        
               arg_lock_network = Arg('-lock-network', 
        
                       action=ParseLockNetworkAction, default=MemoryLockNetwork) 
        
               arg_rchecker = Arg('-rchecker', 
        
                       action=ParseRedundancyCheckerAction, 
        
                       default=MemoryRedundancyChecker) 
        
               # TODO We should be able to specify multiple operation implementation 
        
               # networks. This would enable operations to live in different place, 
        
               # accessed via the orchestrator transparently. 
        
               arg_opimpn = Arg('-opimpn', 
        
                       action=ParseOperationImplementationNetworkAction, 
        
                       default=MemoryOperationImplementationNetwork) 
        
               arg_orchestrator = Arg('-orchestrator', 
        
                       action=ParseOrchestratorAction, default=MemoryOrchestrator) 
        
               arg_output_specs = Arg('-output-specs', required=True, nargs='+', 
        
                       action=ParseOutputSpecsAction) 
        
               arg_inputs = Arg('-inputs', nargs='+', 
        
                       action=ParseInputsAction, default=[], 
        
                       help='Other inputs to add under each ctx (repo\'s src_url will ' + \ 
        
                            'be used as the context)') 
        
               arg_repo_def = Arg('-repo-def', default=False, type=str, 
        
                       help='Definition to be used for repo.src_url.' + \ 
        
                            'If set, repo.src_url will be added to the set of inputs ' + \ 
        
                            'under each context (which is also the repo\'s src_url)') 
        
               arg_remap = Arg('-remap', nargs='+', required=True, 
        
                       action=ParseRemapAction, 
        
                       help='For each repo, -remap output_operation_name.sub=feature_name') 
        
               def __init__(self, *args, **kwargs): 
        
                   super().__init__(*args, **kwargs) 
        
                   self.dff = DataFlowFacilitator() 
        
                   self.linker = Linker() 
        
                   self.exported = self.linker.export(*self.ops) 
        
                   self.definitions, self.operations, _outputs = \ 
        
                           self.linker.resolve(self.exported) 
        
               # Load all entrypoints which may possibly be selected. Then have them add 
        
               # their arguments to the DataFlowFacilitator-tots command. 
        
               @classmethod 
        
               def add_bases(cls): 
        
                   class LoadedDataFlowFacilitator(cls): 
        
                       pass 
        
                   for base in [BaseInputNetwork, 
        
                                BaseOperationNetwork, 
        
                                BaseLockNetwork, 
        
                                BaseRedundancyChecker, 
        
                                BaseOperationImplementationNetwork, 
        
                                BaseOrchestrator]: 
        
                       for loaded in base.load(): 
        
                           for arg_name, arg in loaded.args().items(): 
        
                               setattr(LoadedDataFlowFacilitator, arg_name, arg) 
        
                   return LoadedDataFlowFacilitator 
        
           DataFlowFacilitatorCMD = BaseDataFlowFacilitatorCMD.add_bases()

dffml/dffml/source/source.py

Lines 66 to 69 in bf3493e

    
           async def __aenter__(self): 
        
               await self.open() 
        
               # TODO Context management 
        
               return self

community: Create chat

Suggestions: Gitter

plugin: model: darknet: Create YOLO Model

DFFML is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged gsoc and project are not generally available bugs, but related to project ideas for GSoC.

Project Idea: YOLO/darknet Model.

Project description:
DFFML's initial release included a Model for Tensorflows DNN estimator.

YOLOv1,2,3 are awesome, it would be possible to wrap the YOLO work in a DFFML model and then it could be used within the DFFML API.

This involves filling out the Model abstract base class, just as the Tensorflow DNN does (use this as an example, or probably more as an education because this is likely to be rather different as we're working with images).

Tensorflow DNN: https://github.com/intel/dffml/blob/master/model/tensorflow/dffml_model_tensorflow/model/dnn.py

Model ABC:

dffml/dffml/model/model.py

Lines 30 to 52 in dd8007d

    
               async def train(self, sources: Sources, features: Features, 
        
                       classifications: List[Any], steps: int, num_epochs: int): 
        
                   ''' 
        
                   Train using repos as the data to learn from. 
        
                   ''' 
        
                   raise NotImplementedError() 
        
               @abc.abstractmethod 
        
               async def accuracy(self, sources: Sources, features: Features, 
        
                       classifications: List[Any]) -> Accuracy: 
        
                   ''' 
        
                   Evaluates the accuracy of our model after training using the input repos 
        
                   as test data. 
        
                   ''' 
        
                   raise NotImplementedError() 
        
               @abc.abstractmethod 
        
               async def predict(self, repos: AsyncIterator[Repo], features: Features, 
        
                       classifications: List[Any]) -> \ 
        
                               AsyncIterator[Tuple[Repo, Any, float]]: 
        
                   ''' 
        
                   Uses trained data to make a prediction about the quality of a repo. 
        
                   '''

Skills: Python, git
Difficulty level: Hard

Related Readings/Links:

Demo video

Potential mentors: @pdxjohnny

Getting Started: Start by copying the directory model/tensorflow to model/darknet and re-naming everything. Then move dnn.py to darknet.py (still re-naming everything) and make sure all the tests in the model/darknet directory still pass. Then you'll need to gut the DNN class and start replacing it withsubprocess.call or check_output or whatever which will call out the the darknet binary which you compiled from Joseph's darknet repo for training, accuracy, and prediction (which is object detection in this case).

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", perhaps use the Python C bindings instead of calling subprocess out to the darknet binary. Don't forget to include some time for building appropriate tests.

model: scikit: Add scikit models for AdaBoost

source: file: Compression

Project Idea: File Source Compression

Project description:

DFFML's initial release includes a FileSource which saves and loads data from files using the load_fd and dump_fd methods.

JSON Example

dffml/dffml/source/json.py

Lines 19 to 27 in dd8007d

    
           async def load_fd(self, fd): 
        
               repos = json.load(fd) 
        
               self.mem = {src_url: Repo(src_url, data=data) \ 
        
                       for src_url, data in repos.items()} 
        
               LOGGER.debug('%r loaded %d records', self, len(self.mem)) 
        
           async def dump_fd(self, fd): 
        
               json.dump({repo.src_url: repo.dict() for repo in self.mem.values()}, fd) 
        
               LOGGER.debug('%r saved %d records', self, len(self.mem))

For the open method of FileSource

dffml/dffml/source/file.py

Lines 36 to 44 in dd8007d

    
           async def _open(self): 
        
               if not os.path.exists(self.filename) \ 
        
                       or os.path.isdir(self.filename): 
        
                   LOGGER.debug('%r is not a file, initializing memory to empty dict', 
        
                                self.filename) 
        
                   self.mem = {} 
        
                   return 
        
               with open(self.filename, 'r') as fd: 
        
                   await self.load_fd(fd)

Allow for reading and writing the following file formats, transparently (so without subclasses having to do anything) to any source which is a subclass of FileSource.

gzip (by @yashlamba)
bz2
lzma
zip

Skills: Python, git
Difficulty level: Easy

Related Readings/Links:

See https://docs.python.org/3/library/archiving.html for documentation

Potential mentors: @pdxjohnny

Getting Started: Figure out how to do one of the file types, probably gzip (as that probably is as simple as using https://docs.python.org/3/library/gzip.html#gzip.GzipFile if the filename ends in .gz) then move on to the rest. For now just make modifications directly to the FileSource class. We may have you split out the logic later, but don't worry about another class for now.

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", maybe implement a remote file source which reads form URLs. Don't forget to include some time for building appropriate tests.

df: types: Operation load() method needs implementing

load() should load all operations registered for its entrypoint, or a single Operation by its name.

dffml/dffml/df/types.py

Lines 62 to 67 in 2c67198

    
               def load(cls, loading=None): 
        
                   ''' 
        
                   Loads all installed loading and returns them as a list. Sources to be 
        
                   loaded should be registered to ENTRY_POINT via setuptools. 
        
                   ''' 
        
                   raise NotImplementedError()

source: csv: Enable setting repo src_url

CSVSource needs to be able to set a Repos src_url to a specified column name.

ci: Static type checking

https://github.com/Microsoft/pyright

https://google.github.io/pytype/user_guide.html

https://pyre-check.org/

docs: Code demos

https://mybinder.readthedocs.io/en/latest/using.html

Demo video in README

source: Labeled and Versioned datasets

Assignee: @sudharsana-kjl

Project Idea: Labeled and Versioned Datasets.

Project description:
DFFML's initial release includes sources which abstract the format in which the data is stored from the dataset generation and usage in models.

Add information allowing users to have different versions and datasets from the same source.

Skills: Python, git
Difficulty level: Intermediate

Related Readings/Links:

dffml/dffml/source/source.py

Lines 16 to 45 in dd8007d

    
           class Source(abc.ABC, Entrypoint): 
        
               ''' 
        
               Abstract base class for all sources. New sources must be derived from this 
        
               class and implement the repos method. 
        
               ''' 
        
               ENTRY_POINT = 'dffml.source' 
        
               def __init__(self, src: str) -> None: 
        
                   self.src = src 
        
               @abc.abstractmethod 
        
               async def update(self, repo: Repo): 
        
                   ''' 
        
                   Updates a repo for a source 
        
                   ''' 
        
               @abc.abstractmethod 
        
               async def repos(self) -> AsyncIterator[Repo]: 
        
                   ''' 
        
                   Returns a list of repos retrieved from self.src 
        
                   ''' 
        
                   # mypy ignores AsyncIterator[Repo], therefore this is needed 
        
                   yield Repo('') # pragma: no cover 
        
               @abc.abstractmethod 
        
               async def repo(self, src_url: str): 
        
                   ''' 
        
                   Get a repo from the source or add it if it doesn't exist 
        
                   '''

dffml/dffml/repo.py

Lines 90 to 116 in dd8007d

    
           class Repo(object): 
        
               ''' 
        
               Manages feature independent information and actions for a repo. 
        
               ''' 
        
               REPO_DATA = RepoData 
        
               def __init__(self, src_url: str, *, 
        
                       data: Optional[Dict[str, Any]] = None, 
        
                       extra: Optional[Dict[str, Any]] = None) -> None: 
        
                   if data is None: 
        
                       data = {} 
        
                   if extra is None: 
        
                       extra = {} 
        
                   data['src_url'] = src_url 
        
                   if 'extra' in data: 
        
                       # Prefer extra from init arguments to extra stored in data 
        
                       data['extra'].update(extra) 
        
                       extra = data['extra'] 
        
                       del data['extra'] 
        
                   self.data = self.REPO_DATA(**data) 
        
                   self.extra = extra 
        
               def dict(self): 
        
                   data = self.data.dict() 
        
                   data['extra'] = self.extra 
        
                   return data

Potential mentors: @pdxjohnny

Getting Started: Source.__init__ probably needs another two arguments, label and version, which should probably have defaults (say, default and v0). Since the same backend (aka, a csv file or json file) would be used to store all the data, you'll have to change the existing sources we have to understand how to deal with this. For CSVSource that might mean adding another column to each repo, for JSONSource that might mean instead of one big array, the array of repos is stored like so:

{
    "default": {
        "v0": [
            "... all the repos ..."
        ]
    }
}

What we want to see in your application: Describe how you intend to solve the problem, and give us some "stretch goals", maybe you'll implement a source using sqlite too or something. Don't forget to include some time for building appropriate tests.

source: file: Open file validation issue

dffml/dffml/source/file.py

Line 37 in d248634

if not os.path.exists(self.filename) \

and not should be or

Refactor README

Add: Asyncio enabled Data Flow programing feeding into Machine Learning for processing

Abstract it so that developers can run a lots of functions concurrently. Base on the idea of an event loop, where data of various types entering the network are the the events which tirgger operations (functions) to be run.

Abstracts multi threading for IO to the function level instead of the thread level

Give example of ThreadPoolExecutor usage.

df: base: BaseOperationImplementationNetworks

We should be able to specify multiple operation implementation networks.
This would enable operations to live in different place, accessed via the
orchestrator transparently.

As seen in:

dffml/dffml/df/base.py

Lines 682 to 689 in 062946b

    
           def __call__(self, 
        
                   ictx: BaseInputNetworkContext, 
        
                   octx: BaseOperationNetworkContext, 
        
                   lctx: BaseLockNetworkContext, 
        
                   nctx: BaseOperationImplementationNetworkContext, 
        
                   rctx: BaseRedundancyCheckerContext) -> 'BaseOrchestratorContext': 
        
               return self.CONTEXT(ictx=ictx, octx=octx, lctx=lctx, nctx=nctx, 
        
                                   rctx=rctx)

There is currently a BaseOperationImplementationNetworkContext

The Orchestrator calls dispatch on it to run an operation. Similar to how the Features class wraps a set of Features we should have a BaseOperationImplementationNetworks and BaseOperationImplementationNetworksContext which would be a AsyncContextManagerList of BaseOperationImplementationNetworkContexts and have a dispatch method which check's each BaseOperationImplementationNetworkContext to see if it contains the implementation of the requested operation, if so then dispatch it within that. If not then move on to the next BaseOperationImplementationNetworkContext and so forth until we have found one which contains the OperationImplemenation, and therefore can dispatch it.

docs: Help improve the DFFML documentation!

Well it's very apparent I haven't documented much of DFFML. I am going to do more as I can. But if anyone wants to help out please do!

plugin: feature: git: cloc: Log if no binaries are in PATH

We should add a LOGGER.error at the end of __init__ if self.binary is not inpath

dffml/feature/git/dffml_feature_git/feature/cloc.py

Lines 17 to 23 in a70ee84

    
           def __init__(self): 
        
               super().__init__() 
        
               self.binary = self.BINARY 
        
               for binary in self.FASTER_THAN_CLOC: 
        
                   if inpath(binary): 
        
                       self.binary = binary

df: memory: Locking must be dynamic

Current implementation of MemoryLockNetworkContext is such that it locks all parent inputs and the input itself.

What it should do is look at the parents and let descendants of a parent all operate on descendant Input items at the same time, but not let anyone operate on that parent item until all the operations working on descendant Inputs have completed.

Effectively, all operations working on the descendant Intputs "share" the lock on that parent they descend from.

dffml/dffml/df/memory.py

Lines 466 to 492 in 2c67198

    
               async def acquire(self, parameter_set: BaseParameterSet): 
        
                   ''' 
        
                   Acquire the lock for each input in the input set which must be locked 
        
                   prior to running an operation using the input. 
        
                   ''' 
        
                   need_lock = {} 
        
                   # Acquire the master lock to find and or create needed locks 
        
                   async with self.parent.lock: 
        
                       # Get all the inputs up the ancestry tree 
        
                       inputs = [item async for item in parameter_set.inputs()] 
        
                       # Only lock the ones which require it 
        
                       for item in filter(lambda item: item.definition.lock, inputs): 
        
                           # Create the lock for the input if not present 
        
                           if not item.uid in self.parent.locks: 
        
                               self.parent.locks[item.uid] = asyncio.Lock() 
        
                           # Retrieve the lock 
        
                           need_lock[item.uid] = (item, self.parent.locks[item.uid]) 
        
                   # Use AsyncExitStack to lock the variable amount of inputs required 
        
                   async with AsyncExitStack() as stack: 
        
                       # Take all the locks we found we needed for this parameter set 
        
                       for _uid, (item, lock) in need_lock.items(): 
        
                           # Take the lock 
        
                           self.logger.debug('Acquiring: %s(%r)', item.uid, item.value) 
        
                           await stack.enter_async_context(lock) 
        
                           self.logger.debug('Acquired: %s(%r)', item.uid, item.value) 
        
                       # All locks for these parameters have been acquired 
        
                       yield

model: Instantiation pattern should be consitant with data flow

As seen in df/base.py, we want to make the Model class and its children follow the same design pattern.

class BaseModelContext(abc.ABC):
    '''
    Abstract Base Class for context managing a Model
    '''

    async def __aenter__(self) -> 'BaseModelContext':
        return self

    async def __aexit__(self, exc_type, exc_value, traceback):
        pass

class BaseModel(Entrypoint):
    '''
    Abstract Base Class for a Model
    '''

    ENTRY_POINT = 'dffml.model'

    def __init__(self, config: BaseConfig) -> None:
        self.config = config
        self.logger = LOGGER.getChild(self.__class__.__qualname__)

    @classmethod 
    @abc.abstractmethod 
    def args(cls) -> Dict[str, Arg]: 
        pass 
  
    @classmethod 
    @abc.abstractmethod 
    def config(cls, cmd: CMD): 
        pass 

    @abc.abstractmethod
    def __call__(self) -> 'BaseModelContext':
        return BaseIModelContext(self)

CMD will need to change to resemble DataFlowFacilitatorCMD as well.

dffml/dffml/util/cli/cmd.py

Lines 130 to 193 in bf3493e

    
           class BaseDataFlowFacilitatorCMD(CMD): 
        
               ''' 
        
               Set timeout for features 
        
               ''' 
        
               arg_ops = Arg('-ops', required=True, nargs='+', 
        
                       action=ParseOperationAction) 
        
               arg_input_network = Arg('-input-network', 
        
                       action=ParseInputNetworkAction, default=MemoryInputNetwork) 
        
               arg_operation_network = Arg('-operation-network', 
        
                       action=ParseOperationNetworkAction, default=MemoryOperationNetwork) 
        
               arg_lock_network = Arg('-lock-network', 
        
                       action=ParseLockNetworkAction, default=MemoryLockNetwork) 
        
               arg_rchecker = Arg('-rchecker', 
        
                       action=ParseRedundancyCheckerAction, 
        
                       default=MemoryRedundancyChecker) 
        
               # TODO We should be able to specify multiple operation implementation 
        
               # networks. This would enable operations to live in different place, 
        
               # accessed via the orchestrator transparently. 
        
               arg_opimpn = Arg('-opimpn', 
        
                       action=ParseOperationImplementationNetworkAction, 
        
                       default=MemoryOperationImplementationNetwork) 
        
               arg_orchestrator = Arg('-orchestrator', 
        
                       action=ParseOrchestratorAction, default=MemoryOrchestrator) 
        
               arg_output_specs = Arg('-output-specs', required=True, nargs='+', 
        
                       action=ParseOutputSpecsAction) 
        
               arg_inputs = Arg('-inputs', nargs='+', 
        
                       action=ParseInputsAction, default=[], 
        
                       help='Other inputs to add under each ctx (repo\'s src_url will ' + \ 
        
                            'be used as the context)') 
        
               arg_repo_def = Arg('-repo-def', default=False, type=str, 
        
                       help='Definition to be used for repo.src_url.' + \ 
        
                            'If set, repo.src_url will be added to the set of inputs ' + \ 
        
                            'under each context (which is also the repo\'s src_url)') 
        
               arg_remap = Arg('-remap', nargs='+', required=True, 
        
                       action=ParseRemapAction, 
        
                       help='For each repo, -remap output_operation_name.sub=feature_name') 
        
               def __init__(self, *args, **kwargs): 
        
                   super().__init__(*args, **kwargs) 
        
                   self.dff = DataFlowFacilitator() 
        
                   self.linker = Linker() 
        
                   self.exported = self.linker.export(*self.ops) 
        
                   self.definitions, self.operations, _outputs = \ 
        
                           self.linker.resolve(self.exported) 
        
               # Load all entrypoints which may possibly be selected. Then have them add 
        
               # their arguments to the DataFlowFacilitator-tots command. 
        
               @classmethod 
        
               def add_bases(cls): 
        
                   class LoadedDataFlowFacilitator(cls): 
        
                       pass 
        
                   for base in [BaseInputNetwork, 
        
                                BaseOperationNetwork, 
        
                                BaseLockNetwork, 
        
                                BaseRedundancyChecker, 
        
                                BaseOperationImplementationNetwork, 
        
                                BaseOrchestrator]: 
        
                       for loaded in base.load(): 
        
                           for arg_name, arg in loaded.args().items(): 
        
                               setattr(LoadedDataFlowFacilitator, arg_name, arg) 
        
                   return LoadedDataFlowFacilitator 
        
           DataFlowFacilitatorCMD = BaseDataFlowFacilitatorCMD.add_bases()

docs: tutorial: How to add a new feature

GSoC students (and others) need the new feature tutorial https://github.com/intel/dffml/tree/master/docs/tutorial/FEATURE.md

Misc Q & A and discussion

Please use this issue for any Google Summer of Code-related questions and discussion that does not fit elsewhere.

plugin: model: Add some new models!

Add a model!

This issue is for discussion and help needed comments while adding new Modelss to DFFML.

First, get familiar with how models can be used via the DFFML command line: https://intel.github.io/dffml/master/plugins/dffml_model.html

Make sure you follow: https://intel.github.io/dffml/master/contributing/dev_env.html

Look at what libraries are already being wrapped or models have already been implemented. If you want to use a library that has not yet been integrated, reference the new model tutorial: https://intel.github.io/dffml/master/tutorials/models/

If want to create a new model using any libraries we already have wrappers for,just start working on those packages that already exist under model/. Create a new file under dffml_model_library_name. Each library wrapper does things differently, you should check out how that wrapper is interacting with the underlying library by looking at how the existing models are implemented.

docs: Example usage of Git features

docs: tutorial: Create model tutorial

dffml: feature: Move to data flow architecture

Currently DFFML does not use a data flow architecture for evaluation. That's because locking was hard (shocking I know), and I had to just release. I put some effort into this around December / January. I'll try to finish it out in the next few weeks. It will make writing features SOOOOO much cleaner.

docs: Create presentation for overall architecture

https://github.com/hiroppy/fusuma

docs: More info on various classes

source: file: Fails to read from /dev/fd/num

df: base: Base classes need to have context management within parent and context

Context management needs to be done twofold. Currently it is only done once. Base classes need __aenter__ and __aexit__ methods (similarly to how their context's do).

demo: Create interactive demo website

The web ui will be this.

https://pyscript.net/

What follows was old server side approach

We can use runme.io

When a user clicks a https://runme.io/run?app_id=a1774bbe-b840-4853-b4f1-e6d4440c5e2e style link they'll be redirected to a https://runme.io/show?build_id=3ee099f0-7680-464e-89bf-9e2357e9b5af style link.

We can take that UUID at the end and get the URL the HTTP server is running at: https://3ee099f0-7680-464e-89bf-9e2357e9b5af.runme.io/

So we'll want to add a runme button to the initial popup of the webUI saying that they can try in demo mode or they can try with a real backend if they click the link. If they click it open another model that gives them a couple steps:

Click this link (replace with real one eventually): https://runme.io/run?app_id=a1774bbe-b840-4853-b4f1-e6d4440c5e2e
Wait until the app launches
Copy the URL in their address bar back into the new modal of the web UI and parse it to get the value of app_id
Set the backend URL to https://$app_id.runme.io/ and close all the modals so they can now play around

docs: Create presentation for data flow architecture

https://github.com/hiroppy/fusuma

df: Golang implemenation

The Data Flow Facilitator portion of DFFML would be great to port to other languages. As it's just a helper to use a data flow architecture. Go is probably the most straightforward port.

APIs should be kept the same. If needed we'll change the python APIs to match so that it's all one API.

References
- https://go.dev/doc/install
- https://protobuf.dev/reference/go/faq/

model: scratch: Create Linear Regression model

Assignee: @yashlamba

./scripts/create.sh model scratch

Then fill out model with your implementation of linear regression. (Maybe linear.py)

model: scikit: Add scikit models for RandomForest

https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

df: types: Add properties to Definition

We need to have locality (thread, disk) and then attach things like disk usage to inputs and make the orchestrator aware of this when dispatching to an OpImpN

docs: CONTRIBUTING: Add requirements on test coverage

Make sure CONTRIBUTING.md says that test coverage must remain resonable

df: base: OperationImplementation needs args and config methods

Similarly to #46 and #47. OperationImplementation's need to have args and config methods.

plugin: feature: Add some features!

Add some features!

This issue is for discussion and help needed comments while adding new Features to DFFML.

See the new feature tutorial to get started.

arch: df: Implement a way to use data flow for transformation of existing datasets

Say we have the a dataset generated by dffml-feature-git and want to take 10 quarters down to 1 data point for each feature. There is someway that we could use dff to do this. Probably with operations similar to output operations? Lets brainstorm here.

source: csv: Add update funcationality

Remove the _close method from CSVSource, as FileSource would normally use this to open the file and call dump_fd on it.

dffml/dffml/source/csvfile.py

Lines 49 to 50 in d184e8e

    
           async def _close(self): 
        
               LOGGER.debug('%r save to file not implemented', self)

Implement dump_fd, this will probably include adding new headers to the file.

Something like prediction for the prediction.classification and confidence for prediction.confidence might be appropriate.

dffml/dffml/source/csvfile.py

Lines 52 to 54 in d184e8e

    
           async def dump_fd(self, fd): 
        
               pass 
        
               # LOGGER.debug('%r saved %d records', self, len(self.mem))

df: types: Audit Input.get_parents/inputs/Parameter.inputs

get_parents was meant as a way to get the ancestry of a piece of data in the InputNetwork. This functionality may need to be removed.

dffml/dffml/df/types.py

Lines 115 to 117 in 2c67198

    
           def get_parents(self) -> Iterator['Input']: 
        
               return itertools.chain(*[[item] + list(item.get_parents()) \ 
        
                                        for item in self.parents])

Places needing modification

dffml/dffml/df/base.py

Lines 284 to 296 in 2c67198

    
           class BaseInputSet(abc.ABC): 
        
               def __init__(self, config: BaseInputSetConfig) -> None: 
        
                   self.config = config 
        
                   self.ctx = config.ctx 
        
                   self.logger = LOGGER.getChild(self.__class__.__qualname__) 
        
               @abc.abstractmethod 
        
               async def inputs(self) -> AsyncIterator[Input]: 
        
                   pass 
        
               async def _asdict(self) -> Dict[str, Any]: 
        
                   '''

dffml/dffml/df/base.py

Lines 294 to 304 in 2c67198

    
               async def _asdict(self) -> Dict[str, Any]: 
        
                   ''' 
        
                   Returns an input definition name to input value dict 
        
                   ''' 
        
                   return {item.definition.name: item.value \ 
        
                             async for item in self.inputs()} 
        
           class BaseParameterSetConfig(NamedTuple): 
        
               ctx: BaseInputSetContext

dffml/dffml/df/base.py

Lines 313 to 321 in 2c67198

    
           async def parameters(self) -> AsyncIterator[Parameter]: 
        
               pass 
        
           @abc.abstractmethod 
        
           async def inputs(self) -> AsyncIterator[Input]: 
        
               pass 
        
           async def _asdict(self) -> Dict[str, Any]: 
        
               '''

dffml/dffml/df/memory.py

Lines 79 to 87 in 2c67198

    
               def __init__(self, config: MemoryInputSetConfig) -> None: 
        
                   super().__init__(config) 
        
                   self.__inputs = config.inputs 
        
               async def inputs(self) -> AsyncIterator[Input]: 
        
                   for item in self.__inputs: 
        
                       yield item 
        
           class MemoryParameterSetConfig(NamedTuple):

dffml/dffml/df/memory.py

Lines 97 to 105 in 2c67198

    
           async def parameters(self) -> AsyncIterator[Parameter]: 
        
               for parameter in self.__parameters: 
        
                   yield parameter 
        
           async def inputs(self) -> AsyncIterator[Input]: 
        
               for item in itertools.chain(*[[parameter.origin] + \ 
        
                                             list(parameter.origin.get_parents()) \ 
        
                                             for parameter in \ 
        
                                             self.__parameters]):

dffml/dffml/df/memory.py

Lines 168 to 176 in 2c67198

    
               definitions: Dict[Definition, List[Input]] 
        
           class MemoryDefinitionSetContext(BaseDefinitionSetContext): 
        
               async def inputs(self, definition: Definition) -> AsyncIterator[Input]: 
        
                   # Grab the input set context handle 
        
                   handle = await self.ctx.handle() 
        
                   handle_string = handle.as_string() 
        
                   # Associate inputs with their context handle grouped by definition

dffml/dffml/df/memory.py

Lines 194 to 202 in 2c67198

    
                   await ctx.add(input_set.ctx, []) 
        
           # Add the input set to the incoming inputs 
        
           async with self.parent.input_notification_set[handle_string]() as ctx: 
        
               await ctx.add(input_set, 
        
                   [item async for item in input_set.inputs()]) 
        
           # Associate inputs with their context handle grouped by definition 
        
           async with self.parent.ctxhd_lock: 
        
               # Create dict for handle_string if not present 
        
               if not handle_string in self.parent.ctxhd:

dffml/dffml/df/memory.py

Lines 205 to 213 in 2c67198

    
                           ctx=input_set.ctx, 
        
                           definitions={} 
        
                       ) 
        
           # Go through each item in the input set 
        
           async for item in input_set.inputs(): 
        
               # Create set for item definition if not present 
        
               if not item.definition in \ 
        
                       self.parent.ctxhd[handle_string].definitions: 
        
                   self.parent.ctxhd[handle_string].definitions\

dffml/dffml/df/memory.py

Lines 345 to 353 in 2c67198

    
               stage: Stage = Stage.PROCESSING) -> AsyncIterator[Operation]: 
        
           # Set list of needed input definitions if given 
        
           if not input_set is None: 
        
               input_definitions = set([item.definition \ 
        
                                        async for item in input_set.inputs()]) 
        
           # Yield all operations with an input in the input set 
        
           for operation in self.parent.memory: 
        
               # Only run operations of the requested stage 
        
               if operation.stage != stage:

dffml/dffml/df/memory.py

Lines 392 to 400 in 2c67198

    
                   SHA384 hash of the parameter set context handle as a string, the 
        
                   operation name, and the sorted list of input uuids. 
        
                   ''' 
        
                   uid_list = sorted(map(lambda x: x.uid, [item async for item in \ 
        
                                                           parameter_set.inputs()])) 
        
                   uid_list.insert(0, (await parameter_set.ctx.handle()).as_string()) 
        
                   uid_list.insert(0, operation.name) 
        
                   return hashlib.sha384(', '.join(uid_list).encode('utf-8')).hexdigest()

dffml/dffml/df/memory.py

Lines 471 to 479 in 2c67198

    
           need_lock = {} 
        
           # Acquire the master lock to find and or create needed locks 
        
           async with self.parent.lock: 
        
               # Get all the inputs up the ancestry tree 
        
               inputs = [item async for item in parameter_set.inputs()] 
        
               # Only lock the ones which require it 
        
               for item in filter(lambda item: item.definition.lock, inputs): 
        
                   # Create the lock for the input if not present 
        
                   if not item.uid in self.parent.locks:

dffml/dffml/df/memory.py

Lines 592 to 600 in 2c67198

    
                   for value in output: 
        
                       inputs.append(Input(value=value, 
        
                                            definition=operation.outputs[key], 
        
                                            parents=[item async for item in \ 
        
                                                     parameter_set.inputs()])) 
        
           except KeyError as error: 
        
               raise KeyError( 
        
                       'Value %s missing from output:definition mapping %s(%s)' \ 
        
                       % (str(error), operation.name,

dffml/dffml/df/memory.py

Lines 770 to 778 in 2c67198

    
           more, new_input_set = input_set_enters_network.result() 
        
           self.logger.debug('[%s]: input_set_enters_network: %s', 
        
                             ctx_str, 
        
                             [(i.definition.name, i.value) \ 
        
                             async for i in new_input_set.inputs()]) 
        
           # Identify which operations have complete contextually 
        
           # appropriate input sets which haven't been run yet 
        
           async for operation, parameter_set in \ 
        
                   self.nctx.operations_parameter_set_pairs(self.ictx,

dffml/dffml/operation/output.py

Lines 75 to 83 in 2c67198

    
           group_by = collections.OrderedDict(sorted(group_by.items())) 
        
           # Find all inputs within the input network for the by definition 
        
           async for item in od.inputs(output.by): 
        
               # Get all the parents of the input 
        
               parents = list(item.get_parents()) 
        
               for group, related in group_by.values(): 
        
                   # Ensure that the definition we need to group by is in 
        
                   # the parents 
        
                   if not group in parents:

docs: tests: Add information on running coverage

model: Impelment a way to pass model specific config options

For tensorflow/DNN DFFML Model, hidden_units is hardcoded

dffml/model/tensorflow/dffml_model_tensorflow/model/dnn.py

Line 192 in ba31ae7

hidden_units=[10, 20, 10],

This sucks, there needs to be away for models to define what arguments they take, and then have those passed to the function instantiating them.

For instance, on the Iris dataset, changing to [12, 40, 15] gets us ~96% accuracy in ~1201 steps.

docs: Create GitHub pages site

Pitch
Presentations (#63, #65)
Python API Documentation

	description='',
	long_description=readme,
	author='John Andersen',

	long_description=readme,
	long_description_content_type='text/markdown',
	author='John Andersen',

	class BaseDataFlowFacilitatorCMD(CMD):
	'''
	Set timeout for features
	'''

	arg_ops = Arg('-ops', required=True, nargs='+',
	action=ParseOperationAction)
	arg_input_network = Arg('-input-network',
	action=ParseInputNetworkAction, default=MemoryInputNetwork)
	arg_operation_network = Arg('-operation-network',
	action=ParseOperationNetworkAction, default=MemoryOperationNetwork)
	arg_lock_network = Arg('-lock-network',
	action=ParseLockNetworkAction, default=MemoryLockNetwork)
	arg_rchecker = Arg('-rchecker',
	action=ParseRedundancyCheckerAction,
	default=MemoryRedundancyChecker)
	# TODO We should be able to specify multiple operation implementation
	# networks. This would enable operations to live in different place,
	# accessed via the orchestrator transparently.
	arg_opimpn = Arg('-opimpn',
	action=ParseOperationImplementationNetworkAction,
	default=MemoryOperationImplementationNetwork)
	arg_orchestrator = Arg('-orchestrator',
	action=ParseOrchestratorAction, default=MemoryOrchestrator)
	arg_output_specs = Arg('-output-specs', required=True, nargs='+',
	action=ParseOutputSpecsAction)
	arg_inputs = Arg('-inputs', nargs='+',
	action=ParseInputsAction, default=[],
	help='Other inputs to add under each ctx (repo\'s src_url will ' + \
	'be used as the context)')
	arg_repo_def = Arg('-repo-def', default=False, type=str,
	help='Definition to be used for repo.src_url.' + \
	'If set, repo.src_url will be added to the set of inputs ' + \
	'under each context (which is also the repo\'s src_url)')
	arg_remap = Arg('-remap', nargs='+', required=True,
	action=ParseRemapAction,
	help='For each repo, -remap output_operation_name.sub=feature_name')

	def __init__(self, args, *kwargs):
	super().__init__(args, *kwargs)
	self.dff = DataFlowFacilitator()
	self.linker = Linker()
	self.exported = self.linker.export(*self.ops)
	self.definitions, self.operations, _outputs = \
	self.linker.resolve(self.exported)

	# Load all entrypoints which may possibly be selected. Then have them add
	# their arguments to the DataFlowFacilitator-tots command.
	@classmethod
	def add_bases(cls):
	class LoadedDataFlowFacilitator(cls):
	pass
	for base in [BaseInputNetwork,
	BaseOperationNetwork,
	BaseLockNetwork,
	BaseRedundancyChecker,
	BaseOperationImplementationNetwork,
	BaseOrchestrator]:
	for loaded in base.load():
	for arg_name, arg in loaded.args().items():
	setattr(LoadedDataFlowFacilitator, arg_name, arg)
	return LoadedDataFlowFacilitator

	DataFlowFacilitatorCMD = BaseDataFlowFacilitatorCMD.add_bases()

	async def __aenter__(self):
	await self.open()
	# TODO Context management
	return self

	async def train(self, sources: Sources, features: Features,
	classifications: List[Any], steps: int, num_epochs: int):
	'''
	Train using repos as the data to learn from.
	'''
	raise NotImplementedError()

	@abc.abstractmethod
	async def accuracy(self, sources: Sources, features: Features,
	classifications: List[Any]) -> Accuracy:
	'''
	Evaluates the accuracy of our model after training using the input repos
	as test data.
	'''
	raise NotImplementedError()

	@abc.abstractmethod
	async def predict(self, repos: AsyncIterator[Repo], features: Features,
	classifications: List[Any]) -> \
	AsyncIterator[Tuple[Repo, Any, float]]:
	'''
	Uses trained data to make a prediction about the quality of a repo.
	'''

	async def load_fd(self, fd):
	repos = json.load(fd)
	self.mem = {src_url: Repo(src_url, data=data) \
	for src_url, data in repos.items()}
	LOGGER.debug('%r loaded %d records', self, len(self.mem))

	async def dump_fd(self, fd):
	json.dump({repo.src_url: repo.dict() for repo in self.mem.values()}, fd)
	LOGGER.debug('%r saved %d records', self, len(self.mem))

	async def _open(self):
	if not os.path.exists(self.filename) \
	or os.path.isdir(self.filename):
	LOGGER.debug('%r is not a file, initializing memory to empty dict',
	self.filename)
	self.mem = {}
	return
	with open(self.filename, 'r') as fd:
	await self.load_fd(fd)

	def load(cls, loading=None):
	'''
	Loads all installed loading and returns them as a list. Sources to be
	loaded should be registered to ENTRY_POINT via setuptools.
	'''
	raise NotImplementedError()

	class Source(abc.ABC, Entrypoint):
	'''
	Abstract base class for all sources. New sources must be derived from this
	class and implement the repos method.
	'''

	ENTRY_POINT = 'dffml.source'

	def __init__(self, src: str) -> None:
	self.src = src

	@abc.abstractmethod
	async def update(self, repo: Repo):
	'''
	Updates a repo for a source
	'''

	@abc.abstractmethod
	async def repos(self) -> AsyncIterator[Repo]:
	'''
	Returns a list of repos retrieved from self.src
	'''
	# mypy ignores AsyncIterator[Repo], therefore this is needed
	yield Repo('') # pragma: no cover

	@abc.abstractmethod
	async def repo(self, src_url: str):
	'''
	Get a repo from the source or add it if it doesn't exist
	'''

	class Repo(object):
	'''
	Manages feature independent information and actions for a repo.
	'''

	REPO_DATA = RepoData

	def __init__(self, src_url: str, *,
	data: Optional[Dict[str, Any]] = None,
	extra: Optional[Dict[str, Any]] = None) -> None:
	if data is None:
	data = {}
	if extra is None:
	extra = {}
	data['src_url'] = src_url
	if 'extra' in data:
	# Prefer extra from init arguments to extra stored in data
	data['extra'].update(extra)
	extra = data['extra']
	del data['extra']
	self.data = self.REPO_DATA(**data)
	self.extra = extra

	def dict(self):
	data = self.data.dict()
	data['extra'] = self.extra
	return data

	def __call__(self,
	ictx: BaseInputNetworkContext,
	octx: BaseOperationNetworkContext,
	lctx: BaseLockNetworkContext,
	nctx: BaseOperationImplementationNetworkContext,
	rctx: BaseRedundancyCheckerContext) -> 'BaseOrchestratorContext':
	return self.CONTEXT(ictx=ictx, octx=octx, lctx=lctx, nctx=nctx,
	rctx=rctx)

	def __init__(self):
	super().__init__()
	self.binary = self.BINARY
	for binary in self.FASTER_THAN_CLOC:
	if inpath(binary):
	self.binary = binary

	async def acquire(self, parameter_set: BaseParameterSet):
	'''
	Acquire the lock for each input in the input set which must be locked
	prior to running an operation using the input.
	'''
	need_lock = {}
	# Acquire the master lock to find and or create needed locks
	async with self.parent.lock:
	# Get all the inputs up the ancestry tree
	inputs = [item async for item in parameter_set.inputs()]
	# Only lock the ones which require it
	for item in filter(lambda item: item.definition.lock, inputs):
	# Create the lock for the input if not present
	if not item.uid in self.parent.locks:
	self.parent.locks[item.uid] = asyncio.Lock()
	# Retrieve the lock
	need_lock[item.uid] = (item, self.parent.locks[item.uid])
	# Use AsyncExitStack to lock the variable amount of inputs required
	async with AsyncExitStack() as stack:
	# Take all the locks we found we needed for this parameter set
	for _uid, (item, lock) in need_lock.items():
	# Take the lock
	self.logger.debug('Acquiring: %s(%r)', item.uid, item.value)
	await stack.enter_async_context(lock)
	self.logger.debug('Acquired: %s(%r)', item.uid, item.value)
	# All locks for these parameters have been acquired
	yield

	async def _close(self):
	LOGGER.debug('%r save to file not implemented', self)

	async def dump_fd(self, fd):
	pass
	# LOGGER.debug('%r saved %d records', self, len(self.mem))

	def get_parents(self) -> Iterator['Input']:
	return itertools.chain(*[[item] + list(item.get_parents()) \
	for item in self.parents])

	class BaseInputSet(abc.ABC):

	def __init__(self, config: BaseInputSetConfig) -> None:
	self.config = config
	self.ctx = config.ctx
	self.logger = LOGGER.getChild(self.__class__.__qualname__)

	@abc.abstractmethod
	async def inputs(self) -> AsyncIterator[Input]:
	pass

	async def _asdict(self) -> Dict[str, Any]:
	'''


	async def _asdict(self) -> Dict[str, Any]:
	'''
	Returns an input definition name to input value dict
	'''
	return {item.definition.name: item.value \
	async for item in self.inputs()}

	class BaseParameterSetConfig(NamedTuple):
	ctx: BaseInputSetContext

	async def parameters(self) -> AsyncIterator[Parameter]:
	pass

	@abc.abstractmethod
	async def inputs(self) -> AsyncIterator[Input]:
	pass

	async def _asdict(self) -> Dict[str, Any]:
	'''

	def __init__(self, config: MemoryInputSetConfig) -> None:
	super().__init__(config)
	self.__inputs = config.inputs

	async def inputs(self) -> AsyncIterator[Input]:
	for item in self.__inputs:
	yield item

	class MemoryParameterSetConfig(NamedTuple):

intel / dffml Goto Github PK

dffml's Introduction

Mission Statement

Documentation

Contributing

Help

License

Legal

dffml's People

Contributors

Stargazers

Watchers

Forkers

dffml's Issues

Project Idea: YOLO/darknet Model.

Project Idea: File Source Compression

Project Idea: Labeled and Versioned Datasets.

Recommend Projects

Recommend Topics

Recommend Org

	async def parameters(self) -> AsyncIterator[Parameter]:
	for parameter in self.__parameters:
	yield parameter

	async def inputs(self) -> AsyncIterator[Input]:
	for item in itertools.chain(*[[parameter.origin] + \
	list(parameter.origin.get_parents()) \
	for parameter in \
	self.__parameters]):

	definitions: Dict[Definition, List[Input]]

	class MemoryDefinitionSetContext(BaseDefinitionSetContext):

	async def inputs(self, definition: Definition) -> AsyncIterator[Input]:
	# Grab the input set context handle
	handle = await self.ctx.handle()
	handle_string = handle.as_string()
	# Associate inputs with their context handle grouped by definition

	await ctx.add(input_set.ctx, [])
	# Add the input set to the incoming inputs
	async with self.parent.input_notification_set[handle_string]() as ctx:
	await ctx.add(input_set,
	[item async for item in input_set.inputs()])
	# Associate inputs with their context handle grouped by definition
	async with self.parent.ctxhd_lock:
	# Create dict for handle_string if not present
	if not handle_string in self.parent.ctxhd:

	ctx=input_set.ctx,
	definitions={}
	)
	# Go through each item in the input set
	async for item in input_set.inputs():
	# Create set for item definition if not present
	if not item.definition in \
	self.parent.ctxhd[handle_string].definitions:
	self.parent.ctxhd[handle_string].definitions\

	stage: Stage = Stage.PROCESSING) -> AsyncIterator[Operation]:
	# Set list of needed input definitions if given
	if not input_set is None:
	input_definitions = set([item.definition \
	async for item in input_set.inputs()])
	# Yield all operations with an input in the input set
	for operation in self.parent.memory:
	# Only run operations of the requested stage
	if operation.stage != stage:

	SHA384 hash of the parameter set context handle as a string, the
	operation name, and the sorted list of input uuids.
	'''
	uid_list = sorted(map(lambda x: x.uid, [item async for item in \
	parameter_set.inputs()]))
	uid_list.insert(0, (await parameter_set.ctx.handle()).as_string())
	uid_list.insert(0, operation.name)
	return hashlib.sha384(', '.join(uid_list).encode('utf-8')).hexdigest()

	for value in output:
	inputs.append(Input(value=value,
	definition=operation.outputs[key],
	parents=[item async for item in \
	parameter_set.inputs()]))
	except KeyError as error:
	raise KeyError(
	'Value %s missing from output:definition mapping %s(%s)' \
	% (str(error), operation.name,

	more, new_input_set = input_set_enters_network.result()
	self.logger.debug('[%s]: input_set_enters_network: %s',
	ctx_str,
	[(i.definition.name, i.value) \
	async for i in new_input_set.inputs()])
	# Identify which operations have complete contextually
	# appropriate input sets which haven't been run yet
	async for operation, parameter_set in \
	self.nctx.operations_parameter_set_pairs(self.ictx,

	group_by = collections.OrderedDict(sorted(group_by.items()))
	# Find all inputs within the input network for the by definition
	async for item in od.inputs(output.by):
	# Get all the parents of the input
	parents = list(item.get_parents())
	for group, related in group_by.values():
	# Ensure that the definition we need to group by is in
	# the parents
	if not group in parents: