The root of this particular pipeline starts out in the default namespace, then it's im

exception: "ancestor argument should match namespace" during /_ah/pipeline/output about appengine-pipelines HOT 5 OPEN

NickFranceschina commented on July 29, 2024

exception: "ancestor argument should match namespace" during /_ah/pipeline/output

from appengine-pipelines.

Comments (5)

NickFranceschina commented on July 29, 2024

for now I'm forcing use_barrier_indexes = False so that it uses the legacy code-path... then I can at least get things deployed

from appengine-pipelines.

soundofjw commented on July 29, 2024

Good call using use_barrier_indexes for now - this is a potentially tricky one.

If we move to fix this, it may be surprising behavior for others.
We use namespaces as well, and I think this change should be ok from my perspective, because we don't manipulate the namespaces through the pipeline. For us, a pipeline starts completely in a namespace, and stays on that namespace through appengine_config.namespace_manager_default_namespace_for_request.

With all of that noted, I want to make sure I understand the problem:

The datastore is complaining because your using an ancestor query, where the namespace of the ancestor does not match the namespace passed to the query (defaults to empty string). (https://cloud.google.com/appengine/docs/python/ndb/queryclass)

I believe the fix may be as simple as changing this line in notify_barriers:
_BarrierIndex.all(cursor=cursor, keys_only=True)
=>
_BarrierIndex.all(cursor=cursor, keys_only=True, namespace=ancestor_key.namespace())

from appengine-pipelines.

NickFranceschina commented on July 29, 2024

when you say "defaults to empty" I don't think that's what is happening... instead I believe the query is defaulting to the namespace of the task that called it (child task which was in namespace "1" called /output which triggered BarrierHandler), but the ancestor key is from the root pipeline's namespace (which is '')... that's how it looks from the last line printed out in the trace:

(ancestor.name_space(), namespace)) ---> ("''" != "'1'")

when I grab the string keys from the headers and build datastore Keys out of them, it appears the ancestor has the root pipeline ID in it, but that ID doesn't exist in the current namespace (even in the console if you open the detail barrier record, you can't click on the ancestor because it doesn't exist)

guessing, as you explained how you guys normally use the namespaces, that the Key path is just getting generated with a list of kind/id assuming they are all in the same namespace.... but in my case the top-level kind/id isn't... so it makes that key technically invalid

I can probably re-engineer our stuff to kick off a pipeline per namespace... but let me know if you think you can get it working! Thanks!

from appengine-pipelines.

soundofjw commented on July 29, 2024

Obviously, longterm: you don't want to kick off one pipeline per namespace permanently, you lose the major benefits of pipeline fan-out abortion and success if you aren't yielding child pipelines.

You are also correct about the default namespace - it would be the namespace of the process that created the task, for any of the pipeline tasks complete fanout abort etc.

A much more comfortable solution, now that I'm seeing a larger scope here, would involve keeping ALL pipeline entities in the namespace of the root pipeline. Then, if you need namespace switching, you would explicitly achieve this per pipeline.

This also simplifies the answer to "How do I find a pipeline with a given pipeline_id?".

One paradigm I use a lot is class inheritance for pipelines with a common setup function, to prepare for any common variables. This is good for your larger pipeline chains, like I believe you may have.

class MyRootPipeline(Pipeline):
    def setup(self, **kwargs):
        """Perform setup for MyRootPipeline and all derivative pipes."""
        # Get pipeline information, and do setup.
        self.namespace = kwargs.get('namespace', None)
        self.kwargs = kwargs.copy()  # changes to kwargs shouldn't affect local copy

        if self.namespace:
            # Sets the namespace for the current HTTP request.
            namespace_manager.set_namespace(self.namespace)

    def run(self, **kwargs):
        # Do your setup
        self.setup(**kwargs)

        # Do stuff

        # Yield child with same kwargs, and any additional args.
        kwargs['namespace'] = "other_namespace"
        yield ChildPipeline(child_wants_candy=True, **kwargs)

class ChildPipeline(MyRootPipeline):
    """Subclassed from MyRootPipeline for common setup procedure."""

    def run(self, child_wants_candy, **kwargs):
        # Performs setup, switches namespace, ...
        self.setup(**kwargs)

        # Stuff and things in the new namespace.

This won't work now, until the pipeline knows to explicitly use the namespace for the yielded child pipeline and callback tasks, etc. - but that's the support I'd consider targeting for this issue.

I hope this all makes sense!!! 🐻

from appengine-pipelines.

NickFranceschina commented on July 29, 2024

Funny... that's pretty much exactly our pipeline subclass paradigm as well

class MigrateFiles(MigrationPipeline):
    def run(self):
        self.setup()
        ...

as for our existing structure, we don't need to run a single pipeline across namespaces, we just had it setup that way. we could definitely restructure it to be one per namespace... but as you stated, long term, it would be better to not have to think about it... and it would indeed be easier to figure out where the pipeline records are (and clear out the old ones) if they were always stored in the default namespace

So yeah, this all makes sense... and I really appreciate your input!

from appengine-pipelines.

exception: "ancestor argument should match namespace" during /_ah/pipeline/output about appengine-pipelines HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent