Giter Site home page Giter Site logo

Comments (6)

TomNicholas avatar TomNicholas commented on August 31, 2024 1

I question as to whether a dict approach can handle all of the possible operations with Datasets, because it seems like a lot of them return new Dataset objects. e.g., merge, combine, etc

The reason xarray does almost everything through methods that return new objects is (a) to avoid unintuitive copying behavior under-the-hood with in-place operations and (b) to enable method chaining. I actually just now found this rather interesting blog post discussing this rationale in pandas (which inspired xarray).

So while your solution for renaming vars should suffice, I'm not fully convinced that it solves the root issue entirely.

You know you can turn the data in a datatree node into a standalone Dataset, do whatever you want to that, and then attach it back to the same node of the same tree, right? i.e.

ds = dt['pick/a/node'].to_dataset()

new_ds = rename_whatever(ds)

dt['pick/a/node'].ds = new_ds

No new tree is created when you do this, you just assign new variables to the specific node of the tree. With that pattern I would say this becomes completely an xarray usage question, not a datatree usage question.

from datatree.

TomNicholas avatar TomNicholas commented on August 31, 2024 1

Actually you're right, I don't know if the docs currently mention anywhere that assigning to .ds is allowed!

from datatree.

TomNicholas avatar TomNicholas commented on August 31, 2024

Hi @marcel-goldschen-ohm , thank you so much for your thoughtful feedback!

the design choice of having the tree linkage and dataset all wrapped in the same DataTree object

Original context for this decision is here #2 (comment)

the current lack of support in xarray for renaming vars/coords "inplace"

I don't think this should be a huge barrier. Conceptually xarray Datasets are intended to be mostly treatable like dictionaries. I think we should be able to find some other solution that allows you to treat the variables in the way you want.

(as far as I can tell) the only way to rename a var/coord is to create a new DataTree object (i.e., dt2 = dt1.rename_vars({'x': 'y'}) just as for datasets

With a dataset you can also just do

In [3]: ds = xr.Dataset({'x': 1})

In [4]: ds
Out[4]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    x        int64 1
 
In [5]: ds.rename_vars({'x': 'y'})  # one approach
Out[5]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    y        int64 1

In [6]: ds['y'] = ds['x']  # but we can also just treat the ds like a dict

In [7]: ds
Out[7]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    x        int64 1
    y        int64 1

In [8]: ds.drop_vars('x')
Out[8]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    y        int64 1

In [9]: del ds['x']  # this also works

In [10]: ds
Out[10]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    y        int64 1

Is there a reason why that approach can't work in your case?

when working with a UI such as a tree model/view to interface a datatree things get annoying. Imagine that your UI tree has items for both the datasets and the vars/coords within the datasets.

Are you wrapping the DataTree class? If so then the code the UI calls can do anything to the tree object right?

from datatree.

marcel-goldschen-ohm avatar marcel-goldschen-ohm commented on August 31, 2024

Thanks for the helpful suggestions @TomNicholas. The rename_vars and drop_vars won't work as they return new Dataset objects. However, I did not realize I could use the dict approach ds['y'] = ds['x'] and del ds['x'], which seems like it should do what I want. Thanks!

I haven't fully thought this through yet, but I question as to whether a dict approach can handle all of the possible operations with Datasets, because it seems like a lot of them return new Dataset objects. e.g., merge, combine, etc. So while your solution for renaming vars should suffice, I'm not fully convinced that it solves the root issue entirely.

I am indeed using a UI tree node class which wraps references to the DataTree. My issue is not that I cannot deal with this at all, it is simply that having to rebuild the entire UI tree (or at least a portion of it) every time an operation changes a DataTree ref in a way that should intuitively not require a UI tree rebuild (rename, align, merge?, etc.) is annoying. However, the dict rename solution will go a long way to mitigating this, so maybe not as painful as I'd first thought. We'll see how it pans out. Thanks again for the suggestion.

from datatree.

marcel-goldschen-ohm avatar marcel-goldschen-ohm commented on August 31, 2024

Oh wow, this is exactly what I wanted. Nice!

Perhaps I just did not read the docs closely enough, but I thought DataTree.ds returned an immutable view, so it was not obvious to me that I could assign to it. This is perfect.

from datatree.

TomNicholas avatar TomNicholas commented on August 31, 2024

Perhaps I just did not read the docs closely enough

It's mentioned in the docs, but not documented super well...

I thought DataTree.ds returned an immutable view, so it was not obvious to me that I could assign to it.

It does return an immutable view, but you can still assign to the .ds property... You need to use DataTree.to_dataset() to get something mutable first. Notice the difference in types:

In [3]: ds = xr.Dataset({'x': 1})

In [4]: dt = DataTree(data=ds)

In [5]: dt
Out[5]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        x        int64 1

In [6]: type(dt.ds)
Out[6]: datatree.datatree.DatasetView

In [7]: type(dt.to_dataset())
Out[7]: xarray.core.dataset.Dataset

In [8]: ds_view = dt.ds

In [9]: ds_copy = dt.to_dataset()

In [10]: ds_copy['y'] = ds['x']  # works fine because the copy is a mutable xr.Dataset

In [11]: ds_view['y'] = ds['x']  # forbidden
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[11], line 1
----> 1 ds_view['y'] = ds['x']

File ~/Documents/Work/Code/datatree/datatree/datatree.py:157, in DatasetView.__setitem__(self, key, val)
    156 def __setitem__(self, key, val) -> None:
--> 157     raise AttributeError(
    158         "Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, "
    159         "or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,"
    160         "use `.copy()` first to get a mutable version of the input dataset."
    161     )

AttributeError: Mutation of the DatasetView is not allowed, please use `.__setitem__` on the wrapping DataTree node, or use `dt.to_dataset()` if you want a mutable dataset. If calling this from within `map_over_subtree`,use `.copy()` first to get a mutable version of the input dataset.

In [12]: dt['y'] = ds['x']  # you can do this though

I went for this design because most of the time the distinction between the copy and the view doesn't matter, and an immutable view is a lot easier to make safe (in the sense of not accidentally creating linked objects with inconsistent state, see #38 (comment) and #99 if you're interested). Your use case happens to be one where the distinction very much matters.

This is perfect.

Great! I will close this now then.

from datatree.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.