Giter Site home page Giter Site logo

Comments (8)

ilan-gold avatar ilan-gold commented on September 28, 2024

@AlejandraRodelaRo I am not so familiar with raw as it somewhat predates my time on the project. I will say, though, that I don't think there is a copy made of the object when you assign it to raw, so the change makes sense (even if the raw name indicates otherwise). And looking at the code, when you assign to raw using an already made AnnData object, there is no copy made (from what I can tell). Whether or not this is a bug or feature is sort of out of my knowledge base.

cc @flying-sheep

from scanpy.

mssher07 avatar mssher07 commented on September 28, 2024

Hi all, @flying-sheep @falexwolf

Wanted to echo Alejandro and highlight this is a critical bug, since nearly every function carries a use_raw flag, and the assumption that .raw contains counts is used explicitly or implicitly in numerous scanpy functions. We just realized that a massive dataset we've been processing for ~6 weeks also has no reads in the .raw despite saving it prior to log1p/normalize functions.

I am not sure it's helpful, but we see this bug in version scanpy 1.9.8, but in an old dataset/environment with scanpy 1.6.0, .raw correctly preserved counts.

from scanpy.

ivirshup avatar ivirshup commented on September 28, 2024

My opinion would be that you need to write adata.raw = adata.copy() if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.

If we don't change it, we could maybe warn if we're mutating adata.X and adata.raw.X also refers to the same thing?

Overall, I would recommend that you use adata.layers["counts"] = adata.X.copy() instead of using .raw at all though.

from scanpy.

mssher07 avatar mssher07 commented on September 28, 2024

My opinion would be that you need to write adata.raw = adata.copy() if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.

That makes python-sense. This is absolutely a change in convention though, see:

  1. The original scanpy tutorial
  2. The scVI tutorial (where they discuss needing to retain counts in raw)
  3. Most notably in the anndata API

In addition, both sc.pl.umap and sc.pl.paga_path() come to mind as functions that default to using the .raw layer

If we don't change it, we could maybe warn if we're mutating adata.X and adata.raw.X also refers to the same thing?

I think that's a good idea. In general, it would be very helpful to preserve in the anndata structure some record of the major transformations to .X (or any layer)

Overall, I would recommend that you use adata.layers["counts"] = adata.X.copy() instead of using .raw at all though.

This seems like good practice and the workaround we'll apply for now.

I do wonder if some change was made after this conversation which you were a part of. Thank you by the way, this package is an amazing tool.

from scanpy.

AlejandraRodelaRo avatar AlejandraRodelaRo commented on September 28, 2024

Hello, thank you for your helpful answers.

Could someone please elaborate on when to use .copy() and when it is not needed?
In the original scanpy tutorial I see it in ocasions but not always when modifying adata.
For example:
image

from scanpy.

flying-sheep avatar flying-sheep commented on September 28, 2024

It’s needed when you modify the AnnData object afterwards.

The above slices it twice, and only then copies it, because slicing isn’t a modification. So what’s happening is:

adata_orig = AnnData(...)

adata_sliced_view = adata_orig[..., :]
assert adata_sliced_view.is_view
adata_sliced_copy = adata_sliced_view[..., :].copy()
assert not adata_sliced_copy.is_view

do_modify(adata_sliced_copy)

The slicing could also have been done in one operation

adata = adata_orig[(adata.obs["n_genes_by_count"] < 2500) & (adata.obs["pct_counts_mt"] < 5), :].copy()

from scanpy.

AlejandraRodelaRo avatar AlejandraRodelaRo commented on September 28, 2024

So, does that mean that every time we apply some kind of filtration (adata = adata[ condition]) we should use .copy()?
For instance, when filtering the highly variable genes (see the image extracted from the scanpy legacy workflow)?
image

from scanpy.

flying-sheep avatar flying-sheep commented on September 28, 2024

Yeah, that way, you’ll free up memory too, as the full dataset is no longer referenced

from scanpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.