Please make sure these conditions are met <li cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi all, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Adata.raw gets modified upon log normalization of adata about scanpy HOT 8 OPEN

AlejandraRodelaRo commented on September 28, 2024

Adata.raw gets modified upon log normalization of adata

from scanpy.

Comments (8)

ilan-gold commented on September 28, 2024

@AlejandraRodelaRo I am not so familiar with raw as it somewhat predates my time on the project. I will say, though, that I don't think there is a copy made of the object when you assign it to raw, so the change makes sense (even if the raw name indicates otherwise). And looking at the code, when you assign to raw using an already made AnnData object, there is no copy made (from what I can tell). Whether or not this is a bug or feature is sort of out of my knowledge base.

cc @flying-sheep

from scanpy.

mssher07 commented on September 28, 2024

Hi all, @flying-sheep @falexwolf

Wanted to echo Alejandro and highlight this is a critical bug, since nearly every function carries a use_raw flag, and the assumption that .raw contains counts is used explicitly or implicitly in numerous scanpy functions. We just realized that a massive dataset we've been processing for ~6 weeks also has no reads in the .raw despite saving it prior to log1p/normalize functions.

I am not sure it's helpful, but we see this bug in version scanpy 1.9.8, but in an old dataset/environment with scanpy 1.6.0, .raw correctly preserved counts.

from scanpy.

ivirshup commented on September 28, 2024

My opinion would be that you need to write adata.raw = adata.copy() if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.

If we don't change it, we could maybe warn if we're mutating adata.X and adata.raw.X also refers to the same thing?

Overall, I would recommend that you use adata.layers["counts"] = adata.X.copy() instead of using .raw at all though.

from scanpy.

mssher07 commented on September 28, 2024

My opinion would be that you need to write adata.raw = adata.copy() if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.

That makes python-sense. This is absolutely a change in convention though, see:

The original scanpy tutorial
The scVI tutorial (where they discuss needing to retain counts in raw)
Most notably in the anndata API

In addition, both sc.pl.umap and sc.pl.paga_path() come to mind as functions that default to using the .raw layer

If we don't change it, we could maybe warn if we're mutating adata.X and adata.raw.X also refers to the same thing?

I think that's a good idea. In general, it would be very helpful to preserve in the anndata structure some record of the major transformations to .X (or any layer)

Overall, I would recommend that you use adata.layers["counts"] = adata.X.copy() instead of using .raw at all though.

This seems like good practice and the workaround we'll apply for now.

I do wonder if some change was made after this conversation which you were a part of. Thank you by the way, this package is an amazing tool.

from scanpy.

AlejandraRodelaRo commented on September 28, 2024

Hello, thank you for your helpful answers.

Could someone please elaborate on when to use .copy() and when it is not needed?
In the original scanpy tutorial I see it in ocasions but not always when modifying adata.
For example:

from scanpy.

flying-sheep commented on September 28, 2024

It’s needed when you modify the AnnData object afterwards.

The above slices it twice, and only then copies it, because slicing isn’t a modification. So what’s happening is:

adata_orig = AnnData(...)

adata_sliced_view = adata_orig[..., :]
assert adata_sliced_view.is_view
adata_sliced_copy = adata_sliced_view[..., :].copy()
assert not adata_sliced_copy.is_view

do_modify(adata_sliced_copy)

The slicing could also have been done in one operation

adata = adata_orig[(adata.obs["n_genes_by_count"] < 2500) & (adata.obs["pct_counts_mt"] < 5), :].copy()

from scanpy.

AlejandraRodelaRo commented on September 28, 2024

So, does that mean that every time we apply some kind of filtration (adata = adata[ condition]) we should use .copy()?
For instance, when filtering the highly variable genes (see the image extracted from the scanpy legacy workflow)?

from scanpy.

flying-sheep commented on September 28, 2024

Yeah, that way, you’ll free up memory too, as the full dataset is no longer referenced

from scanpy.

Adata.raw gets modified upon log normalization of adata about scanpy HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent