Comments (8)
@AlejandraRodelaRo I am not so familiar with raw
as it somewhat predates my time on the project. I will say, though, that I don't think there is a copy made of the object when you assign it to raw, so the change makes sense (even if the raw
name indicates otherwise). And looking at the code, when you assign to raw
using an already made AnnData
object, there is no copy made (from what I can tell). Whether or not this is a bug or feature is sort of out of my knowledge base.
from scanpy.
Hi all, @flying-sheep @falexwolf
Wanted to echo Alejandro and highlight this is a critical bug, since nearly every function carries a use_raw flag, and the assumption that .raw contains counts is used explicitly or implicitly in numerous scanpy functions. We just realized that a massive dataset we've been processing for ~6 weeks also has no reads in the .raw despite saving it prior to log1p/normalize functions.
I am not sure it's helpful, but we see this bug in version scanpy 1.9.8, but in an old dataset/environment with scanpy 1.6.0, .raw correctly preserved counts.
from scanpy.
My opinion would be that you need to write adata.raw = adata.copy()
if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.
If we don't change it, we could maybe warn if we're mutating adata.X
and adata.raw.X
also refers to the same thing?
Overall, I would recommend that you use adata.layers["counts"] = adata.X.copy()
instead of using .raw
at all though.
from scanpy.
My opinion would be that you need to write
adata.raw = adata.copy()
if you want a copy to be made, since almost all assignments do not create a copy of the assigned object in anndata. But we should look into whether this is a change that was made deliberately or not.
That makes python-sense. This is absolutely a change in convention though, see:
- The original scanpy tutorial
- The scVI tutorial (where they discuss needing to retain counts in raw)
- Most notably in the anndata API
In addition, both sc.pl.umap and sc.pl.paga_path() come to mind as functions that default to using the .raw layer
If we don't change it, we could maybe warn if we're mutating
adata.X
andadata.raw.X
also refers to the same thing?
I think that's a good idea. In general, it would be very helpful to preserve in the anndata structure some record of the major transformations to .X (or any layer)
Overall, I would recommend that you use
adata.layers["counts"] = adata.X.copy()
instead of using.raw
at all though.
This seems like good practice and the workaround we'll apply for now.
I do wonder if some change was made after this conversation which you were a part of. Thank you by the way, this package is an amazing tool.
from scanpy.
Hello, thank you for your helpful answers.
Could someone please elaborate on when to use .copy() and when it is not needed?
In the original scanpy tutorial I see it in ocasions but not always when modifying adata.
For example:
from scanpy.
It’s needed when you modify the AnnData
object afterwards.
The above slices it twice, and only then copies it, because slicing isn’t a modification. So what’s happening is:
adata_orig = AnnData(...)
adata_sliced_view = adata_orig[..., :]
assert adata_sliced_view.is_view
adata_sliced_copy = adata_sliced_view[..., :].copy()
assert not adata_sliced_copy.is_view
do_modify(adata_sliced_copy)
The slicing could also have been done in one operation
adata = adata_orig[(adata.obs["n_genes_by_count"] < 2500) & (adata.obs["pct_counts_mt"] < 5), :].copy()
from scanpy.
So, does that mean that every time we apply some kind of filtration (adata = adata[ condition]) we should use .copy()?
For instance, when filtering the highly variable genes (see the image extracted from the scanpy legacy workflow)?
from scanpy.
Yeah, that way, you’ll free up memory too, as the full dataset is no longer referenced
from scanpy.
Related Issues (20)
- A umap warning HOT 1
- AttributeError (sc.pl.stacked_violin) HOT 1
- A neighbors error HOT 1
- sc.pp.neighbors error: api_export.__init__() got an unexpected keyword argument 'metaclass'
- Running sc.pp.highly_variable_genes(adata, n_top_genes=5000, flavor='seurat_v3') produces scikit-misc error; package not installable with either pip or conda HOT 1
- Change the size of gene symbol labels on x axis
- Incorrect column label and color assign when using 'rank_genes_groups_stacked_violin' with 'swap_axes=True'
- `hvg` selects more genes than asked for HOT 5
- scanpy.pl.highest_expr_genes boxplots contain extra gene rows
- Enhance scanpy.tl.rank_gene_groups with additional filters (min_pct, etc.,)
- ingest confidence thresholding HOT 2
- score_genes doesn’t produce the expected number of bins of equal or approximately equal size
- score_genes fails completely when the gene set has zero expression in some cells HOT 2
- Don't prepend `"X_"`
- the colors saved in the .uns are not used when pl embeddings
- sc.tl.dendrogram no longer(?) works in backed mode HOT 1
- `sc.pl.dotplot` does not show the dendrogram HOT 2
- Depend on session-info2 HOT 1
- Trajectory inference tutorial: error when running a code HOT 1
- Rank_genes_groups -> does not return Axes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scanpy.