Comments (3)
Potentially through group metadata. But these are just HDF5 files which can be manipulated separately. The to_hdf5
and from_hdf5
don't even operate on files directly, but instead take Group
h5py objects.
from biom-format.
Thank you for the swift reply, btw. It sort of works, but I have two more issues:
-
It wasn't clear to me, but the data type seems to be restricted? I can store the following without problems:
observation_group_metadata={"ranks": ("csv", "Root;Kingdom;Phylum;Clade;Order;Family")},
However, originally, I tried to just use a list of strings of the ranks and that failed when writing to HDF5.
observation_group_metadata={"ranks": ("list", ["Root", "Kingdom", "Phylum", "Clade", "Order", "Family"])},
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[13], line 2 1 with biom_open("/tmp/test.biom", permission="w") as handle: ----> 2 tbl.to_hdf5(handle, generated_by="me") File ~/.pyenv/versions/taxpasta/lib/python3.11/site-packages/biom/table.py:4618, in Table.to_hdf5(self, h5grp, generated_by, compress, format_fs, creation_date) 4616 for key, value in group_md.items(): 4617 datatype, val = value -> 4618 grp_dataset = grp.create_dataset( 4619 'group-metadata/%s' % key, 4620 shape=(1,), dtype=H5PY_VLEN_STR, 4621 data=val, compression=compression) 4622 grp_dataset.attrs['data_type'] = datatype 4624 grp.create_group('matrix') File ~/.pyenv/versions/taxpasta/lib/python3.11/site-packages/h5py/_hl/group.py:183, in Group.create_dataset(self, name, shape, dtype, data, **kwds) 180 parent_path, name = name.rsplit(b'/', 1) 181 group = self.require_group(parent_path) --> 183 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) 184 dset = dataset.Dataset(dsid) 185 return dset File ~/.pyenv/versions/taxpasta/lib/python3.11/site-packages/h5py/_hl/dataset.py:60, in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, dapl, efile_prefix, virtual_prefix, allow_unknown_filter, rdcc_nslots, rdcc_nbytes, rdcc_w0) 58 shape = (shape,) if isinstance(shape, int) else tuple(shape) 59 if data is not None and (numpy.product(shape, dtype=numpy.ulonglong) != numpy.product(data.shape, dtype=numpy.ulonglong)): ---> 60 raise ValueError("Shape tuple is incompatible with data") 62 if isinstance(maxshape, int): 63 maxshape = (maxshape,) ValueError: Shape tuple is incompatible with data
-
When trying to read the BIOM file from R instead of Python, there seems to be
no concept of group metadata at all. I'm trying to establish the BIOM format as my primary mode of moving data from Python to R, so this is a big problem for me.
from biom-format.
Hi @Midnighter,
-
it might be, to be honest we added support in expectation of use but I'm not aware of much use in practice for the group metadata. As such it is possible those components are not as well sorted out as they should be. We would certainly welcome contributions to the project to improve their use
-
Would it be possible to open up an issue with the relevant R project and cc me?
from biom-format.
Related Issues (20)
- Add wheel testing
- Expand partiton API HOT 2
- Trouble with installation using pip HOT 8
- Issue adding biom-format support for aarch64 conda HOT 9
- update_ids strict=False is not necessarily working as expected HOT 1
- `add-metadata` is not representing floats correctly HOT 1
- `add-metadata` is representing Taxonomy as np array of byte strings HOT 1
- `add-metadata` help messages need clarity
- Possible to represent relative abundance of taxonomy rather than OTUs? HOT 4
- align_tree cannot handle features with 0 counts HOT 3
- Edge case for biom.Table subsample with replacement HOT 7
- DOC: changelog
- NumPy 2.0 compatibility issue HOT 2
- NumPy compatibility issue (further) HOT 12
- update_ids can silently truncate IDs
- Expose the _subsample method
- TypeError: the JSON object must be str, bytes or bytearray, not File HOT 9
- Wheels are not being built properly HOT 10
- From dataframe
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biom-format.