Comments (3)
I benchmarked 4 use cases where I use baseband.open
as a context manager and do something inexpensive like get fh.shape
:
import baseband
# Opening a single DADA file (no format given)
def dada_no_format():
with baseband.open('sample.dada') as fh:
z = fh.shape
return z
# Opening a single DADA file (format given)
def dada_with_format():
with baseband.open('sample.dada', format='dada') as fh:
z = fh.shape
return z
# Opening sequential GUPPI files (to test SequentialFileReader)
def guppi_multifile_no_format():
fs = ['fake.0.raw', 'fake.1.raw', 'fake.2.raw', 'fake.3.raw']
with baseband.open(fs) as fh:
z = fh.shape
return z
# Opening sequential GUPPI files (to test SequentialFileReader) with format given
def guppi_multifile_with_format():
fs = ['fake.0.raw', 'fake.1.raw', 'fake.2.raw', 'fake.3.raw']
with baseband.open(fs, format='guppi') as fh:
z = fh.shape
return z
The results for these are:
In [2]: %timeit dada_no_format()
11.4 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit dada_with_format()
3.28 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [4]: %timeit guppi_multifile_no_format()
27.8 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [5]: %timeit guppi_multifile_with_format()
7.63 ms ± 592 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Giving a format
leads to a significant speed up. Beyond that, you would have to create a reader over 3000 times to make it worth me typing this sentence.
I'd say readers are quick enough for now until someone comes up for a compelling use case for why they need to open a new reader object every 10 milliseconds.
PS: Perhaps there's a particular file format that's slower to open than GUPPI/DADA? Also, all files I used are small test files - I don't think the size of the files should have any bearing on the efficiency of creating a reader instance (but I could be wrong).
from baseband.
@theXYZT - OK, I think this is a bit of a non-issue, then (I think VDIF would be slower as it needs to read several headers, and Mark 4 as it needs to find the start, but unlikely to be so much slower that it actually matters).
p.s. It is expected that it is faster if you given format
, since then it doesn't have to try all formats to see which one works. There is obvious room for improvement in how it selects formats that might work...
from baseband.
Conclusion: not really important, so closing for now.
from baseband.
Related Issues (20)
- Creating a new StreamReader object from an existing one. HOT 4
- Baseband doesn't properly support file paths in all cases HOT 5
- `baseband.open()` does not support keyword arguments it should HOT 1
- Inconsistent sample_shape between VDIF StreamReader and header? HOT 3
- Guaranteed Attributes/Methods for headers? HOT 3
- Replace entrypoints by importlib.metadata HOT 1
- Dtype in Info object (and other useful attributes to have?) HOT 1
- Support psrfits baseband format
- Problem reading VDIF files with frame rate of 1 Hz
- Add CHIME HDF5 Gated format HOT 1
- Error raised when one seeks to bad spot in file and then reads
- Deal with files with large numbers of invalid frames HOT 1
- Write gapped vdif problem HOT 4
- Idea: allow time jumps in writing streams?
- Maybe support other data types?
- Task modules cannot be imported
- Some DADA headers are missing END
- Support VSSP format?
- ANN: Please migrate away from pytest-openfiles HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from baseband.