mhvk / baseband Goto Github PK

View Code? Open in Web Editor NEW

27.0 4.0 27.0 5.04 MB

Package to read and write radio baseband data

Home Page: http://baseband.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%

python astronomy vlbi baseband-recording

baseband's Issues

Mark5B reader invalid BCD encoded value?

Hi,
I noticed that in the Crab data (Onsala Mark5B data) at certain times (not necessarily at midnight), the reader throws an error.

Here's some iPython output using a file, "ek036a_o8_no0002.m5a" which spans from 2015-10-18T23:11:06.000000000 to 2015-10-18T23:23:25.000078125 in ISO time to illustrate that:

In[5]: in_file = 'telescopes/o8/ek036a_o8_no0002.m5a'

In [6]: sample_rate = 32 * u.MHz

In [7]: thread_ids = [8, 12, 0, 4, 9, 13, 1, 5, 10, 14, 2, 6, 11, 15, 3, 7]

In [8]: Nsamples = 2**25

In [9]: t_before = Time('2015-10-18T23:15:38.000000000', format='isot', precision = 9)

In [10]: t_after = Time('2015-10-18T23:15:40.000000000', format='isot', precision = 9)

In [11]: t_error = Time('2015-10-18T23:15:39.000000000', format='isot', precision = 9)

In [12]: fh = mark5b.open(in_file, mode='rs', nchan=16, sample_rate=sample_rate,
   ....:                  thread_ids=thread_ids, ref_mjd=57000)

In [13]: fh.seek(t_before)
Out[13]: 8704000000

In [14]: z = fh.read(Nsamples).astype(np.float32)

 fh.seek(t_after)
Out[15]: 8768000000

In [16]: z = fh.read(Nsamples).astype(np.float32)

In [17]: fh.seek(t_error)
Out[17]: 8736000000

In [18]: z = fh.read(Nsamples).astype(np.float32)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-ed0974a3def5> in <module>()
----> 1 z = fh.read(Nsamples).astype(np.float32)

/mnt/raid-cita/rlin/packages/baseband/baseband/mark5b/base.pyc in read(self, count, fill_value, squeeze, out)
    232                 # Read relevant frame, reusing data array from previous frame.
    233                 self._read_frame()
--> 234                 dt_expected = (self._frame.seconds - self.header0.seconds +
    235                                86400 * (self._frame.jday - self.header0.jday))
    236                 assert dt == dt_expected

/mnt/raid-cita/rlin/packages/baseband/baseband/vlbi_base/frame.pyc in __getattr__(self, attr)
    177         except AttributeError:
    178             if attr in self.header._properties:
--> 179                 return getattr(self.header, attr)
    180             else:
    181                 raise

/mnt/raid-cita/rlin/packages/baseband/baseband/mark5b/header.pyc in seconds(self)
    171     def seconds(self):
    172         """Integer seconds on day (decoded from 'bcd_seconds')."""
--> 173         return bcd_decode(self['bcd_seconds'])
    174
    175     @seconds.setter

/mnt/raid-cita/rlin/packages/baseband/baseband/vlbi_base/utils.pyc in bcd_decode(value)
     11         if not isinstance(value, np.ndarray):
     12             raise ValueError("Invalid BCD encoded value {0}={1}."
---> 13                              .format(value, hex(value)))
     14     except TypeError:  # Might still be an array (newer python versions)
     15         if not isinstance(value, np.ndarray):

ValueError: Invalid BCD encoded value 571193=0x8b739.

As you can see, the data is readable except at particular times.

VDIF frameset header consistency checker

Currently only the first header of each VDIF frameset is used by stream readers, though the others are read into memory. It might be useful to check that all headers of the frameset are consistent (ie. only differ by thread_id) and throw a warning if it is not the case.

GSB header reader can seek by wrong amount

GSB header reading seeks based on the number of characters in the first line of the .timestamp file, and will seek to the wrong point when the line lengths change. This can regularly happen in GMRT headers, as the second last value increases incrementally - I noticed this as an assertion error when it changed from 9999 to 10000, increasing the line length by one character. I've adjusted the whitespace padding in the timestamp file to create uniform line lengths as a temporary workaround.

Consistent docstring & documentation terminology

Documentation for parameters shared between formats should have consistent (identical, if appropriate) descriptions, and should use terminology that's consistent with each other and with the documentation. Proposed common terms and descriptions are (currently an incomplete list):

Terms (maybe we need a glossary?):

channel: may mean frequency sub-band channels or Fourier channels, depending on the data set.
frame: a block of data, or payload, accompanied by a header.
header: metadata accompanying a data frame.
payload: the data within a data frame.
sample: an individual data point.
complete sample: samples from all threads, polarization, channels, etc. for one point in time.
sample rate: rate of complete samples.
component: individual threads, polarizations or channels of the complete sample.
elementary sample (maybe change to atomic sample?): the smallest possible subdivision of a complete sample, i.e. the real/imaginary part of one component of a complete sample.
elementary sample rate: rate of elementary samples, typically used only when describing the rate at which data is being read from or written to disk, or from the internet.
stream: timeseries of complete samples; may refer to all of, or a subsection of, the dataset.
subset: a subset of a complete sample, in particular one defined by the user for selective decoding.

open parameters when reading (each format may have specific defaults or features that are additionally described):

    fh_raw : 
        File handle of the raw <FORMAT> stream
    sample_rate : `~astropy.units.Quantity`
        Number of complete samples per second (ie. the sampling rate of each <COMPONENTS>).
    ref_time : `~astropy.time.Time`, or None, optional
        Reference time within <TIME FRAME> of the start time of the observations.
    samples_per_frame : int
        Number of complete samples per frame.
    nthread : int
        Number of threads in a complete sample.
    nchan : int
        Number of channels in a complete sample.
    complex_data : bool
        Whether the data is complex.
    bps : int
        Bits per elementary sample (e.g., the real or imaginary part of one
        component of the complete sample).
    subset: indexing object, or tuple of objects
        Specific components of the complete sample to decode.  <SOME FORMAT SPECIFIC INSTRUCTIONS.>
    squeeze : bool
        If `True` (default), remove any dimensions of length unity from
        decoded data.

open parameters when writing:

    raw :
        Which will write filled sets of frames to storage.
    header : :class:`~baseband.vdif.VDIFHeader`
        Header for the first frame, holding time information, etc.
    squeeze : bool
        If `True` (default), ``write`` accepts squeezed arrays as input,
        and adds channel and thread dimensions if they have length unity.
    **kwargs
        If no header is given, an attempt is made to construct the header from
        these.  For a standard header, the following suffices.

Descriptive "whence" values for `.seek()`

.seek() takes in an argument whence which can be either 0, 1 or 2 to mean "from start", "from current" and "from end", respectively.

Can we change this to a string-based argument to be more obviously descriptive in the code and in general, good code design.

So, whence='start', whence='current', or whence='end'.
That would be much more readable.

Mark 5b time setting does not also set frame_nr

(Might affect mark4 as well)

from astropy.time import Time
from baseband.mark5b import mark5b
mark5b.Mark5BHeader.fromvalues(time=Time('2014-06-13T05:30:01'), frame_nr=89)
<Mark5BHeader sync_pattern: 0xabaddeed,
              user: 0,
              internal_tvg: False,
              frame_nr: 89,
              bcd_jday: 0x821,
              bcd_seconds: 0x19801,
              bcd_fraction: 0x0,
              crc: 0x975d>

Here, one has an inconsistent header, since at an integer second, frame_nr should be 0.

GSB stream reader's `fill_value` doesn't do anything

Following on #101, fill_value in GSB doesn't do anything. This makes sense since GSB has no built-in way to flag for invalid data. I'm not sure why the option even exists, though - should we just get rid of it?

`fill_value` doesn't do anything for the first frame(set) using any stream reader

Following on #101: because fill_value is passed to a frame(set) reader, which sets invalid values with fill_value, and because the first frame is read upon initialization, fill_value in the stream readers' read method doesn't do anything until a new frame is read in. Likewise if the user switches fill_value while reading, and the stream reader doesn't need to read in a new frame, the fill_value remains whatever the user previously set.

Solutions:

Allow the user to pass fill_value upon stream reader initialization. This doesn't solve the issue when switching fill_value in the middle of a frame, and feels clunky.
Within read, check if fill_value == self._frame(set).invalid_data_value, and reread the frame(set) if it isn't. This makes the read method more complicated, but is invisible to the user and shouldn't slow down read too much (since, if the user sticks to a convention, we've only added an if statement check).

Not a high priority issue - no one complained that fill_value didn't do anything in Mark4 and 5B, so it's likely not a feature people need.

Cannot read VDIF streams with non-monotonically increasing frame number

@ishengyang, @pharaofranz: taking this out of e-mails to github so we don't forget. Not completely trivial so when this is addressed depends on how urgent it is. Note that I did merge #12, so the legacy headers now do get recognized properly in master.

A Mark 5B file converted to vdif by using jive5ab uses a somewhat peculiar ordering of threads and frames:

thread frame

0      0
0      1
1      0
1      1
2      0
2      1
3      0
3      1
4      0
4      1
5      0
5      1
6      0
6      1
7      0
7      1
0      2
0      3

It would be nice to ensure that the stream reader can read this, perhaps by explicitly telling it that there are 8 threads. A possible issue is that with the above ordering, one cannot seek a particular frame number into the raw data file, so this may involve more generally addressing that the data file can have gaps or inconsistent ordering.

Tab-complete prints out deprecation warnings due to `time0` and `time1`

Just an annoyance (and not sure if it's preventable) - using tab-complete on a stream reader object raises Astropy deprecation warnings for time0 and time, since they're still in the attributes dictionary.

Unify way of passing time rate information

Right now we have samplerate, frames_per_second, samples_per_frame - need at least 2 and some headers give information on one or two. Do we allow all three for the user? Is there a priority.

What do formats need (for a normal, long file):

GSB - samples_per_frame.
Mark 4 - nothing (2 frames is enough)
Mark 5B - nothing (full second is enough)
VDIF - nothing (full second enough for all, EDV=3 never a problem)
DADA - nothing (always OK)

Decision: remove frames_per_second from arguments, but allow passing in sample_rate for all, as well as samples_per_frame for GSB.

Mark5b within VDIF not currently tested - should be noted in documentation

Just a reminder for myself.

Sequentialfile should be properly documented

Right now, it has a good docstring in open, but the module itself could do with some examples (including a title, which would presumably become visible for helpers).

Final bits on subset combined with squeeze

From #118 (review):
One more general question: we could insert singleton integers for dimensions that get squeezed in subset, thus avoiding the need for squeezing the payload. (This means subset would no longer be equal to the input for any value of squeeze, but since it is now edited for squeeze=False, perhaps that's OK after all...) One advantage of this is that the __repr__ will then correctly give the subset that would reproduce the input (currently, squeeze is not shown in the repr).

mark5b stream reader cannot detect sample rate

Opening a large (9min) mark5b file with correct channels and reference time (e.g. mark5b.open(fn, 'rs', nchan=8, ref_time=ref_time) ), fails with message " KeyError: ('Mark5BHeader header does not contain seconds', 'the sample rate could not be auto-detected. This can happen if the file is too short to determine the sample rate, or because it is corrupted. Try passing in an explicit sample_rate.') "

The stream reader still works when explicitly giving the sample_rate, but the file is uncorrupted and long, and should be able to infer the sample_rate.

Generalize reading subsets

Right now we have thread_ids, which is logical for VDIF only. Would like to be able to select both on "thread" or Fourier channel, or really anything else. Should have a new subset keyword which is just a tuple of appropriate slices or indices.

EDIT: apply subset first, then squeeze. Logic is: get payload sample_shape, create empty array with that shape, apply subset, then remove unity channels if squeeze.

Methods for copying frames?

While there's a native copy function for headers, there doesn't seem to be one for payloads and frames. It's generally not a great idea to copy entire frames, but Dana just wanted to do it to create dummy frames with the same size and shape as the ones she's analyzing. Maybe worthwhile to consider.

Create an inspection function to determine stream properties

It would be useful to have a function that can inspect a stream, tell which type it is, what its basic properties are, and whether it seems basically correct (e.g., do tests on EDV3 that the sample rate makes sense).

Obviously, this would be a requirement for eventually having a baseband.open()

Decide on order of arguments for opening files

Ideally, order is by relevance. For writing, the overall logic seems obvious:

header0 - ideally this has everything one needs;
Keywords not normally inferable from a header in a given format;
Keywords that define a full header.

For reading, one would only need (2), so the real question is about order of those. Going by what is essential in order of how the data is presented makes some sense, ie.,

sample_rate
organization over/in payload: samples_per_frame, sample_shape (or nchan, whatever defines it)
sample definition: bps, complex_data

Anyway, for discussion. cc @cczhu.

Consider including warnings for VDIF non-compliance such as inconsistency between headers

VDIF EDV 1, 3 `sample_rate` is not well defined

EDV 1 and 3 have a "sample rate" in their headers in either ksps or Msps. Our sample file (EVN/VLBA PSR B1957+20 observation, EDV 3) sets sample rate to the bandwidth (16 MHz) rather than the sampling rate (32 MHz, since it's real samples). We therefore aren't certain how to treat "sample rate" (if we should use it at all).

Also, EDV 3 only allows for 2 bps real samples (EDV 1 is unclear about this), but our sample rate getter assumes complex data is possible, and is sampled at half the rate real data is for the same bandwidth.

Add GSB rawdump file + header

Currently, the tests all use made-up data - would be good to ensure it works on an actual file!

Change VDIF frameset output array ordering?

Currently, VDIFFrameSet.data has order threads, samples, channels - it might make more sense to make samples the first dimension, like everything else. That way the main code in the streamreaders can be more similar to that of other formats, and one avoids having to do a transpose there before any slicing (thus enforcing decoding of all frames in the set even if only parts are needed).

(Inspired by #108 and #118)

Implement Guppi reader and use it for a new-reader tutorial

Quite important to have, as it will guide people in implementing new formats themselves.

`fill_value` in Mark 4, 5B stream reader `read` method doesn't do anything

I figure this is a result of a combination of function standardization and Mark 4 and 5B not having an invalid data flag, but don't like the fact the user can set an argument that doesn't actually do anything. Should we at least mention in the docstring this currently doesn't function?

Mark5B reader gets confused at midnight?

Hi,
I noticed this bug in the Crab data (Effelsberg Mark5B data). If an observation spans across two MJD dates (that is, when midnight is in the middle of the observation), the reader throws an error.

Here's some iPython output using a file, "ek036a_ef_no0006.m5a" which spans from 2015-10-18T23:51:35.000000000 to 2015-10-19T00:03:55.000078125 in ISO time.

The file is inexplicably unreadable past midnight. Here's some iPython output to illustrate that:

In [34]: (fh.time0 + 504*u.s).isot
Out[34]: '2015-10-18T23:59:59.000000000'

In [35]: fh.seek(fh.time0 + 504*u.s)
Out[35]: 16128000000

In [36]: z = fh.read(32000000)

In [37]: fh.seek(fh.time0 + 504*u.s)
Out[37]: 16128000000

In [38]: z = fh.read(32000001)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-38-c3cf624c62ff> in <module>()
----> 1 z = fh.read(32000001)

/mnt/raid-cita/rlin/packages/baseband/baseband/mark5b/base.pyc in read(self, count, fill_value, squeeze, out)
    229                 # Read relevant frame, reusing data array from previous frame.
    230                 self._read_frame()
--> 231                 assert dt == (self._frame.seconds - self.header0.seconds)
    232                 assert frame_nr == self._frame['frame_nr']
    233

AssertionError:

In [39]: fh.time1.isot
Out[39]: '2015-10-19T00:03:55.000078125'

As you can see, in the test I seek to 1 second before midnight and successfully read exactly 1 second of data (32 million samples). However, reading 32000000+1 samples leads to an error, because the file is somehow unreadable beyond midnight. The same is true for seeking well beyond midnight and attempting to read any number of samples.

The file is perfectly readable for all samples before midnight.

Bug in writing incomplete Mark4 frame to file

Writing the sample Mark4 data to a new file, but not filling a frame leads to a ValueError. For example, this code snippet:

import baseband.vdif as vdif
import baseband.mark4 as mark4
from baseband.data import SAMPLE_MARK4
fr = mark4.open(SAMPLE_MARK4, 'rs', ntrack=64, decade=2010,
                frames_per_second=400)
record = fr.read(642)

with mark4.open("./my_partial_data.m4", 'ws', sample_rate=32*u.MHz,
                time=time0, ntrack=64, bps=2, fanout=4) as fw:
    # write in bits and pieces and with some invalid data as well.
    fw.write(record[:10])
    fw.write(record[10])
    fw.fh_raw.flush()

Leads to

ValueError: could not broadcast input array from shape (79989,8,8) into shape (79989,8)

following the warning that it will pad additional values to fill the frame.

The root cause is line 255 in vlbi_base/base.py, which assumes that channels are an separate dimension from thread (like for VDIF, where channel is "channels per frame"). Mark4StreamReader, though, inherits the line self.nthread = nchan if thread_ids is None else len(thread_ids).

ValueError when reading a Mark5B data

Hi @cczhu @mhvk :
When I using the following code to reading a Mark5B data, I encounter some problem.

CODE is

`from baseband.baseband import mark5b
from astropy.time import Time

def get_data_time(file_name, ref_day):
with open(file_name, 'rb') as fh:
header = mark5b.Mark5BHeader.fromfile(fh, ref_mjd=Time(ref_day).mjd)
return header

if name == 'main':
get_data_time('test.m5b', '2010-01-01')`

The OUTPUT is

`Traceback (most recent call last):
File "/private/tmp/test/src/baseband/baseband/vlbi_base/utils.py", line 9, in bcd_decode
return int('{:x}'.format(value))
ValueError: invalid literal for int() with base 10: 'cc0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "readmark5b.py", line 10, in
get_data_time('test.m5b', '2010-01-01')
File "readmark5b.py", line 6, in get_data_time
header = mark5b.Mark5BHeader.fromfile(fh, ref_mjd=Time(ref_day).mjd)
File "/private/tmp/test/src/baseband/baseband/vlbi_base/header.py", line 353, in fromfile
return cls(cls._struct.unpack(s), *args, **kwargs)
File "/private/tmp/test/src/baseband/baseband/mark5b/header.py", line 89, in init
self._header_parser.parsers'bcd_jday')
File "/private/tmp/test/src/baseband/baseband/vlbi_base/utils.py", line 13, in bcd_decode
.format(value, hex(value)))
ValueError: Invalid BCD encoded value 3264=0xcc0.`

Do you have any ideas?

Mark4 Reader does not operate correctly on BlueGene/Q

Running baseband.test() on BGQ generates a test output that I have pasted here:
https://pastebin.com/raw/zeTMJaqP

It seems like the mark4 reader doesn't operate correctly in big-endian environments.

Reminder: more efficient Mark4 decoder

Reproducing Marten's comment in #104 regarding baseband/mark4/base.py, since we won't be implementing it in #104 itself:

We can make the decoding somewhat less inefficient by using the fact that one can slice the frame, in which case only part will be decoded; so remove this one and rewrite as:

nsample = ...
selection = slice(sample_offset, sample_offset + nsample)
if self.thread_ids:
    selection = (selection, self.thread_ids)
sample = self.offset - offset0
result[sample:sample + nsample] = self._frame[selection]

Mark 4 fan-out ratio definition inverse of what Baseband uses

In the Mark 4 specifications (Sec. 2.2.1.1), the "fan out ratio" is defined as the bitstreams / track, while Baseband appears to be using the inverse definition (if bps and number of channels is fixed, Baseband takes a larger fan-out number to mean more tracks).

The inverse definition may be inherited from Walter Brisken's mark5access.

The easiest solution would be to replace "fanout" with "fanin", but this looks uglier and may break legacy codes. An alternative would be to clearly define what we mean by fanout, but I disagree with violating the data specification - it would lead to confusion from users.

Enable ipython completion of header keys?

Might be nice for headers at least. See astropy/astropy#7071 for implementation in astropy. No hurry, though. cc @cczhu.

Require newer version of pytest

Testing also gives:

None
  [pytest] section in setup.cfg files is deprecated, use [tool:pytest] instead.

I think this is for pytest < 3. Maybe we can require pytest >=3?
(Milestone is for deciding about this)

Error & warning messages should start with lowercase

Since errors and warnings start with TypeError: or UserWarning:, the trailing text message should start in lower-case. For release 1.0 we should check that this is the case in Baseband.

Can't subset Mark 4 frame object

Mark 4's frame class currently has a __getitem__ that explicitly prevents slicing. No idea why, and the error messages and comments suggests it's a placeholder.

Allow users to turn verification off?

During sequential decode, approximately 10% of the time is taken up by verifying header and payload integrity. While this is essential for selective or initial sequential decode, if the user is re-analyzing data they know to be error-free, it would be useful to pass verify=False to stream readers for the performance boost.

Reserve size for total number of samples

I was thinking that perhaps frame, payload should be even closer to arrays, by having the appropriate __len__ and size. But we use size for number of bytes as well. Arguably, that is much less useful to the typical user, so one solution would be to rename that. E.g., encodedsize (the .payloadsize and .framesize of a header could stay that way).

User is allowed to seek time outside the file's bounds without warning

The .seek() function in VLBIStreamReaderBase does not check to see if the offset provided is valid.

It would be helpful if the seek function threw an error if the user attempts to seek an offset out of the bounds of the data file.

Factor out `read` and `write` to `StreamReaderBase`

Currently, the code of read and write is nearly identical for the different formats, which suggests the method can be moved up, as long as one keeps any format-specific frame checking in the file formats (likely, in _read_frame).

Mark5B find_header is very unelegant and slow

It works now by trying to read header at every position; it would make much more sense to start by finding sync_pattern.

Runtimewarning in docs

Testing currently gives:

docs/baseband/tutorials/getting_started.rst
  /usr/lib/python3/dist-packages/astropy/units/quantity.py:639: RuntimeWarning: invalid value encountered in true_divide
    result = super().__array_ufunc__(function, method, *arrays, **kwargs)

-- Docs: http://doc.pytest.org/en/latest/warnings.html

It would be good to see if we can get rid of it (not sure where it happens)

`dada.open` doesn't accept a `sequentialfile` object

As noted...somewhere in one of the PRs... dada.open throws an error if the user passes a sequentialfile object, which is a bug.

Baseband fails if DADA header payload size doesn't match actual file payload size

This happens often for the last file of an observation run. I am working on this issue.

Just to practice documenting, here's the traceback on a test DADA file where this happens:

In [61]: b = dada.open('b.dada', 'rs', squeeze=False)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-61-737657ac9e72> in <module>()
----> 1 b = dada.open('b.dada', 'rs', squeeze=False)

~/src/baseband/baseband/dada/base.py in open(name, mode, subset, header, **kwargs)
    482             kwargs['subset'] = subset
    483
--> 484     return opener(name, mode, **kwargs)
    485
    486

~/src/baseband/baseband/vlbi_base/base.py in open(name, mode, **kwargs)
    521                 except Exception:  # pragma: no cover
    522                     pass
--> 523             raise exc
    524
    525     open.__doc__ = (default_open_doc.replace('baseband', fmt) + doc

~/src/baseband/baseband/vlbi_base/base.py in open(name, mode, **kwargs)
    514                              .format(fmt))
    515         try:
--> 516             return classes[cls_type](name, **kwargs)
    517         except Exception as exc:
    518             if not got_fh:

~/src/baseband/baseband/dada/base.py in __init__(self, fh_raw, subset, squeeze)
    236         header = DADAHeader.fromfile(fh_raw)
    237         super(DADAStreamReader, self).__init__(fh_raw, header, subset, squeeze)
--> 238         self._get_frame(0)
    239
    240     @lazyproperty

~/src/baseband/baseband/dada/base.py in _get_frame(self, frame_nr)
    301     def _get_frame(self, frame_nr):
    302         self.fh_raw.seek(frame_nr * self.header0.framesize)
--> 303         self._frame = self.read_frame(memmap=True)
    304         self._frame_nr = frame_nr
    305         assert (self._frame.header['OBS_OFFSET'] ==

~/src/baseband/baseband/dada/base.py in read_frame(self, memmap)
    126             parts of interest by slicing the frame.
    127         """
--> 128         return DADAFrame.fromfile(self.fh_raw, memmap=memmap)
    129
    130

~/src/baseband/baseband/dada/frame.py in fromfile(cls, fh, memmap, valid, verify)
     74         """
     75         header = cls._header_class.fromfile(fh, verify)
---> 76         payload = cls._payload_class.fromfile(fh, header=header, memmap=memmap)
     77         return cls(header, payload, valid=valid, verify=verify)
     78

~/src/baseband/baseband/dada/payload.py in fromfile(cls, fh, header, memmap, payloadsize, **kwargs)
     94             words = np.memmap(fh, mode=mode, dtype=cls._dtype_word,
     95                               offset=offset, shape=None if payloadsize is None
---> 96                               else (payloadsize // cls._dtype_word.itemsize,))
     97             fh.seek(offset + words.size * words.dtype.itemsize)
     98         return cls(words, header=header, **kwargs)

/usr/local/lib/python3.6/site-packages/numpy/core/memmap.py in __new__(subtype, filename, dtype, mode, offset, shape, order)
    262         bytes -= start
    263         array_offset = offset - start
--> 264         mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
    265
    266         self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,

ValueError: mmap length is greater than file size

Stream _frame_class is defined but not used

Makes sense really only on binary reader/writer. If defined, could be used to infer header and payload classes.

Some stream reader attributes aren't documented

Sphinx's automodapi is unable to document instance attributes (for obvious reasons), but this leaves a number of useful attributes initialized by stream readers, such as nchan, frames_per_second and even header0, undocumented in the API.

One solution would be to include these in the class whose __init__ creates them, eg. nchan in VLBIStreamBase. This does, however, increase clutter. Strangely, attribute docstrings show up in the API of their defined class, but only the attribute shows up in subclasses (if :inherited-members: is used).

Remove pytest=3.2 fixes when astropy lts upgraded to 2.0.3

Reminder to ourselves.

Make fill_value a property of the file handle

There would seem to be a great benefit of fh.read(...) to have only arguments that are valid for all formats. We've gone partially that way by moving squeeze up and I think we should continue that by also moving fill_value to be an argument of open rather than of read (this in particular since gsb does not have a fill value; #105). I think initially at least I would advocate for not making this settable after opening the file (though that is not hard to do in principle; it would require invalidating any frame already decoded, though).

Make all header properties "quasi-lazy"?

Line profiling finds that decoding the header takes a significant amount of time (perhaps around 10% of the total time for squential decode). A possible solution is to make header properties "quasi-lazy" - the properties themselves are lazy and are only set once, but all lazy properties are cleared whenever mutable is set to True to allow for user changes.

Remove ability to get header items as attributes?

Does anything break? It seem not bad to keep a logical separation between what is really in the header and what is inferred from it.

VDIF header fromvalues could be cleaner

Right now the fromvalues method pops properties related to setting the time (and verify), creates the header, and then sets the time using sample_rate (which not every EDV has). A perhaps more elegant implementation is:

        kwargs.setdefault('legacy_mode', True if edv is False else False)
        kwargs['edv'] = edv
        time = kwargs.pop('time', None)
        sample_rate = kwargs.pop('sample_rate', None)
        if time is not None:
            kwargs['seconds'], kwargs['frame_nr'] = self.convert_time(time, sample_rate)
        # Pop verify and pass on False so verify happens after time is set.
        return super(VDIFHeader, cls).fromvalues(edv, **kwargs)

(not sure if we should allow the user to pass a time and frame_nr - I think we had concluded against it in past discussion.) We then would have to also have a convert_time function in the header that does the same thing as set_time now, and set_time would then be:

    def set_time(self, time, sample_rate=None, frame_nr=None):
        self['seconds'], self['frame_nr'] = self.convert_time(time, sample_rate=sample_rate,
                                                                                        frame_nr=frame_nr)

With the discussion in #135 that EDV header values aren't necessarily trustworthy, this also frees us from explicitly using the header's internal sample_rate.

mhvk / baseband Goto Github PK

baseband's Issues

Recommend Projects

Recommend Topics

Recommend Org