selik / xport Goto Github PK
View Code? Open in Web Editor NEWPython reader and writer for SAS XPORT data transport files.
License: MIT License
Python reader and writer for SAS XPORT data transport files.
License: MIT License
Since Windows users may not have command-line pipes, it'd be nice if CLI tool would allow specifying the output file as a command line argument. This would be similar behavior to many CLI tools.
The file format is slightly different.
https://support.sas.com/techsup/technote/ts140_2.pdf
HEADER RECORD*******LIBV8 HEADER RECORD!!!!!!!000000000000000000000000000000
aaaaaaaabbbbbbbbccccccccddddddddeeeeeeee ffffffffffffffff
where aaaaaaaa and bbbbbbbb are each 'SAS ' and cccccccc is 'SASLIB ', dddddddd is
the version of the SAS system that created the file, and eeeeeeee is the operating system
creating it. ffffffffffffffff is the datetime created, formatted as ddMMMyy:hh:mm:ss.
Note that only a 2-digit year appears. If any program needs to read in this 2-digit year, be
prepared to deal with dates in the 1900s or the 2000s.
Another way to consider this record is as a C structure:
struct REAL_HEADER {
char sas_symbol[2][8];
char saslib[8];
char sasver[8];
char sas_os[8];
char blanks[24];
char sas_create[16];
};
ddMMMyy:hh:mm:ss
where the string is the datetime modified. Most often, the datetime created and datetime
modified will always be the same. Pad with ASCII blanks to 80 bytes.
Note that only a 2-digit year appears. If any program needs to read in this 2-digit year, be
prepared to deal with dates in the 1900s or the 2000s.
HEADER RECORD*******MEMBV8 HEADER RECORD!!!!!!!000000000000000001600000000140
HEADER RECORD*******DSCPTV8 HEADER RECORD!!!!!!!000000000000000000000000000000
Note the 0140 that appears in the member header record above. That value is the size of the variable descriptor (NAMESTR) record that is described later in this document.
aaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccddddddddeeeeeeeeffffffffffffffff
where aaaaaaaa is 'SAS ', bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb is the data set name,
cccccccc is SASDATA (if a SAS data set is being created), dddddddd is the version of
the SAS System under which the file was created, and eeeeeeee is the operating system
name. ffffffffffffffff is the datetime created, formatted as in previous headers. Consider
this C structure:
struct REAL_HEADER {
char sas_symbol[8];
char sas_dsname[32];
char sasdata[8];
char sasver[8];
char sas_osname[8];
char sas_create[16];
};
The second header record is
ddMMMyy:hh:mm:ss aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbb
where the datetime modified appears using DATETIME16. format, followed by blanks
up to column 33, where the a's above correspond to a blank-padded data set label, and
bbbbbbbb is the blank-padded data set type. Note that data set labels can be up to 256
characters as of Version 8 of the SAS System, but only up to the first 40 characters are
stored in the second header record. Note also that only a 2-digit year appears in the
datetime modified value. If any program needs to read in this 2-digit year, be prepared to
deal with dates in the 1900s or the 2000s.
Consider the following C structure:
struct SECOND_HEADER {
char dtmod_day[2];
char dtmod_month[3];
char dtmod_year[2];
char dtmod_colon1[1];
char dtmod_hour[2];
char dtmod_colon2[1];
char dtmod_minute[2];
char dtmod_colon2[1];
char dtmod_second[2];
char padding[16];
char dslabel[40];
char dstype[8];
};
HEADER RECORD*******NAMSTV8 HEADER RECORD!!!!!!!000000xxxxxx000000000000000000
Here is the C structure definition for the namestr record:
struct NAMESTR {
short ntype; /* VARIABLE TYPE: 1=NUMERIC, 2=CHAR */
short nhfun; /* HASH OF NNAME (always 0) */
short nlng; /* LENGTH OF VARIABLE IN OBSERVATION */
short nvar0; /* VARNUM */
char8 nname; /* NAME OF VARIABLE */
char40 nlabel; /* LABEL OF VARIABLE */
char8 nform; /* NAME OF FORMAT */
short nfl; /* FORMAT FIELD LENGTH OR 0 */
short nfd; /* FORMAT NUMBER OF DECIMALS */
short nfj; /* 0=LEFT JUSTIFICATION, 1=RIGHT JUST */
char nfill[2]; /* (UNUSED, FOR ALIGNMENT AND FUTURE) */
char8 niform; /* NAME OF INPUT FORMAT */
short nifl; /* INFORMAT LENGTH ATTRIBUTE */
short nifd; /* INFORMAT NUMBER OF DECIMALS */
long npos; /* POSITION OF VALUE IN OBSERVATION */
char longname[32]; /* long name for Version 8-style */
short lablen; /* length of label */
char rest[18]; /* remaining fields are irrelevant */
};
The variable name truncated to 8 characters goes into nname, and the complete name
goes into longname. Use blank padding in either case if necessary. The variable label
truncated to 40 characters goes into nlabel, and the total length of the label goes into
lablen. If your label exceeds 40 characters, you will have the opportunity to write the
complete label in the label section described below.
Note that the length given in the last 4 bytes of the member header record indicates the
actual number of bytes for the NAMESTR structure. The size of the structure listed
above is 140 bytes.
If you have any labels that exceed 40 characters, they can be placed in this section. The
label records section starts with this header:
HEADER RECORD*******LABELV8 HEADER RECORD!!!!!!!nnnnn
where nnnnn is the number of variables for which long labels will be defined.
Each label is defined using the following:
aabbccd.....e.....
where
aa = variable number
bb = length of name
cc = length of label
d.... = name in bb bytes
e.... = label in cc bytes
For example, variable number 1 named x with the 43-byte label 'a very long label for x is
given right here' would be provided as a stream of 6 bytes in hex '00010001002B'X
followed by the ASCII characters.
xa very long label for x is given right here
These are streamed together. The last label descriptor is followed by ASCII blanks
('20'X) to an 80-byte boundary.
If you have any format or informat names that exceed 8 characters, regardless of the
label length, a different form of label record header is used:
HEADER RECORD*******LABELV9 HEADER RECORD!!!!!!!nnnnn
where nnnnn is the number of variables for which long format names and any labels will
be defined.
Each label is defined using the following:
aabbccddeef.....g.....h.....i.....
where
aa=variable number
bb=length of name in bytes
cc=length of label in bytes
dd=length of format description in bytes
ee=length of informat description in bytes
f.....=text for variable name
g.....=text for variable label
h.....=text for format description
i.....=text of informat description
Note: The FORMAT and INFORMAT descriptions are in the form used in a FORMAT
or INFORMAT statement. For example, my_long_fmt., my_long_fmt8.,
my_long_fmt8.2. The text values are streamed together and no characters appear for
attributes with a length of 0 bytes.
For example, variable number 1 is named X and has a label of 'ABC,' no attached
format, and an 11-character informat named my_long_fmt with informat length=8 and
informat decimal=0. The data would be
(hex) (characters)
010103000d XABCmy_long_fmt
The last label descriptor is followed by ASCII blanks ('20'X) to an 80-byte boundary.
HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!!000000000000000000000000000000
Data records are streamed in the same way that namestrs are. There is ASCII blank
padding at the end of the last record if necessary. There is no special trailing record.
Hi All,
Since xpt files not as directly translated as csv editing, can you please guide on how to write to a column to change its value in an xpt file. I only see example of mapping. I do not want to map, here is what I need to do:
Please help how to access 1 column, change value, and save file.
Also 2. Can we convert to CSV and then back to XPT file? Or not needed?
Thanks,
Saga
I'm trying to figure out if None
in character columns should be written as empty string or some kind of sentinel value, like an ASCII NUL character.
These sound like good ideas.
http://agdr.org/2020/05/14/Polyglot-Makefiles.html
https://tech.davis-hansson.com/p/make/
The library doesn't support compressed XPORT files.
I.e. files starting with
**COMPRESSED** **COMPRESSED** **COMPRESSED** **COMPRESSED** **COMPRESSED**".
Example files can be fetched from https://www.ctti-clinicaltrials.org/aact-database.
Having support for these files would be great, but might be complicated to do, as the format specification is not available.
Can you add a error message, that the XPORT file is a compressed XPORT (CPORT) file, which is not supported by the library?
Thanks.
Hello Mr. Selik,
I am trying to read a .XPT file into Python for a class assignment and I came across your library. Using the sample code provided at https://pypi.org/project/xport/ , I am receiving the error:
AttributeError: module 'pandas' has no attribute 'NA'
Here is the code I used, modified from the website:
import xport
import xport.v56
with open('data/DXX_J.XPT','rb') as f:
library = xport.v56.load(f)
It is past my skill level to look into the source code and try to fix the error myself.
From email: "from_dataframe function bugs out as currently the list is referencing df not the passed in dataframe object"
From email: "the from_columns function throws an error when any column is blank... Need to catch this and pass in some representation of Null."
ImportError: cannot import name reading
get this error when adding in the code as below:
with open('nsIQTScriptablePlugin.xpt', 'r') as f:
for row in xport.Reader(f):
print row
For example, if the first row has a float
but the next row has a 10-character str
, the XPT namestr will say it's a numeric column of length 10. That will cause a NotImplementedError
when reading the XPT file.
I'm getting the following error parsing an xpt file -
xport.py, line 188, in __iter__
for obs in self._read_observations(self._variables):
xport.py, line 367, in _read_observations
raise ParseError('incomplete record', sentinel, block)
data_integration.utilities.xport.ParseError: incomplete record -- expected b' ', got b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
Note there are many spaces between the quotes in "expected b' ' " that I omitted.
It looks like spaces are expected, but it's getting null characters instead. This is a SaS version 9.4 xpt file, but I've parsed many other version 9.4 files using this library without an issue. Also interestingly I can read the header/schema of this file using this library fine, I'm just having issues parsing the rows.
I don't think the file itself is corrupt, because it can be parsed correctly in R using haven::read_xpt. I'm using the most reason version of xport published to PyPI.
currently while writing a xpt file from dataframe, the from_columns function is called and here we write the member header records and default the name of the date set to b'dataset, see below code for reference
# Member header data
fp.write(b'SAS'
b'dataset' # dataset name -customize this field
b'SASDATA'
+ sas_version
+ os_version
+ 24 * b' '
+ created)
Can you help me to make the dataset name customizable or is there already a function for doing this? As we don't want the all datasets to have the same same as dataset
Hello,
I was trying to change the name of the SAS Dataset to more than 8 characters and figured out that it is not supported by this module and it supports only upto V6, You have asked me to submit an issue if I want to work with SAS V9, Itried to change ther version number from 6.06 to 9.4 but it did'nt work.
Can you please help me out with just changing the name of the Dataset as it can contain more than 8 characters.
Thanks
Vignesh
Traceback (most recent call last):
File "main.py", line 124, in <module>
rows = xport.load(f)
File "build/bdist.linux-x86_64/egg/xport.py", line 380, in load
File "build/bdist.linux-x86_64/egg/xport.py", line 164, in __iter__
File "build/bdist.linux-x86_64/egg/xport.py", line 339, in _read_observations
ValueError: Incomplete record, ''
whenever i try to load a xpt file with more than 9999 values it gives me incomplete record error.
I am using below method for generating xpt file, but I have 15 columns or so
mapping = {'numbers': [1, 3.14, 42],
'text': ['life', 'universe', 'everything']}
# as a mapping of labels to columns
with open('answers.xpt', 'wb') as f:
dump(f, mapping, mode='columns')
but while loading i am using below method
rows = xport.load(f)
everything works fine if I loop till 9999 for inserting data,
but for 10000 records it gives me above error
Also, I am working on debugging the code and identify the problem. Let me know if you can find something.
I'm trying to convert an XPT file to CSV, and am getting the error below. I installed xport from pip.
The file ( MGX_H.XPT ) is from this cdc.gov page. A direct link to the file is here.
I'm a bit of a newbie with SAS and XPT files, so I'm sorry if I'm missing anything obvious!
$ xport MGX_H.XPT > mgx_h.csv
Traceback (most recent call last):
File "/home/user/.local/bin/xport", line 8, in <module>
sys.exit(cli())
File "/home/user/.local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/user/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/xport/cli.py", line 72, in cli
library = xport.v56.load(input)
File "/home/user/.local/lib/python3.9/site-packages/xport/v56.py", line 900, in load
return loads(bytestring)
File "/home/user/.local/lib/python3.9/site-packages/xport/v56.py", line 911, in loads
return Library.from_bytes(bytestring)
File "/home/user/.local/lib/python3.9/site-packages/xport/v56.py", line 700, in from_bytes
self = Library(
File "/home/user/.local/lib/python3.9/site-packages/xport/__init__.py", line 589, in __init__
for dataset in members:
File "/home/user/.local/lib/python3.9/site-packages/xport/v56.py", line 607, in from_bytes
data.copy_metadata(head)
File "/home/user/.local/lib/python3.9/site-packages/xport/__init__.py", line 412, in copy_metadata
for k, v in self.items():
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 957, in items
yield k, self._get_item_cache(k)
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 3542, in _get_item_cache
res = self._box_col_values(values, loc)
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3192, in _box_col_values
return klass(values, index=self.index, name=name, fastpath=True)
File "/home/user/.local/lib/python3.9/site-packages/xport/__init__.py", line 310, in __init__
LOG.debug(f'Initialized {self}')
File "/home/user/.local/lib/python3.9/site-packages/xport/__init__.py", line 276, in __repr__
return f'{type(self).__name__}\n{super().__repr__()}\n{", ".join(metadata)}'
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/series.py", line 1327, in __repr__
self.to_string(
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/series.py", line 1386, in to_string
formatter = fmt.SeriesFormatter(
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/formats/format.py", line 261, in __init__
self._chk_truncate()
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/formats/format.py", line 285, in _chk_truncate
series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 274, in concat
op = _Concatenator(
File "/home/user/.local/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 395, in __init__
axis = sample._constructor_expanddim._get_axis_number(axis)
File "/home/user/.local/lib/python3.9/site-packages/xport/__init__.py", line 340, in _constructor_expanddim
raise NotImplementedError("Can't copy SAS variable metadata to dataframe")
NotImplementedError: Can't copy SAS variable metadata to dataframe
$ python --version
Python 3.9.0
$ pip show pandas
Name: pandas
Version: 1.1.4
...
$ pip show xport
Name: xport
Version: 3.2.1
...
Hi, can you explain how we can save the dataframe (df) to new xpt file?
Thanks,
Sagar
Hello,
Thanks for maintaining this package, it's quite helpful.
I'm trying to run it and, while typing exactly what's in the help section, I'm getting a strange error message. I'm pretty sure it used to work. I'm using version 3.1.3 (from Anaconda).
import pandas
import xport
import xport.v56
df = pandas.DataFrame({
'alpha': [10, 20, 30],
'beta': ['x', 'y', 'z'],
})
... # Analysis work ...
ds = xport.Dataset(df, name='DATA', label='Wonderful data')
for k, v in ds.items():
v.label = k # Use the column name as SAS label
v.name = k.upper()[:8] # SAS names are limited to 8 chars
if v.dtype == 'object':
v.format = '$CHAR20.' # Variables will parse SAS formats
else:
v.format = '10.2'
library = xport.Library({'DATA': ds})
# Libraries can have multiple datasets.
with open('example.xpt', 'wb') as f:
xport.v56.dump(library, f)
Getting this log in Jupyter:
Converting column 'alpha' from int64 to float
Converting column 'beta' from object to string
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py in pandas_dtype(dtype)
TypeError: data type "string" not understood
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in __bytes__(self)
613 try:
--> 614 self[column] = self[column].astype(dtype)
615 except Exception:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
5690 # GH 24704: use iloc to handle duplicate column names
-> 5691 results = [
5692 self.iloc[:, i].astype(dtype, copy=copy)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
530 for b in blocks
--> 531 ]
532
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
394
--> 395 self._consolidate_inplace()
396
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
533 return self.make_block(nv)
--> 534
535 # ndim > 1
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
594 return self.make_block(Categorical(self.values, dtype=dtype))
--> 595
596 dtype = pandas_dtype(dtype)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\common.py in pandas_dtype(dtype)
TypeError: data type 'string' not understood
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-43-8d39bacd8d51> in <module>
23
24 with open('example.xpt', 'wb') as f:
---> 25 xport.v56.dump(library, f)
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in dump(library, fp)
905
906 """
--> 907 fp.write(dumps(library))
908
909
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in dumps(library)
924
925 """
--> 926 return bytes(Library(library))
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in __bytes__(self)
704 b'created': strftime(self.created if self.created else datetime.now()),
705 b'modified': strftime(self.modified if self.modified else datetime.now()),
--> 706 b'members': b''.join(bytes(Member(member)) for member in self.values()),
707 }
708
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in <genexpr>(.0)
704 b'created': strftime(self.created if self.created else datetime.now()),
705 b'modified': strftime(self.modified if self.modified else datetime.now()),
--> 706 b'members': b''.join(bytes(Member(member)) for member in self.values()),
707 }
708
C:\ProgramData\Anaconda3\lib\site-packages\xport\v56.py in __bytes__(self)
614 self[column] = self[column].astype(dtype)
615 except Exception:
--> 616 raise TypeError(f'Could not coerce column {column!r} to {dtype}')
617 header = bytes(MemberHeader.from_dataset(self))
618 observations = bytes(Observations.from_dataset(self))
TypeError: Could not coerce column 'beta' to string
Any idea what's causing this?
thanks a lot,
Kind regards,
Nicolas
Hi, in order to edit the value of 1 column in a file - do we need to configure pandas python package?
What is example code to access and change the value of 1 column? Please advise since the tutorial says not the same as CSV and only shows mapping example.
Thanks,
Sagar
Code used below- simple open and read all rows from xpt file
import xport
#this portion is for opening and closing xpt files
with open('bg.xpt', 'rb') as f:
for row in xport.Reader(f):
print row
Error:
Traceback (most recent call last):
File "test1.py", line 18, in
for row in xport.Reader(f):
File "/usr/lib/python2.7/site-packages/xport.py", line 160, in init
version, os, created, modified = self._read_header()
File "/usr/lib/python2.7/site-packages/xport.py", line 197, in _read_header
tokens = tuple(t.rstrip() for t in struct.unpack(fmt, raw))
struct.error: unpack requires a string argument of length 80
I get this error when I attempt to do import xport.v56
I am using the Python standard library virtual environment created via python -m venv .venv
, but I note that there may be a conda configuration requirement. Could this be what is causing my code to fail?
I tried to convert NHANES data in xpt format into csv format in Jupyter notebook, and have installed xport with the following code:
`import sys
!{sys.executable} -m pip install xport
import xport, csv
with xport.XportReader('MCQ_J.xpt') as reader:
with open('MCQ_J.csv', 'rb') as out:
writer = csv.DictWriter(out, [f['name'] for f in reader.fields])
for row in reader:
writer.writerow(row)`
but I have the error that "module 'xport' has no attribute 'XportReader'", was my download package wrong or do you have advice on how to solve this?
Hi,
I am able to read some xpt files correctly, but for some files with the same code I am getting the following error:
xport.ParseError: header -- expected b'HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!' got b'STUDYID,DOMAIN,USUBJID,SUBJID,RFSTDTC,RFENDTC,RF'
The latter part of the error after "got" are the column names in my .xpt file.
I am wondering if this has to do with the xpt file being generated in a newer version of SAS? If so, please advise how best to go around this issue.
Thanks!
Hi, when using from_columns function from xport, I get following error for a couple files. The other 15 work fine. Please advise why this error occurs?
"xport.py", line 666, in from_columns
column[ i ] = value.encode('ISO-8859-1')
UnicodeDecodeError: 'ascii' codec cant decode byte 0xa0 in position 11: ordinal not in range(128).
Hi,
It tried without any success to read metadata as label and format of the xpt variables .
Such an option seems to not be available.
Many thanks !
Currently, I can convert my JSON data in a xpt format and when I convert it to SAS data set, it contains only single SAS data set(.sas7bdat format).
Is there any way I can convert a JSON data into xpt format with multiple SAS datasets(.sas7bdat format)?
Thanks in advance
Thanks @MarkPinches for pointing this out. When the number of bytes read is a multiple of 80, the XPT file may end without padding.
Hi,
If i open xpt file in notepad, i can see column name there. Could you please guide me how can i extract column name form xpt file using xport?
Installed in a fresh conda environment:
# Name Version Build Channel
ca-certificates 2020.1.1 0
certifi 2020.4.5.1 py38_0
click 7.1.1 pypi_0 pypi
libcxx 4.0.1 hcfea43d_1
libcxxabi 4.0.1 hcfea43d_1
libedit 3.1.20181209 hb402a30_0
libffi 3.2.1 h0a44026_6
ncurses 6.2 h0a44026_0
numpy 1.18.3 pypi_0 pypi
openssl 1.1.1g h1de35cc_0
pandas 1.0.3 pypi_0 pypi
pip 20.0.2 py38_1
python 3.8.2 hc70fcce_0
python-dateutil 2.8.1 pypi_0 pypi
pytz 2019.3 pypi_0 pypi
pyyaml 5.3.1 pypi_0 pypi
readline 8.0 h1de35cc_0
setuptools 46.1.3 py38_0
six 1.14.0 pypi_0 pypi
sqlite 3.31.1 h5c1f38d_1
tk 8.6.8 ha441bb4_0
wheel 0.34.2 py38_0
xport 3.1.2 pypi_0 pypi
xz 5.2.5 h1de35cc_0
zlib 1.2.11 h1de35cc_3
Tried to convert one file to another via xport file1.xpt > file2.csv
.
Got an enormous error traceback, ending with:
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 141, in __init__
self._consolidate_check()
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 656, in _consolidate_check
ftypes = [blk.ftype for blk in self.blocks]
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 656, in <listcomp>
ftypes = [blk.ftype for blk in self.blocks]
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 349, in ftype
return f"{dtype}:{self._ftype}"
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/numpy/core/_dtype.py", line 54, in __str__
return dtype.name
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/numpy/core/_dtype.py", line 347, in _name_get
if _name_includes_bit_suffix(dtype):
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/numpy/core/_dtype.py", line 326, in _name_includes_bit_suffix
elif np.issubdtype(dtype, np.flexible) and _isunsized(dtype):
File "/Users/samuelrobertmathias/miniconda3/envs/xport/lib/python3.8/site-packages/numpy/core/numerictypes.py", line 392, in issubdtype
if not issubclass_(arg1, generic):
RecursionError: maximum recursion depth exceeded
Hi is there a function to convert csv to xpt? (Reverse of listed xpt to csv file)
Thanks,
Sagar
It should be easy enough to publish the docs website either on GitHub, ReadTheDocs, or somewhere similar.
I followed this wonderful link from the docs that knew just what I wanted
If you want the relative comfort of SAS Transport v8’s limit of 246 characters, please make an enhancement request.
Is this upgrade feasible?
We have command line: python -m xport dm_1.xpt > ex.csv (input = xpt, output = csv)
Is there a reverse function we can use (input = csv, output = xpt). Please advise on solution.
Thanks,
Sagar
I want to add label and format in columns . is there any function for this?
Some XPORT files can be quite large, so it'd be nice to make string decoding faster. I suspect Cython could give us a boost.
Is there any way to handle different character encoding sets? Most of the SAS files that I have to read are encoded in CP-1252 (gross, I know) and it looks there isn't a good way to handle that here
Currently, when writing to the xpt from_columns, the module creates column names from the labels. It would be helpful for our submissions to the FDA to be able to specify both the column names and column labels.
For example, the first column is the study identifier. The required column name is "STUDYID" and the column label should be "Study Identifier". When I pass "Study Identifier" as the column label using from_column, the column name "Study_Id" is created and is not acceptable to the FDA.
When writing from_rows, the column name remains what I had populated in the dataframe but there is apparently not a way to then add column labels.
Michael, Thank you for your excellent work on this project!
Why not? Let's make this an all-around SAS file reader/writer.
The XPORT format specifies that the file is padded with b' '
to ensure the total file length (in bytes) is a multiple of 80. If there are no numeric columns and the last row consists of only empty strings or strings with only spaces, these are indistinguishable from the XPORT file padding.
This appears to be a defect in the XPORT format specification. Wontfix?
Nice library and would like to use it, not sure how much work xport is doing, but finding it much slower than pandas .to_csv().
We're trying to process data outside of SAS, then move it back into SAS 9.4.6.
I have a Pandas DataFrame (10000000, 10) [All string objects]
pdf.to_csv()
takes 30.46 seconds [670MB produced]
xport.v56.dump(ds, f)
takes 6.12 minutes [3.5GB produced]
We're using pandas ==1.0.5 and underlying Pandas dataframe came from an Arrow type data structure.
I'm noticing that most of the time comes after the last Converting column 'column10' from object to string
I don't know what the results for a corresponding .sas7bdat
file would be, but sas7bdat would be the real end goal.
Thanks!
There is no way to specify the name of the dataset generated. Looking in the code, I see it always defaults to 'dataset'.
Maybe the from_rows and from_columns functions could take an optional parameter to specify the dataset name.
It seems some archaic FDA submission rules require(d) SAS XPT or CPT-format files. The Aggregate Analysis of ClinicalTrials.gov Database hosts the same data in Oracle "dmp", pipe-delimited text, and SAS CPORT formats. Perhaps we can use these files as a sort of Rosetta stone to infer the specification of the SAS CPT/CPORT format.
Hello,
I have two questions:
How can I manage to call the script, to create a .xpt file with my data, from command line with all required data, since I work in a C# environment. I think I can call the python executable/script with properly formatted arguments, but what about arguments length? I also need to look up how to retrieve the program output (using Windows). Maybe can the documentation be improved for Python-newbies like me?
Is the produced .xpt file XPORT version 5 compliant? I'm required to produce such files, not version 6.
Thank you a lot.
Code extract
ds = xport.Dataset(table_df, name='Data', label='Test data')
for column, variable in ds.items():
if (a_condition):
v.format="10.2"
with open('my_file.xpt', 'wb') as f:
xport.v56.dump(ds,f)
That would give me an error when opening my_file.xpt. And a_condition is true when column is numeric.
Please assist and can you please have more examples on the export.
I'm attempting to read the Behavioral Risk Factor Surveillance System (BRFSS) report from the CDC. When I attempt to read it in I get ValueError: Field names cannot start with an underscore: '_STATE'. The obvious solution to me is to remove the underscore but then I get either ValueError: 256 is not a valid VariableType or 0 is not a valid VariableType
is the parse error you were referring to in your description "ParseError: header -- expected b'HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!', got b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00>\x00\x03\x00\xfe\xff\t\x00\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00'"
Potentially with custom bytes on the end there? If so, this library could use updating.
Very convenient library for quick binary XPT files serialization with xport.to_*
functions. However it would be naturally named after returned object instead of NumPy library name:
xport.to_dict()
xport.to_ndarray()
xport.to_dataframe()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.