Giter Site home page Giter Site logo

fixedwidth's People

Contributors

antoinel avatar miceno avatar rohanza avatar shawnmilo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fixedwidth's Issues

Support utf-8 and other encodings

Some chars will be represented as more than 1 byte when writing it to file, so conversion to char positions and byte positions will not map, they are the same only for ASCII chars.

For example, the following line representing a SEPA direct debit:

0319154003010000000031                       010000000031                       FRST    0000000180020091031BIC123BBXXXFARRÉS I JULIÀ, ANDREU                                               CARRER ESPAÑA, 2                                  08012 ALBACETE                                     ALBACETE                                 ES                                                                        AES4721003311852100084378              QUOTA BARÇA                                                                                                                                                

FARRÉS is 6 chars long, but it is stored as 7 bytes, as È are bytes 0xc3 0x89 in UTF-8 (see here for a table of UTF-8 chars https://www.utf8-chartable.de/unicode-utf8-table.pl).

00000000  30 33 31 39 31 35 34 30  30 33 30 31 30 30 30 30  |0319154003010000|
00000010  30 30 30 30 33 31 20 20  20 20 20 20 20 20 20 20  |000031          |
00000020  20 20 20 20 20 20 20 20  20 20 20 20 20 30 31 30  |             010|
00000030  30 30 30 30 30 30 30 33  31 20 20 20 20 20 20 20  |000000031       |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000050  46 52 53 54 20 20 20 20  30 30 30 30 30 30 30 31  |FRST    00000001|
00000060  38 30 30 32 30 30 39 31  30 33 31 42 49 43 31 32  |80020091031BIC12|
00000070  33 42 42 58 58 58 46 41  52 52 c3 89 53 20 49 20  |3BBXXXFARR..S I |
00000080  4a 55 4c 49 c3 80 2c 20  41 4e 44 52 45 55 20 20  |JULI.., ANDREU  |
00000090  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000b0  20 20 20 20 20 20 20 20  20 20 20 20 20 43 41 52  |             CAR|
000000c0  52 45 52 20 45 53 50 41  c3 91 41 2c 20 32 20 20  |RER ESPA..A, 2  |
000000d0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000f0  30 38 30 31 32 20 41 4c  42 41 43 45 54 45 20 20  |08012 ALBACETE  |
00000100  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000120  20 20 20 41 4c 42 41 43  45 54 45 20 20 20 20 20  |   ALBACETE     |
00000130  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000140  20 20 20 20 20 20 20 20  20 20 20 20 45 53 20 20  |            ES  |
00000150  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000190  20 20 20 20 20 20 41 45  53 34 37 32 31 30 30 33  |      AES4721003|
000001a0  33 31 31 38 35 32 31 30  30 30 38 34 33 37 38 20  |311852100084378 |
000001b0  20 20 20 20 20 20 20 20  20 20 20 20 20 51 55 4f  |             QUO|
000001c0  54 41 20 42 41 52 c3 87  41 20 20 20 20 20 20 20  |TA BAR..A       |
000001d0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000250  20 20 20 20 20 20 20 20  20 0a 0a                 |         ..|
0000025b

There are some options to deal with it:

  • Add a new type
  • Extend current string type to accept a new parameter encoding and deal with multi-byte chars when calculating positions.

Handling variable occurrences at run time

This library is excellent for fixed_width conversion.
There is a use case which I am trying to resolve:
A field might occur a number of times depending on a count(in some other field, say the previous field's value) at run time.
In such a case, will start_pos have to be changed dynamically or can there be a field_index property with only the field_length?

No date type support

The fixedwidth_test.py lists one of the fields as date:

SAMPLE_CONFIG = {
...
    "date": {
        "required": False,
        "type": "date",
        "default": datetime.datetime.strptime('20170101', '%Y%m%d'),
        "start_pos": 101,
        "end_pos": 108,
        "alignment": "right",
        "padding": " ",
        "format": '%Y%m%d',
        },

However, when attempting to use similar config, I get:

ValueError: Field purchdate has an invalid type (date). Allowed: 'string', 'integer', 'decimal', 'numeric'

Is the test working, if there is no support for it in the module?

Build only what is needed

Hi, For performance reasons I needed to only build the objects that I really needed. Our FixedWidth files contains 14,000+ columns and normally I only require around 50 of these columns. I built the object .json file for the entire 14,000+ but then filter the object down to only what I need. Has to comment out the validation part.

` headers = ['HEAD001', 'HEAD002', 'HEAD005']
configfile = 'layout.json'

with open(configfile, 'r') as f:
    config = json.load(f)

#Create FixedWidth Object
fw_config = deepcopy(config)
fw_configFiltered = {x: k for x, k in fw_config.items() if x in (headers)}`

Please see below part that I had to comment out, was thinking of rather adding a parameter to ignore if I need this to be ignored. Is there perhaps a better way of doing this?

`
#ensure start_pos and end_pos or length is correct in config
# current_pos = 1
# for start_pos, field_name in self.ordered_fields:

    #     if start_pos != current_pos:
    #         raise ValueError("Field %s starts at position %d; \
    #         should be %d (or previous field definition is incorrect)." \
    #         % (field_name, start_pos, current_pos))

    #     current_pos = current_pos + config[field_name]['length']`

Could not import FixedWidth

fixedwidth import FixedWidth
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name FixedWidth

import fixedwidth
fixedwidth.__
fixedwidth.class( fixedwidth.format( fixedwidth.new( fixedwidth.repr(
fixedwidth.delattr( fixedwidth.getattribute( fixedwidth.package fixedwidth.setattr(
fixedwidth.dict fixedwidth.hash( fixedwidth.path fixedwidth.sizeof(
fixedwidth.doc fixedwidth.init( fixedwidth.reduce( fixedwidth.str(
fixedwidth.file fixedwidth.name fixedwidth.reduce_ex( fixedwidth.subclasshook(

[Suggestion] mapping type?

Hello and thanks again for implementing this, very useful!
It turns out there are plenty of organisations still using fixed width files, especially older ERP systems.

I was wondering, if you could possibly consider improving your module even further with an additional data type, which would allow for seamless translation of key values between formats, such as frequently are needed? It would allow these conversions happen more transparently and keep the config of data type in one place.

Let us suppose that you have format A, in which certain field ORDER_TYPE can (only) contain certain codes, for example: ["SALE", "PRCH", "XFER"], and you need to convert it to another format B, in which the same codes are called [ "SELL", "BUY", "TRANSFER"]. It seems to me it would be elegant if it were possible to do something like this:

config={
    "ORDER_TYPE": {
      "required" : True,
       "type": "mapping",
       "start_pos": 1,
       "length": 4,
       "values" : dict( SALE="SELL", PRCH="BUY", XFER="TRANSFER")
    }
}

and the effect would be this:

proc=FixedWidth(config)
proc.line="XFER"
proc.data: 
{ 'ORDER_TYPE' : 'TRANSFER'}
proc.line='NONE'
    ValueError: Key not found in mapping data type!'

I would be glad to implement it myself, but I am not certain enough of my Python skills.

new function "fw_from_dict"

can you implement a new function?

#current way
from fixedwidth.fixedwidth import FixedWidth
dict_01 = {"aaa": {"required": True,"type": "string","start_pos": 1,"end_pos": 5,"alignment": "left","padding": " "},"bbb": {"required": True,"type": "string","start_pos": 6,"end_pos": 10,"alignment": "left","padding": " "}}
dict_02 ={ "aaa" :"xxx","bbb":"yyy"}
fw_obj = FixedWidth(dict_01)
fw_obj.update(aaa=dict_02.get("aaa",""),bbb=dict_02.get("bbb", ""))*
print(fw_obj.line)

future way
from fixedwidth.fixedwidth import FixedWidth
dict_01 = {"aaa": {"required": True,"type": "string","start_pos": 1,"end_pos": 5,"alignment": "left","padding": " "},"bbb": {"required": True,"type": "string","start_pos": 6,"end_pos": 10,"alignment": "left","padding": " "}}
dict_02 ={ "aaa" :"xxx","bbb":"yyy"}
fw_obj = FixedWidth(dict_01)
fw_obj.fw_from_dict(dict_02)
print(fw_obj.line)

'default': None not allowed for type=decimal

Simple test is to replace none_date with none_decimal in the current test suite (see attached source and the easier-to-read diff). This fails loudly:

ERROR: test_basic (main.TestFixedWidth)

Traceback (most recent call last):
File "...\fixedwidth.git\fixedwidth\tests\fixedwidth_testNoneDecimal.py", line 150, in test_basic
fw_obj = FixedWidth(fw_config)
File "...\fixedwidth.git\fixedwidth\fixedwidth.py", line 134, in init
value['default'] = Decimal(value['default'])
TypeError: conversion from NoneType to Decimal is not supported

Since while reading (_string_to_dict) a field which is space-filled will also raise an exception if there is no default (Decimal('') is not accepted, this makes decimal awkward to deal with.

fixedwidth_testNoneDecimal.zip

Negative values not supported for integer types

Negative values for integers cause a
ValueError: <field> is defined as a integer, but the value is not of that type

To recreate

from fixedwidth.fixedwidth import FixedWidth

CONFIG = { 'foo' : { 'required' : True, 'type': 'integer', 'start_pos': 1, 'end_pos': 5, 'alignment': 'right', 'padding': ' '}}

f = FixedWidth(CONFIG)
f.update(foo=-1)
f.line

[Suggestion] Regenerate config when changes are made

@ShawnMilo First of all, thank you for this library - it has come to my rescue on a few projects I have worked on to date.

The project I'm currently working on is still in an early phase so the structure of my config is always changing. What I do struggle with is having to change all indexes below the point where I make a change (such as adding a new field or changing start_pos, end_pos or length of an existing field. Since the config is rather large, this is rather time consuming and could take a few attempts to finally get all start/end positions updated correctly.

The script below detects any incorrect positions within the config and adjusts them accordingly. Please feel free to add it in should you like to.

"""
Author: @muhammedabad

Assuming all lengths are correct, this script takes a fixed-width config and recursively re-writes the start and
end positions when new fields are added to the config or existing lengths are modified.

"""

from fixedwidth.fixedwidth import FixedWidth

sample_config = {
    'field_one': {
        'required': True,
        'type': 'string',
        'start_pos': 1,
        'end_pos': 2,
        'alignment': 'right',
        'padding': '0',
        'length': 2,
    },

    'field_two': {
        'required': True,
        'type': 'string',
        'start_pos': 3,
        'end_pos': 6,
        'alignment': 'right',
        'padding': '0',
        'length': 4,
    },

    'field_three': {
        'required': True,
        'type': 'string',
        'start_pos': 7,
        'end_pos': 41,
        'alignment': 'left',
        'padding': ' ',
        'length': 35,
    }
}

start_pos = 0
end_pos = 0

for key in sample_config.keys():
    if not all([end_pos, start_pos]):
        start_pos = sample_config[key]['start_pos']
        end_pos = start_pos + sample_config[key]['length'] - 1

    else:
        start_pos = end_pos + 1
        end_pos = start_pos + sample_config[key]['length'] - 1

        sample_config[key]['start_pos'] = start_pos
        sample_config[key]['end_pos'] = end_pos

# Initialize the new config in a FW object which handles the validation.
fw_config = FixedWidth(config=sample_config)

# Use an online python prettifier like https://pythoniter.appspot.com/ and paste in the output from the line below
print(sample_config)

[Suggestion] Type Test add date and datetime

Hi is posibile add date and datetime in type formats

< from datetime import date
188,189c187
< 'datetime': lambda x: isinstance(x, datetime),
< 'date': lambda x: isinstance(x, date),

        'date': lambda x: isinstance(x, datetime),

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.