jf-tech / omniparser Goto Github PK

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

License: MIT License

Go 98.69% Shell 0.47% Dockerfile 0.08% HTML 0.67% PowerShell 0.10%

transform etl xml json csv fixed-length edi x12 edifact parser

omniparser's Introduction

omniparser

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.

Min Golang Version: 1.14

Licenses and Sponsorship

Omniparser is publicly available under MIT License. Individual and corporate sponsorships are welcome and gratefully appreciated, and will be listed in the SPONSORS page. Company-level sponsors get additional benefits and supports granted in the COMPANY LICENSE.

Documentation

Docs:

Getting Started: a tutorial for writing your first omniparser schema.
IDR: in-memory data representation of ingested data for omniparser.
XPath Based Record Filtering and Data Extraction: xpath queries are essential to omniparser schema writing. Learn the concept and tricks in depth.
All About Transforms: everything about transform_declarations.
Use of custom_func, Specially javascript: An in depth look of how custom_func is used, specially the all mighty javascript (and javascript_with_context).
CSV Schema in Depth: everything about schemas for CSV input.
Fixed-Length Schema in Depth: everything about schemas for fixed-length (e.g. TXT) input
JSON/XML Schema in Depth: everything about schemas for JSON or XML input.
EDI Schema in Depth: everything about schemas for EDI input.
Programmability: Advanced techniques for using omniparser (or some of its components) in your code.

References:

Custom Functions: a complete reference of all built-in custom functions.

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the .snapshots sub directory, you'll find their corresponding output files.

Online Playground (not functioning)

~~Use The Playground (may need to wait for a few seconds for instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.~~

As for now (2023/03/14), all of our previous free docker hosting solutions went away and we haven't found another one yet. For now please clone the repo and use ./cli.sh as described in the Getting Started page.

Why

No good ETL transform/parser library exists in Golang.
Even looking into Java and other languages, choices aren't many and all have limitations:
- Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
- BeanIO can't deal with EDI input.
- Jolt can't deal with anything other than JSON input.
- JSONata still only JSON -> JSON transform.
Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.

Requirements

Golang 1.14 or later.

Recent Major Feature Additions/Changes

2022/09: v1.0.4 released: added csv2 file format that supersedes the original csv format with support of hierarchical and nested records.
2022/09: v1.0.3 released: added fixedlength2 file format that supersedes the original fixed-length format with support of hierarchical and nested envelopes.
1.0.0 Released!
Added Transform.RawRecord() for caller of omniparser to access the raw ingested record.
Deprecated custom_parse in favor of custom_func (custom_parse is still usable for back-compatibility, it is just removed from all public docs and samples).
Added NonValidatingReader EDI segment reader.
Added fixed-length file format support in omniv21 handler.
Added EDI file format support in omniv21 handler.
Major restructure/refactoring
- Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
  - 'result_type' -> 'type'
  - 'ignore_error_and_return_empty_str -> 'ignore_error'
  - 'keep_leading_trailing_space' -> 'no_trim'
- Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
- Changed the way we package custom functions for extensions: previously we collected custom functions from all extensions and then passed all of them to the extension that is used; this feels weird, now only the custom functions included in a particular extension are used in that extension.
- Deprecated/removed most of the custom functions in favor of using 'javascript'.
- A number of package renaming.
Added CSV file format support in omniv2 handler.
Introduced IDR node cache for allocation recycling.
Introduced IDR for in-memory data representation.
Added trie based high performance times.SmartParse.
Command line interface (one-off transform cmd or long-running http server mode).
javascript engine integration as a custom_func.
JSON stream parser.
Extensibility:
- Ability to provide custom functions.
- Ability to provide custom schema handler.
- Ability to customize the built-in omniv2 schema handler's parsing code.
- Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

omniparser is a collaboration effort of jf-tech,Simon and Steven.

omniparser's People

Contributors

Stargazers

Watchers

Forkers

cybernetics huangfu danieljustinbell isgasho shaymanx rohaninfibeam kongqq ngaut kokizzu woshizilong mljdemoulin-cb skarthikraj eatonemmerich kfrico nishankgandhi muharihar bcui-ns samolds cloud-hangar ajunlonglive davedbase ecasilla frankfanslc phial3 tinle alexandrecruz lakshmipjuul pandemicsyn bellyfat segedy01 fgydata jarede-dev jarcement perachi0405 aajaddu luis-pinto-fanduel wahello danielsan musab reconbug sung2585 iq-scm jose-sherpa suresh-nakkeran maxicb modernmediagrp jmptrader johnnyeven logward end2endlogic-com dimied huoli4844

omniparser's Issues

Error in go get install latest version

Hi,

I am getting following error while installing the latest version v1.0.2 of omniparser

extensions/omniv21/fileformat/flatfile/fixedlength/.snapshots/TestReadAndMatchRowsBasedEnvelope-non-empty_buf;_no_read;_match;_create_IDR: malformed file path "extensions/omniv21/fileformat/flatfile/fixedlength/.snapshots/TestReadAndMatchRowsBasedEnvelope-non-empty_buf;_no_read;_match;_create_IDR": invalid char ';'

probably accidental typo in file name is checked in

Ignore blank rows CSV

Hi,

encountered a csv (converted from xlsx) where one line was all blank (,,,,,,)

Errored on 'cannot convert "" to int'

Now I fixed it at the source, but in general, it might be useful to add the option 'ignore blank row' or be able to set a default (like you can in EDI) for empty values? Especially when casting?
(in this case, setting '' to 0)

Question - JSON/XML to EDI conversion

Can this library be used to perform JSON/XML to EDI conversion?

Define new package IDR and new Node basic struct and flags

IDR == intermediate data representation OR in-memory data representation

Let 'javascript' handle aggregation!

The current way of doing sum/avg feels awkward and clumsy. Since we have 'javascript' custom_func, let it handle aggregation. The idea is to have its argument decl support array:

from:

                "total_price": { "custom_func": {
                    "name": "javascript",
                    "args": [
                        { "const": "num * price" },
                        { "const": "num:int" }, { "xpath": "number_purchased" },
                        { "const": "price:float" }, { "xpath": "item_price" }
                    ]
                }}

to:

                "total_price": { "custom_func": {
                    "name": "javascript",
                    "args": [
                        { "const": "var total=0; for (i=0; i<Math.min(num.length,price.length); i++) { total+=num[i]*price[i]; } total; " },
                        { "const": "num:int:array" }, { "xpath": "number_purchased" },
                        { "const": "price:float:array" }, { "xpath": "item_price" }
                    ]
                }}

Converting JSON to EDI X12

Hi all, first of all I would say this tool is great, thanks for all that you have done. I just have a simple question if there is a way to convert json to EDI (X12), or whether something like that is on a roadmap somewhere?

Make `FINAL_OUTPUT` xpath optional for xml/json - default to `.`

Make FINAL_OUTPUT xpath optional for xml/json - default to .

There is no need to require xpath on the root object FINAL_OUTPUT, if it's not specified, just default it to ..

Rethink how we handle custom_func argument and return value

currently custom_func args are all string and return value is string. There are situations where we want an arg to be a slice of string, e.g..

[Feature] additional output options?

Hi everyone,
Are there plans to add mappings to additional output formats, or some documentation around how to output to custom formats?

current usecase would include ProtoBuf-sourced Go struct population from, say, EDI sources. Ideally without requiring a JSON intermediary, but for providing value reusing JSON output towards custom formats might help other consumers adopt omniparser and build precedence for different output options.

Question-make an error when processing csv separated by commas

when processing a cell containing "," in it, it will be divided into multiple pieces,although the whole cell is wrapped in quotation marks
The csv data :

The parsing result:

Is it my default?

EDI: How to assign values to the array, if the field part to be fetched is a loop part, it is not in the loop

I want to add ponumber field to edi850LinesInterface array. But the value is currently based on the relative path: ../BEG/PurchaseOrderNumber, is there any other way to use the absolute path？

#input
ISA*00**00**ZZ*AMAZON*ZZ*KK8T5JV*230619*0735*U*00400*000003059*0*P*>~
GS*PO*AMAZON*KK8T5JV*20230619*0735*6059*X*004010~
ST*850*0001~
BEG*00*NE*3F7L587U**20230619~
REF*CR*8T5JV~
N1*ST**15*9920757~
PO1*1*1*EA*28.79*PE*VN*840386503775~
PO1*2*1*EA*23.99*PE*VN*840386503737~
CTT*2*2~
SE*10*0001~
GE*1*3059~
IEA*1*000006059~

# schema
{
    "parser_settings": {
        "version": "omni.2.1",
        "file_format_type": "edi"
    },
    "file_declaration": {
        "segment_delimiter": "~",
        "element_delimiter": "*",
        "ignore_crlf": true,
        "segment_declarations": [
            {
                "name": "omniparser_detail",
                "type": "segment_group",
                "min": 0,
                "max": -1,
                "is_target": true,
                "child_segments": [
                    {
                        "name": "ISA",
                        "min": 0,
                        "max": -1
                    },
                    {
                        "name": "GS",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "FunctionalIdentifierCode",
                                "index": 1
                            },
                            {
                                "name": "ApplicationSender'sCode",
                                "index": 2
                            },
                            {
                                "name": "ApplicationReceiver'sCode",
                                "index": 3
                            },
                            {
                                "name": "Date",
                                "index": 4
                            },
                            {
                                "name": "Time",
                                "index": 5
                            },
                            {
                                "name": "GroupControlNumber",
                                "index": 6
                            },
                            {
                                "name": "ResponsibleAgencyCode",
                                "index": 7
                            },
                            {
                                "name": "Version/Release/IndustryIdentifierCode",
                                "index": 8
                            }
                        ]
                    },
                    {
                        "name": "ST",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "TransactionSetIdentifierCode",
                                "index": 1
                            },
                            {
                                "name": "TransactionSetControlNumber",
                                "index": 2
                            }
                        ]
                    },
                    {
                        "name": "BEG",
                        "min": 0,
                        "elements": [
                            {
                                "name": "TransactionSetPurposeCode",
                                "index": 1
                            },
                            {
                                "name": "PurchaseOrderTypeCode",
                                "index": 2
                            },
                            {
                                "name": "PurchaseOrderNumber",
                                "index": 3
                            },
                            {
                                "name": "Date",
                                "index": 5
                            }
                        ]
                    },
                    {
                        "name": "CUR",
                        "min": 0
                    },
                    {
                        "name": "REF",
                        "min": 0,
                        "elements": [
                            {
                                "name": "ReferenceIdentificationQualifier",
                                "index": 1
                            },
                            {
                                "name": "ReferenceIdentification",
                                "index": 2
                            }
                        ]
                    },
                    {
                        "name": "N1",
                        "min": 0,
                        "elements": [
                            {
                                "name": "EntityIdentifierCode",
                                "index": 1
                            },
                            {
                                "name": "Name",
                                "index": 2
                            }
                        ]
                    },
                    {
                        "name": "HL",
                        "type": "segment_group",
                        "min": 0,
                        "max": -1,
                        "child_segments": [
                            {
                                "name": "PO1",
                                "min": 0,
                                "elements": [
                                    {
                                        "name": "AssignedIdentification",
                                        "index": 1
                                    },
                                    {
                                        "name": "QuantityOrdered",
                                        "index": 2
                                    },
                                    {
                                        "name": "UnitorBasisforMeasurementCode",
                                        "index": 3
                                    },
                                    {
                                        "name": "UnitPrice",
                                        "index": 4
                                    },
                                    {
                                        "name": "VendorPN",
                                        "index": 7
                                    }
                                ]
                            }
                        ]},
                    {
                        "name": "CTT",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "NumberofLineItems",
                                "index": 1
                            },
                            {
                                "name": "HashTotal",
                                "index": 2
                            }
                        ]
                    },
                    {
                        "name": "SE",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "TransactionSetControlNumber",
                                "index": 2
                            },
                            {
                                "name": "NumberofIncludedSegments",
                                "index": 1
                            }
                        ]
                    },
                    {
                        "name": "GE",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "NumberofTransactionSetsIncluded",
                                "index": 1
                            },
                            {
                                "name": "GroupControlNumber",
                                "index": 2
                            }
                        ]
                    },
                    {
                        "name": "IEA",
                        "min": 0,
                        "max": -1,
                        "elements": [
                            {
                                "name": "NumberofIncludedFunctionalGroups",
                                "index": 1
                            },
                            {
                                "name": "InterchangeControlNumber",
                                "index": 2
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "transform_declarations": {
        "FINAL_OUTPUT": {
            "object": {
                "datas": {
                    "array": [
                        {
                            "object": {
                                "edi850LinesInterface": {
                                    "array": [
                                        {
                                            "xpath":"HL","object": {
                                                "vendorPn": {
                                                    "xpath": "PO1/VendorPN",
                                                    "keep_empty_or_null": true
                                                },
                                                "orderQuantity": {
                                                    "xpath": "PO1/QuantityOrdered",
                                                    "keep_empty_or_null": true
                                                },
                                                "status": {
                                                    "const": "Pending",
                                                    "type": "string"
                                                },
                                                "ponumber": {
                                                    "xpath": "../BEG/PurchaseOrderNumber",
                                                    "keep_empty_or_null": true
                                                }
                                            }
                                        }
                                    ]
                                },
                                "ediSummaryInterface": {
                                    "object": {
                                        "documentNumber": {
                                            "xpath": "/BEG/PurchaseOrderNumber",
                                            "keep_empty_or_null": true
                                        }
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

EDI Parsing

Hello. I have a particular EDI case that if resolvable will add another unique example to your docs. In the example file below a unique entity is defined from the INS segment to the DTP segment(s). Therefore, there are 2 unique entities below.

There can be many DTP segments and they are usually the last segments in the entity after the HD segment. However, as you can see with the first entity in this example below, there is an extra DTP segment after the REF1L segment. Is it possible to define a schema with a DTP segment after REF1L and it is optional (0 or 1) and also define DTP segment(s) [0 or more] following the HD segment?

For after REF*0F I use { "name": "DTP", "min": 0, "max": 1 }. For after HD I used the one below. But I get the error Error: input '' at segment no.1 (char[1,1]): segment 'ISA' needs min occur 1, but only got 0. If I remove the first DTP segment from the schema and remove the first entity then it works. So it doesn't like when the that first DTP segment is missing even though I said min 0 and max 1.

{
	"name": "DTP",
	"min": 0,
	"max": -1,
	"elements": [
		{
			"name": "dateTimeQualifier", "index": 1
		},
		{
			"name": "dateTimePeriod",
			"index": 3
		}
	]
}

ISA*00*          *00*          *30*Entitled FTI     *30*631157085      *200214*0933*^*00501*000000001*1*P*:~
GS*BE*Entitled FTI*631157085*20200214*09333565*1*X*005010X220A1~
ST*834*0001*005010X220A1~BGN*00*ENTITLED LLC*20200214*09333595****4~
REF*38*00000~DTP*007*D8*20200214~N1*P5*ENTITLED LLC*FI*770594061~
N1*IN*Prescription Benefits Inc*FI*631157085~
INS*Y*18*030*XN*A***FT~
REF*0F*555446666~
REF*1L*B196~
DTP*336*D8*20131210~
NM1*IL*1*James*Fredericks*G***34*555446666~
PER*IP**CP*9183333339~
N3*14752 Zoo Avenue~
N4*Lincolnville*OK*76234~
DMG*D8*20000101*M*S~
HD*030**PDG*ENTE0100*EMP~
DTP*348*D8*20200101~
INS*N*19*030*XN*C****N*N~
REF*0F*555446666~
REF*1L*B196~
NM1*IL*1*Jackson*Pollock*S***34*123446666~
N3*142 Bumpkin Road~
N4*Lakeville*SD*79004~
DMG*D8*20000101*F~
HD*030**PDG*ENTA8888~
DTP*348*D8*20200401~
DTP*344*D8*20200401~

Debug mode

Hi,

First off, thanks for this parser. Recently found out I needed to parse some EDI and this helped out, well, eventually. Being new to omniparser and EDI made the learning curve pretty much vertical.

What didn't help was the nontrivial nonstandard message I needed to parse (it comes with a 102 page manual)
Only after giving up and moving to a different library, giving up again and moving to a javascript library, days of trial and error and finally getting a vague grasp of EDI did I realize what I was doing wrong**. Came back to omniparser and managed to create a schema that could handle both test files I have.

Anyway, what helped tremendously was adding these lines to the output:

diff --git a/extensions/omniv21/fileformat/edi/seg.go b/extensions/omniv21/fileformat/edi/seg.go
index cefc213..99a3833 100644
--- a/extensions/omniv21/fileformat/edi/seg.go
+++ b/extensions/omniv21/fileformat/edi/seg.go
@@ -1,6 +1,8 @@
 package edi

 import (
+       "fmt"
+
        "github.com/jf-tech/go-corelib/maths"
 )

@@ -95,8 +97,19 @@ func (d *segDecl) matchSegName(segName string) bool {
                //    "...loop is optional, but if any segment in the loop is used, the first segment
                //    within the loop becomes mandatory..."
                //  - https://github.com/smooks/smooks-edi-cartridge/blob/54f97e89156114e13e1acd3b3c46fe9a4234918c/edi-sax/src/main/java/org/smooks/edi/edisax/model/internal/SegmentGroup.java#L68
+               if len(d.Children) > 0 {
+                       children := make([]string, len(d.Children))
+                       for i, c := range d.Children {
+                               children[i] = c.Name
+                       }
+                       fmt.Printf("group "+d.Name+" children %v\n", children)
+               }
                return len(d.Children) > 0 && d.Children[0].matchSegName(segName)
        default:
+               fmt.Printf("node %s found: %v\n", d.fqdn, d.Name == segName)
+               if d.Name != segName {
+                       fmt.Printf("unexpected node %s \n", segName)
+               }
                return d.Name == segName
        }
 }

It helped figuring out the last known 'good state', what the parser saw, where I was, etc. I don't expect you to add that exact code as its pretty ugly/messy. But I'd like to suggest adding some kind of verbose mode

**I'm not sure, but I don't think any of the parsers out there handle EDI segment compression well. Was trying to strictly implement the specification I had, but had to loosen it up a bit.

Accessing Sub Lines in Omniparser

We have an issue where we receive line items referencing the line one above and we would like to merge this lines

UNA:+.? '
UNB+UNOC:3+238098000000:14+222070370009:14+230516:0651+00407++DATA'
UNH+1+INVOIC:D:01B:UN:EAN010'
BGM+380+328322+9'
DTM+137:20230515:102'
DTM+35:20230515:102'
FTX+AAK+++Entgeltminderungen:Konditionsvereinbarungen'
FTX+ZZZ+++BBN-WARENLIEFERANT    254'
FTX+ABN+1+BA:LEI:246'
FTX+ABO+1+KOR::246'
RFF+DQ:8003964'
RFF+ABO:00407'
DTM+171:20230516:102'
RFF+XC1:DE-OEKO-006'
NAD+SU+2228098000000::9+GESCHAEFTSFUEHRER?::Peter Maly(Sprecher) (SPRECHER:Dr. Daniela B�chel:Christoph Eltze:Thomas Nonn+Telerik Schischmanow'
RFF+FC:215/5940/0012'
RFF+VA:DE812706034'
NAD+BY+2222101000003::9'
RFF+VA:DE49365817'
NAD+DP+2222220573598::9'
TAX+7+VAT+++:::7+S'
CUX+2:EUR:4'
LIN+000001++0891451000066:SRV'
IMD+A++8914510:::DUSCHE BARBER CLUB     250ML FL    MEN EXPERT'
IMD+C++IN'
QTY+47:1.000:PCE'
FTX+ZZZ+++44033'
MOA+203:8.13'
PRI+AAB:8.130:::1:PCE'
RFF+ZZZ:01'
RFF+ON:44033'
TAX+7+VAT+++:::19+S'
LIN+000002++03600524036607:SRV+1:000001'
IMD+A++8914510:::DUSCHE BARBER CLUB     250ML FL    MEN EXPERT'
IMD+C++CU'
QTY+59:6:PCE'
UNS+S'
MOA+124:0.03'
MOA+77:0.25'
MOA+79:0.22'
MOA+125:0.22'
TAX+7+VAT+++:::19+S'
MOA+79:0.11'
MOA+124:0.02'
MOA+125:0.11'
TAX+7+VAT+++:::7+S'
MOA+79:0.11'
MOA+124:0.01'
MOA+125:0.11'
UNT+52+1'
UNZ+1+00407'

On the above EDI, you can see that LIN+000002++03600524036607:SRV+1:000001' is referencing LIN+000001++0891451000066:SRV' How do you suggest we go about combining the two?

[EDI] Handle (or ignore?) line endings

Consider the header

UNA:+.? '

This wouldn't be

    "component_delimiter": ":",
    "element_delimiter": "+",
    "segment_delimiter": "'",
    "release_character": "?",

But instead is

    "component_delimiter": ":",
    "element_delimiter": "+",
    "segment_delimiter": "'\n", <--
    "release_character": "?",

If the file you're trying to read contains line endings. Maybe the safest/easiest option would be removing all line endings?

Type casting for output

Hey @jf-tech,

I have a question about the type casting: we actually need to be able to specify the type to safe cast the value to, instead of relying on the source format type.

The example of the scenario: take a JSON payload and transform it into a Parquet or SQL set of records. Both have extended value types such as DATE, TIME and others. Currently, the only way to achieve that is to have another schema with column/type mappings and type handlers, which seems a little bit redundant.

It'd be great if we can somehow specify it in the schema and have it as a source of truth.

{
    "parser_settings": {
        "version": "omni.2.1",
        "file_format_type": "json"
    },
    "transform_declarations": {
        "FINAL_OUTPUT": { "xpath": "/publishers/*/books/*[author!='']", "object": {
            "publisher_name": { "xpath": "../../name", "type": "string" },
            "author": { "xpath": "author", type: "string" },
            "year": { "xpath": "year", "type": "date" },
            "price": { "xpath": "price", "type": "double" },
            "title": { "xpath": "title",  type: "string" }
        }}
    }
}

Is it something that could be extended programmatically? Does it make any sense to you?

Omniparser gives duplicate records incase of missing values

Hi I am working on a csv file with some missing values:

INPUT
A,B,C,D,E
1,2,3,4,5
4,5,6,7,8
,,,13,14
,,,16,17

SCHEMA:
{ "parser_settings": { "version": "omni.2.1", "file_format_type": "csv2" }, "file_declaration": { "delimiter": ",", "replace_double_quotes": false, "records": [{ "min": 1, "max": 1, "header": "^A,B,C,D,E$" }, { "is_target": true, "columns": [{ "name": "A" }, { "name": "B" }, { "name": "C" }, { "name": "D" }, { "name": "E" } ] } ] }, "transform_declarations": { "FINAL_OUTPUT": { "object": { "a": { "xpath": "A", "type": "int" }, "b": { "xpath": "B", "type": "string" }, "c": { "xpath": "C", "type": "string" }, "d": { "xpath": "D", "type": "string" }, "e": { "xpath": "E", "type": "string" } } } } }

The first 2 records are processed neatly but the last 2 records gives the duplicate result of the "4,5,6,7,8,9
"

OUTPUT:

{
 "Result": [
  {
   "a": 1,
   "b": "2",
   "c": "3",
   "d": "4",
   "e": "5"
  },
  {
   "a": 4,
   "b": "5",
   "c": "6",
   "d": "7",
   "e": "8"
  },
  {
   "a": 4,
   "b": "5",
   "c": "6",
   "d": "7",
   "e": "8"
  },
  {
   "a": 4,
   "b": "5",
   "c": "6",
   "d": "7",
   "e": "8"
  }
 ]
}

How can I ignore such records?

[EDI] Huge memory leak when parsing ~20MB EDI file

Hi, we found a huge memory leak on parsing the EDI file with a size of over 20Mb

EDI is a sample INVOIC type with millions of LIN items.
attaching SVG of export --memprofile of benchmark,
it allocates over 5GB memory for the single ~23MB EDI file

is there any alternative to rather than calling ingester.Parse() for the large files?

How to add constant values in the schema?

I was wondering if we can add constants to our schema. I was able to add strings but unable to add boolean values.. Is there a way? I checked the docs but didn't find much.

EDIFACT parser segment skip

Could you please let me know if it's possible to skip a segment if it's not declared on schema but it's present on the input and vice versa?

Error generated:
bad request: transform failed. err: input 'test-input' at segment no.8 (char[247,247]): segment 'details/NAD' needs min occur 1, but only got 0
Input:
UNA:+.? '
UNB+UNOC:3+9999999999999:14+9999999999998:14+210419:1622+446047262+ORDERS'
UNH+1+ORDERS:D:96A:UN:EAN008'
BGM+220::9+6666666666+9'
DTM+137:20210419:102'
DTM+2:20210518:102'
FTX+PUR+3++STORE ORDER:DR01'
RFF+PUR+3++STORE ORDER PLEN:DR01'
NAD+BY+9999999999999::9'

Schema:
{
"parser_settings":{
"version":"omni.2.1",
"file_format_type":"edi"
},
"file_declaration":{
"segment_delimiter":"'",
"element_delimiter":"+",
"component_delimiter":":",
"ignore_crlf":true,
"segment_declarations":[
{
"name":"details",
"is_target":true,
"type":"segment_group",
"min":0,
"max":-1,
"child_segments":[
{
"name":"UNA",
"elements":[
{
"name":"random1",
"index":1
}
]
},
{
"name":"UNB",
"elements":[
{
"name":"syntaxIdentifier",
"index":1
},
{
"name":"buyerGln",
"index":2
},
{
"name":"sellerGln",
"index":3
},
{
"name":"docDate",
"index":4
},
{
"name":"transferNumber",
"index":5
},
{
"name":"documentType",
"index":6
}
]
},
{
"name":"UNH",
"elements":[
{
"name":"documentType2",
"index":1
},
{
"name":"fileFormatType",
"index":2
}
]
},
{
"name":"BGM",
"elements":[
{
"name":"orderType",
"index":1
},
{
"name":"orderNumber",
"index":2
},
{
"name":"SignatureForOriginal",
"index":3
}
]
},
{
"name":"DTM",
"elements":[
{
"name":"qualifierDocDate",
"index":1,
"component_index":1
},
{
"name":"docDate",
"index":1,
"component_index":2
},
{
"name":"formatDate",
"index":1,
"component_index":3
}
]
},
{
"name":"DTM",
"elements":[
{
"name":"qualifierDeliveryDate",
"index":1,
"component_index":1
},
{
"name":"deliveryDate",
"index":1,
"component_index":2
},
{
"name":"deliveryformatDate",
"index":1,
"component_index":3
}
]
},
{
"name":"FTX",
"elements":[
{
"name":"containPurchaseInformation",
"index":1
},
{
"name":"defaultValue",
"index":2
},
{
"name":"freeText1",
"index":4,
"component_index":1
},
{
"name":"freeText2",
"index":4,
"component_index":2
}
]
},
{
"name":"NAD",
"elements":[
{
"name":"partyQualifier1",
"index":1
},
{
"name":"partyGln1",
"index":2,
"component_index":1
},
{
"name":"partyIDcode1",
"index":2,
"component_index":3
}
]
}
]
}
]
},
"transform_declarations":{
"FINAL_OUTPUT":{
"object":{
"una_elem1":{
"xpath":"UNA/random1"
},
"header1":{
"object":{
"syntaxIdentifier":{
"xpath":"UNB/syntaxIdentifier"
},
"buyerGln":{
"xpath":"UNB/buyerGln"
},
"sellerGln":{
"xpath":"UNB/sellerGln"
},
"docDate":{
"xpath":"UNB/docDate"
},
"transferNumber":{
"xpath":"UNB/transferNumber"
},
"documentType":{
"xpath":"UNB/documentType"
}
}
},
"heade2":{
"object":{
"documentType2":{
"xpath":"UNH/documentType2"
},
"fileFormatType":{
"xpath":"UNH/fileFormatType"
}
}
},
"document":{
"object":{
"documentType2":{
"xpath":"BGM/orderType"
},
"orderNumber":{
"xpath":"BGM/orderNumber"
},
"SignatureForOriginal":{
"xpath":"BGM/SignatureForOriginal"
}
}
},
"docDate":{
"object":{
"qualifierDocDate":{
"xpath":"DTM/qualifierDocDate"
},
"docDate":{
"xpath":"DTM/docDate"
},
"formatDate":{
"xpath":"DTM/formatDate"
}
}
},
"deliveryDate":{
"object":{
"qualifierDeliveryDate":{
"xpath":"DTM/qualifierDeliveryDate"
},
"deliveryDate":{
"xpath":"DTM/deliveryDate"
},
"deliveryformatDate":{
"xpath":"DTM/deliveryformatDate"
}
}
},
"freeText":{
"object":{
"containPurchaseInformation":{
"xpath":"FTX/containPurchaseInformation"
},
"defaultValue":{
"xpath":"FTX/defaultValue"
},
"freeText1":{
"xpath":"FTX/freeText1"
},
"freeText2":{
"xpath":"FTX/freeText2"
}
}
},
"PartyInformation1":{
"object":{
"partyQualifier":{
"xpath":"NAD/partyQualifier1"
},
"partyGln":{
"xpath":"NAD/partyGln1"
},
"partyIDcode":{
"xpath":"NAD/partyIDcode1"
}
}
}
}
}
}
}

Skip a line before parsing a csv header line

From @MrRobo-t

OK.. Thanks for this.. Really appreciate your response.

I am working on a csv file and want to skip the first line which contains junk value, the second line contains my headers.. How >> can I make the schema understand that my headers begin from second row?

A,B,,
1,2,3,4

Protobuf input support

I was looking at converting protobuf to JSON. This tool would be useful if it supported that.

Segment is missing in output produced by "javascript_with_context"

Steps to reproduce the error:

Have a schema at least 3 - 4 level nested.
make sure you don't have elements details mentioned in schema, basically selecting no elements from particular segment in schema.

For me, I'm able to reproduce using below schema

{ "name": "TDT", "min": 1, "max": 99, "elements": [], "child_segments": [ { "name": "LOC", "min": 0, "max": 99, "elements": [ { "name": "reference", "index": 1 } ], "child_segments":[ { "name": "DTM", "min": 0, "max": 9, "elements": [ { "name": "date", "index": 1 } ] } ] } ] }

and transform_declaration is:
"transform_declarations": { "FINAL_OUTPUT": { "object": { "data": { "custom_func": { "name": "javascript_with_context", "args": [ { "const": "JSON.parse(_node)" } ] } } } } }

and the output is:
"TDT": [ [ { "DTM": { "date": "2" }, "reference": "10" }, { "reference": "7" } ], [ { "reference": "9" }, { "DTM": { "date": "10" }, "reference": "12" } ] ]
segment "LOC" is missing.

The issue is not happening if I add element details for TDT segment.

Let me know if any other details are required?

generic EDI schema

Love the project! thanks!!

Was wondering whether you have a generic EDI schema (it's okay if it is just for X12 docs) that one can use to convert any EDI doc to JSON without missing information. Thanks!

-- Todd

Flatten JSON file

Hi @jf-tech ,

is it possible to create a schema for flattening JSON object using current version? E.g.:

    "publishers": [
        {
            "name": "Scholastic Press",
            "books": [
                {
                    "title": "Harry Potter and the Philosopher's Stone",
                    "price": 9.99,
                    "author": "J. K. Rowling",
                    "year": 1997
                },
                {
                    "title": "Harry Potter and the Chamber of Secrets",
                    "price": 10.99,
                    "author": "J. K. Rowling",
                    "year": 1998
                }
            ]
        },
        {
            "name": "Harper & Brothers",
            "books": [
                {
                    "title": "Goodnight Moon",
                    "price": 5.99,
                    "author": "Margaret Wise Brown",
                    "year": 1947
                },
            ]
        }
    ]
}

[
  {
    "publisher_name": "Scholastic Press",
    "title": "Harry Potter and the Philosopher's Stone",
    "price": 9.99,
    "author": "J. K. Rowling",
    "year": 1997
  },
  {
    "publisher_name": "Scholastic Press",
    "title": "Harry Potter and the Chamber of Secrets",
    "price": 10.99,
    "author": "J. K. Rowling",
    "year": 1998
  },
  {
    "publisher_name": "Harper & Brothers",
    "title": "Goodnight Moon",
    "price": 5.99,
    "author": "Margaret Wise Brown",
    "year": 1947
  }
]

XLSX Parsing

Hi team,

I have a use case of parsing xlsx files. Not sure how to do it right now using custom file formats. I am looking to parse single sheet as well as multisheet xlsx files. Can you guys drop in an example around it?

Mogenius not responding ever after waiting several minutes

Hi team,

I tried the playground for Omniparser to try out schemas but the screen is stuck and not responding for several minutes.
Can you have a look at it?

CSV Example

In package csv, decl.go's RowsBased (called from reader.go ReadAndMatch) function says if r.Header == nil then it's rows based but the line above makes the header not nil? So now it thinks my CSV is Header and Footer based.

What am I missing?

JSON/XML to EDI conversion

Could you please clarify if JSON / XML data can be converted to EDI EDIFACT data?
If feasible, could you please provide a small example?
Thank you in advance.

Return unique hash for input

So i'm busy ingesting shipments, they arrive as either csv, json, xml or edi

The interface I'm working should take an array of shipments, divide that into individual shipments, hash those and store the original input for success/audit/retry/failure tracking reasons. This would make it easier to ingest 99/100 shipments and retry (after localizing and fixing the issue) that one shipment that's invalid for whatever reason.

In order to decide whether something has been ingested correctly I thought a solution could be hashing it 'unit' of input and storing the original input somewhere as well

Quite easy for csv

Weird python-and-bash-esque psuedocode:

for line in csv:
  process(line) && hash(line) && gzip(line) -> store result, hash, line in db

It becomes less so for json and xml, even marshal and unmarshal is not 100% identical to the input

Even worse is EDI

So, even though I liked the idea of storing the original it quickly becomse cumbersome. A decent alternative is is hashing and storing the output of transform.Read()

But that comes with several issues

I can change the output and thus the hash using the schema (not really an issue)
its not original (but it is more consistent (all json)), so kind of bug/feature
I don't see what I haven't told omniparser to see, so new fields that might have been added

None of these are a major issue, but part of hashing a new representation of the input, not the input itself

I was wondering how hard would it be to hash the input of whatever generates the output would be?
So:
hash, data, err := transform.Read

Is your internal data stable enough? That you could say 'for loop' the IDR input through the sha256 encoder (it supports streaming) and return a stable/unchanging hash?

As in, in theory ["a", "b", "c"] should return the same hash for a, b and c regardless of ordering

Also, I imagine being able to verify whether a file has been fully processed is interesting for more than one usecase

Non-authenticating EDI segment reader, any examples?

Issue parsing csv file delimited with asterisk

Any reason you know of that an asterisk-delimited file should fail parsing?

trying to parse an asterisk-delimited file, getting the following InvalidCsv Error message
record/record_group '' needs min occur 1, but only got 0

So apparently it's not able to recognize the line as a record. Not sure why, we've tried many different delimiters successfully with the same code/templates. We are specifying the delimiter in code.

Same input file parses successfully just replacing the asterisks with commas and specifying comma as the delimiter. So it's specifically the asterisk that is causing the error. Also seeing the same issue with the carat "^" character as a delimiter. Thanks for any assistance!

How to use nested custom_func, such as using upper+concat function

input:
{
"tracking_number": "1z9999999999999999"
}
expected output
{
"tracking_number": "1Z9999999999999999_ORD"
}

How to write schema json？
{
"parser_settings": {
"version": "omni.2.1",
"file_format_type": "json"
},
"transform_declarations": {
"FINAL_OUTPUT": {
"object": {
"tracking_number": {
"custom_func": [
{
"name": "upper",
"args": [ { "xpath": "tracking_number" } ]
},
{
"name": "concat",
"args": [ { "xpath": "tracking_number" } ,{ "const": "_ORD" }]
}
]
}
}
}
}
}

[EDI] Handle segment compression

Disclaimer: I only assume this is segment compression, as defined in the manual

7.1 Exclusion of segments
Conditional segments containing no data shall be omitted
(including their segment tags).

This is what I encountered in the schema, basically a mandatory/conditional sandwich.

SG25 R 99
43 NAD M 1
44 LOC Orts 9 O

SG25 R 99
45 NAD M 1
46 LOC O 9
    SG29 C 9
    47 RFF M 1

SG25 O 99
48 NAD M 1

SG25 D 99
49 NAD M 1

SG25 D 99
50 NAD M 1

SG25 O 99
51 NAD M 1

SG25 M 99
52 NAD M 1
    SG29 C 9
    53 RFF M 1

SG25 D 99
54 NAD M 1

SG25 R 99
55 NAD M 1

SG25 R 99
56 NAD M 1
    SG26 C 9
    57 CTA O 1
    58 COM O 9

None of the conditional statements were present in the data I was trying to parse, ended up fixing it using:

                    "name": "SG25-SENDER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "NAD",
                        "min": 1,
                        "elements": [
                          { "name": "cityName", "index": 1 },
                          { "name": "provinceCode", "index": 2 },
                          { "name": "postalCode", "index": 3 },
                          { "name": "countryCode", "index": 4 }
                        ]
                      },
                      { "name": "LOC", "min": 0 }
                    ]
                  },
                  {
                    "name": "SG25-RECEIVER",
                    "min": 1,
                    "type": "segment_group",
                    "child_segments": [
                      { "name": "NAD", "min": 1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },
                  {
                    "name": "SG25-OTHERS",
                    "min": 0,
                    "max": 99,
                    "type": "segment_group",
                    "child_segments": [
                      {
                        "name": "SG26",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [
                          { "name": "CTA", "min": 0 },
                          { "name": "COM", "min": 0, "max": -1 }
                        ]
                      },
                      { "name": "NAD", "min": 0, "max": -1 },
                      { "name": "LOC", "min": 0 },
                      {
                        "name": "SG29",
                        "min": 0,
                        "type": "segment_group",
                        "child_segments": [{ "name": "RFF", "min": 1 }]
                      }
                    ]
                  },

The message I'm trying to parse

NAD+CZ+46388514++Foo A/S+Foo 2+Foo++Foo+DK'
NAD+CN+46448510++NL01001 Foo Foo Foo:Foo+Foo 6+Foo++Foo+NL'
CTA+CN+AS:NL01001 Foo'
COM+0031765140344:TE'
[email protected]:EM'
NAD+LP+04900000250'

Which basically means, grab the two explicit ones (luckily at top), and do as you wish with the others in whatever order you encounter them. I'm not sure how I would have handled it if I did care about NAD+LP

Also had to use min/max 1 instead of the specified 99, as it only considers NAD, not NAD+FIRSTVALUE when 'collapsing' similar but not same segments.

Basically, the EDI specification has a lot of implicitness which I think is quite hard to easily parse.

JSONify2/J2NodeToInterface doesn't include attrs when translation nodes obtained from XML parsing.

given JSONify2 is used for serializing *node.Node to _node JSON for javascript custom_func, in XML input case, we might pass incomplete _node to the javascript custom_func.

how to parse an XML document with identical tags

I have an API that returns XML using the following structure:

<root>
    <objects>
        <numObjects>2</numObjects>
        <object>
            <field1></field1>
            <field2></field2>
        </object>
        <object>
            <field1></field1>
            <field2></field2>
        </object>
    </objects>
</root>

Is there a way to parse this using xpath/not resorting to a custom function? I tried "root/objects/object/*[position()=1]" but the omni parser picks up both objects in the output and doesnt seem capable of discriminating between the two.

the snippet from my schema:

 "array": [
                    {
                          "xpath": "root/objects/object/*[position()=1]",
                          "object": {
                              "key": { "const": "constValue"},
                              ...
                          }
                     }
             ]

produces the output:

 [
        {
          "key": "constValue"
        },
        {
          "key": "constValue"
        }
      ]

as if the XPATH picked up both objects erroneously. it also does not populate anything in my schema based on field1 or field2 xpaths nested inside the object.

JSON schema validation only allows alphanumeric property names

When using a valid template with non alphanumeric names for the properties, omniparser refuses the template because of patternProperties declaration such as this one: https://github.com/jf-tech/omniparser/blob/master/extensions/omniv21/validation/transformDeclarations.json#L87, in combination with the additionalProperties: false

Example of a template (notice the $ in the property name):

{
    "parser_settings": {
        "version": "omni.2.1",
        "file_format_type": "json"
    },
    "transform_declarations": {
        "FINAL_OUTPUT": {
            "object": {
                "$test": { "xpath": "/foo/bar" }
            }
        }
    }
}

Which returns this error:

transform_declarations.FINAL_OUTPUT.object: Additional property $test is not allowed
transform_declarations.FINAL_OUTPUT.object: Additional property $test is not allowed
transform_declarations.FINAL_OUTPUT: Must validate one and only one schema (oneOf)
transform_declarations.FINAL_OUTPUT: Must validate one and only one schema (oneOf)

Happy to provide additional information if needed.

Thanks

Change quote character in CSV format

In CSV file declaration section, we have a config to change the delimiter. I don't see a config to change the quote character. Is there a way to change?

Is it possible to chain javascript custom functions?

Hello. I was wondering if Is it possible to chain javascript or javascript_with_context custom functions? I haven't been able to get the right syntax if so. If I swap the value for name below from concat to javascript and add javascript args I get either invalid number of args or ILLEGAL javascript or cannot specify 'xpath' or 'xpath_dynamic' on both [sic: field and template].

I was able to figure out how to chain javascript and built in functions below. If not is the only solution to attempt to code both operations in a single javascript function?

"left_pad_with_zero_for_7_and_append_dash_old": {
	"custom_func": {
		"name": "concat",
		"args": [
			{
				"custom_func": {
					"name": "javascript",
					"args": [
						{
							"const": "if (x.length < 7) { i=0; b=7-x.length; for (i=0; i < b; i++) { x = '0' + x}} else { x };"
						},
						{ 
							"const": "x"
						},
						{
							"xpath": "."
						}
					]
				}
			},
			{
				"const": "-old"
			}
		]
	}
}

Add `idr.Node` pool and recycling to save repeated allocation

Some initial CSV reader prototype shows 10x reduction in memory alloc, and 2x reduction in latency (only 2x because recycling takes time to remove all nodes in the trees and putting them back to sync.Pool) -- still very much worth it!

fixedLength2 with footer failing.

Hi I'm trying to understand why parsing of this fixed length file is failing.
Error: input 'CNHI_input.txt' line 9: envelope/envelope_group 'FOOTER' needs min occur 1, but only got 0
I get that it's not finding the footer envelope, I'm just not sure why. I appreciate any help

the schema and input seem to be very similar to the example here: https://github.com/jf-tech/omniparser/blob/master/extensions/omniv21/samples/fixedlength2/3_header_footer.schema.json

CNHI_input.txt

CNHI_schema.txt

Merge `custom_parse` into `custom_func` and retire `custom_parse`?

Previously we created custom_parse because of limitation of return type of a custom_func must be string. Since we revamped custom_func type system because of javascript introduction, seems like there is no reason for custom_parse to exist anymore. custom_parse is simply just another custom_func with transformctx and idr.Node two arguments and an arbitrary interface{} return type.

Should we just deprecate and retire custom_parse?

custom_func uses properties from two arrays of objects

input data
{
"count":7,
"next":"xxx",
"previous":"xxx",
"results":[
{
"order_reference":"T3008659677",
"order_date":"2023-01-06T12:37:37",
"test_flag":true,
"supplier":"https://XXXX",
"currency_code":"GBP",
"subtotal":"1.40",
"tax":"8.40",
"total":"9.80",
"items":[
{
"url":"https://items",
"part_number":"RESKU1",
"supplier_sku_reference":"SUSKU1",
"line_reference":"1",
"quantity":3
},
{
"url":"https://items2",
"part_number":"RESKU2",
"supplier_sku_reference":"SUSKU2",
"line_reference":"2",
"quantity":1
}
],
"goods":[
{
"g_url":"https://goods",
"g_part_number":"RESKU1",
"g_supplier_sku_reference":"SUSKU1",
"g_line_reference":"1",
"g_quantity":3
},
{
"g_url":"https://goods2",
"g_part_number":"RESKU2",
"g_supplier_sku_reference":"SUSKU2",
"g_line_reference":"2",
"g_quantity":1
}
]
}
]}
expected output：
eg: use concat func = > iterm_url_number = items.url + “_” + g_part_number
How to write schema？Is there a sample?
{
"parser_settings": {
"version": "omni.2.1",
"file_format_type": "json"
},
"transform_declarations": {
"FINAL_OUTPUT": {
"object": {
"datas": {
"array": [{
"xpath": "/results/",
"object": {
"iterms_list": { "array": [ { "xpath": "items/", "object": {
"iterm_url": {
"xpath": "url"
},
"iterm_url_number": {
"custom_func": {
"name": "concat",
"args": [
{"xpath": "url"},
{"const": "_"},
{"xpath": "../goods//g_part_number"}
]
}
}
}} ] },
"goods_list": { "array": [ { "xpath": "goods/", "object": {
"iterm_part_number": {
"xpath": "g_part_number"
}
}} ] }
}
}]
}
}
}
}
}

no support for creating hierarchical structure in transform with CSV format

when parsing a CSV, each row is treated as a new root IDR DocumentNode. It would be convenient if the FINAL_OUTPUT transform could support adding each csv row as an object in an array.

specifically, using this schema:

{
  "parser_settings": {
    "version": "omni.2.1",
    "file_format_type": "csv"
  },
  "file_declaration": {
    "delimiter": ",",
    "replace_double_quotes": true,
    "data_row_index": 1,
    "columns": [ { "name": "col1" } ]
  },
  "transform_declarations": {
    "FINAL_OUTPUT": { "object": {
      "records": {
        "array": [
          {
            "object": {
              "col1": { "xpath": "col1" }
            }
          }
        ]
      }
    }}
  }
}

i would hope to be able to transform a csv into the following json output:

{
  "records": [
    {
      "col1": "abc"
    },
    {
      "col1": "abc"
    },
    ...
  ]
}

unfortunately, i don't think this is currently possible. the closest thing i can achieve without modifying the byte slices returned by transform.Read() gives me something like:

{
  "records": [
    {
      "col1": "abc"
    }
  ]
}
{
  "records": [
    {
      "col1": "abc"
    }
  ]
}
...

Notice how these are not elements in an array, but separate root objects, each returned from a new Read() call.

I don't fully grok the IDR system, so I'm not sure how large of a feature request this is - but I would assume this could be as easy as adding some new optional DocumentNode root, and adding each csv row node as a child to this root node?

Parser only loops 10 times

I can not seem to get this to loop EB more than 10 times. Any help would be appreciated .

{
    "parser_settings": {
        "version": "omni.2.1",
        "file_format_type": "edi"
    },
    "file_declaration": {
        "segment_delimiter": "~",
        "element_delimiter": "*",
        "component_delimiter": "|",
        "ignore_crlf": true,
        "segment_declarations": [
            {
                "name": "ISA",
                "child_segments": [
                    {
                        "name": "GS",
                        "child_segments": [
                            {
                                "name": "transaction_set_id",
                                "type": "segment_group",
                                "is_target": true,
                                "child_segments": [
                                    {
                                        "name": "ST",
                                        "elements": [
                                            {
                                                "name": "X12Form",
                                                "index": 1
                                            },
                                            {
                                                "name": "TransactionSetControlNumber ",
                                                "index": 2
                                            },
                                            {
                                                "name": "ImplementationConventionReference",
                                                "index": 3
                                            }
                                        ]
                                    },
                                    {
                                        "name": "BHT",
                                        "min": 0,
                                        "elements": [
                                            {
                                                "name": "BHT01",
                                                "index": 1
                                            },
                                            {
                                                "name": "BHT04",
                                                "index": 3
                                            },
                                            {
                                                "name": "BHT05",
                                                "index": 4
                                            },
                                            {
                                                "name": "BHT06",
                                                "index": 5
                                            }
                                        ]
                                    },
                                    {
                                        "name": "HL",
                                        "type": "segment_group",
                                        "min": 0,
                                        "max": -1,
                                        "child_segments": [
                                            {
                                                "name": "HL",
                                                "elements": [
                                                    {
                                                        "name": "HL1",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "HL4",
                                                        "index": 4
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "TRN",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "TRN00",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "TRN01",
                                                        "index": 2
                                                    },
                                                    {
                                                        "name": "TRN02",
                                                        "index": 3
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "NM1",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "NM1101",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "NM1102",
                                                        "index": 2
                                                    },
                                                    {
                                                        "name": "NM1103",
                                                        "index": 3
                                                    },
                                                    {
                                                        "name": "NM1108 ",
                                                        "index": 8
                                                    },
                                                    {
                                                        "name": "NM1109",
                                                        "index": 9
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "N3",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "N301",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "N302",
                                                        "index": 2
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "N4",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "N401",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "N402",
                                                        "index": 2
                                                    },
                                                    {
                                                        "name": "N403",
                                                        "index": 3
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "DMG",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "DMG01",
                                                        "index": 1
                                                    },
                                                    {
                                                        "name": "DMG02",
                                                        "index": 2
                                                    },
                                                    {
                                                        "name": "DMG03",
                                                        "index": 3
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "DTP",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "DTP01",
                                                        "index": 1
                                                    }
                                                ]
                                            }
                                        ]
                                    },
                                    {
                                        "name": "EB",
                                        "min": 0,
                                        "max": -1,
                                        "type": "segment_group",
                                        "child_segments": [
                                            {
                                                "name": "EB",
                                                "elements": [
                                                    {
                                                        "name": "EB01",
                                                        "index": 1,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB02",
                                                        "index": 2,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB03",
                                                        "index": 3,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB04",
                                                        "index": 4,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB05",
                                                        "index": 5,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB06",
                                                        "index": 6,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB07",
                                                        "index": 7,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "EB08",
                                                        "index": 8,
                                                        "default": ""
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "DTP",
                                                "min": 0, "max": -1,
                                                "elements": [
                                                    {
                                                        "name": "DTP01",
                                                        "index": 1,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "DTP02",
                                                        "index": 2,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "DTP03",
                                                        "index": 3,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "DTP04",
                                                        "index": 4,
                                                        "default": ""
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "LS",
                                                "min": 0, "max": -1,
                                                "elements": [
                                                    {
                                                        "name": "LS01",
                                                        "index": 1,
                                                        "default": ""
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "HSD",
                                                "min": 0,
                                                "max": -1,
                                                "type": "segment_group",
                                                "child_segments": [
                                                    {
                                                        "name": "HSD",
                                                        "min": 0,
                                                        "elements": [
                                                            {
                                                                "name": "HSD01",
                                                                "index": 1,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD02",
                                                                "index": 2,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD03",
                                                                "index": 3,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD04",
                                                                "index": 4,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD05",
                                                                "index": 5,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD06",
                                                                "index": 6,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD07",
                                                                "index": 7,
                                                                "default": ""
                                                            },
                                                            {
                                                                "name": "HSD08",
                                                                "index": 8,
                                                                "default": ""
                                                            }
                                                        ]
                                                    }
                                                ]
                                            },
                                            {
                                                "name": "MSG",
                                                "min": 0,
                                                "elements": [
                                                    {
                                                        "name": "MSG01",
                                                        "index": 1,
                                                        "default": ""
                                                    },
                                                    {
                                                        "name": "MSG02",
                                                        "index": 2,
                                                        "default": ""
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            {
                "name": "SE",
                "min": 0
            },
            {
                "name": "IEA",
                "min": 0
            }
        ]
    },
    "transform_declarations": {
        "FINAL_OUTPUT": {
            "object": {
                "transaction_set_id1": {
                    "xpath": "ST/X12Form"
                },
                "transaction_set_id2": {
                    "xpath": "ST/TransactionSetControlNumber"
                },
                "transaction_set_id3": {
                    "xpath": "ST/ImplementationConventionReference"
                },
                "BHT01": {
                    "xpath": "BHT/BHT01"
                },
                "BHT04": {
                    "xpath": "BHT/BHT04"
                },
                "BHT05": {
                    "xpath": "BHT/BHT05"
                },
                "BHT06": {
                    "xpath": "BHT/BHT06"
                },
                "HL": {
                    "array": [
                        {
                            "xpath": "HL",
                            "object": {
                                "HL1": {
                                    "xpath": "HL/HL1"
                                },
                                "HL4": {
                                    "xpath": "HL/HL4"
                                },
                                "TRN": {
                                    "xpath": "TRN/TRN00"
                                },
                                "TRN1": {
                                    "xpath": "TRN/TRN01"
                                },
                                "TRN2": {
                                    "xpath": "TRN/TRN02"
                                },
                                "NM1": {
                                    "xpath": "NM1/NM1101"
                                },
                                "NM2": {
                                    "xpath": "NM1/NM1102"
                                },
                                "NM3": {
                                    "xpath": "NM1/NM1103"
                                },
                                "NM8": {
                                    "xpath": "NM1/NM1108"
                                },
                                "NM9": {
                                    "xpath": "NM1/NM1109"
                                },
                                "N3": {
                                    "xpath": "N3/N301"
                                },
                                "N302": {
                                    "xpath": "N3/N302"
                                },
                                "N4": {
                                    "xpath": "N4/N401"
                                },
                                "N402": {
                                    "xpath": "N4/N402"
                                },
                                "N403": {
                                    "xpath": "N4/N403"
                                },
                                "DMG01": {
                                    "xpath": "DMG01/DMG01"
                                },
                                "DM02": {
                                    "xpath": "DMG02/DMG02"
                                },
                                "DMG03": {
                                    "xpath": "DMG03/DMG03"
                                },
                                "DMG04": {
                                    "xpath": "DMG04/DMG04"
                                }
                            }
                        }
                    ]
                },
                "EB": {
                    "array": [
                        {
                            "xpath": "EB",
                            "object": {
                                "EB": {
                                    "xpath": "EB/EB01"
                                },
                                "EB2": {
                                    "xpath": "EB/EB02"
                                },
                                "EB3": {
                                    "xpath": "EB/EB03"
                                },
                                "EB4": {
                                    "xpath": "EB/EB04"
                                },
                                "EB5": {
                                    "xpath": "EB/EB05"
                                },
                                "DTP1": {
                                    "xpath": "DTP/DTP01"
                                },
                                "DTP2": {
                                    "xpath": "DTP/DTP02"
                                },
                                "DTP3": {
                                    "xpath": "DTP/DTP03"
                                },
                                "HSD": {
                                    "array": [
                                        {
                                            "xpath": "HSD",
                                            "object": {
                                                "HSD1": {
                                                    "xpath": "HSD/HSD01"
                                                },
                                                "HSD2": {
                                                    "xpath": "HSD/HSD02"
                                                },
                                                "HSD3": {
                                                    "xpath": "HSD/HSD03"
                                                },
                                                "HSD4": {
                                                    "xpath": "HSD/HSD04"
                                                },
                                                "HSD5": {
                                                    "xpath": "HSD/HSD05"
                                                },
                                                "HSD6": {
                                                    "xpath": "HSD/HSD06"
                                                },
                                                "HSD7": {
                                                    "xpath": "HSD/HSD07"
                                                },
                                                "HSD8": {
                                                    "xpath": "HSD/HSD08"
                                                }
                                            }
                                        }
                                    ]
                                },
                                "MSG1": {
                                    "xpath": "MSG/MSG01"
                                },
                                "MSG2": {
                                    "xpath": "MSG/MSG02"
                                },
                                "LS": {
                                    "xpath": "LS/LS01"
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

ISA*00* *00* *ZZ*CMS *ZZ*SUBMITTERID *171104*0734*^*00501*111111111*0*P*|~
GS*HB*CMS*SUBMITTERID*20171104*07340000*1*X*005010X279A1~
ST*271*0001*005010X279A1~
BHT*0022*11*TRANSA*20171104*07342355~
HL*1**20*1~
NM1*PR*2*CMS*****PI*CMS~
HL*2*1*21*1~
NM1*1P*2*IRNAME*****XX*1234567893~
HL*3*2*22*0~
TRN*2*TRACKNUM*ABCDEFGHIJ~
NM1*IL*1*LNAME*FNAME*M***MI*123456789A~
N3*ADDRESSLINE1*ADDRESSLINE2~
N4*CITY*ST*ZIPCODE~
DMG*D8*19400401*F~
DTP*307*RD8*20170101-20171204~
EB*6**30~
DTP*307*RD8*20170101-20170108~
EB*I**41^54~
EB*1**88~
EB*1**30^10^42^45^48^49^69^76^83^A5^A7^AG^BT^BU^BV*MA~
DTP*291*D8*20050401~
EB*D**30*MA~
DTP*292*RD8*20170116-20170120~
EB*C**30*MA**26*1316~
DTP*291*RD8*20170101-20171231~
EB*C**30*MA**29*1316~
DTP*291*RD8*20170101-20171231~
EB*C**30*MA**29*0~
DTP*291*RD8*20170116-20170120~
EB*C**42^45*MA**26*0~
DTP*292*RD8*20170101-20171231~
EB*B**30*MA**26*0~
HSD***DA**30*0~
HSD***DA**31*60~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**30*MA**7*329~
HSD***DA**30*60~
HSD***DA**31*90~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**30*MA**26*0~
HSD***DA**29*60~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**30*MA**7*329~
HSD***DA**29*30~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**30*MA**26*0~
HSD***DA**29*56~
HSD*****26*1~
DTP*435*RD8*20170116-20170120~
EB*B**30*MA**7*329~
HSD***DA**29*30~
HSD*****26*1~
DTP*435*RD8*20170116-20170120~
EB*B**AG*MA**26*0~
HSD***DA**30*0~
HSD***DA**31*20~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**AG*MA**7*164.50~
HSD***DA**30*20~
HSD***DA**31*100~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**AG*MA**26*0~
HSD***DA**29*20~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**AG*MA**7*164.50~
HSD***DA**29*80~
HSD*****26*1~
DTP*435*RD8*20170101-20171231~
EB*B**AG*MA**26*0~
HSD***DA**29*16~
HSD*****26*1~
DTP*435*RD8*20170116-20170120~
EB*B**AG*MA**7*164.50~
HSD***DA**29*80~
HSD*****26*1~
DTP*435*RD8*20170116-20170120~
EB*K**30*MA**32***DY*60~
EB*K**30*MA**33***DY*58~
EB*K**30*MA**7*658~
DTP*435*RD8*20170101-20171231~
EB*K**A7*MA**32***DY*190~
EB*K**A7*MA**33***DY*180~
EB*D**45*MA**26***99*1~
EB*1**30^2^3^5^10^14^23^24^25^26^27^28^33^36^37^38^39^40^42^50^51^52^53^67^69^73^76^83^86^98^A4^A6^
A8^AD^AE^AF^AI^AJ^AK^AL^BF^BG^BT^BU^BV^DM^UC*MB~
DTP*291*D8*20050401~
EB*C**30*MB**23*183~
DTP*291*RD8*20170101-20171231~
EB*C**30*MB**29*0~
DTP*291*RD8*20170101-20171231~
EB*A**30*MB**27**.2~
DTP*291*RD8*20170101-20171231~
EB*C**42^67^AJ*MB**23*0~
DTP*292*RD8*20170101-20171231~
EB*A**42^67^AJ*MB**27**0~
DTP*292*RD8*20170101-20171231~
EB*C***MB**23*0******HC|80061~
DTP*292*D8*20171104~
EB*A***MB**27**0*****HC|80061~
DTP*292*D8*20171104~
EB*D***MB*********HC|80061~
DTP*348*D8*20130105~
EB*D***MB*********HC|G0117~
DTP*348*D8*20120107~
EB*F**67*MB**22***VS*8~
HSD*VS*6***29~
EB*D**AD*MB***200~
DTP*292*RD8*20170101-20171231~
MSG*USED AMOUNT~
EB*D**AE*MB***0~
DTP*292*RD8*20170101-20171231~
MSG*USED AMOUNT~
EB*F**BF*MB**29***CA*72~
MSG*Professional~
EB*F**BF*MB**29***CA*72~
MSG*Technical~
EB*F**BG*MB*****99*0~
MSG*Professional~
EB*F**BG*MB*****99*0~
MSG*Technical~
EB*F**BG*MB*****99*15~
MSG*Intensive Cardiac Rehabilitation – Professional~
EB*F**BG*MB*****99*15~
MSG*Intensive Cardiac Rehabilitation – Technical~
EB*X**42***26~
DTP*472*RD8*20161222-20170116~
LS*2120~
NM1*PR*2*ORGNAME*****PI*CONTR~
NM1*1P*2******XX*1234567890~
LE*2120~
EB*X************HC|G0180~
DTP*193*D8*20140101~
EB*X************HC|G0179~
DTP*193*D8*20140501~
DTP*193*D8*20140301~
EB*X**45*MA**26~
DTP*292*RD8*20170201-20170301~
MSG*Revocation Code – 1~
LS*2120~
NM1*1P*2******XX*1234567890~
LE*2120~
EB*D**14*MB~
DTP*356*D8*20110601~
DTP*096*D8*20130105~
EB*E**10***23***DB*3~
HSD*FL*2***29~
DTP*292*RD8*20170101-20171231~
EB*R**88*OT~
REF*18*S0000 999~
DTP*292*D8*20130101~
LS*2120~
NM1*PRP*2*ORGNAME~
N3*ADDRESSLINE1*ADDRESSLINE2~
N4*CITY*ST*ZIPCODE~
PER*IC**TE*AAABBBCCCC*UR*www.website.com~
LE*2120~
EB*R**30*IN~
REF*18*H0000 999~
DTP*290*D8*20090101~
MSG*MCO Bill Option Code- C~
LS*2120~
NM1*PRP*2*ORGNAME~
N3*ADDRESSLINE1*ADDRESSLINE2~
N4*CITY*ST*ZIPCODE~
PER*IC**TE*AAABBBCCCC*UR*www.website.com~
LE*2120~
EB*R**30*13~
REF*IG*GROUPCOVERAGEPLANPOLICYNUMBER~
DTP*290*RD8*20110601-20170601~
LS*2120~
NM1*PRP*2*ORGNAME~
N3*ADDRESSLINE1*ADDRESSLINE2~
N4*CITY*ST*ZIPCODE~
LE*2120~
SE*181*0001~
GE*1*1~
IEA*1*111111111~

[feature] parsing multiple nested records under fixed-length parser

Hi,

Are there plans or some documentation around how to support parsing of multiple nested objects in fixed-length parser?

i.e. format like this one, that have repeating and nested elements like NWR under GRHNWR and also SPU SPT, SWT, SWR under NWR records.

HDRPB123456789SAMPLE MEDIA MUSIC                        01.1000000003200123412340713
GRHNWR0000102.100000003035

NWR0000000000000000Song 1 - 1 Pub1/ 0 Wrt                                        00000000100001T000000000100000000            POP000000Y      ORI         TM (BULK)
SPU000000000000000101Pub10    Publisher 10                         E 00000000000700000010              010012000340120009901200 N                                          OG
SPT0000000000000002RMM            002090041800000I2136N001

NWR0000000100000000Song 2 - 1 Pub2/ 0 Wrt // 1 SPU-AM                            00000000100002T000000000200000000            POP000000Y      ORI         TM (BULK)
SPU000000010000000101Pub20    Publisher 20                         E 00000000000700000020              010012000340120009901200 N                                          OG
SPU000000010000000201RMM      SAMPLE MEDIA MUSIC                   AM000000000005470437533745610       010000000340000009900000 N                            5594837       PG
SPT0000000100000003RMM            002090041800000I2136N001

NWR0000000300000000Song 4 - 1 Pub1/ 1 Wrt1                                       00000000100004T000000000400000000            POP000000Y      ORI         TM (BULK)
SPU000000030000000101Pub10    Publisher 10                         E 00000000000700000010              010012000340120009901200 N                                          OG
SPT0000000300000002RMM            002090041800000I2136N001
SWR0000000300000003Wrt100   Writer 100                                   Controlled                     CA00000000000700000100010040000340400009904000 N
SWT0000000300000004Wrt100   004180000000000I2136N001

NWR0000000500000000Song 5 - 2 Pub1,2 / 2 Wrt - 1 new 1 old                       00000000100005T000000000500000000            POP000000Y      ORI         TM (BULK)
SPU000000050000000101Pub10    Publisher 10                         E 00000000000700000010              010010000340100009901000 N                                          OG
SPT0000000500000002RMM            002090041800000I2136N001
SPU000000050000000301Pub50    Publisher 50                         E 00000000000700000050              010010000340100009901000 N                                          OG
SPT0000000500000004RMM            002090041800000I2136N001
SWR0000000500000005Wrt100   Writer 100                                   Controlled                     CA00000000000700000100010030000340300009903000 N
SWT0000000500000006Wrt100   004180000000000I2136N001
SWR0000000500000007Wrt500   Writer 500                                   Controlled                     CA00000000000700000500010030000340300009903000 N
SWT0000000500000008Wrt500   004180000000000I2136N001

GRT000010000000100000163
TRL000010000000100000165

Tab delimiter

Is it possible to use tab as a file delimiter? If so, what is the proper way to assign it? Should the header use tabs as well?

OMNIPARSER Playground Broken

There seems to be some issue with your playground on heroku.

Disable or change sorting of generated keys

Hello there,

first of I want to thank you for creating this amazing library and making it open source!!

I am currently trying to transform some XML data and noticed that the keys generated in the output file are always sorted alphabetically resulting in confusing scenarios, where keys with incrementing numerical suffixes are out of order:

KEY1
KEY10
KEY2
KEY3
KEY4
...

I wasn't able to find anything in the documentation about this and searching the code didn't help my either (I found some schema validation methods, where sort occurs, but disabling those didn't change the result)

Is it possible to adjust the sorting behavior or disable it altogether and relying on the order inside the schema?

Edi parsing failing, with error segment needs min occur 1, but only got 0

I am not able to parse the EDI. The code for the issue can be replicated from this repo https://github.com/rohanbr/edi-transformer/tree/feat/edi_parser. when i start go application, i am getting error segment no.1 (char[1,1]): segment 'ISA' needs min occur 1, but only got 0