Giter Site home page Giter Site logo

Header and Trailer records about cobrix HOT 7 CLOSED

absaoss avatar absaoss commented on August 30, 2024
Header and Trailer records

from cobrix.

Comments (7)

yruslan avatar yruslan commented on August 30, 2024

There is support for similar files (record_start_offset, record_start_offset options), but it only works for fixed record length files and for variable length files if record sizes are determined by a length field present in a copybook, not by RDW headers.

Currently we are working on custom record header parser support to be able to load variable record length files that do not contain RDW, but some custom record size field/discriminator.

I also think supporting the record layout above could be useful. I have a couple of questions about it:

  • Do HEADER and TRAILER records are real records? I mean, do they each contain an RDW header?
  • Do you want just to filter them out or to extract some information from them?
  • If you want just to filter them out, would just filtering records by record sizes help in this case?

from cobrix.

tr11 avatar tr11 commented on August 30, 2024

No record contains the RDW header, unfortunately. The layout is really just some info appended to the beginning and to the end of each file. I can still read it and discard the first and last records (since those correspond to the header/trailer), but it'd be great if I could extract info from those without either (1) creating an aggregate copybook, or (2) read the file twice to extract the header.

from cobrix.

yruslan avatar yruslan commented on August 30, 2024

Is it a fixed record length file?
If this is the case it can be done using the above options like this:
https://github.com/AbsaOSS/cobrix/blob/master/spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test2RecordOffsetsSpec.scala

E.g. You just specify the number of bytes to skip at the beginning and at the end of each record.

from cobrix.

tr11 avatar tr11 commented on August 30, 2024

But that's per record, right? Mine happen at the file level, one header and one trailer per file.

from cobrix.

yruslan avatar yruslan commented on August 30, 2024

Oh, I see. Sorry, I've misunderstood the issue.

Yes, I think it makes perfect sense to add file header / footer offsets at file level. Adding to the backlog.

from cobrix.

tr11 avatar tr11 commented on August 30, 2024

This would probably be automatically handled in the setting of the new custom record header parser and of issue #33.

We could merge segments from different copybooks into a larger one and process all segments (could be a potential solution to #33) provided we pass a function that points to the correct segment (in this case knowing the record offset within the file determines whether it's a header, trailer, or regular record).

from cobrix.

yruslan avatar yruslan commented on August 30, 2024

Yeah, this might be handled by a custom record header parser by returning isValid=false for the header and the trailing record. But it might involve too much boilerplate code so let's keep this issue in the backlog for now.

Issue #33 is about loading a hierarchical data structure all at once without splitting and joining segments. Each child segment will become an array of structs for a parent segment. It absolutely makes sense to ignore records which are neither a parent or a child (or a child of a child etc). The header and the trailing record will be (I think) ignored. Yes, solving #33 could help a lot for loading hierarchical DBs into hierarchical dataframes.

from cobrix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.