Giter Site home page Giter Site logo

gedcom.io's People

Contributors

bigderekwatkins avatar clarkegj avatar dapug avatar davidmstraub avatar dependabot[bot] avatar dthaler avatar dthaler2 avatar fisharebest avatar funwithbots avatar jfcardinal avatar jimmyz avatar tychonievich avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gedcom.io's Issues

maximal70.ged contains @VOID@ without PHRASE

Some structures in the file use a void pointer, but do not use an associated PHRASE. Whilst I'm not sure this is strictly illegal, it doesn't really offer any useful information. Could we add a phrase for these void pointers? I currently have validation checks for this and it's breaking my tests.

Possible semantic contradictions in maximal.ged

In maximal.ged, we have

  • Semantic contradictions
    • FAM F1 has both ANUL Y and NO ANUL without a date, asserting both "there was an annulment" and "there wasn't an annulment"
    • INDI I1 has both EMIG and NO EMIG without a date, asserting both "there was an emigration" and "there wasn't an emigration"
  • Semantically-void structures
    • Many ASSO @VOID@ with no PHRASE substructure, as e.g. (omitting some lines)

      0 @I1 INDI
      1 DEAT
      2 ASSO @VOID@
      3 ROLE PARENT
      2 NOTE Note Text

      This appears to say "someone was a parent of the deceased," a self-evidently true statement. There are many other similar examples throughout maximal.ged. They also seem to violate the spirit of the spec saying

      A voidPtr and PHRASE can be used to describe associations to people not referenced by any INDI record.

  • Meaningless structures
    • FAM F1 has a SUBM @VOID@ and OBJE @VOID@ with no substructures
    • INDI I1 has a NAME.SNOTE @VOID@ and ALIA @VOID@ with no substructures

We could leave these, arguing that the file is not designed to be sensible, just to exercise every part of the spec. But I think I would support parsers that flag these as errors and/or as prunable zero-information structures, and hence rather not have them in maximal.ged

typo in FAMC definition

on page 74 we have:
FAMC (Family child) g7:ADOP-FAMC
The individual or couple that adopted this this individual.

Should read:
The individual or couple that adopted this individual.

Migration guide example has invalid GEDCOM 5.5.1

https://gedcom.io/migrate/#non-pointer-sour-substructures has this 5.5.1:

1 DEAT
2 DATE 1910
2 SOUR Letter from Alice Smith, 13 April 1946
3 PAGE According to his grand-daughter, who was present.

The GEDCOM 5.5.1 spec however has:

SOURCE_CITATION:=
[ /* pointer to source record (preferred)*/
n SOUR @<XREF:SOUR>@ {1:1} p.27
+1 PAGE <WHERE_WITHIN_SOURCE> {0:1} p.64
+1 EVEN <EVENT_TYPE_CITED_FROM> {0:1} p.49
+2 ROLE <ROLE_IN_EVENT> {0:1} p.61
+1 DATA {0:1}
+2 DATE <ENTRY_RECORDING_DATE> {0:1} p.48
+2 TEXT <TEXT_FROM_SOURCE> {0:M} p.63
+3 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} p.37, 26
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} p.43
| /* Systems not using source records */
n SOUR <SOURCE_DESCRIPTION> {1:1} p.61
+1 [CONC|CONT] <SOURCE_DESCRIPTION> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:M} p.63
+2 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} p.37, 26
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} p.43
]

That is, the PAGE substructure is only legal when @XREF:SOUR@ is present, it cannot appear when not using a source record.

Sample Files Usefulness

Here are my results testing the available files with MacFamilyTree 10:

escapes.ged: Works, no issues

extension-record.ged: _LOC is not imported, but user is alerted of this behaviour. A future version will be able to import custom structures to user-defined fields

long-url.ged: Works, no issues

maximal70.ged: Several issues: Duplicate PHON and EMAIL tags in Header are unsupported and silently dropped, „STAT CHALLENGED“ unsupported, PHRASE for ASSO is unsupported, OBJE->FILE->MEDI->PHRASE unsupported, PHRASE from SDATE silently dropped…

minimal70.ged: Works, no issues

remarriage1.ged: Works, no issues

remarriage2.ged: Works, no issues

same-sex-marriage.ged: Works, no issues

spaces.ged: Works, no issues (values are considered empty if they are just spaces)

voidptr.ged: @void@ is currently considered „not set“. This will change in a subsequent release of MacFamilyTree.

I need to investigate maximal70.ged further – this file is of tremendous help.

Best,
Mendel

gedcom-registries discussion

  • Currently gedcom-registries is only mentioned on the Community page, not Tools or Guides or Specifications. This seems like a gap... perhaps Tools is most appropriate?
  • Currently the registry path puts all structure YAML files in the same directory, regardless of version. Of course only v7 URIs exist currently, but there could be v7.1 or v8 URIs in the future, which would need separate files (e.g., because of different superstructure lists or substructure lists). Also there could be v5.5.1 extension structures that one wants to register now as well. Shouldn't we create a directory structure, e.g., by version?

Listing superstructures of relocated standard structures

https://gedcom.io/terms/format has a "superstructures" section per structure.
However a relocated standard structure is supposed to have the same URI as the standard structure, so an extension that has a relocated standard structure can't have a different YAML file.

Should the standard YAML file be updated for each extension that relocates it?
Or is there no way to specify the "superstructures" section, and only rely on the "substructures" section of the extension structure it's relocated under?

example file `spaces.ged` violates the spec

I think that testfiles/gedcom70/spaces.ged should be removed. Its internal TYPEs suggest a subtle yet important misunderstanding of the specification: conflating LineVal and payload.

The LineVal is the sequence of characters after a space after the tag and the before the newline. It must not be empty and if it starts with an @ it must start with two of them.

The payload is the string made by processing the LineVal and any following CONT pseudo-structures. It de-duplicates off leading @@, adds line breaks for CONTs, and is the empty string if there was no LineVal. It also treats the empty string and the lack of a payload as equivalent; or, in more theoretical language, payloads implicitly cast between Ɛ (the empty string) and ∅ (no string at all) as needed by their structure type's expected payload.

Thus

  • the line 1 XYZ has no LineVal and has a payload that is equivalently either "" or null.
  • the line 1 XYZ is a syntax error: it has a space after the tag but no LineVal, which is forbidden by the spec
  • the phrase "with space but no payload" in spaces.ged is nonsensical, and should say "with syntax error caused by space but no LineVal"

Copyright statements

In the General FAQ it states the following:

How can I use the published specifications?
Permalink
The specifications may be copied for the purpose of reviewing or programming of genealogical software, provided the notice below is included:

Copyright © 1987, 1989, 1992, 1993, 1995, 1999, 2019, 2021 by The Church of Jesus Christ of Latter-day Saints. All rights reserved.

In the introduction to the specification it states:

In accordance with the Apache 2.0 license that governs this work, any other work that is based on or derived from this work must include a readable copy of the following NOTICE. For more information, please refer to the full copy of the Apache 2.0 license.

NOTICE:

This work comprises, is based on, or is derived from the FAMILYSEARCH GEDCOM™ Specification, © 1984-2022 Intellectual Reserve, Inc. All rights reserved.

“FAMILYSEARCH GEDCOM™” and “FAMILYSEARCH®” are trademarks of Intellectual Reserve, Inc. and may not be used except as allowed by the Apache 2.0 license that governs this work or as expressly authorized in writing and in advance by Intellectual Reserve, Inc.

Is there any way to make these two statements converge to be more consistent?

gedcom7code/test-files vs FamilySearch/GEDCOM.io/testfiles

https://gedcom.io/tools/ points to a number of sample FamilySearch GEDCOM 7.0 and GEDZIP files.

https://github.com/gedcom7code/test-files/tree/main/7 also has some 7.0 GEDCOM files that test some things not currently covered by files in https://github.com/FamilySearch/GEDCOM.io/tree/master/testfiles/gedcom70

Options:

  1. Update https://github.com/FamilySearch/GEDCOM.io/blob/master/_pages/tools.md#example-familysearch-gedcom-70-files to also point to files in the gedcom7code/test-files repository.
  2. Copy such files into the FamilySearch/GEDCOM.io repository. (This would be my preference.)

If we go with option 2, we should also consider copying GEDCOM 5.5.1 files from https://github.com/gedcom7code/test-files/tree/main/5 to say https://github.com/FamilySearch/GEDCOM.io/tree/master/testfiles/gedcom551 (which doesn't currently exist).

Add a next-minor branch with maximal70.ged

  • INDI.SEX.EXID (multiple).TYPE
  • INDI.FAMC.EXID (multiple).TYPE
  • REPO.RESN
  • SNOTE.RESN
  • <<SOURCE_RECORD>>.RESN
  • SUBM.RESN
  • <<EVENT_DETAIL>>.DATE.TIME.PHRASE
  • <<EVENT_DETAIL>>.SDATE.TIME.PHRASE
  • <<EVENT_DETAIL>>.EXID (multiple).TYPE
  • <<LDS_ORDINANCE_DETAIL>>.DATE.TIME.PHRASE
  • NOTE.RESN
  • NOTE.<<IDENTIFIER_STRUCTURE>> (multiple)
  • NAME.LANG
  • NAME.EXID (multiple).TYPE
  • SOUR.RESN
  • SOUR.DATA.DATE.TIME.PHRASE
  • SOUR.EXID (multiple).TYPE

Extension tag URIs in YAML file

https://gedcom.io/terms/format has an "extension tags" section which contains extTag values.

However the extTag is simply the tag:

extTag  = underscore 1*tagchar

and there seems to be nowhere to list the URI if any. So if we have a documented extension tag that is deployed today, with a URI pointing to a YAML file for it, and then later a standard tag is assigned, it will result in a new standard URI that is semantically equivalent to the documented extension tag's URI. But there seems to be no way to correlate these URIs as having the same meaning. One can add an HTTP redirect perhaps, but that seems fragile.

Request for @ escapes in maximal70.ged

It would be useful if the maximal file had examples of @ symbols, both within line values (which already exists), and with @ escapes at the beginning of line values, including on CONT lines. This will be useful for testing purposes.

Confusion around how to represent stillborn

GEDCOM 5.5.1 had "AGE STILLBORN" and also said:

Other descriptor values might include, for example, 'stillborn' as a qualifier to BIRTh or `Tribal Custom' as a qualifier to MARRiage.

FamilySearch GEDCOM 7 has only the latter:

Using the subordinate TYPE classification method provides a further classification of
the superstructure but does not change its basic meaning.
Example — A MARR (p.71) with a TYPE could clarify what kind of marriage was
performed:
This classifies the entry as a common law marriage but the event is still a mar‐
riage event.
Other descriptor values might include, for example,
“Stillborn” as a qualifier to BIRT (p.61) (birth)
“Tribal Custom” as a qualifier to MARR (marriage)
“College” as a qualifier to GRAD (p.69) (graduation)
See also FACT and EVEN for additional examples.

However, the migration guide (https://gedcom.io/migrate/#age-words) says only:

  • STILLBORN was defined to mean 0y and also to imply the existence of a DEAT event

These have been removed from 7.0 in preference for their simpler and more expressive year-based forms. A PHRASE may be used to clarify the age category of a person.

Which implies "AGE STILLBORN" should be replaced with "AGE 0y" and add "DEAT Y" if no DEAT exists.

So if you see this in GEDCOM 5.5.1:

1 BIRT
2 AGE STILLBORN

then in GEDCOM 7 should that be:

1 DEAT
2 AGE 0y

or

1 DEAT
2 AGE 0y
3 PHRASE Stillborn

or

1 BIRT
2 TYPE Stillborn

or

1 BIRT
2 TYPE Stillborn
1 DEAT
2 AGE 0y
3 PHRASE Stillborn

Or are all possibilities legal with no recommendation between them?

Parentage types (PEDI? ASSO?)

These days, surrogacy and sperm donation are common enough that they should have a standard term in an enum-set for PEDI and/or ASSO instead of depending on PHRASE.

Some extension features not in maximal.ged

We currently have two documented extension substructures in maximal.ged, but there are various extension features not in that file:

  • Undocumented extension
    • record
    • substructure
    • enumeration value
    • calendar
  • Documented extension
    • record
    • enumeration value
    • calendar
    • re-use of tag for two different URIs
  • Relocated standard structure
  • Extension-defined substructure

Do we want to add these? If so, I can work on a PR to add them

Cannot use file:/// in maximal.gdz

Per the spec

A URL with scheme file refers to a machine-local file as defined by RFC 8089. Machine-local files must not be used in FamilySearch GEDZIP nor when sharing datasets on the web or with unknown parties, but may be used for close collaboration between parties with known similar file structures.

maximial.gdz violates this rule, as it has a file:///... URI in it.

We could argue that maximal.ged violates it too as it is shared on the web, but I think it's OK because it is not the data that is the purpose of the file, it is the formatting of the data..

GEDCOM.io sample files should be in a public repo

So other github projects can incorporate as a submodule, e.g., for CI/CD testing.

Ideally this gedcom.io repo could be public, but barring that, the sample files should be moved to a separate repo.

Recording relations in CENS events

Is there a preferred way of recording Head/Wife/Son/Daughter etc. in INDI.CENS events? I'd like to use a substructure within the event but NOTE seems to be the only way of recording it. I was hoping for something more formal. It would be nice to at a glance what position they had in the household.

I guess another way of doing it would be to use a single FAM.CENS event instead and then just add a load of FAM.CENS.ASSO substructures?

Thoughts?

Defining compatibility without referencing specific features

The current compatibility guide suggests compatibility with the specification is tied to supporting a wide set of features. I'd rather define it in terms of the alignment between whatever features an application supports and the files they read and write.

As a discussion proposal, perhaps we could define something like the following


The FamilySearch GEDCOM 7 specification contains more than 150 standard structure types appearing in more than 1000 contexts, and many family history applications use only a subset of them. Additionally, many applications implement features that are not (yet) part of the specification. Because of this, compatibility with the specification is dependent on the features implemented by a given application.

The following compatibility categories are defined.

Import Compliance
The application can successfully import any file that conforms to the specification.
Export Compliance
Every file exported by the application with a HEAD.GEDC.VERS is a valid file as defined by the identified version of the specification.
Import Coverage
For each component of the application's data model, if a standard structure in the imported file corresponds to that component then that component is set to match that structure during import.
Export Coverage
For each component of the application's data model, if a standard structure is available in the specification to represent that component then that structure is used to represent that component during export.
Import Transparency
The application alerts the user of any structures in an imported file that are not fully imported into the application's data model.
Export Transparency
The application alerts the user of any structures in the application's internal data that are not fully represented in the exported file. This is trivially achieved if the application has lossless exports.
Lossless Exports
The application loses no data if it (1) exports a FamilySearch GEDCOM 7 file, (2) clears its internal state, and then (3) imports the file it exported. Achieving this may entail the use of extension structures in the exported file.

INIL obsoletes WAC

A web search on GEDC "1 WAC" will show that there exist files in the wild that use the WAC tag (and I've personally seen many more such files).

https://raw.githubusercontent.com/camhart/webapp/master/uploads/Ira%20Fulton.ged and http://www.waldensian.info/gedcom-files/CardonAncestors-14Aug2013.txt are examples that show use by RootsMagic 5.0 and 6.0, respectively.

The GEDCOM 5.3 spec defined it:

LDS_INDI_ORD:= {Size=3:4}
[ BAPL | CONL | WAC | ENDL ]
A tag that represents an individual's religious event associated with The Church of Jesus Christ
of Latter-day Saints. (See Appendix A for a definition of these tags.)

but it did not appear in Appendix A. The same was true in GEDCOM 5.0.
The GEDCOM 4.0 spec however did contain the fuller definition of the tag:

WAC Used to identify an event: indicates the temple initiatory ordinances of The Church of Jesus Christ of Latter-day Saints.

Similarly GEDCOM 3.0 did as well:

WAC The temple initiatory ordinances.

Tamura Jones's site https://www.tamurajones.net/GEDCOMTags.xhtml puts it back into the non-standard so-called "5.6" list.

GEDCOM 7.0 changed the tag to INIL:

INIL A religious event where an initiatory ordinance for an individual was performed by priesthood authority in a temple of The Church of Jesus Christ of Latter-day Saints.

We should document that migrating an older GEDCOM file to 7.0 entails changing WAC to INIL, since INIL obsoletes WAC.

Extension YAML file location

When an extension structure is being defined, should it be hosted on an external website or hosted in the gedcom.io repository or both? Currently I don't think there is any guidance and we should probably provide some guidance.

A documented extension tag needs a URI, and we say that URI should be a URL, ideally resolvable to the YAML description (see FamilySearch/GEDCOM#330 and FamilySearch/GEDCOM#350).

If the URI points to an external website, then the question of durability of the URL arises (what happens if the external website goes away in 10 years?)

RFN text is incorrect

The GEDCOM 5.5.1 spec defines:

PERMANENT_RECORD_FILE_NUMBER:= {Size=1:90}
<REGISTERED_RESOURCE_IDENTIFIER>:<RECORD_IDENTIFIER>
The record number that uniquely identifies this record within a registered network resource. The number will be usable as a cross-reference pointer. The use of the colon (:) is reserved to indicate the separation of the "registered resource identifier" (which precedes the colon) and the unique "record identifier" within that resource (which follows the colon).

https://gedcom.io/terms/v7/RFN then says:

The fragment identifer of this URI (the part after the #) is the
registered resource identifier; the payload of the EXID is the record
identifier within that resource.

All good so far, other than the typo above (identifer). But it then goes on to say:

It is recommend that the 5.5.1 structure

    2 RFN xyz:123abc

be converted to 7.0 structures

    2 EXID xyz
    3 TYPE https://gedcom.io/terms/v7/RFN#123abc

The last past has the registered resource identifier and record identifier backwards.

Read/write compatibility requirements

Here's my feedback on the initial compatibility guide, which I'm filing in an issue since the guide wasn't done via a pull request where I would have commented on it there...

READ FAMILYSEARCH GEDCOM FILE
The vendor needs to demonstrate the ability to read a sample FamilySearch GEDCOM file and display the contents in there own environement.

One of the sample files provided on the tools page is https://gedcom.io/testfiles/gedcom70/minimal70.ged and simply reading that file (which is "a sample FamilySearch GEDCOM file") and displaying the contents is not very interesting from a compatibility perspective.

WRITE FAMILYSEARCH GEDCOM FILE
The vendor needs to demonstrate the ability to write a sample FamilySearch GEDCOM file and prove that it can be read by a FamilySearch GEDCOM Compatible product.

Similarly, the ability to write the minimal70.ged file without any real information in it is not very interesting from a compatibility perspective.

In my opinion, a more useful approach would be a score, rather than a binary yes/no. For example, we might create a somewhat "maximal" file that uses all the standard tags, enum values, etc. in legal ways, specifically reference that file in the requirement, and then provide some scale or metric on how much of the data was preserved after being read (vs being lost).

If an implementation can ONLY read but not write, it might be a bunch of effort to come up with the appropriate score by verifying what was and was not preserved, but it is not impossible. Desktop applications and some websites on the other hand would want to both read and write, and for that I think it's actually more practical to score Read+Write compliance together. That is: read the maximal file into a local environment, and then write a GEDCOM file back out of the local environment, verify the result passes a GEDCOM validator (like the one on the Tools page) and compare the output with the original input, which is relatively straightforward to compute a score for (how much info was preserved vs lost) without any manual effort. (This is similar to what we do with our desktop app in our github CI/CD today.)

That same approach BTW could be used to provide a migration score (read a "maximal" legal GEDCOM 5.5.1 file, write it to FamilySearch GEDCOM 7, and provide an automated score based on the result).

And I posit that there is no demand for some implementation that would want to write but not read GEDCOM 7 (the reverse being much more common), but would be happy to be proven wrong. If there is such a demand, then I would say the requirement would be to have the implementation generate a file containing as many different tags as it supports (this may be hard to verify so best effort and honor system is probably the only practical way), and then verify it passes an official GEDCOM validator like the one on the Tools page as compliant. That will still not measure things like use of extension tags that should have been standard tags instead, which I don't know how to score. Just saying "prove that it can be read by a FamilySearch GEDCOM Compatible product" to me is too open ended because you could have 2 products that are compatible with each other, but nothing else, which isn't very helpful if they are niche products used by largely disjoint communities.

extension tags in YAML format

Currently https://gedcom.io/terms/format has:

  • Keyextension tags
    Typeseq of extTag
    Required by*
    Allowed bytypes calendar, enumeration, month, structure

    * Required instead of allowed if no standard tag is provided

    A list, with the most-preferred tag first, of extension tags known to be used by applications for this concept.

    Standard structures may have an extension tags entry to list fully compatible extensions that predated the standard and can be converted to the standard tag without any other modification.
    For example, 7.0's UID structure is fully compatible with the common 5.5.1 extension identified by tag _UID.

However, extTag is ambiguous. That is, two separate applications might use the same extTag with very different meanings, even under the same superstructure. As such, simply listing the extTag under extension tags can cause tools that consume the YAML to do the wrong thing with GEDCOM files. A URI on the other hand would be unambiguous. So would the combination of HEAD.SOUR payload plus extTag.

I claim that the new subsumes key can be used to more accurately represent the intent of the extension tags key, in an unambiguous way. Now that we have subsumes, I believe extension tags provides no real value and I would propose replacing standard tag and extension tags with just tag (which could be a standard tag or an extTag) and the existing subsumes.

To resolve ambiguity of different applications using the same extTag, a URI is required for use with subsumes, even for undocumented extension tags. A proposal to construct such a URI is:

  1. If the tag is a documented extension tag, use the URI provided in the SCHMA
  2. Else, if the HEAD.SOUR payload is itself a URI as suggested by https://gedcom.io/specifications/FamilySearchGEDCOMv7.html#HEAD-SOUR, construct the extension URI as: HEAD.SOUR payload / extTag
  3. Else, construct the extension URI as, say: https://gedcom.io/terms/ext/ HEAD.SOUR payload / extTag

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.