familysearch / gedcom.io Goto Github PK
View Code? Open in Web Editor NEWFiles for the GEDCOM.io website
Files for the GEDCOM.io website
https://gedcom.io/community/ doesn't show DNA as a team, should there be?
See FamilySearch/GEDCOM#119 and FamilySearch/GEDCOM#118 for community discussion to date, on which a number of different individuals have weighed in.
Some structures in the file use a void pointer, but do not use an associated PHRASE. Whilst I'm not sure this is strictly illegal, it doesn't really offer any useful information. Could we add a phrase for these void pointers? I currently have validation checks for this and it's breaking my tests.
An optional field in YAML should be contact info of the maintainer of the YAML file, which could be a personal contact, mailing list, etc.
In maximal.ged
, we have
ANUL Y
and NO ANUL
without a date, asserting both "there was an annulment" and "there wasn't an annulment"EMIG
and NO EMIG
without a date, asserting both "there was an emigration" and "there wasn't an emigration"Many ASSO @VOID@
with no PHRASE
substructure, as e.g. (omitting some lines)
0 @I1 INDI
1 DEAT
2 ASSO @VOID@
3 ROLE PARENT
2 NOTE Note Text
This appears to say "someone was a parent of the deceased," a self-evidently true statement. There are many other similar examples throughout maximal.ged
. They also seem to violate the spirit of the spec saying
A voidPtr and PHRASE can be used to describe associations to people not referenced by any INDI record.
SUBM @VOID@
and OBJE @VOID@
with no substructuresNAME.SNOTE @VOID@
and ALIA @VOID@
with no substructuresWe could leave these, arguing that the file is not designed to be sensible, just to exercise every part of the spec. But I think I would support parsers that flag these as errors and/or as prunable zero-information structures, and hence rather not have them in maximal.ged
on page 74 we have:
FAMC (Family child) g7:ADOP-FAMC
The individual or couple that adopted this this individual.
Should read:
The individual or couple that adopted this individual.
https://gedcom.io/migrate/#non-pointer-sour-substructures has this 5.5.1:
1 DEAT
2 DATE 1910
2 SOUR Letter from Alice Smith, 13 April 1946
3 PAGE According to his grand-daughter, who was present.
The GEDCOM 5.5.1 spec however has:
SOURCE_CITATION:=
[ /* pointer to source record (preferred)*/
n SOUR @<XREF:SOUR>@ {1:1} p.27
+1 PAGE <WHERE_WITHIN_SOURCE> {0:1} p.64
+1 EVEN <EVENT_TYPE_CITED_FROM> {0:1} p.49
+2 ROLE <ROLE_IN_EVENT> {0:1} p.61
+1 DATA {0:1}
+2 DATE <ENTRY_RECORDING_DATE> {0:1} p.48
+2 TEXT <TEXT_FROM_SOURCE> {0:M} p.63
+3 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} p.37, 26
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} p.43
| /* Systems not using source records */
n SOUR <SOURCE_DESCRIPTION> {1:1} p.61
+1 [CONC|CONT] <SOURCE_DESCRIPTION> {0:M}
+1 TEXT <TEXT_FROM_SOURCE> {0:M} p.63
+2 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} p.37, 26
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} p.43
]
That is, the PAGE substructure is only legal when @XREF:SOUR@ is present, it cannot appear when not using a source record.
The maximal70.ged file reuses the same UID for several records, e.g. `bbcc0025-34cb-4542-8cfb-45ba201c9c2c`. For my purposes this is not sufficient for testing.
Originally posted by @jl5000 in #42 (comment)
Posts should show the date.
FamilySearch/GEDCOM#75 announces a move to the github discussions board.
Gedcom.io currently has two pages referencing the Google group that would need to be updated:
Here are my results testing the available files with MacFamilyTree 10:
escapes.ged: Works, no issues
extension-record.ged: _LOC is not imported, but user is alerted of this behaviour. A future version will be able to import custom structures to user-defined fields
long-url.ged: Works, no issues
maximal70.ged: Several issues: Duplicate PHON and EMAIL tags in Header are unsupported and silently dropped, „STAT CHALLENGED“ unsupported, PHRASE for ASSO is unsupported, OBJE->FILE->MEDI->PHRASE unsupported, PHRASE from SDATE silently dropped…
minimal70.ged: Works, no issues
remarriage1.ged: Works, no issues
remarriage2.ged: Works, no issues
same-sex-marriage.ged: Works, no issues
spaces.ged: Works, no issues (values are considered empty if they are just spaces)
voidptr.ged: @void@ is currently considered „not set“. This will change in a subsequent release of MacFamilyTree.
I need to investigate maximal70.ged further – this file is of tremendous help.
Best,
Mendel
https://gedcom.io/terms/format has a "superstructures" section per structure.
However a relocated standard structure is supposed to have the same URI as the standard structure, so an extension that has a relocated standard structure can't have a different YAML file.
Should the standard YAML file be updated for each extension that relocates it?
Or is there no way to specify the "superstructures" section, and only rely on the "substructures" section of the extension structure it's relocated under?
I've tried
toc_sticky: false and true --- settings at the top of the page.
maximal70.gdz has maximal70/gedcom.ged inside rather than just gedcom.ged.
minimal70.gdz is correct
I think that testfiles/gedcom70/spaces.ged should be removed. Its internal TYPE
s suggest a subtle yet important misunderstanding of the specification: conflating LineVal
and payload.
The LineVal
is the sequence of characters after a space after the tag and the before the newline. It must not be empty and if it starts with an @
it must start with two of them.
The payload is the string made by processing the LineVal
and any following CONT
pseudo-structures. It de-duplicates off leading @@
, adds line breaks for CONT
s, and is the empty string if there was no LineVal
. It also treats the empty string and the lack of a payload as equivalent; or, in more theoretical language, payloads implicitly cast between Ɛ (the empty string) and ∅ (no string at all) as needed by their structure type's expected payload.
Thus
1 XYZ
has no LineVal
and has a payload that is equivalently either ""
or null
.1 XYZ
is a syntax error: it has a space after the tag but no LineVal, which is forbidden by the specspaces.ged
is nonsensical, and should say "with syntax error caused by space but no LineVal"In the General FAQ it states the following:
How can I use the published specifications?
Permalink
The specifications may be copied for the purpose of reviewing or programming of genealogical software, provided the notice below is included:Copyright © 1987, 1989, 1992, 1993, 1995, 1999, 2019, 2021 by The Church of Jesus Christ of Latter-day Saints. All rights reserved.
In the introduction to the specification it states:
In accordance with the Apache 2.0 license that governs this work, any other work that is based on or derived from this work must include a readable copy of the following NOTICE. For more information, please refer to the full copy of the Apache 2.0 license.
NOTICE:
This work comprises, is based on, or is derived from the FAMILYSEARCH GEDCOM™ Specification, © 1984-2022 Intellectual Reserve, Inc. All rights reserved.
“FAMILYSEARCH GEDCOM™” and “FAMILYSEARCH®” are trademarks of Intellectual Reserve, Inc. and may not be used except as allowed by the Apache 2.0 license that governs this work or as expressly authorized in writing and in advance by Intellectual Reserve, Inc.
Is there any way to make these two statements converge to be more consistent?
https://gedcom.io/tools/ points to a number of sample FamilySearch GEDCOM 7.0 and GEDZIP files.
https://github.com/gedcom7code/test-files/tree/main/7 also has some 7.0 GEDCOM files that test some things not currently covered by files in https://github.com/FamilySearch/GEDCOM.io/tree/master/testfiles/gedcom70
Options:
If we go with option 2, we should also consider copying GEDCOM 5.5.1 files from https://github.com/gedcom7code/test-files/tree/main/5 to say https://github.com/FamilySearch/GEDCOM.io/tree/master/testfiles/gedcom551 (which doesn't currently exist).
https://gedcom.io/terms/format has an "extension tags" section which contains extTag values.
However the extTag is simply the tag:
extTag = underscore 1*tagchar
and there seems to be nowhere to list the URI if any. So if we have a documented extension tag that is deployed today, with a URI pointing to a YAML file for it, and then later a standard tag is assigned, it will result in a new standard URI that is semantically equivalent to the documented extension tag's URI. But there seems to be no way to correlate these URIs as having the same meaning. One can add an HTTP redirect perhaps, but that seems fragile.
GEDCOM issue 97 comment from @jl500 notes that the ADR# fields in maximal.ged might not match best practices
This translation structure does not have a MIME or LANG payload but it needs at least one.
It would be useful if the maximal file had examples of @ symbols, both within line values (which already exists), and with @ escapes at the beginning of line values, including on CONT lines. This will be useful for testing purposes.
The spec states that BIC only applies to SLGC, however it appears under BAPL on line 402.
GEDCOM 5.5.1 had "AGE STILLBORN" and also said:
Other descriptor values might include, for example, 'stillborn' as a qualifier to BIRTh or `Tribal Custom' as a qualifier to MARRiage.
FamilySearch GEDCOM 7 has only the latter:
Using the subordinate TYPE classification method provides a further classification of
the superstructure but does not change its basic meaning.
Example — A MARR (p.71) with a TYPE could clarify what kind of marriage was
performed:
This classifies the entry as a common law marriage but the event is still a mar‐
riage event.
Other descriptor values might include, for example,
“Stillborn” as a qualifier to BIRT (p.61) (birth)
“Tribal Custom” as a qualifier to MARR (marriage)
“College” as a qualifier to GRAD (p.69) (graduation)
See also FACT and EVEN for additional examples.
However, the migration guide (https://gedcom.io/migrate/#age-words) says only:
- STILLBORN was defined to mean 0y and also to imply the existence of a DEAT event
These have been removed from 7.0 in preference for their simpler and more expressive year-based forms. A PHRASE may be used to clarify the age category of a person.
Which implies "AGE STILLBORN" should be replaced with "AGE 0y" and add "DEAT Y" if no DEAT exists.
So if you see this in GEDCOM 5.5.1:
1 BIRT
2 AGE STILLBORN
then in GEDCOM 7 should that be:
1 DEAT
2 AGE 0y
or
1 DEAT
2 AGE 0y
3 PHRASE Stillborn
or
1 BIRT
2 TYPE Stillborn
or
1 BIRT
2 TYPE Stillborn
1 DEAT
2 AGE 0y
3 PHRASE Stillborn
Or are all possibilities legal with no recommendation between them?
These days, surrogacy and sperm donation are common enough that they should have a standard term in an enum-set for PEDI and/or ASSO instead of depending on PHRASE.
From FamilySearch/GEDCOM#122, we had this suggestion:
Example:
https://gedcom.io/exid-type/FamilySearch-PersonId/ABCD-123 should invoke a script that redirects everything underneath https://gedcom.io/exid-type/FamilySearch-PersonId/ where that one might redirect initially to, say, https://familysearch.org/platform/tree/persons/ABCD-123Jimmy reports we should look at: https://superdevresources.com/redirects-jekyll-github-pages/
We currently have two documented extension substructures in maximal.ged
, but there are various extension features not in that file:
Do we want to add these? If so, I can work on a PR to add them
Per the spec
A URL with scheme file refers to a machine-local file as defined by RFC 8089. Machine-local files must not be used in FamilySearch GEDZIP nor when sharing datasets on the web or with unknown parties, but may be used for close collaboration between parties with known similar file structures.
maximial.gdz violates this rule, as it has a file:///...
URI in it.
We could argue that maximal.ged violates it too as it is shared on the web, but I think it's OK because it is not the data that is the purpose of the file, it is the formatting of the data..
So other github projects can incorporate as a submodule, e.g., for CI/CD testing.
Ideally this gedcom.io repo could be public, but barring that, the sample files should be moved to a separate repo.
Get instructions on Github pages to Gordon. Gordon to setup with the IP.
Is there a preferred way of recording Head/Wife/Son/Daughter etc. in INDI.CENS events? I'd like to use a substructure within the event but NOTE seems to be the only way of recording it. I was hoping for something more formal. It would be nice to at a glance what position they had in the household.
I guess another way of doing it would be to use a single FAM.CENS event instead and then just add a load of FAM.CENS.ASSO substructures?
Thoughts?
The current compatibility guide suggests compatibility with the specification is tied to supporting a wide set of features. I'd rather define it in terms of the alignment between whatever features an application supports and the files they read and write.
As a discussion proposal, perhaps we could define something like the following
The FamilySearch GEDCOM 7 specification contains more than 150 standard structure types appearing in more than 1000 contexts, and many family history applications use only a subset of them. Additionally, many applications implement features that are not (yet) part of the specification. Because of this, compatibility with the specification is dependent on the features implemented by a given application.
The following compatibility categories are defined.
A web search on GEDC "1 WAC"
will show that there exist files in the wild that use the WAC
tag (and I've personally seen many more such files).
https://raw.githubusercontent.com/camhart/webapp/master/uploads/Ira%20Fulton.ged and http://www.waldensian.info/gedcom-files/CardonAncestors-14Aug2013.txt are examples that show use by RootsMagic 5.0 and 6.0, respectively.
The GEDCOM 5.3 spec defined it:
LDS_INDI_ORD:= {Size=3:4}
[ BAPL | CONL | WAC | ENDL ]
A tag that represents an individual's religious event associated with The Church of Jesus Christ
of Latter-day Saints. (See Appendix A for a definition of these tags.)
but it did not appear in Appendix A. The same was true in GEDCOM 5.0.
The GEDCOM 4.0 spec however did contain the fuller definition of the tag:
WAC Used to identify an event: indicates the temple initiatory ordinances of The Church of Jesus Christ of Latter-day Saints.
Similarly GEDCOM 3.0 did as well:
WAC The temple initiatory ordinances.
Tamura Jones's site https://www.tamurajones.net/GEDCOMTags.xhtml puts it back into the non-standard so-called "5.6" list.
GEDCOM 7.0 changed the tag to INIL
:
INIL A religious event where an initiatory ordinance for an individual was performed by priesthood authority in a temple of The Church of Jesus Christ of Latter-day Saints.
We should document that migrating an older GEDCOM file to 7.0 entails changing WAC to INIL, since INIL obsoletes WAC.
When an extension structure is being defined, should it be hosted on an external website or hosted in the gedcom.io repository or both? Currently I don't think there is any guidance and we should probably provide some guidance.
A documented extension tag needs a URI, and we say that URI should be a URL, ideally resolvable to the YAML description (see FamilySearch/GEDCOM#330 and FamilySearch/GEDCOM#350).
If the URI points to an external website, then the question of durability of the URL arises (what happens if the external website goes away in 10 years?)
The GEDCOM 5.5.1 spec defines:
PERMANENT_RECORD_FILE_NUMBER:= {Size=1:90}
<REGISTERED_RESOURCE_IDENTIFIER>:<RECORD_IDENTIFIER>
The record number that uniquely identifies this record within a registered network resource. The number will be usable as a cross-reference pointer. The use of the colon (:) is reserved to indicate the separation of the "registered resource identifier" (which precedes the colon) and the unique "record identifier" within that resource (which follows the colon).
https://gedcom.io/terms/v7/RFN then says:
The fragment identifer of this URI (the part after the #) is the
registered resource identifier; the payload of the EXID is the record
identifier within that resource.
All good so far, other than the typo above (identifer). But it then goes on to say:
It is recommend that the 5.5.1 structure 2 RFN xyz:123abc be converted to 7.0 structures 2 EXID xyz 3 TYPE https://gedcom.io/terms/v7/RFN#123abc
The last past has the registered resource identifier and record identifier backwards.
The spec says:
Local file URLs must not be used in FamilySearch GEDZIP
But maximal70.ged has:
1 FILE file:///path/to/file1
And so maximal70.gdz does too.
In the YAML file format, superstructures
is not allowed for enumeration values.
This limitation resulted in bugs like #100 since it could not be validated. If instead, https://gedcom.io/terms/v7/enum-BIC had had something like
superstructures:
"https://gedcom.io/terms/v7/SLGC": "{0:1}"
then the bug with
1 BAPL
2 STAT BIC
would have been automatically detected.
Here's my feedback on the initial compatibility guide, which I'm filing in an issue since the guide wasn't done via a pull request where I would have commented on it there...
READ FAMILYSEARCH GEDCOM FILE
The vendor needs to demonstrate the ability to read a sample FamilySearch GEDCOM file and display the contents in there own environement.
One of the sample files provided on the tools page is https://gedcom.io/testfiles/gedcom70/minimal70.ged and simply reading that file (which is "a sample FamilySearch GEDCOM file") and displaying the contents is not very interesting from a compatibility perspective.
WRITE FAMILYSEARCH GEDCOM FILE
The vendor needs to demonstrate the ability to write a sample FamilySearch GEDCOM file and prove that it can be read by a FamilySearch GEDCOM Compatible product.
Similarly, the ability to write the minimal70.ged file without any real information in it is not very interesting from a compatibility perspective.
In my opinion, a more useful approach would be a score, rather than a binary yes/no. For example, we might create a somewhat "maximal" file that uses all the standard tags, enum values, etc. in legal ways, specifically reference that file in the requirement, and then provide some scale or metric on how much of the data was preserved after being read (vs being lost).
If an implementation can ONLY read but not write, it might be a bunch of effort to come up with the appropriate score by verifying what was and was not preserved, but it is not impossible. Desktop applications and some websites on the other hand would want to both read and write, and for that I think it's actually more practical to score Read+Write compliance together. That is: read the maximal file into a local environment, and then write a GEDCOM file back out of the local environment, verify the result passes a GEDCOM validator (like the one on the Tools page) and compare the output with the original input, which is relatively straightforward to compute a score for (how much info was preserved vs lost) without any manual effort. (This is similar to what we do with our desktop app in our github CI/CD today.)
That same approach BTW could be used to provide a migration score (read a "maximal" legal GEDCOM 5.5.1 file, write it to FamilySearch GEDCOM 7, and provide an automated score based on the result).
And I posit that there is no demand for some implementation that would want to write but not read GEDCOM 7 (the reverse being much more common), but would be happy to be proven wrong. If there is such a demand, then I would say the requirement would be to have the implementation generate a file containing as many different tags as it supports (this may be hard to verify so best effort and honor system is probably the only practical way), and then verify it passes an official GEDCOM validator like the one on the Tools page as compliant. That will still not measure things like use of extension tags that should have been standard tags instead, which I don't know how to score. Just saying "prove that it can be read by a FamilySearch GEDCOM Compatible product" to me is too open ended because you could have 2 products that are compatible with each other, but nothing else, which isn't very helpful if they are niche products used by largely disjoint communities.
Currently https://gedcom.io/terms/format has:
Key extension tags
Type seq
ofextTag
Required by * Allowed by type
scalendar
,enumeration
,month
,structure
* Required instead of allowed if no
standard tag
is providedA list, with the most-preferred tag first, of extension tags known to be used by applications for this concept.
Standard structures may have an
extension tags
entry to list fully compatible extensions that predated the standard and can be converted to thestandard tag
without any other modification.
For example, 7.0'sUID
structure is fully compatible with the common 5.5.1 extension identified by tag_UID
.
However, extTag
is ambiguous. That is, two separate applications might use the same extTag
with very different meanings, even under the same superstructure. As such, simply listing the extTag
under extension tags
can cause tools that consume the YAML to do the wrong thing with GEDCOM files. A URI on the other hand would be unambiguous. So would the combination of HEAD.SOUR
payload plus extTag
.
I claim that the new subsumes
key can be used to more accurately represent the intent of the extension tags key, in an unambiguous way. Now that we have subsumes
, I believe extension tags
provides no real value and I would propose replacing standard tag
and extension tags
with just tag
(which could be a standard tag or an extTag
) and the existing subsumes
.
To resolve ambiguity of different applications using the same extTag
, a URI is required for use with subsumes
, even for undocumented extension tags. A proposal to construct such a URI is:
SCHMA
HEAD.SOUR
payload is itself a URI as suggested by https://gedcom.io/specifications/FamilySearchGEDCOMv7.html#HEAD-SOUR, construct the extension URI as: HEAD.SOUR payload / extTagA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.