Giter Site home page Giter Site logo

gis-metadata-parser's People

Contributors

dharvey-consbio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gis-metadata-parser's Issues

Writing secondary properties

Docs do a great job describing secondary lookup locations, e.g. :

FGDC_DEFINITIONS = dict({k: dict(v) for k, v in iteritems(COMPLEX_DEFINITIONS)})
FGDC_DEFINITIONS[CONTACTS].update({
    '_name': '{_name}',
    '_organization': '{_organization}'
})

This then supports reading from a path such as cntorgp/cntorg in a FGDC metadata file. Docs also indicate that secondary properties are parsed, but not validated or written.

Does any method exist to set the secondary as the primary? For example, compliant FGDC has either a primary contact person or a primary contact organization. Right now, I am not sure how to serialize the latter as a primary contact organization is specified as the secondary property. The result is XML like so on write:

<cntinfo>
        <cntvoice>555-555-5555</cntvoice>
        <cntperp>
          <cntorg>My Organization</cntorg>
        </cntperp>
        <cntorgp>
          </cntorgp>
        <cntaddr>
          <country>US</country>
          <postal>00000</postal>
          <state>??</state>
          <city>Home</city>
          <address>My Address</address>
          <addrtype>mailing address</addrtype>
        </cntaddr>
      </cntinfo>

The desired output is:

<cntinfo>
        <cntvoice>555-555-5555</cntvoice>
        <cntorgp>
          <cntorg>My Organization</cntorg>
        </cntorgp>
        <cntaddr>
        ...

FrozenOrderedDict is removed from frozendict version 2.0.2

The frozendict dependency is listed in setup.py as frozendict >= 1.2. Since 2.0.2 was recently released, it ends up getting installed whenever gis-metadata-parser is freshly installed. This breaks the library, because gis-metadata-parser tries to import FrozenOrderedDict which no longer exists (or not in the same place).

Questions about adding FGDC projection information - nested complex?

I am trying to parse/include projection information in some FGDC metadata. Specifically, the and then projection specific sections.

I have the following working to set the projection name:

class CustomFGDCMetadata(FgdcParser):
      mapproj_definition = frozendict({
            'name': '{name}'
        }) 
    
    def _init_data_map(self):
        super(CustomFGDCMetadata, self)._init_data_map()
        
        mapproj_prop = 'projection'
        mapproj_xpath = 'spref/horizsys/planar/mapproj/{mapproj_path}'

        self._data_structures[mapproj_prop] = format_xpaths(
            self.mapproj_definition,
            name=mapproj_xpath.format(mapproj_path='mapprojn'),
        )
        
        self._data_map['_{prop}_root'.format(prop=mapproj_prop)] = 'projection'
        self._data_map[mapproj_prop] = ParserProperty(self._parse_complex, self._update_complex)
        self._metadata_props.add(mapproj_prop)

I am struggling to set the projection itself inside of the name. Here, I am trying to extend the above in order to get an equirectangular projection defined.

class CustomFGDCMetadata(FgdcParser):
    equi_definition = frozendict({
        'standard_parallel' : '{standard_parallel}',
        'lon_center_meridian':'{lon_center_meridian}',
        'false_easting' : '{false_easting}',
        'false_northing' : '{false_northing}'
    })
    
    mapproj_definition = frozendict({
            'name': '{name}',
            'crs': '{crs}'
        }) 
    
    def _init_data_map(self):
        super(CustomFGDCMetadata, self)._init_data_map()
        
        mapproj_prop = 'projection'
        mapproj_xpath = 'spref/horizsys/planar/mapproj/{mapproj_path}'
        crs_prop = 'crs'
        crs_xpath = 'spref/horizsys/planar/mapproj/{projname}/{mapproj_path}'
        
        self._data_structures[mapproj_prop] = format_xpaths(
            self.mapproj_definition,
            name=mapproj_xpath.format(mapproj_path='mapprojn'),
            projection=mapproj_xpath.format(mapproj_path='equirect'),
            standard_parallel=crs_xpath.format(projname='equirect', mapproj_path='stdparll')
        )
        
        self._data_map['_{prop}_root'.format(prop=mapproj_prop)] = 'projection'
        self._data_map[mapproj_prop] = ParserProperty(self._parse_complex, self._update_complex)
        self._metadata_props.add(mapproj_prop)

This is throwing key errors trying to get the crs xpath. I also tried defining the mapprojn separate from the projection information, but this resulted in duplicate wrapping keys (specifically the planar one). Does an example exist for nested objects that I can/should follow?

Thanks!

Associate with gis shapefile

Thanks for this awesome library - it's really really great. I am wondering about how you might use the output from this to import the metadata to an ArcGIS shapefile - particularly those in geodatabases.

Basically looking for an "import_to_shapefile" function that supports gdb and folder shps

Added to conda-forge

@dharvey-consbio I just wanted to give you a heads up that I have submitted both this package and parserutils to conda-forge. This gets your packages into that ecosystem as well (not just in the pypi ecosystem). Whenever you peel a new release, conda-forge will see it and automatically get the feedstock (their recipe) updated. Nothing for you to do!

I wanted to give you a heads up in case you also wanted to be a maintainer on your package over on conda-forge. If so, you can open a PR on the recipe repo adding yourself as a recipe maintainer.

Many thanks for both this library and parserutils! They are amazing and continue to be a joy to work with.

Error serializing FGDC example

I am trying to serialize my own fgdc file and am seeing the following when attempting to serialize:

~/anaconda3/envs/amg/lib/python3.9/site-packages/parserutils/elements.py in element_is_empty(elem_to_parse, element_path)
    263         (element.tail is None or not element.tail.strip()) and
    264         (element.attrib is None or not len(element.attrib)) and
--> 265         (not len(element.getchildren()))
    266     )
    267 

AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'

I have tracked this down to an issue in the data_map on the contacts key. I assumed that the issue was my metadata file (perhaps I missed a key?) so I went here and pulled an example rom fgdc.gov. I am seeing the same error attempting to serialize. GitHub is not supporting an xml upload though I can upload the file as plaintext.

Here is the code that I am using:

from gis_metadata.fgdc_metadata_parser import FgdcParser

with open('sample.xml', 'r') as metadata:
    data = FgdcParser(metadata)
    data.serialize()

Perhaps this is operator error or a lack of understanding about how to serialize the data?

As an aside, I have very much appreciated working with this library and using it to get programatic access to the metadata records. Thanks to the maintainers!

Add support for other ArcGIS standard keyword types

We want to support at least the keyword types that are available in FGDC:

<!ELEMENT theme    (themekt, themekey+)>
<!ELEMENT place    (placekt, placekey+)>
<!ELEMENT stratum  (stratkt, stratkey+)>
<!ELEMENT temporal (tempkt, tempkey+)>

The full list of known ArcGIS types are:

<!ELEMENT discKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT otherKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT placeKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT productKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT searchKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT stratKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT subTopicCatKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT tempKeys (keyword+, thesaName?, thesaLang?)>
<!ELEMENT themeKeys (keyword+, thesaName?, thesaLang?)>

In addition to the four FGDC keyword types, consider supporting searchKeys. This can be done in the ISO standard by utilizing MD_KeywordTypeCode (exactly as it is currently used for theme and place keys), and for cross-compatibility with FGDC by appending them to themekey on import.

Note: because there is no standard place to write search keys in FGDC, they will not convert losslessly back to ISO or ArcGIS after being converted to FGDC, and I don't feel good about making up a new FGDC tag just for this:

<!ELEMENT search (searchkt, searchkey+)>

XML tags losing 'gco; gmd, etc...' prefixes when serializing

Just using the serialize function here to output the data, and writing the new xml file out, I notice that the structure of the original XML isn't preserved. The tags <gmd:... > and <gco:...? etc.. that are standard for ISO 19139 metadata are lost.

Basic structure looks like this from input:

-<gmd:MD_Metadata xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:gss="http://www.isotc211.org/2005/gss" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:gmd="http://www.isotc211.org/2005/gmd">
+gmd:fileIdentifier
+gmd:language
+gmd:hierarchyLevel
+gmd:contact
+gmd:dateStamp
+gmd:metadataStandardName
+gmd:metadataStandardVersion
+gmd:referenceSystemInfo
+gmd:identificationInfo
+gmd:distributionInfo
+gmd:dataQualityInfo
</gmd:MD_Metadata>

Comes out like this:

-<MD_Metadata>
fileIdentifier
language
hierarchyLevel
contact
etc...

I believe those prefixes are quite important for ISO quality standardss

Serialization and New Line Characters

How should one specify using newline characters when serializing xml metadata using custom parsers. For example, I am using the CustomFGDC parser that handles map projections. When I serialize an instance of that object, I am seeing the following:

<metadata>
    <spref><horizsys><planar><mapproj><equirect><longcm>180.0</longcm><stdparll>0.0</stdparll><fnorth>0.0</fnorth><feast>0.0</feast></equirect><mapprojn>Equirectangular EUROPA</mapprojn></mapproj><planci><plance>coordinate pair</plance><plandu>meters</plandu><coordrep><ordres>7346.0</ordres><absres>7346.0</absres></coordrep></planci></planar></horizsys></spref><idinfo>
        <useconst>none</useconst><spdom><bounding><northbc>84.94594418192924</northbc><westbc>0.0</westbc><southbc>-84.77108883435208</southbc><eastbc>0.0</eastbc></bounding></spdom><ptcontac><cntinfo><cntpos>Research Space Scientist</cntpos><cntemail>[email protected]</cntemail><cntperp><cntorg>Southwest Region: ASTROGEOLOGY SCIENCE CENTER</cntorg><cntper>Michael T Bland</cntper></cntperp></cntinfo></ptcontac><datacred>NASA's Galileo Mission Solid State Imaging Team</datacred><citation>
            <citeinfo>
                <pubdate>2021</pubdate><origin>Michael Bland</origin><onlink>mydoinumber</onlink><title>Photogrammetrically Controlled Galileo Images of Europa</title><geoform>raster digital data</geoform>
                <pubinfo>
                    <pubplace>Flagstaff, AZ</pubplace>
                    <publish>U. S. Geological Survey</publish>
                </pubinfo>
                </citeinfo>
        </citation>

I included extra output here to show the formatting on the projection fields (which are custom) and also the formatting on the contact fields (which are not custom). How is the code determining when to include a new line or not? Can this be contracted via this package or is this an issue in parseutils (I also did not see a means to be explicit in that library, but I only briefly skimmed the code base looking at the write methods.)

Insight much appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.