Giter Site home page Giter Site logo

keeps / dbptk-developer Goto Github PK

View Code? Open in Web Editor NEW
43.0 18.0 17.0 84.43 MB

DBPTK Developer - library and command-line tool for execution of database preservation actions

Home Page: http://www.database-preservation.com

License: GNU Lesser General Public License v3.0

Java 99.70% Shell 0.30%
preservation database relational-databases siard preservation-formats

dbptk-developer's People

Contributors

006627 avatar andreaskring avatar antoniog70 avatar chalkos avatar daniel-skovenborg avatar dependabot[bot] avatar dudeusinglinux avatar hmiguim avatar hsilva-keep avatar jmaferreira avatar luis100 avatar miguelsc avatar snyk-bot avatar thomaskristensen avatar tomasmferreira avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbptk-developer's Issues

Column type change in MSSQLServer

When round-trip testing from MSSQLServer to SIARD and back, the following changes occurred on the column data types:

  • [bigint] became [decimal](19, 0)
  • [nvarchar](max) became [text]
  • [nvarchar](50) became [varchar](50)
  • [datetime] became [datetime2](7)
  • [tinyint] became [smallint]

Missing complexType "clobType” or "blobType" in table.xsd (SIARD Output)

If "clobType” or "blobType" are used in the complexType “rowType” then the table.xsd must contain the complexType "clobType” or "blobType".

Wrong / v2.0-rc2:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.admin.ch/xmlns/siard/1.0/northwind/suppliers.xsd" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.admin.ch/xmlns/siard/1.0/northwind/suppliers.xsd">
    <xs:element name="table">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" minOccurs="0" name="row" type="rowType">
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:complexType name="rowType">
        <xs:sequence>
            <xs:element name="c1" type="xs:decimal"/>
            <xs:element minOccurs="0" name="c2" type="xs:string"/> 
            <xs:element minOccurs="0" name="c3" type="xs:string"/>
            <xs:element minOccurs="0" name="c4" type="xs:string"/>
            <xs:element minOccurs="0" name="c5" type="xs:string"/>
            <xs:element minOccurs="0" name="c6" type="xs:string"/>
            <xs:element minOccurs="0" name="c7" type="xs:string"/>
            <xs:element minOccurs="0" name="c8" type="xs:string"/>
            <xs:element minOccurs="0" name="c9" type="xs:string"/>
            <xs:element minOccurs="0" name="c10" type="xs:string"/>
            <xs:element minOccurs="0" name="c11" type="xs:string"/>
            <xs:element minOccurs="0" name="c12" type="clobType"/>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

Correct:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.admin.ch/xmlns/siard/1.0/northwind/suppliers.xsd" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.admin.ch/xmlns/siard/1.0/northwind/suppliers.xsd">
    <xs:element name="table">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" minOccurs="0" name="row" type="rowType">
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:complexType name="rowType">
        <xs:sequence>
            <xs:element name="c1" type="xs:decimal"/>
            <xs:element minOccurs="0" name="c2" type="xs:string"/> 
            <xs:element minOccurs="0" name="c3" type="xs:string"/>
            <xs:element minOccurs="0" name="c4" type="xs:string"/>
            <xs:element minOccurs="0" name="c5" type="xs:string"/>
            <xs:element minOccurs="0" name="c6" type="xs:string"/>
            <xs:element minOccurs="0" name="c7" type="xs:string"/>
            <xs:element minOccurs="0" name="c8" type="xs:string"/>
            <xs:element minOccurs="0" name="c9" type="xs:string"/>
            <xs:element minOccurs="0" name="c10" type="xs:string"/>
            <xs:element minOccurs="0" name="c11" type="xs:string"/>
            <xs:element minOccurs="0" name="c12" type="clobType"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="clobType">
        <xs:simpleContent>
            <xs:extension base="xs:string">
                <xs:attribute name="file" type="xs:string"/>
                <xs:attribute name="length" type="xs:integer"/>
            </xs:extension>
        </xs:simpleContent>
    </xs:complexType>
    <xs:complexType name="blobType">
        <xs:simpleContent>
            <xs:extension base="xs:hexBinary">
                <xs:attribute name="file" type="xs:string"/>
                <xs:attribute name="length" type="xs:integer"/>
            </xs:extension>
        </xs:simpleContent>
    </xs:complexType>
</xs:schema>

Missing charsets on SIARD to MySQL conversion

When round-trip testing from MySQL to SIARD and back again, column charset definition are missing. Example: missing COLLATE utf8_bin.

Also, default charset changed from utf8 to latin1.

Error converting data from SIARD to MSSQLServer

When converting from SIARD to MSSQLServer the following error occurred:

ERROR 2014-12-04 18:19:50,977 (SIARDImportModule) An error occurred while handling data row
pt.gov.dgarq.roda.common.convert.db.model.exception.ModuleException: Error executing insert batch
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataRow(JDBCExportModule.java:540)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule$SIARDContentSAXHandler.endElement(SIARDImportModule.java:953)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule.getDatabase(SIARDImportModule.java:224)
    at pt.gov.dgarq.roda.common.convert.db.Main.main(Main.java:89)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The stream value is not the specified length. The specified length was 4,241, the actual length is 4,198.
    at com.microsoft.sqlserver.jdbc.TDSWriter.error(IOBuffer.java:3426)
    at com.microsoft.sqlserver.jdbc.TDSWriter.writeReader(IOBuffer.java:3418)
    at com.microsoft.sqlserver.jdbc.TDSWriter.writeRPCReaderUnicode(IOBuffer.java:4641)
    at com.microsoft.sqlserver.jdbc.DTV$SendByRPCOp.execute(dtv.java:887)
    at com.microsoft.sqlserver.jdbc.DTV.executeOp(dtv.java:1078)
    at com.microsoft.sqlserver.jdbc.DTV.sendByRPC(dtv.java:1116)
    at com.microsoft.sqlserver.jdbc.Parameter.sendByRPC(Parameter.java:660)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.sendParamsByRPC(SQLServerPreparedStatement.java:473)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doPrepExec(SQLServerPreparedStatement.java:628)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatementBatch(SQLServerPreparedStatement.java:1282)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtBatchExecCmd.doExecute(SQLServerPreparedStatement.java:1209)
    at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:5696)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1715)
    at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:180)
    at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:155)
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1173)
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataRow(JDBCExportModule.java:535)
    ... 14 more

Missing ANSI_PADDING definition on MSSQLServer

When round-trip testing from MSSQLServer to SIARD and back, missing definition of ANSI_PADDING. Example: SET ANSI_PADDING ON.

Check if this is a problem or if ANSI_PADDING is not a significant property.

java.lang.NullPointerException when exporting database

Hi.

As part of the E-ARK project (http://eark-project.com/) I am trying out the keeps/db-preservation-toolkit.

When trying to export an Alfresco MySQL database I get the following error:

INFO 2014-11-17 13:19:06,747 (Main) Translating database: MySQLJDBCImportModule to DBMLExportModule
ERROR 2014-11-17 14:30:25,105 (Main) Unexpected exception
java.lang.NullPointerException
at org.apache.commons.transaction.util.FileHelper.copy(FileHelper.java:331)
at org.apache.commons.transaction.util.FileHelper.copy(FileHelper.java:296)
at org.apache.commons.transaction.util.FileHelper.copy(FileHelper.java:271)
at pt.gov.dgarq.roda.common.convert.db.model.data.FileItem.(FileItem.java:52)
at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.convertRawToCell(JDBCImportModule.java:480)
at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.convertRawToRow(JDBCImportModule.java:443)
at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.getDatabase(JDBCImportModule.java:529)
at pt.gov.dgarq.roda.common.convert.db.Main.main(Main.java:93)

I have made a mysqldump from the file, but it is too big to attach here (105MB). If you have a place where I can upload the file, please let me know, so you can recreate the problem.

  • Torben

Setting different schema names

When round-trip testing from DBMS to SIARD and back again, schema names are being suffixed by "_dbpresexport" in at least PostgreSQL and MSSQLServer. In PostgreSQL this creates difficulties when accessing the database.

Better XMLBufferedWriter

Put attributes as arguments of the openTag(String, Attribute...) and automatically calculate indentation level.

Testing environments

Add different testing groups/suites/configurations so that the tests that depend on databases (and maybe other future tests with external dependencies) don't run unless we specifically ask maven to run them.

NullPointerException when converting SIARD to MySQL

When round-trip testing from MySQL to SIARD and back again, the following error occurred on the SIARD to MySQL conversion:

ERROR 2014-12-04 17:03:32,931 (Main) Unexpected exception
java.lang.NullPointerException
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataCell(JDBCExportModule.java:637)
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataRow(JDBCExportModule.java:526)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule$SIARDContentSAXHandler.endElement(SIARDImportModule.java:953)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule.getDatabase(SIARDImportModule.java:224)
    at pt.gov.dgarq.roda.common.convert.db.Main.main(Main.java:89)

MySQL database name is suffixed

When round-trip testing from MySQL to SIARD and back again, database name gets suffixed by the database original name. Example: alfresco_siard becomes alfresco_siard_alfresco, original database name was alfresco.

JDBC driver class could not be found

do i need to download myself an Access jdbc driver?

$ java -jar db-preservation-toolkit-1.0.0-jar-with-dependencies.jar -i MSAccess books.mdb -o DBML ./books/   

INFO 2013-09-16 12:37:59,699 (Main) Translating database: MsAccessImportModule to DBMLExportModule
ERROR 2013-09-16 12:37:59,707 (Main) Error while importing/exporting
pt.gov.dgarq.roda.common.convert.db.model.exception.ModuleException: JDBC driver class could not be found
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.getDatabase(JDBCImportModule.java:538)
    at pt.gov.dgarq.roda.common.convert.db.Main.main(Main.java:93)
Caused by: java.lang.ClassNotFoundException: sun.jdbc.odbc.JdbcOdbcDriver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:171)
    at pt.gov.dgarq.roda.common.convert.db.modules.msAccess.in.MsAccessImportModule.getConnection(MsAccessImportModule.java:55)
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.getDatabaseStructure(JDBCImportModule.java:174)
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.in.JDBCImportModule.getDatabase(JDBCImportModule.java:523)
    ... 1 more

VIEWS are missing in MSSQLServer

When round-trip testing from MSSQLServer to SIARD and back, missing declarations of VIEWS. This may also happen in other DBMSs.

Missing UNIQUE constraint

When roud-trip testing from PostgreSQL to SIARD and back again, UNIQUE constraints on columns are missing. This may occur with other DBMS but it was only verified in PostgreSQL.

Lost precision on dates

On round-trip testing from PostgreSQL to SIARD and back again, dates loose precision (milliseconds are lost). Example 2014-11-27 11:12:38.373 become 2014-11-27 11:12:38.37. Other DBMS may also have this problem.

Column type change in MySQL

When round-trip testing from MySQL to SIARD and back again, the following column data types changed:

  • tinyint(4) became smallint(6)
  • bigint(20) became decimal(19,0)
  • timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP became datetime NOT NULL
  • tinyint(1) became bit(1)
  • blob became longblob

Missing SEQUENCE in SIARD to PostgreSQL

When round-trip testing from PostgreSQL to SIARD and back, SEQUENCES are missing.

Check if this is a problem or if SEQUENCE is not considered to be a significant property.

Possibly lost one hour on date values in SIARD to MySQL conversion

When round-trip testing from MySQL to SIARD and back again, and column type changed from timestamp to datetime, the value of the insert changes one hour, but it is possibly due to the difference on the data types. Example: 2014-08-27 11:04:23 becomes 2014-08-27 12:04:23.

Column type change in PostgreSQL

When round-trip testing PostgreSQL to SIARD and back again, the following column data types change:

  • character varying becomes text
  • bigint becomes `numeric(19,0)

Don't initialize variables

Do this for variables that are eventually defined. The purpose of this is to have a compile time error in some future code if a code path is added that does not set the variable.

SimpleTypeBinary not supported by SIARD Import module

When testing export from SIARD to MySQL on a database with BLOBs, the following error occurred:

ERROR 2014-12-04 17:03:32,917 (SIARDImportModule) An error occurred while handling data row
pt.gov.dgarq.roda.common.convert.db.model.exception.InvalidDataException: SimpleTypeBinary not applicable to simple cell or not yet supported
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataCell(JDBCExportModule.java:591)
    at pt.gov.dgarq.roda.common.convert.db.modules.jdbc.out.JDBCExportModule.handleDataRow(JDBCExportModule.java:526)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule$SIARDContentSAXHandler.endElement(SIARDImportModule.java:953)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at pt.gov.dgarq.roda.common.convert.db.modules.siard.in.SIARDImportModule.getDatabase(SIARDImportModule.java:224)
    at pt.gov.dgarq.roda.common.convert.db.Main.main(Main.java:89)

Need SIARD support

Hi.

I need to have SIARD support integrated into the db-preservation-toolkit. I can see, that there is already a pull request for such a feature: #3

Would you please consider merging this int the toolkit? I am using an Alfresco database for testing, and I might as well test the SIARD support as well :-)

  • Torben/e-ark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.