Giter Site home page Giter Site logo

Comments (7)

mattsunsjf avatar mattsunsjf commented on August 16, 2024

It's actually the character 𝒷 caused the error, not those German characters. 𝒷 is the escaped form of MATHEMATICAL SCRIPT SMALL B

from marklogic-contentpump.

sravanr avatar sravanr commented on August 16, 2024

Used the below command:

mlcp/mlcp-9.0/bin/mlcp.sh import -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
special-char.xml:

b Justus-Liebig-Universität , Gie𝒷en , Germany.

I am getting the below sample exceptions,
ERROR contentpump.MultithreadedMapper: Parsing error
ERROR contentpump.MultithreadedMapper: java.lang.NullPointerException

Please find the attached file: 42812_MLCP_console_errorlog.txt for more info on this.

attached in bugtrak 42812_MLCP_console_errorlog.txt

from marklogic-contentpump.

sravanr avatar sravanr commented on August 16, 2024
[ypaladug@engrlab-131-083 qa]$ mlcp/mlcp-9.0/bin/mlcp.sh  import -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com  -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Command: IMPORT
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Arguments: -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Running in: localmode
16/12/07 19:02:31 INFO contentpump.LocalJobRunner: Content type: XML
16/12/07 19:02:32 INFO contentpump.ContentPump: Job name: local_864518043_1
16/12/07 19:02:32 DEBUG contentpump.LocalJobRunner: Thread pool size: 4
16/12/07 19:02:32 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1
16/12/07 19:02:32 DEBUG contentpump.LocalJobRunner: Thread Count for Split#0 : 4
16/12/07 19:02:32 DEBUG contentpump.MultithreadedMapper: Running with 4 threads
16/12/07 19:02:32 ERROR contentpump.AggregateXMLReader: Parsing error
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: Invalid byte 2 of 3-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper: Parsing error
java.io.IOException: Parsing error
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:554)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: Invalid byte 2 of 3-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        ... 13 more
16/12/07 19:02:32 ERROR contentpump.AggregateXMLReader: Parsing error
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: XML document structures must start and end within the same entity.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper: Parsing error
java.io.IOException: Parsing error
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:554)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: XML document structures must start and end within the same entity.
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        ... 13 more
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper:
java.lang.NullPointerException
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1788)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(XMLEntityScanner.java:954)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2820)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:553)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at com.marklogic.contentpump.MultithreadedMapper.run(MultithreadedMapper.java:215)
        at com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.java:378)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper:
java.lang.NullPointerException
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1788)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(XMLEntityScanner.java:954)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2820)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:553)
        at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
        at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
        at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 INFO contentpump.LocalJobRunner:  completed 0%
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: Total execution time: 0 sec

from marklogic-contentpump.

mattsunsjf avatar mattsunsjf commented on August 16, 2024

I don't reproduce.

[jsun@msun-z620 mlcp-9.0]$ bin/mlcp.sh import -username admin -password admin -host msun  -port 8000 -mode local -input_file_path /space/tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Command: IMPORT
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Arguments: -username admin -password admin -host msun -port 8000 -mode local -input_file_path /space/tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Running in: localmode
16/12/08 10:57:29 INFO contentpump.LocalJobRunner: Content type: XML
16/12/08 10:57:30 INFO contentpump.ContentPump: Job name: local_1721263359_1
16/12/08 10:57:30 DEBUG contentpump.LocalJobRunner: Thread pool size: 4
16/12/08 10:57:30 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1
16/12/08 10:57:30 DEBUG contentpump.LocalJobRunner: Thread Count for Split#0 : 4
16/12/08 10:57:30 DEBUG contentpump.MultithreadedMapper: Running with 4 threads
16/12/08 10:57:30 DEBUG mapreduce.ContentWriter: Connect to msun
16/12/08 10:57:30 INFO contentpump.LocalJobRunner:  completed 100%
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: Total execution time: 0 sec

from marklogic-contentpump.

sravanr avatar sravanr commented on August 16, 2024

I am not able to attach the file, I found Yeshwanth used a file which is not UTF8 encoded, so that is the reason for the above stack.

If I try the above special charecters in utf8, working fine, but I see NullPointerExceptions when we used not UTF8 file so I am expecting them to be fixed as part of this bug.

you can find the file we used to get the above stack at /project/qa/skottam/special-char.xml

from marklogic-contentpump.

mattsunsjf avatar mattsunsjf commented on August 16, 2024

The original reported bug has been fixed already. The NPE is a separate one and a corner case, I will create another bug on 9.0.1 tracking it. Please test the original bug for EA4.

from marklogic-contentpump.

sravanr avatar sravanr commented on August 16, 2024

We have validated the document with UTF8 encoding and its working.

from marklogic-contentpump.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.