Comments (7)
It's actually the character 𝒷
caused the error, not those German characters. 𝒷
is the escaped form of MATHEMATICAL SCRIPT SMALL B
from marklogic-contentpump.
Used the below command:
mlcp/mlcp-9.0/bin/mlcp.sh import -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
special-char.xml:
b Justus-Liebig-Universität , Gie𝒷en , Germany.
I am getting the below sample exceptions,
ERROR contentpump.MultithreadedMapper: Parsing error
ERROR contentpump.MultithreadedMapper: java.lang.NullPointerException
Please find the attached file: 42812_MLCP_console_errorlog.txt for more info on this.
attached in bugtrak 42812_MLCP_console_errorlog.txt
from marklogic-contentpump.
[ypaladug@engrlab-131-083 qa]$ mlcp/mlcp-9.0/bin/mlcp.sh import -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Command: IMPORT
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Arguments: -username admin -password admin -host engrlab-131-083.engrlab.marklogic.com -port 8000 -mode local -input_file_path /tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/07 19:02:31 DEBUG contentpump.ContentPump: Running in: localmode
16/12/07 19:02:31 INFO contentpump.LocalJobRunner: Content type: XML
16/12/07 19:02:32 INFO contentpump.ContentPump: Job name: local_864518043_1
16/12/07 19:02:32 DEBUG contentpump.LocalJobRunner: Thread pool size: 4
16/12/07 19:02:32 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1
16/12/07 19:02:32 DEBUG contentpump.LocalJobRunner: Thread Count for Split#0 : 4
16/12/07 19:02:32 DEBUG contentpump.MultithreadedMapper: Running with 4 threads
16/12/07 19:02:32 ERROR contentpump.AggregateXMLReader: Parsing error
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper: Parsing error
java.io.IOException: Parsing error
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:554)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: Invalid byte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:591)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
... 13 more
16/12/07 19:02:32 ERROR contentpump.AggregateXMLReader: Parsing error
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper: Parsing error
java.io.IOException: Parsing error
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:554)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,32]
Message: XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
... 13 more
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper:
java.lang.NullPointerException
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1788)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(XMLEntityScanner.java:954)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2820)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:553)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at com.marklogic.contentpump.MultithreadedMapper.run(MultithreadedMapper.java:215)
at com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.java:378)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 ERROR contentpump.MultithreadedMapper:
java.lang.NullPointerException
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1788)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(XMLEntityScanner.java:954)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2820)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:553)
at com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLReader.java:468)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: completed 0%
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/12/07 19:02:32 INFO contentpump.LocalJobRunner: Total execution time: 0 sec
from marklogic-contentpump.
I don't reproduce.
[jsun@msun-z620 mlcp-9.0]$ bin/mlcp.sh import -username admin -password admin -host msun -port 8000 -mode local -input_file_path /space/tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Command: IMPORT
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Arguments: -username admin -password admin -host msun -port 8000 -mode local -input_file_path /space/tmp/special-char.xml -input_file_type aggregates -aggregate_record_element q -generate_uri
16/12/08 10:57:29 DEBUG contentpump.ContentPump: Running in: localmode
16/12/08 10:57:29 INFO contentpump.LocalJobRunner: Content type: XML
16/12/08 10:57:30 INFO contentpump.ContentPump: Job name: local_1721263359_1
16/12/08 10:57:30 DEBUG contentpump.LocalJobRunner: Thread pool size: 4
16/12/08 10:57:30 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1
16/12/08 10:57:30 DEBUG contentpump.LocalJobRunner: Thread Count for Split#0 : 4
16/12/08 10:57:30 DEBUG contentpump.MultithreadedMapper: Running with 4 threads
16/12/08 10:57:30 DEBUG mapreduce.ContentWriter: Connect to msun
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: completed 100%
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 1
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/12/08 10:57:30 INFO contentpump.LocalJobRunner: Total execution time: 0 sec
from marklogic-contentpump.
I am not able to attach the file, I found Yeshwanth used a file which is not UTF8 encoded, so that is the reason for the above stack.
If I try the above special charecters in utf8, working fine, but I see NullPointerExceptions when we used not UTF8 file so I am expecting them to be fixed as part of this bug.
you can find the file we used to get the above stack at /project/qa/skottam/special-char.xml
from marklogic-contentpump.
The original reported bug has been fixed already. The NPE is a separate one and a corner case, I will create another bug on 9.0.1 tracking it. Please test the original bug for EA4.
from marklogic-contentpump.
We have validated the document with UTF8 encoding and its working.
from marklogic-contentpump.
Related Issues (20)
- MLCP does not handle quoted fields for non-default delimiter HOT 3
- MLCP skips bad records without reporting when using splits HOT 1
- mlcp task fails when using java 11 on windows HOT 7
- command line options doesn't match documentation HOT 1
- Powershell doesn't receive the error code
- test instructions are not up to date HOT 1
- Errors are not reported when transform throws an error and thread count equals or exceeds number of documents HOT 2
- mlcp issue with hadoop-hdfs-2.6.0.jar HOT 1
- Configure pre and post functions to run during import HOT 1
- Deprecated XCC method ContentSourceFactory.newContentSource used sending password as String instead of char[] HOT 1
- JsonGenerator createJsonGenerator is deprecated HOT 1
- Broken link to http://developer.marklogic.com/pubs/ in JavaDoc footer HOT 1
- configure maven-compiler-plugin for Java 8 HOT 2
- MLCP transform on triple data does not insert into output_graph without additional steps
- copy via gradle is logging password HOT 1
- The current depends on jena 2.13.0 introduce CVE-2021-39239 HOT 1
- -output_uri_replace regex is nearly impossible to use
- incorrect dependency on javax.xml.soap.Node HOT 2
- Issues with commons-csv:1.5.2-marklogic in mlcp 11.0.0 HOT 2
- Critical CVE-2022-42889 with commons-text:1.9 in mlcp 11.0.2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marklogic-contentpump.