kite-sdk / kite-examples Goto Github PK
View Code? Open in Web Editor NEWKite SDK Examples
License: Apache License 2.0
Kite SDK Examples
License: Apache License 2.0
Hi,
I try to implement the demos, but when I would like to create the datasets with the mvn command, it comes to an error:
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] demo ............................................... SUCCESS [ 5.167 s] [INFO] demo-core .......................................... SKIPPED [INFO] demo-crunch ........................................ SKIPPED [INFO] demo-logging-webapp ................................ SKIPPED [INFO] demo-reports-webapp ................................ SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 5.426 s [INFO] Finished at: 2016-06-02T11:06:06+02:00 [INFO] Final Memory: 35M/1370M [INFO] ------------------------------------------------------------------------ Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ShutdownHookManager$2 at org.apache.hadoop.util.ShutdownHookManager.getShutdownHooksInOrder(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:52) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ShutdownHookManager$2 at org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy.loadClass(SelfFirstStrategy.java:50) at org.codehaus.plexus.classworlds.realm.ClassRealm.unsynchronizedLoadClass(ClassRealm.java:271) at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:247) at org.codehaus.plexus.classworlds.realm.ClassRealm.loadClass(ClassRealm.java:239) ... 2 more
The dataset is not set.
We don't use the cloudera VM, we use cloudera enterprise in cluster mode.
Thanks in advance.
Martin
Hi everyone
I want to adapt the json
example provided, but I got this error:
15/05/20 08:09:47 INFO conf.FlumeConfiguration: Processing:UFOKiteDS
15/05/20 08:09:47 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [UFOAgent]
15/05/20 08:09:47 INFO node.AbstractConfigurationProvider: Creating channels
15/05/20 08:09:47 INFO channel.DefaultChannelFactory: Creating instance of channel archivo type file
15/05/20 08:09:47 INFO node.AbstractConfigurationProvider: Created channel archivo
15/05/20 08:09:47 INFO source.DefaultSourceFactory: Creating instance of source UFODir, type spooldir
15/05/20 08:09:47 INFO interceptor.StaticInterceptor: Creating StaticInterceptor: preserveExisting=true,key=flume.avro.schema.url,value=file:/home/itam/schemas/ufos.avsc
15/05/20 08:09:47 INFO api.MorphlineContext: Importing commands
15/05/20 08:09:52 INFO api.MorphlineContext: Done importing commands
15/05/20 08:09:52 INFO sink.DefaultSinkFactory: Creating instance of sink: UFOKiteDS, type: org.apache.flume.sink.kite.DatasetSink
15/05/20 08:09:52 ERROR node.AbstractConfigurationProvider: Sink UFOKiteDS has been removed due to an error during configuration
java.lang.IllegalArgumentException
at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at org.kitesdk.data.URIBuilder.<init>(URIBuilder.java:106)
at org.kitesdk.data.URIBuilder.<init>(URIBuilder.java:90)
at org.apache.flume.sink.kite.DatasetSink.configure(DatasetSink.java:188)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/05/20 08:09:52 INFO node.AbstractConfigurationProvider: Channel archivo connected to [UFODir]
15/05/20 08:09:52 INFO node.Application: Starting new configuration:{ sourceRunners:{UFODir=EventDrivenSourceRunner: { source:Spool Directory source UFODir: { spoolDir: /opt/ufos } }} sinkRunners:{} channels:{arch
ivo=FileChannel archivo { dataDirs: [/opt/ufos/log/data] }} }
15/05/20 08:09:52 INFO node.Application: Starting Channel archivo
15/05/20 08:09:52 INFO file.FileChannel: Starting FileChannel archivo { dataDirs: [/opt/ufos/log/data] }...
15/05/20 08:09:52 INFO file.Log: Encryption is not enabled
I ran the flume-agent
with:
flume-ng agent -n UFOAgent -Xmx100m --conf ingestion -f ingestion/spooldir_example.conf
The spooldir_example.conf
is
# Componentes
UFOAgent.sources = UFODir
UFOAgent.channels = archivo
UFOAgent.sinks = UFOKiteDS
# Canal
UFOAgent.channels.archivo.type = file
UFOAgent.channels.archivo.checkpointDir = /opt/ufos/log/checkpoint/
UFOAgent.channels.archivo.dataDirs = /opt/ufos/log/data/
# Fuente
UFOAgent.sources.UFODir.type = spooldir
UFOAgent.sources.UFODir.channels = archivo
UFOAgent.sources.UFODir.spoolDir = /opt/ufos
UFOAgent.sources.UFODir.fileHeader = true
UFOAgent.sources.UFODir.deletePolicy = immediate
# Interceptor
UFOAgent.sources.UFODir.interceptors = attach-schema morphline
UFOAgent.sources.UFODir.interceptors.attach-schema.type = static
UFOAgent.sources.UFODir.interceptors.attach-schema.key = flume.avro.schema.url
UFOAgent.sources.UFODir.interceptors.attach-schema.value = file:/home/itam/schemas/ufos.avsc
UFOAgent.sources.UFODir.interceptors.morphline.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
UFOAgent.sources.UFODir.interceptors.morphline.morphlineFile = /home/itam/ingestion/morphline.conf
UFOAgent.sources.UFODir.interceptors.morphline.morphlineId = convertUFOFileToAvro
# Sumidero
UFOAgent.sinks.UFOKiteDS.type = org.apache.flume.sink.kite.DatasetSink
UFOAgent.sinks.UFOKiteDS.channel = archivo
UFOAgent.sinks.UFOKiteDS.kite.repo.uri = dataset:hive
UFOAgent.sinks.UFOKiteDS.kite.dataset.name = ufos
UFOAgent.sinks.UFOKiteDS.kite.batchSize = 10
I created the dataset as follows:
kite-dataset create ufos --schema /home/itam/schemas/ufos.avsc --format avro
Finally, the morphline.conf
is
morphlines: [
{
id: convertUFOFileToAvro
importCommands: ["com.cloudera.**", "org.kitesdk.**" ]
commands: [
{ tryRules {
catchExceptions : false
throwExceptionIfAllRulesFailed : true
rules : [
# next rule of tryRules cmd:
{
commands : [
{ readCSV: {
separator : "\t"
columns : [Timestamp, City, State, Shape, Duration, Summary, Posted]
trim: true
charset : UTF-8
quoteChar : "\""
}
}
{
toAvro {
schemaFile: /home/itam/schemas/ufos.avsc
}
}
{
writeAvroToByteArray: {
format: containerlessBinary
}
}
]
}
# next rule of tryRules cmd:
{
commands : [
{ dropRecord {} }
]
}
]
}
}
{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]
What I am doing wrong?
Hi everybody, I'm trying to write some unit test for my morphlines files and I'm facing a problem related with the location of my morphline files.
The files are under "src/main/resources" folder then when compiled are in "target/classes" folder but the AbstractMorphlineTest class is looking for it on "target/test-classes". I understand that the obvious solution is move my files to "src/test/resources" but I wonder if there is any way to override this setting.
Thanks in advance
P.S.- I also have looked for the AbstractMorphlineTest in order to make some changes and submit a PR but I'm not able to find it in any repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.