logbee / keyscore Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
In my opinion, writing descriptors for sources, filters and sinks is a pain. There are multiple reasons why we should rewrite the descriptor-api.
Mixing the parameter definition an the translation of the name and description pollutes the code and make it hard to read. Additionally creating a map of descriptors (to support multiple languages) makes it even worse. In the example below taken from the AddFieldsFilterLogic
is a lot of code related to localization and not defining parameters:
private val filterName = "io.logbee.keyscore.agent.pipeline.contrib.filter.AddFieldsFilterLogic"
private val bundleName = "io.logbee.keyscore.agent.pipeline.contrib.filter.AddFieldsFilter"
private val filterId = "1a6e5fd0-a21b-4056-8a4a-399e3b4e7610"
override def describe: MetaFilterDescriptor = {
val descriptorMap = mutable.Map.empty[Locale, FilterDescriptorFragment]
descriptorMap ++= Map(
Locale.ENGLISH -> descriptor(Locale.ENGLISH),
Locale.GERMAN -> descriptor(Locale.GERMAN)
)
MetaFilterDescriptor(fromString(filterId), filterName, descriptorMap.toMap)
}
private def descriptor(language: Locale): FilterDescriptorFragment = {
val translatedText: ResourceBundle = ResourceBundle.getBundle(bundleName, language)
FilterDescriptorFragment(
displayName = translatedText.getString("displayName"),
description = translatedText.getString("description"),
previousConnection = FilterConnection(true),
nextConnection = FilterConnection(true),
parameters = List(
MapParameterDescriptor("fieldsToAdd", translatedText.getString("fieldsToAddName"), translatedText.getString("fieldsToAddDescription"),
TextParameterDescriptor("fieldName", translatedText.getString("fieldKeyName"), translatedText.getString("fieldKeyDescription")),
TextParameterDescriptor("fieldValue", translatedText.getString("fieldValueName"), translatedText.getString("fieldValueDescription"))
)
))
}
To extract the parameter values from a given configuration i have to search through the configuration and check the parameter by its name given as string. This approach is error prone and needs a lot boiler plate code as shown in the Example below also taken from the AddFieldsFilterLogic
:
for (parameter <- configuration.parameters) {
parameter.name match {
case "fieldsToAdd" =>
val dataMap = parameter.value.asInstanceOf[Map[String, String]]
dataToAdd ++= dataMap.map(pair => (pair._1, TextField(pair._1, pair._2)))
case _ =>
}
}
There is no automatic check whether a given configuration suits a certain logic. The developer of a filter has to write it on its own or the filter crashes at runtime due to a missing paramter.
ParameterDescriptors
:
import io.logbee.keyscore.model.FieldNameHint.PresentField
import io.logbee.keyscore.model.PatternType.Grok
import io.logbee.keyscore.model._
import io.logbee.keyscore.model.localization.{Locale, Localization, TextRef}
val EN = Locale("en", "US")
val DE = Locale("de/DE")
val filterDisplayName = TextRef("displayName")
val filterDescription = TextRef("description")
val category = TextRef("aa5de1cd-1122-758f-97fa-228ca8911378")
val parameterARef = ParameterRef("37024d8b-4aec-4b3e-8074-21ef065e5ee2")
val parameterADisplayName = TextRef("parameterADisplayName")
val parameterADescription = TextRef("parameterADescription")
// val parameterBRef = ParameterRef("ff543cab-15bf-114a-47a1-ce1f065e5513")
val parameterBDisplayName = TextRef("parameterBDisplayName")
val parameterBDescription = TextRef("parameterBDescription")
val parameterCRef = ParameterRef("b7cc9c84-ae6e-4ea3-bbff-f8d62af4caed")
// val parameterDRef = ParameterRef("5f28c6dd-f88f-4530-afd1-c8b946bc5406")
val descriptor = Descriptor(
id = "1a6e5fd0-a21b-4056-8a4a-399e3b4e7610",
describe = FilterDescriptor(
name = "io.logbee.keyscore.agent.pipeline.contrib.filter.AddFieldsFilterLogic",
displayName = filterDisplayName,
description = filterDescription,
category = category,
parameters = Seq(
TextParameterDescriptor(parameterARef, ParameterInfo(parameterADisplayName, parameterADescription), defaultValue = "Hello World", validator = StringValidator("Hello*", PatternType.Glob)),
BooleanParameterDescriptor(parameterCRef, ParameterInfo(TextRef("parameterDDisplayName"), TextRef("parameterDDescription")), defaultValue = true),
ConditionalParameterDescriptor(condition = BooleanParameterCondition(parameterCRef, negate = true), parameters = Seq(
PatternParameterDescriptor("98276284-a309-4f21-a0d8-50ce20e3376a", patternType = Grok),
ListParameterDescriptor("ff543cab-15bf-114a-47a1-ce1f065e5513",
ParameterInfo(parameterBDisplayName, parameterBDescription),
FieldNameParameterDescriptor(hint = PresentField, validator = StringValidator("^_.*", PatternType.RegEx)),
min = 1, max = Int.MaxValue)
)),
ChoiceParameterDescriptor("e84ad685-b7ad-421e-80b4-d12e5ca2b4ff", min = 1, max = 1, choices = Seq(
Choice("red"),
Choice("green"),
Choice("blue")
))
)
),
localization = Localization(Set(EN, DE), Map(
filterDisplayName -> Map(
EN -> "Add Fields",
DE -> "Feld Hinzufuegen"
),
filterDescription -> Map(
EN -> "Adds the specified fields.",
DE -> "Fuegt die definierten Felder hinzu."
),
category -> Map(
EN -> "Source",
DE -> "Quelle"
),
parameterADisplayName -> Map(
EN -> "A Parameter",
DE -> "Ein Parameter"
),
parameterADescription -> Map(
EN -> "A simple text parameter as example.",
DE -> "Ein einfacher Textparameter als Beispiel."
),
parameterBDisplayName -> Map(
EN -> "A Parameter",
DE -> "Ein Parameter"
),
parameterBDescription -> Map(
EN -> "A simple text parameter as example.",
DE -> "Ein einfacher Textparameter als Beispiel."
)
)
))
{
"jsonClass": "io.logbee.keyscore.model.Descriptor",
"id": "1a6e5fd0-a21b-4056-8a4a-399e3b4e7610",
"describe": {
"jsonClass": "io.logbee.keyscore.model.FilterDescriptor",
"name": "io.logbee.keyscore.agent.pipeline.contrib.filter.AddFieldsFilterLogic",
"displayName": {
"id": "displayName"
},
"description": {
"id": "description"
},
"category": {
"id": "aa5de1cd-1122-758f-97fa-228ca8911378"
},
"parameters": [
{
"jsonClass": "io.logbee.keyscore.model.TextParameterDescriptor",
"ref": {
"id": "37024d8b-4aec-4b3e-8074-21ef065e5ee2"
},
"info": {
"displayName": {
"id": "parameterADisplayName"
},
"description": {
"id": "parameterADescription"
}
},
"defaultValue": "Hello World",
"validator": {
"pattern": "Hello*",
"patternType": "Glob"
}
},
{
"jsonClass": "io.logbee.keyscore.model.BooleanParameterDescriptor",
"ref": {
"id": "b7cc9c84-ae6e-4ea3-bbff-f8d62af4caed"
},
"info": {
"displayName": {
"id": "parameterDDisplayName"
},
"description": {
"id": "parameterDDescription"
}
},
"defaultValue": true
},
{
"jsonClass": "io.logbee.keyscore.model.ConditionalParameterDescriptor",
"ref": {
"id": ""
},
"condition": {
"jsonClass": "io.logbee.keyscore.model.BooleanParameterCondition",
"parameter": {
"id": "b7cc9c84-ae6e-4ea3-bbff-f8d62af4caed"
},
"negate": true
},
"parameters": [
{
"jsonClass": "io.logbee.keyscore.model.PatternParameterDescriptor",
"ref": {
"id": "98276284-a309-4f21-a0d8-50ce20e3376a"
},
"patternType": "Grok",
"defaultValue": ""
},
{
"jsonClass": "io.logbee.keyscore.model.ListParameterDescriptor",
"ref": {
"id": "ff543cab-15bf-114a-47a1-ce1f065e5513"
},
"info": {
"displayName": {
"id": "parameterBDisplayName"
},
"description": {
"id": "parameterBDescription"
}
},
"kind": {
"jsonClass": "io.logbee.keyscore.model.FieldNameParameterDescriptor",
"ref": {
"id": ""
},
"defaultValue": "",
"hint": "PresentField",
"validator": {
"pattern": "^_.*",
"patternType": "RegEx"
}
},
"min": 1,
"max": 2147483647
}
]
},
{
"jsonClass": "io.logbee.keyscore.model.ChoiceParameterDescriptor",
"ref": {
"id": "e84ad685-b7ad-421e-80b4-d12e5ca2b4ff"
},
"min": 1,
"max": 1,
"choices": [
{
"name": "red"
},
{
"name": "green"
},
{
"name": "blue"
}
]
}
]
},
"localization": {
"locales": [
{
"language": "en",
"country": "US"
},
{
"language": "de",
"country": "DE"
}
],
"mapping": {
"parameterADisplayName": {
"translations": {
"en/US": "A Parameter",
"de/DE": "Ein Parameter"
}
},
"description": {
"translations": {
"en/US": "Adds the specified fields.",
"de/DE": "Fuegt die definierten Felder hinzu."
}
},
"parameterBDisplayName": {
"translations": {
"en/US": "A Parameter",
"de/DE": "Ein Parameter"
}
},
"aa5de1cd-1122-758f-97fa-228ca8911378": {
"translations": {
"en/US": "Source",
"de/DE": "Quelle"
}
},
"parameterADescription": {
"translations": {
"en/US": "A simple text parameter as example.",
"de/DE": "Ein einfacher Textparameter als Beispiel."
}
},
"displayName": {
"translations": {
"en/US": "Add Fields",
"de/DE": "Feld Hinzufuegen"
}
},
"parameterBDescription": {
"translations": {
"en/US": "A simple text parameter as example.",
"de/DE": "Ein einfacher Textparameter als Beispiel."
}
}
}
}
}
When starting the gradle task "startContainers" and one or more instances of those containers are already running, the task fails.
Currently, the user has to stop and remove those containers manually before executing the task again.
Possible Solution:
Stop and remove all specified containers automatically before executing the startContainers task.
The throughput Time is a good metric to evaluate the performance of KEYSCORE. The ValveStage
is already prepared to compute the throughput time of a filter in front of a valve and the throughput time from the beginning of the pipeline to a valve.
ValveStage
uses the MovingMedian
. But this class is currently not implemented.ValveState
already contains the throughput time and the total throughput time.FilterController
has to request the throughput time from the valves in question.FilterState
has to be enhanced to carry the throughput time and the total throughput time.Possible solution taken from https://groups.google.com/forum/#!topic/akka-user/z0y1kvcY97I:
One way is that in your seed node subscribe to cluster membership changes and write the current set of nodes to a file. When you restart the seed node you construct the list of seed nodes from the file, and include your own address as the first element in the seed-nodes list. Then it will first try to join the other nodes, before joining itself.
With the current implementation it is not possible to launch multiple instances of the same
Pipeline(Configuration). But there are use-cases where multiple pipelines can run
in parallel to speed-up data processing.
A fundamental problem in distributed systems is that network partitions (split brain scenarios) and machine crashes are indistinguishable for the observer [...] Temporary and permanent failures are indistinguishable because decisions must be made in finite time, and there always exists a temporary failure that lasts longer than the time limit for the decision.
[...]
The Akka cluster has a failure detector that will notice network partitions and machine crashes (but it cannot distinguish the two).
[...]
The failure detector in itself is not enough for making the right decision in all situations. The naive approach is to remove an unreachable node from the cluster membership after a timeout. This works great for crashes and short transient network partitions, but not for long network partitions. Both sides of the network partition will see the other side as unreachable and after a while remove it from its cluster membership. Since this happens on both sides the result is that two separate disconnected clusters have been created.
One of the most used programming languages in data analytics is python. That's why it is very import to KS to get an python integration. This integration can be spilt into two problems.
There are a couple of algorithms already written in python and a user of keyscore for whom it doesn't matter in which programming language a filter is written wants to use these algorithms as filters in her pipelines the same way as any other filter. So KS has to make it transparent to the user and there should be no difference between filters written in different programing language.
Another user of keyscre, a filter developer, wants to write his algorithm in python. Therefore KS has to provide an environment in which a python developer can implement sources, filters and sinks in python. And she needs a machanism to plug these custom filters into KS.
I'm going to track these two problems in separate issues:
These icons are for keyscore-manager:
TypeIcons(Field value types)
BlockTypeIcons
StandardBlockIcons
Apache Avro is a data serialization system. Avro provides:
- Rich data structures.
- A compact, fast, binary data format.
- A container file, to store persistent data.
- Remote procedure call (RPC).
- Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
Target of the second release is to stabilize exiting features and clean up the code base as preparation for the up-coming features.
Possible solution taken from https://groups.google.com/forum/#!topic/akka-user/z0y1kvcY97I:
One way is that in your seed node subscribe to cluster membership changes and write the current set of nodes to a file. When you restart the seed node you construct the list of seed nodes from the file, and include your own address as the first element in the seed-nodes list. Then it will first try to join the other nodes, before joining itself.
The original issue #4 was resolved by a work-around. After we introduce mechanisms to persist state of agents and frontiers we can re-work the issue.
KafkaSource and KafkaSink are not working anymore.
KafkaSource stops with: "Message [akka.kafka.KafkaConsumerActor$Internal$Stop$] without sender to Actor[akka://keyscore/system/kafka-consumer-2#2056765281] was not delivered. [3] dead letters encountered."
Kafka Sink stops with: "KafkaProducer - Closing the Kafka producer with timeoutMillis = 60000 ms."
I tried combining KafkaSource with StdOutSink and KafkaSink with HttpSource which produces the exact same errors, while a Stream with HttpSource and StdOut Sink works just as expected.
add a collapse Button in the description and the configuration component so the user can hide elements he doesnt need
after a collapse action the rest of the ui has to rearrange and use the free space to actually improve it
Evaluate wheter its possible to merge these two components
the task 'gradle startContainers' does fail locally with following Exceptions:
which is followed by:
Seems only to be a promblem if you run the task locally. It works in Travis.
The actual workflow provides that a user creates a PipelineConfiguration within the KS:M.
The configuration is send to the KS:F and from there to an agent which materializes the
configuration into a several stages and starts these stages. So there is currently no way
to just store a PipelineConfiguration without starting a pipeline.
The Agent is not able to translate the KafkaSink Descriptor. On Registering the KafkaSink extension the Agent throws an
java.util.MissingResourceException: Can't find resource for bundle java.util.PropertyResourceBundle, key category2
In Localization.scala
on Line 37 it tries to get the TextRef(category2)
String from the bundle which is not there.
But i have to admit that i have no idea why it even tries to resolve the category2
key.
add the functionality to search through datasets, weather search for keys, values or everything
the Ui-element for the search should be placed in the datasets-visualizer component
Method: DELETE
URL: /pipeline/configuration/*
Body: <empty>
.Case: Deletion was successful.
200 OK
<empty>
.Case: Something went wrong.
500 INTERNAL SERVER ERROR
<empty>
Method: DELETE
URL: /pipeline/instances/*
Body: <empty>
.Case: Deletion was successful.
200 OK
<empty>
.Case: Something went wrong.
500 INTERNAL SERVER ERROR
<empty>
When stopping a container with a bmuschko gradle task, e.g. the kafka stopContainer task, the task fails.
No value has been specified for property 'containerId'.
Therefore no container that was started from the bmuschko plugin in keyscore can be stopped and removed in keyscore.
The Keyscore-Manager navigation should be changed to a sidemenu.
Every main component shows some kind of a title or other similar information. But ever component does it in an other way.
Build a header bar as a angular dump component. The other component can use use the header bar to show similar information like:
Note: If a component is very special or does not have similar information to display it does't have to use the common header bar.
The KS:F should offer a rest-endpoint to remove a not responding agent from the cluster.
The keyscore-manager is still not loading in InternetExplorer although we`re now using webpack v4.19.0
Currently if the user navigates to /pipelines/pipeline/uuid and the given pipeline does not exist a new pipeline is created.
Showing a 404 page and redirecting the user to /pipelines/pipeline might be the better approach
When inserting 2 Datasets in the first Pipeline, the expected count of extracted Datasets of the filter of the second Pipeline is 2 - but is actually 3.
The problem does not extend to the actual dataflow in the pipeline.
Only the extraction of datasets from a filter is not as expected.
The goal of the first release is to illustrate the functioning of KS and the interaction of all technologies.
Summary:
More and more features and their components get integrated into the WebUI of keyscore. Some of them assist the user to monitor the system and provide information about the system‘s health and performance. Other components allow the user to tweak every aspect of the System in detail.
This can lead to a bloated UI. Where most users won‘t find what they are looking for. We call this an airplan cockpit. The UI offers too many levers, buttons and displays too many information at the same time.
To solve the problem described above the UI could offer the posibility to reduce or increase the level of details. Think of it as you are looking at a picture of a lovely forest with many trees a small lake and great mountains in background. If you stand far away from the picture you see there is a forest some mountains etc. If you get a bit closer you notice that it isn’t just a big forest. There are many different trees and some goose on the small lake. If you get even closer you will spot the leaves of the trees and the feathers of the gooses. So the amount of details you see depends one the distance to the picture you looking.
With this in mind we build a UI where the user can increase or reduce the amount of information like he/she chnges the distance to a picture. To implement this, we define a range e.g. [1..9]. Than we assinge each UI element to one of the levels. A UI element is visible if it belongs to the currently selected level or below.
This approach enables a user to decide on its own how detailed and complex the UI should be. A experienced user can set a high LOD to tweak every detail of KEYSCORE. On the other hand a new user can set a low LOD, he/she just gets the most importent UI elements displayed to orientate oneself quickly.
The base-url is currently stored in the conf/application.conf
file of the keyscore-manager which gets packt into the docker image during build. Therefore it is not possible to change the base-url after deployment easily.
To enhance the keyscore-manager a modal dialog settings-page could be implemented where the user can set the base-url to backend of his choice. These settings could be stored in a cookie.
Currently a stream cannot be built.
Because in the FilterManager the methods:
getSinkStageLogicConstructor,
getSourceStageLogicConstructor,
getFilterStageLogicConstructor
do fail because the logicClass.getConstructor(...) fail with a NoSuchMethodException
logicClass.getConstructors() returns an empty array.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.