gchq / gaffer-doc Goto Github PK
View Code? Open in Web Editor NEWDocumentation for Gaffer
Home Page: https://gchq.github.io/gaffer-doc/
License: Apache License 2.0
Documentation for Gaffer
Home Page: https://gchq.github.io/gaffer-doc/
License: Apache License 2.0
The user guide contains some typos, and it could be made easier to follow for users who would like to start from a blank project rather than cloning the examples.
There are several links that still point to files in github/gchq/Gaffer/doc. These need to be updated to point to the files in this repository.
When run on Accumulo, reading directly from R files is supported in GetDataFrameOfElements and GetJavaRDDOfAllElements.
Depends on gchq/koryphe#64.
Similar to the Aggregate and Filter operations, a few examples for usage of the Transform operation should be written.
The examples for operations such as GetJavaRDDOfAllElements don't show how the Hadoop configuration can be passed in as an option. Passing this option in isn't essential if you're running your code either via the Hadoop command or within a Spark job that has been configured with Hadoop, but it means you can't override properties in the configuration.
Copied from gchq/Gaffer#1328
For example if a user executes a GetAllElements on Accumulo:
final Iterable<? extends Element> elements = graph.execute(new GetAllElements(), getUser());
The 'elements' iterable is lazy and the query is only executed on Accumulo when you start iterating around the results. So if you add another element 'X' to the graph before you consume the 'elements' iterable you will notice the results now also contain 'X'.
For this reason you should be very careful if you do an AddElements with a lazy iterable returned from a Get query on the same Graph. The problem that could arise is that the AddElements will lazily consume the lazy iterable of elements, potentially causing duplicates to be added.
To do a Get followed by an Add on the same Graph, we recommend consuming and caching the Get results first. For a small number of results, this can be done simply using the ToList operation in your chain. e.g:
new OperationChain.Builder()
.first(new GetAllElements())
.then(new ToList<>())
.then(new AddElements())
.build();
For a large number of results you could add them to the gaffer cache temporarily:
new OperationChain.Builder()
.first(new GetAllElements())
.then(new ExportToGafferResultCache<>())
.then(new DiscardOutput())
.then((Operation) new GetGafferResultCacheExport())
.then(new AddElements())
.build()
With the addition of new Operation
s, the development guide could do to be updated with all of the potentially required tasks associated with adding a new Operation
.
The getting started pages are getting quite big. It would be good to try and break them down into smaller pages for each section.
This should go in the Filtering and/or Views section within the User Guide.
The documentation should explain that when applying filtering, aggregation and transformation in Views instead of selecting a property name you can select one of these fields: VERTEX, SOURCE, DESTINATION, DIRECTED, MATCHED_VERTEX, ADJACENT_MATCHED_VERTEX.
When producing the examples for the different functions, predicates and operations we should check if the class is annotated with Deprecated. If it is, we should add a note to tell users that it is deprecated in the description.
Examples should be written for the GetGraphFrameOfElements operation.
The Dev guide contains a note about setting the timestamp property in the schema, but doesn't say anything about what it's for. I think it just lets you specify which property is used to set the timestamp in an Accumulo key, but that doesn't achieve anything as far as the user is concerned.
Setting numReduceTasks has now been deprecated, and instead setting the min and/or max should be used. This means Accumulo, in most cases, will be able to choose the right amount of reducers for the user, based on the number of tablet servers. If the minimum is more than the amount Accumulo chooses it will update to be more, and equally if the maximum is less than the Accumulo amount, it will reduce the number of reducers to be the maximum.
The new GetSchema operation should have a few examples, to demonstrate usage, and mention the implications of the 'compact' boolean flag.
Also update koryphe version to 1.0.0 and gaffer-tools version to 1.0.0-RC4
We don't really have any documentation for what and how global filters work in a View.
We should also document using a global groupBy and global properties/excludeProperties.
Should either add a new example or modify the existing examples to include the new optional field of "Score".
Examples for the new Map and FlatMap operations (see gchq/Gaffer#1345)
Use the same code tabs we use for showing how to create an operation to display the results of the operation. This should show what the results look like in java (or a simple toString) and json.
This could involve just including the static Swagger documentation for the REST API.
This should be similar to the Predicates section.
An example should be written for the ScoreOperationChain to demonstrate how it is used.
Currently as a fix until 1.0.0-RC3, the configuration is encoded without using:
final String encodedConf = AbstractGetRDDHandler.convertConfigurationToString(configuration);
This needs to be updated to use the above line when 1.0.0-RC3 is released.
This just involves copying the comment from gchq/Gaffer#1798 to the Filtering page in the documentation.
This page should detail the level of our testing, such as what combinations of different schemas have been tested with each of the different ingest mechanisms on the different Stores.
It should also describe our Integration Test Suite and the areas of the framework that are included/missed.
This should explain that undirected edges are bidirectional and how they are aggregated. It should also mention that Gaffer will flip the edge for consistency so the source is always ordered 'less' than the destination (based on natural ordering).
Examples to be written, demonstrating usage of the new Aggregate operation.
We should update the documentation on sketches to include this link to show the difference in performance of Clearspring's HLL and Datasketches HLL (https://datasketches.apache.org/docs/HLL/Hll_vs_CS_Hllpp.html) and use this to justify recommending use of datasketches over Clearspring.
This is dependent on Gaffer 1.0.0-RC3
Extract out the duplication of setting up a cache (in NamedOperations, Jobs and Federated Store within the Dev Guide) and make 1 cache section in the Dev guide. The other sections should reference it.
This depends on Gaffer version 1.1.0
Documentation and examples to be added for the:
If
: gchq/koryphe#81
IsLongerThan
: gchq/koryphe#89
Length
: gchq/koryphe#85
ExtractId
and ExtractProperty
: gchq/Gaffer#1705
If
: gchq/Gaffer#1649
There doesn't seem to be any documentation of GraphConfig.
It would be useful to explain the different ways of traversing a Gaffer graph. This should include the use of:
A number of new functions have been added to Gaffer - examples for each of these should be written.
A new filter operation has been written for Gaffer, an example should be written to give users an idea of usage.
Documentation should be written for the recent additions of a number of Extraction functions in Koryphe, and operations in Gaffer.
See:
gchq/koryphe#60
gchq/koryphe#74
Links don't work example: https://gchq.github.io/gaffer-doc/getting-started/properties-guide/simple-properties/hashmap.html
Examples should also be added for the SchemaMigration options.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.