Giter Site home page Giter Site logo

gaffer-doc's People

Contributors

cn337131 avatar dependabot[bot] avatar gchqdev404 avatar gchqdeveloper314 avatar l46978 avatar lb324567 avatar n3101 avatar p3430233 avatar r32575 avatar rj77259 avatar t92549 avatar tb06904 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gaffer-doc's Issues

Update User Guide

The user guide contains some typos, and it could be made easier to follow for users who would like to start from a blank project rather than cloning the examples.

Fix broken links

There are several links that still point to files in github/gchq/Gaffer/doc. These need to be updated to point to the files in this repository.

Spark operation examples don't show how to specify Hadoop conf

The examples for operations such as GetJavaRDDOfAllElements don't show how the Hadoop configuration can be passed in as an option. Passing this option in isn't essential if you're running your code either via the Hadoop command or within a Spark job that has been configured with Hadoop, but it means you can't override properties in the configuration.

Copied from gchq/Gaffer#1328

Add documentation to explain that Get operations returns a lazy iterable

For example if a user executes a GetAllElements on Accumulo:

final Iterable<? extends Element> elements = graph.execute(new GetAllElements(), getUser());

The 'elements' iterable is lazy and the query is only executed on Accumulo when you start iterating around the results. So if you add another element 'X' to the graph before you consume the 'elements' iterable you will notice the results now also contain 'X'.

For this reason you should be very careful if you do an AddElements with a lazy iterable returned from a Get query on the same Graph. The problem that could arise is that the AddElements will lazily consume the lazy iterable of elements, potentially causing duplicates to be added.

To do a Get followed by an Add on the same Graph, we recommend consuming and caching the Get results first. For a small number of results, this can be done simply using the ToList operation in your chain. e.g:

new OperationChain.Builder()
                .first(new GetAllElements())
                .then(new ToList<>())
                .then(new AddElements())
                .build();

For a large number of results you could add them to the gaffer cache temporarily:

new OperationChain.Builder()
                .first(new GetAllElements())
                .then(new ExportToGafferResultCache<>())
                .then(new DiscardOutput())
                .then((Operation) new GetGafferResultCacheExport())
                .then(new AddElements())
                .build()

Add documentation for MATCHED_VERTEX and ADJACENT_MATCHED_VERTEX

This should go in the Filtering and/or Views section within the User Guide.

The documentation should explain that when applying filtering, aggregation and transformation in Views instead of selecting a property name you can select one of these fields: VERTEX, SOURCE, DESTINATION, DIRECTED, MATCHED_VERTEX, ADJACENT_MATCHED_VERTEX.

Document timestamp property

The Dev guide contains a note about setting the timestamp property in the schema, but doesn't say anything about what it's for. I think it just lets you specify which property is used to set the timestamp in an Accumulo key, but that doesn't achieve anything as far as the user is concerned.

Add documentation for the addElementsFromHdfs logic updates

Setting numReduceTasks has now been deprecated, and instead setting the min and/or max should be used. This means Accumulo, in most cases, will be able to choose the right amount of reducers for the user, based on the number of tablet servers. If the minimum is more than the amount Accumulo chooses it will update to be more, and equally if the maximum is less than the Accumulo amount, it will reduce the number of reducers to be the maximum.

Create GetSchema examples

The new GetSchema operation should have a few examples, to demonstrate usage, and mention the implications of the 'compact' boolean flag.

Add documentation for Global View filters

We don't really have any documentation for what and how global filters work in a View.

We should also document using a global groupBy and global properties/excludeProperties.

Add a Testing page that documents our testing coverage

This page should detail the level of our testing, such as what combinations of different schemas have been tested with each of the different ingest mechanisms on the different Stores.

It should also describe our Integration Test Suite and the areas of the framework that are included/missed.

Add Cache docs

Extract out the duplication of setting up a cache (in NamedOperations, Jobs and Federated Store within the Dev Guide) and make 1 cache section in the Dev guide. The other sections should reference it.

Add a Traversing walkthrough to the user guide

It would be useful to explain the different ways of traversing a Gaffer graph. This should include the use of:

  • GetAdjacentIds -> GetAdjacentIds
  • GetWalks (GetElements, GetElements)
  • GetElements -> ToVertices -> ToEntitySeeds -> GetElements

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.