Giter Site home page Giter Site logo

documentation's Introduction

Tendrl API

  • Unit tests: Build Status

  • Functional tests: Build Status

Note
All the commands are run as a regular user that has sudo privileges. The commands are all assumed to be run from a single directory, which by default could be the user’s home directory. If different, the required current directory is indicated in [] before the shell prompt $.

Ensure that etcd is running on a node in the network and is reachable from the node you’re about to install tendrl-api on. Note it’s address and port. In most development setups, both etcd and tendrl-api would reside on the same host.

  1. Install the build toolchain.

    $ sudo yum groupinstall 'Development Tools'
  2. Install Ruby 2.0.0p598.

    $ sudo yum install ruby ruby-devel rubygem-bundler
  1. Clone tendrl-api.

    $ git clone https://github.com/Tendrl/tendrl-api.git
  2. Install the gem dependencies, either..

    $ cd tendrl-api
    1. everything,

      [tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin
    2. OR development setup only,

      [tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin \
                     --without production
    3. OR production setup only.

      [tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin \
                     --without development test documentation
Note
Using binstubs allows any of the executables to be executed directly from vendor/bin, instead of via bundle exec.

To configure the etcd connection information, copy the sample configuration file to the appropriate location and make the necessary changes based on your etcd configuration, as discussed in the Deployment Requirements section.

[tendrl-api] $ cp config/etcd.sample.yml config/etcd.yml
Note
All the commands below are assumed to be run from inside the git checkout directory.
  1. Tendrl Definitions:

    The API needs the proper Tendrl definitions yaml file to generate the attributes and actions. You can either download it or use the one from the fixtures to explore the API.

    [tendrl-api] $ cp spec/fixtures/sds/tendrl_definitions_gluster-3.8.3.yaml \
                   config/sds/tendrl_definitions_gluster-3.8.3.yaml
  2. Seed the etcd instance (optional):

    The script will seed the etcd instance with mock cluster data and print a cluster uuid which can be used to make API requests.

    [tendrl-api] $ vendor/bin/rake etcd:seed # Seed the local store with cluster
  3. Start the development server:

    This server will reload itself when any of the source files are updated.

    [tendrl-api] $ vendor/bin/shotgun
    Note
    This makes the development server to be queryable on localhost:9393 by default. Check vendor/bin/shotgun --help to change the ip:port binding.

The test environment does not need the local etcd instance to run the tests.

[tendrl-api] $ vendor/bin/rspec

Binding to port 80 requires root permissions. However, tendrl-api runs as a normal user. In order to make the application available on port 80, apache needs to be installed and configured.

  1. Install apache

    $ sudo yum install httpd
  2. Copy over the sample configuration file and validate it’s syntax.

    Important
    Update the file for your specific host details. The file is commented to point out the suggested changes. The file is configured to connect to the tendrl-api application server on port 9292.
    Important
    Running behind apache makes the API available at http://<hostname>:80/api/. Client applications' (including tendrl frontend’s) configuration needs to be updated to make all API queries behind this endpoint.
    [tendrl-api] $ sudo cp config/apache.vhost.sample \
                   /etc/httpd/conf.d/tendrl.conf
    $ sudo apachectl configtest
  3. Update the SELinux configuration to allow apache to make connections.

    $ sudo setsebool -P httpd_can_network_connect 1
  4. Run the application via the production server puma, daemonised, listening on port 9292.

    [tendrl-api] $ vendor/bin/puma -e development -d
    Note
    It is possible to run both the development and the production servers at the same time, with the production server behind apache. While the production server puma runs, by default, on port 9292; the development server shotgun listens on port 9393.
  5. Start apache.

    $ sudo systemctl start httpd.service

documentation's People

Contributors

anupnivargi avatar brainfunked avatar cloudbehl avatar gowthamshanmugam avatar julienlim avatar nthomas-redhat avatar r0h4n avatar sankarshanmukhopadhyay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

documentation's Issues

Consistency

I note on Tendrl Core Components that there are multiple "Tendrl Application Instance" entities. When they are handling incoming requests that modify the same storage resources, how is consistency guaranteed?

Is there a requirement that the Central Store provides some level of transactional access across different pieces of data?

A worked example would be useful.

Security

What is the security model?

  • How does Tendrl establish mutual trust with nodes?
  • How are users authenticated and user permissions managed?
  • What crypto is used?

Do Tendrl have mailing list and irc channel?

I have some query regarding Tendrl(node_agent) and I am not able to reach out to you people can some please let me know your mailing list and Irc channel. So I can contact you people easily.

Sorry for creating this issue but I was not finding any other channel to ask this question.

Thanks

API overview doc : Put vs Patch

While going through the document, observed that 'PATCH' is not present in the supported HTTP verbs list. In some cases(like editing only certain parameters of a resources ) , I feel that PATCH can be used over PUT as you it is generally used for replacing a resource.

Is there any specific reason why PUT is proposed for update operations?

Creating a definition of "done"

As the project moves through the stages of completing chunks of tasks and making features available, it would be good to have a baseline agreement on what is a definition of "done". Such a definition can be applicable at various instances - for a feature; for a story/epic or, for a release. The following lists the basic items in a check-list which can be used to arrive at the state of 'done'. This is not intended to be a static list; it is expected that it will evolve over time being driven by the needs of the project organization, focus and direction.

The definition could be expected to implement the values necessary to enhance the quality of the software produced, but may not assert a functional value of that.

Definition of Done (DoD)

A listed item will be considered as done if:

  • code, including unit tests, is merged upstream (in mainline+branch, as appropriate)
  • feature has passed all integration gates
  • feature is available as a packaged bundle
  • test plan is completed; documented; reviewed; signed off
  • test plan is implemented in CI or, automation
  • feature is integrated with install, tear-down or, CRUD as appropriate
  • feature passes automated tests/tests in test plan
  • documentation/notes for the feature in functionality is available to content author
  • no major, critical defects including data loss inducing ones have been introduced
  • performance regression has not occurred, if applicable
  • a completed feature is accompanied with a recorded demo

Give details about services, languages, protocols

Looking at the diagram on Tendrl Core Components:

  • Is Central Store etcd?
  • Is Tendrl Application Instance a Go binary?
  • Are the arrows between big boxes etcd clients?
  • What protocol are the arrows within nodes?

[Monitoring] Identify the list of physical resources to be monitored

Skyring monitored the following physical resources:

Utilizations:

  • CPU Utilization
  • Memory Utilization
  • Network Utilization
  • inode Usage
  • Disk iops
  • Swap Utilization
  • Mount point utilization
  • Network Latency
  • Network throughput

Availability:

  • Node availability
  • Collectd availability

In addition to the above availability, we could probably include monitoring of the following processes:

  • node-agent
  • sds-bridge

Does the above list suffice or do we need to add more resources to the list

Document the design for import cluster workflow

Document the flow how import would get triggered from tendrl. The below steps should be detailed -

  • How the existing cluster created using CLI gets known/discovered by tendrl
  • What is procedure to trigger the import cluster flow (like select a bootstrap node and then get the whole details in tendrl central store)
  • What details to be pulled and how to be maintained in tendrl central store
  • How to make sure any changes from command line to the underlying cluster gets refelected in tendrl seamlessly

Tendrl App

  1. Language used for implementation?
  2. HTTP server used?
  3. API spec?
  4. How the job queue is constructed?
  5. Planning to use any web framework?

Document the replace storage device workflow in tendrl

If a storage device goes faulty (which is supposed to happen in real world), we need to clearly have workflows defined how a new device could be brought in the system and replaced for the faulty one

  • This needs to take care of movement of data new device coming up into picture
  • Then slowly phasing out the old faulty device
  • Finally bring down the faulty device and remove from the underlying cluster

This flow looks simpler but involves lot of technicalities in ceph and gluster and its a risky stuff and need to be done very carefully, so flow and steps involved should be well thought through and implemented.

Travis CI setup for Tendrl/documentation repository is broken

Travis CI setup for Tendrl/documentation repository is broken so that checks run for every pull requests are failed. Based on the work flow drafted by @r0h4n (every repository of Tendrl group should have Travis CI configured, without any exception), we would need to fix the travis setup for this repo. I guess that it would mean running asciidoctor during the check.

HTTP verbs

It seems that it is not clear how to use HTTP verbs in API. Now API uses HTTP verbs with API function names with verbs. For example in https://github.com/anivargi/documentation/blob/c26ccba15091efd70de41c54929f5f710626b8ec/api/ceph-pool-examples.adoc
is used HTTP verb DELETE and also Delete is used in function name. I think we should pick one way how to provide action:

  1. use HTTP verbs
  2. use POST and verb in function name

Also this should be cleared in https://github.com/Tendrl/documentation/blob/master/api/overview.adoc

Here you can find example of usage of HTTP verbs: https://github.com/ManageIQ/manageiq_docs/blob/master/api/overview/http.adoc#methods-and-related-urls

[Monitoring] Monitoring Architecture Details

Tendrl documentation doesn't provide information about the monitoring architecture and how the tools employed for #39, #25 would fit into the tendrl architecture. Details about approach employed for handling thresholds and sending notifications needs to be added

Scaling

How does Tendrl scale to managing systems with many thousands of nodes?

  • What is the ballpark number of Central Store transactions per second per node or per OSD?
  • What is the ballpark transactions per second limit of the Central Store?
  • How does the system behave if nodes are sending updates faster than the store can persist them?
  • On a busy system, what takes priority, writes from the node side or queries from the UI side?

Tendrl HA

Need to have a clarity on how we are planning to achieve HA in Tendrl

Details about the clustering of central store.

  1. Where will the central store be located ? all the instances on same machine or will it span across multiple machines ?
  2. Considering this to be a distributed data store, how many instances are we planning to have ? Will the number of instance be increased based on the number of nodes managed by Tendrl ?
  3. how difficult/easy is it to deploy the central store, is it just package installation ? if not, which component will take care of deploying this ?

Tendrl Build Process

There should be a document to understand the build process exactly. It should have details about build details, jenkins ci, fedora copr details and package availability details and also like what our jenkins does and what copr does and how both work together.

Detailed explanation for tendrl agent on storage nodes

Looking at the architecture diagram still not very clear if we still are planning to use collectd or some new tool to take care of various time series data collections and push to time series DB.

Also not clear on what time series DB and monitoring system as such suggested / planned in the system.

need more detail on etcd configuration

we should include something like this in https://github.com/Tendrl/documentation/blob/master/deployment.adoc

You will need to edit the following options in /etc/etcd/etcd.conf and replace localhost with your node's IP address (10.0.2.15 in this example)
ETCD_LISTEN_PEER_URLS="http://10.0.2.15:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.2.15:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.2.15:2380"
ETCD_INITIAL_CLUSTER="default=http://10.0.2.15:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.2.15:2379"

Document the shrink cluster workflow in tendrl

This is something which might be required if cluster storage is under-used and admin wants to take out few devices and use them for different purpose. Its critical to have clear flow defined and implemented as it needs data movement to the left out devices and there should not be any data loss during this process. Also the inventory structure gets changed maintained with tendrl.

Glosary clarification: translator

Tendrl Architectural Guidelines mentions a concept of translator, but the Glossary file doesn't mention it at all.

I would suggest to either include description of translator into Glossary, or provide the comparison between translator and (node) agent in the description of node agent.

Moreover there is a conflict with translator of GlusterFS terminology, so if this terndrl translator is not well established yet, I would suggest to rename it to avoid confusion. That said, it's a general term so maybe I'm wrong and glossary description would be good enough.

Preset criteria and guidelines for the frontend

With respect to the component diagram, the frontend part needs expansion.

This is to enable the frontend team to make better choices to build the solution. Some high level criteria will help us align with the direction that we're all headed.

For example, limitations regarding:

  • Dependencies
  • Licenses
  • Packaging
  • Deployment
  • Platforms
  • User agent etc.

Approach for gluster brick creation from disks.

This issue is to arrive at a suitable approach to be taken for creating gluster bricks form the disks. To achieve that few of the available tools like pyudev, blivet, pyudevDAG, storaged and udisks will be investigated and most suitable tool will be picked. Also the current approach used by Ovirt project to create bricks will be investigated and the positive points will be considered.

Initial user creation

In documentation should be described how to create or set up an initial user (superuser) so that Tendrl can be used with authentication.

Analyze feasibility of ceph-mgr in tendrl and parity with calamari

We understand that ceph-mgr might not be available readily for consumption in tendrl. But need to analyze the feasibility of using ceph-mgr in tendrl stack.
The items to be checked -

  1. How easy/tough it is to setup ceph-mgr
  2. What is the deployment mechanism (whether on all the mons or on leader mon only) for ceph-mgr
  3. What is parity of features from calamari
  4. How feasible to integrate with tendrl-core

[Monitoring] Identify the list of ceph resources to be monitored

Skyring monitored the following ceph resources:

Utilizations:

  • RBD Utilization
  • OSD utilization
  • Pool Utilization
  • Cluster Utilization

Availabilities:

  • Pool add/delete
  • cluster health
  • mon availability
  • rbd add/delete

Does the above list suffice or do we need to add more resources to the list

[Monitoring] Identify the list of gluster logical resources to be monitored

During skyring planning the below list was identified as the list of gluster logical resources to be monitored:

Utilization:

  • Cluster Utilization
  • Volume utilization
  • Thin Pool Data Utilization
  • Thin Pool Meta Data Utilization
  • Brick Utilization

Availability:

  • Cluster Health
  • Volume Status

Does the above list suffice or do we need to add more resources to the list

Document the major workflows designs

To make it clear how the work flows look and would work, document the flows and their design for below

  • Create cluster
    This is important because we should be very clear on how storage nodes are discovered and accepted in tendrl. One storage node entities are known to tendrl how its going to frame ansible requests based on user's selection of nodes and disks for creation of bricks/osds etc
  • Import cluster
    I understand its pretty straight forward but we should have document clearly explaining how details are pushed to tendrl central store
  • Create volume/pool (other CRUD would operations would follow)
    This is a critical workflow as different SDS systems would have different set of parameters related to quota, whether is erasure coded etc. To be specific gluster volumes have got more types compared to ceph for example. It should be well documented for clarity purpose so that development time less effort goes in rather finalizing the data model etc
  • Expand cluster (by adding bricks/osds)
    There are peculiarities involved while expansion of clusters. For example in gluster we can expand a volume by adding more disks whereas in ceph we can expand the whole cluster by adding more OSDs and pools share among themselves as PGs are re-calculated and data movement happens to make it balanced again
  • Shrink cluster
    This is something which is critical from customer's point of view and need clear guidelines and set of steps followed very carefully both in gluster as well ceph because it involes movement of data to existing bricks/osds and phasing out the non performing ones
  • Replace brick/osd
    This is very much required because hardware is prone to failures and admin do need replacements of bricks/osd without a downtime (or less impact on running cluster)

Document the design for create volume/pool workflow in tendrl

This is critical to tendrl as different SDS systems would have different set of parameters required and flows involved for creation of actual underlyting storage entities. The flow should be detailed with all the involved components and how underlying storage is carved out and tendrl gets the details of full inventory in central store.

tendrl-alerting documentation

tendrl-alerting packages are available. Tendrl Package Installation Reference on wiki of Tendrl/documentation should be updated. There should be documented how to install and configure tendrl-alerting and on which servers it should run.

Atomicity of operations executed by Tendrl

How to ensure the atomicity of critical operations? Suppose there is an cluster creation operation in progress with a number of storage nodes, its not desirable that another user attempting a similar operation with same set of nodes. How to prevent this? Does the central store provides any mechanism? or Tendrl supposed to have some lock framework implemented?

Document the design for expand cluster workflow

The existing cluster's storage capacity could be increased by getting more new storage devices in to existing cluster nodes or new nodes itself with set of storage devices.

Document should clearly articulate the flow and like -

  • How the additional hardware (be it new device in existing node or a fresh node itself) is discivered in tendrl
  • How admin can select and decide what to be done with the new devices (storage profiles etc to be decided or in case of gluster say where this disk fits and a brick gets added)
  • Trigger the expansion of cluster with new set of devices and async task start

Time series data

Tendrl Core Components mentions that the Central Store handles "Current time series data needed for automated conditional triggers." which I take to mean a specialised subset of the overall time series data.

Where does the rest of the time series data go?

Object model for tendrl central store

It is not clear if there would be a generic data model followed by all the SDS systems to populate data in central store or SDS specific objects as is would be maintained. If as is SDS specific objects are to be maintained, does it mean that tendrl App would have SDS system specific logic written to process and return as output of REST GET calls.

It would get more clearer if data models are published here and clearly explained if a generic model is suggested or a SDS specific and if SDS specific how the App tackles them.

Document the design for create cluster workflow in tendrl

Add documentation with details about create cluster workflow regarding what components are involved and how the data flows. This is important to tendrl as discovery of storage nodes, acceptance (bootstrapping) of storage nodes with full entity details like devices, CPUs etc is at core before a cluster creation can start. The points to keep in mind while design are

  • How the storage nodes and if full inventory structure is discovered and known to tendrl
  • How underlying storage layout is framed for ansible to submit to backend
  • How public and cluster networks segregation is taken care
  • Storage profiles consideration (applicable to ceph only may be)
  • Async task framework and reporting details of specific steps to client code
  • Storage node provisioning (installation of all required bits)
  • Cluster provisioning (forming the full cluster layout at backend)

These are just few points and may be detailed while design detailing done.

Clarification of component diagram in *Deploying Tendrl*

In Deploying Tendrl document there is a diagram with Tendrl Core Components, which presents a general overview and doesn't try to convey particular deployment scenario.

The problem is that I'm not sure if I understand a relation between Non-HA Layout and the Tendrl Core Components diagram:

  • Storage Node from the Non-HA Layout matches with Storage Node from the diagram, that's clear
  • It seems that Tendrl Master machine (from the Non-HA Layout) contain all other components from the diagram, including Central Store, Tendrl Application and UI, do I read it right?
  • It seems that Central Store is provided by etcd, right?

If answers the these questions are yes, I would suggest to improve the description of the Layout so that the relation is absolutely clear (eg. there is no clear statement that Central Store is provided by etcd anywhere including Glossary and Components Overview documents). Suggested improvements include:

  • update the diagram to make it clear what box is a machine, group of machines or a component.
  • add/update description of Tendrl Master and Storage Node so that the same terminology is used and relation between a particular layout and the general component overview is clear

On the other hand, if any answer to the questions above is no, more work would be needed to make the document clear.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.