tendrl / documentation Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 20.0 308 KB

Project-wide documentation

License: Other

Ruby 100.00%

documentation's Introduction

Tendrl API

Table of Contents

Build Status
Installation from Source on CentOS 7
Development Environment
Test Environment
Running on Port 80

Installation from Source on CentOS 7

Note	All the commands are run as a regular user that has `sudo` privileges. The commands are all assumed to be run from a single directory, which by default could be the user’s home directory. If different, the required current directory is indicated in `[]` before the shell prompt `$`.

Ensure that etcd is running on a node in the network and is reachable from the node you’re about to install tendrl-api on. Note it’s address and port. In most development setups, both etcd and tendrl-api would reside on the same host.

System Setup

Install the build toolchain.

$ sudo yum groupinstall 'Development Tools'

Install Ruby 2.0.0p598.

$ sudo yum install ruby ruby-devel rubygem-bundler

Install tendrl-api

Clone tendrl-api.

$ git clone https://github.com/Tendrl/tendrl-api.git

Install the gem dependencies, either..

$ cd tendrl-api

everything,

[tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin

OR development setup only,

[tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin \
               --without production

OR production setup only.

[tendrl-api] $ bundle install --path vendor/bundle --binstubs vendor/bin \
               --without development test documentation

Note	Using binstubs allows any of the executables to be executed directly from `vendor/bin`, instead of via `bundle exec`.

Configuration

To configure the etcd connection information, copy the sample configuration file to the appropriate location and make the necessary changes based on your etcd configuration, as discussed in the Deployment Requirements section.

[tendrl-api] $ cp config/etcd.sample.yml config/etcd.yml

Development Environment

Note	All the commands below are assumed to be run from inside the git checkout directory.

Tendrl Definitions:

The API needs the proper Tendrl definitions yaml file to generate the attributes and actions. You can either download it or use the one from the fixtures to explore the API.
```
[tendrl-api] $ cp spec/fixtures/sds/tendrl_definitions_gluster-3.8.3.yaml \
               config/sds/tendrl_definitions_gluster-3.8.3.yaml
```
Seed the etcd instance (optional):

The script will seed the etcd instance with mock cluster data and print a cluster uuid which can be used to make API requests.
```
[tendrl-api] $ vendor/bin/rake etcd:seed # Seed the local store with cluster
```
Start the development server:

This server will reload itself when any of the source files are updated.
```
[tendrl-api] $ vendor/bin/shotgun
```
Note
This makes the development server to be queryable on localhost:9393 by default. Check vendor/bin/shotgun --help to change the ip:port binding.

Test Environment

The test environment does not need the local etcd instance to run the tests.

[tendrl-api] $ vendor/bin/rspec

Running on Port 80

Binding to port 80 requires root permissions. However, tendrl-api runs as a normal user. In order to make the application available on port 80, apache needs to be installed and configured.

Install apache
```
$ sudo yum install httpd
```

Copy over the sample configuration file and validate it’s syntax.

Important

Update the file for your specific host details. The file is commented to point out the suggested changes. The file is configured to connect to the tendrl-api application server on port 9292.

Important

Running behind apache makes the API available at http://<hostname>:80/api/. Client applications' (including tendrl frontend’s) configuration needs to be updated to make all API queries behind this endpoint.

[tendrl-api] $ sudo cp config/apache.vhost.sample \
               /etc/httpd/conf.d/tendrl.conf
$ sudo apachectl configtest

Update the SELinux configuration to allow apache to make connections.
```
$ sudo setsebool -P httpd_can_network_connect 1
```

Run the application via the production server puma, daemonised, listening on port 9292.

[tendrl-api] $ vendor/bin/puma -e development -d

Note	It is possible to run both the development and the production servers at the same time, with the production server behind apache. While the production server `puma` runs, by default, on port 9292; the development server `shotgun` listens on port 9393.

Start apache.
```
$ sudo systemctl start httpd.service
```

documentation's People

Contributors

Stargazers

Watchers

documentation's Issues

Consistency

I note on Tendrl Core Components that there are multiple "Tendrl Application Instance" entities. When they are handling incoming requests that modify the same storage resources, how is consistency guaranteed?

Is there a requirement that the Central Store provides some level of transactional access across different pieces of data?

A worked example would be useful.

Security

What is the security model?

How does Tendrl establish mutual trust with nodes?
How are users authenticated and user permissions managed?
What crypto is used?

Provisioning Documentation

Document the current ideas around the provisioning layers, flows, components involved etc.

Do Tendrl have mailing list and irc channel?

I have some query regarding Tendrl(node_agent) and I am not able to reach out to you people can some please let me know your mailing list and Irc channel. So I can contact you people easily.

Sorry for creating this issue but I was not finding any other channel to ask this question.

Thanks

Documentation for ceph cluster creation

Document for create ceph cluster manually which helps the developer to implement ceph cluster creation part and also verify import ceph cluster part.

API overview doc : Put vs Patch

While going through the document, observed that 'PATCH' is not present in the supported HTTP verbs list. In some cases(like editing only certain parameters of a resources ) , I feel that PATCH can be used over PUT as you it is generally used for replacing a resource.

Is there any specific reason why PUT is proposed for update operations?

Tendrl development workflow outdated

As per https://www.redhat.com/archives/tendrl-devel/2016-November/msg00101.html

This needs to be updated : https://github.com/Tendrl/documentation/blob/master/development-workflow.adoc

Creating a definition of "done"

As the project moves through the stages of completing chunks of tasks and making features available, it would be good to have a baseline agreement on what is a definition of "done". Such a definition can be applicable at various instances - for a feature; for a story/epic or, for a release. The following lists the basic items in a check-list which can be used to arrive at the state of 'done'. This is not intended to be a static list; it is expected that it will evolve over time being driven by the needs of the project organization, focus and direction.

The definition could be expected to implement the values necessary to enhance the quality of the software produced, but may not assert a functional value of that.

Definition of Done (DoD)

A listed item will be considered as done if:

code, including unit tests, is merged upstream (in mainline+branch, as appropriate)
feature has passed all integration gates
feature is available as a packaged bundle
test plan is completed; documented; reviewed; signed off
test plan is implemented in CI or, automation
feature is integrated with install, tear-down or, CRUD as appropriate
feature passes automated tests/tests in test plan
documentation/notes for the feature in functionality is available to content author
no major, critical defects including data loss inducing ones have been introduced
performance regression has not occurred, if applicable
a completed feature is accompanied with a recorded demo

Give details about services, languages, protocols

Looking at the diagram on Tendrl Core Components:

Is Central Store etcd?
Is Tendrl Application Instance a Go binary?
Are the arrows between big boxes etcd clients?
What protocol are the arrows within nodes?

Unable to find appropriate IRC details for Tendrl

Being a new contributor, I tried searching for a community communication methods, but found no IRC channels mentioned, nor could i find mailing list info.

User Management in Tendrl

Local
- AD/LDAP/IPA integration

Need to document the high level plan.

[Monitoring] Identify the list of physical resources to be monitored

Skyring monitored the following physical resources:

Utilizations:

CPU Utilization
Memory Utilization
Network Utilization
inode Usage
Disk iops
Swap Utilization
Mount point utilization
Network Latency
Network throughput

Availability:

Node availability
Collectd availability

In addition to the above availability, we could probably include monitoring of the following processes:

node-agent
sds-bridge

Does the above list suffice or do we need to add more resources to the list

Initial API Overview document with example.

Create a initial API documentation which provides the overview of how the API will work and its usage.

Create a example Gluster create volume API.

Document the design for import cluster workflow

Document the flow how import would get triggered from tendrl. The below steps should be detailed -

How the existing cluster created using CLI gets known/discovered by tendrl
What is procedure to trigger the import cluster flow (like select a bootstrap node and then get the whole details in tendrl central store)
What details to be pulled and how to be maintained in tendrl central store
How to make sure any changes from command line to the underlying cluster gets refelected in tendrl seamlessly

Tendrl App

Language used for implementation?
HTTP server used?
API spec?
How the job queue is constructed?
Planning to use any web framework?

Document the development environment setup

I feel this discussed earlier but creating an Issue tracker for the same here.

Document the replace storage device workflow in tendrl

If a storage device goes faulty (which is supposed to happen in real world), we need to clearly have workflows defined how a new device could be brought in the system and replaced for the faulty one

This needs to take care of movement of data new device coming up into picture
Then slowly phasing out the old faulty device
Finally bring down the faulty device and remove from the underlying cluster

This flow looks simpler but involves lot of technicalities in ceph and gluster and its a risky stuff and need to be done very carefully, so flow and steps involved should be well thought through and implemented.

Travis CI setup for Tendrl/documentation repository is broken

Travis CI setup for Tendrl/documentation repository is broken so that checks run for every pull requests are failed. Based on the work flow drafted by @r0h4n (every repository of Tendrl group should have Travis CI configured, without any exception), we would need to fix the travis setup for this repo. I guess that it would mean running asciidoctor during the check.

HTTP verbs

It seems that it is not clear how to use HTTP verbs in API. Now API uses HTTP verbs with API function names with verbs. For example in https://github.com/anivargi/documentation/blob/c26ccba15091efd70de41c54929f5f710626b8ec/api/ceph-pool-examples.adoc
is used HTTP verb DELETE and also Delete is used in function name. I think we should pick one way how to provide action:

use HTTP verbs
use POST and verb in function name

Also this should be cleared in https://github.com/Tendrl/documentation/blob/master/api/overview.adoc

Here you can find example of usage of HTTP verbs: https://github.com/ManageIQ/manageiq_docs/blob/master/api/overview/http.adoc#methods-and-related-urls

[Monitoring] Monitoring Architecture Details

Tendrl documentation doesn't provide information about the monitoring architecture and how the tools employed for #39, #25 would fit into the tendrl architecture. Details about approach employed for handling thresholds and sending notifications needs to be added

Scaling

How does Tendrl scale to managing systems with many thousands of nodes?

What is the ballpark number of Central Store transactions per second per node or per OSD?
What is the ballpark transactions per second limit of the Central Store?
How does the system behave if nodes are sending updates faster than the store can persist them?
On a busy system, what takes priority, writes from the node side or queries from the UI side?

Tendrl HA

Need to have a clarity on how we are planning to achieve HA in Tendrl

Details about the clustering of central store.

Where will the central store be located ? all the instances on same machine or will it span across multiple machines ?
Considering this to be a distributed data store, how many instances are we planning to have ? Will the number of instance be increased based on the number of nodes managed by Tendrl ?
how difficult/easy is it to deploy the central store, is it just package installation ? if not, which component will take care of deploying this ?

Tendrl Build Process

There should be a document to understand the build process exactly. It should have details about build details, jenkins ci, fedora copr details and package availability details and also like what our jenkins does and what copr does and how both work together.

Asynchronous task management in Tendrl

How the asynchronous tasks are created and managed? Does the etcd job queues help here? More details are required.

Package installation reference sequence of steps need to be updated

https://github.com/Tendrl/documentation/wiki/Tendrl-Package-Installation-Reference

Detailed explanation for tendrl agent on storage nodes

Looking at the architecture diagram still not very clear if we still are planning to use collectd or some new tool to take care of various time series data collections and push to time series DB.

Also not clear on what time series DB and monitoring system as such suggested / planned in the system.

need more detail on etcd configuration

we should include something like this in https://github.com/Tendrl/documentation/blob/master/deployment.adoc

You will need to edit the following options in /etc/etcd/etcd.conf and replace localhost with your node's IP address (10.0.2.15 in this example)
ETCD_LISTEN_PEER_URLS="http://10.0.2.15:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.2.15:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.2.15:2380"
ETCD_INITIAL_CLUSTER="default=http://10.0.2.15:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.2.15:2379"

Tendrl package(rpm) installation and setup instructions

Create a doc which talks about installation and setup instructions for different components

Document the shrink cluster workflow in tendrl

This is something which might be required if cluster storage is under-used and admin wants to take out few devices and use them for different purpose. Its critical to have clear flow defined and implemented as it needs data movement to the left out devices and there should not be any data loss during this process. Also the inventory structure gets changed maintained with tendrl.

[Monitoring] Data collection mechanism

Tendrl documentation doesn't provide details about the tool that will be employed for performance data collection.

Glosary clarification: translator

Tendrl Architectural Guidelines mentions a concept of translator, but the Glossary file doesn't mention it at all.

I would suggest to either include description of translator into Glossary, or provide the comparison between translator and (node) agent in the description of node agent.

Moreover there is a conflict with translator of GlusterFS terminology, so if this terndrl translator is not well established yet, I would suggest to rename it to avoid confusion. That said, it's a general term so maybe I'm wrong and glossary description would be good enough.

Document the different vim/editor specific configurations for development environment

If there are specific guidelines for using editor like Atom/vim, better publish the configurations needed to be done with examples.

Preset criteria and guidelines for the frontend

With respect to the component diagram, the frontend part needs expansion.

This is to enable the frontend team to make better choices to build the solution. Some high level criteria will help us align with the direction that we're all headed.

For example, limitations regarding:

Dependencies
Licenses
Packaging
Deployment
Platforms
User agent etc.

Approach for gluster brick creation from disks.

This issue is to arrive at a suitable approach to be taken for creating gluster bricks form the disks. To achieve that few of the available tools like pyudev, blivet, pyudevDAG, storaged and udisks will be investigated and most suitable tool will be picked. Also the current approach used by Ovirt project to create bricks will be investigated and the positive points will be considered.

Initial user creation

In documentation should be described how to create or set up an initial user (superuser) so that Tendrl can be used with authentication.

Incompatibility of documentation and API

Request URLs for gluster volume creation via API are not same as URLs specified in documentation [1].
For example currently works
POST /:cluster_id/GlusterCreateVolume
instead of
POST /cluster/:cluster_id/volume/create
Json data that is sent to API in POST requests is also different.
Either the documentation or API implementation should be corrected.

[1] https://github.com/Tendrl/documentation/blob/master/api/gluster-create-volume-example.adoc

Analyze feasibility of ceph-mgr in tendrl and parity with calamari

We understand that ceph-mgr might not be available readily for consumption in tendrl. But need to analyze the feasibility of using ceph-mgr in tendrl stack.
The items to be checked -

How easy/tough it is to setup ceph-mgr
What is the deployment mechanism (whether on all the mons or on leader mon only) for ceph-mgr
What is parity of features from calamari
How feasible to integrate with tendrl-core

[Monitoring] Identify the list of ceph resources to be monitored

Skyring monitored the following ceph resources:

Utilizations:

RBD Utilization
OSD utilization
Pool Utilization
Cluster Utilization

Availabilities:

Pool add/delete
cluster health
mon availability
rbd add/delete

Does the above list suffice or do we need to add more resources to the list

[Monitoring] Identify the list of gluster logical resources to be monitored

During skyring planning the below list was identified as the list of gluster logical resources to be monitored:

Utilization:

Cluster Utilization
Volume utilization
Thin Pool Data Utilization
Thin Pool Meta Data Utilization
Brick Utilization

Availability:

Cluster Health
Volume Status

Does the above list suffice or do we need to add more resources to the list

Document the major workflows designs

To make it clear how the work flows look and would work, document the flows and their design for below

Create cluster
This is important because we should be very clear on how storage nodes are discovered and accepted in tendrl. One storage node entities are known to tendrl how its going to frame ansible requests based on user's selection of nodes and disks for creation of bricks/osds etc
Import cluster
I understand its pretty straight forward but we should have document clearly explaining how details are pushed to tendrl central store
Create volume/pool (other CRUD would operations would follow)
This is a critical workflow as different SDS systems would have different set of parameters related to quota, whether is erasure coded etc. To be specific gluster volumes have got more types compared to ceph for example. It should be well documented for clarity purpose so that development time less effort goes in rather finalizing the data model etc
Expand cluster (by adding bricks/osds)
There are peculiarities involved while expansion of clusters. For example in gluster we can expand a volume by adding more disks whereas in ceph we can expand the whole cluster by adding more OSDs and pools share among themselves as PGs are re-calculated and data movement happens to make it balanced again
Shrink cluster
This is something which is critical from customer's point of view and need clear guidelines and set of steps followed very carefully both in gluster as well ceph because it involes movement of data to existing bricks/osds and phasing out the non performing ones
Replace brick/osd
This is very much required because hardware is prone to failures and admin do need replacements of bricks/osd without a downtime (or less impact on running cluster)

Document the design for create volume/pool workflow in tendrl

This is critical to tendrl as different SDS systems would have different set of parameters required and flows involved for creation of actual underlyting storage entities. The flow should be detailed with all the involved components and how underlying storage is carved out and tendrl gets the details of full inventory in central store.

tendrl-alerting documentation

tendrl-alerting packages are available. Tendrl Package Installation Reference on wiki of Tendrl/documentation should be updated. There should be documented how to install and configure tendrl-alerting and on which servers it should run.

Documentation for time series database

In https://github.com/Tendrl/performance-monitoring/blob/32bfd131150833d840e67b7904ad1721eb656667/doc/source/installation.rst is written that user should specify time_series_db_server and time_series_db_port in performance-monitoring.conf.yaml config file but there is no further mention about what it is. In documentation should be specified what is meant as a time series database for Tendrl (graphite).

Atomicity of operations executed by Tendrl

How to ensure the atomicity of critical operations? Suppose there is an cluster creation operation in progress with a number of storage nodes, its not desirable that another user attempting a similar operation with same set of nodes. How to prevent this? Does the central store provides any mechanism? or Tendrl supposed to have some lock framework implemented?

Document the design for expand cluster workflow

The existing cluster's storage capacity could be increased by getting more new storage devices in to existing cluster nodes or new nodes itself with set of storage devices.

Document should clearly articulate the flow and like -

How the additional hardware (be it new device in existing node or a fresh node itself) is discivered in tendrl
How admin can select and decide what to be done with the new devices (storage profiles etc to be decided or in case of gluster say where this disk fits and a brick gets added)
Trigger the expansion of cluster with new set of devices and async task start

Time series data

Tendrl Core Components mentions that the Central Store handles "Current time series data needed for automated conditional triggers." which I take to mean a specialised subset of the overall time series data.

Where does the rest of the time series data go?

Object model for tendrl central store

It is not clear if there would be a generic data model followed by all the SDS systems to populate data in central store or SDS specific objects as is would be maintained. If as is SDS specific objects are to be maintained, does it mean that tendrl App would have SDS system specific logic written to process and return as output of REST GET calls.

It would get more clearer if data models are published here and clearly explained if a generic model is suggested or a SDS specific and if SDS specific how the App tackles them.

Document the design for create cluster workflow in tendrl

Add documentation with details about create cluster workflow regarding what components are involved and how the data flows. This is important to tendrl as discovery of storage nodes, acceptance (bootstrapping) of storage nodes with full entity details like devices, CPUs etc is at core before a cluster creation can start. The points to keep in mind while design are

How the storage nodes and if full inventory structure is discovered and known to tendrl
How underlying storage layout is framed for ansible to submit to backend
How public and cluster networks segregation is taken care
Storage profiles consideration (applicable to ceph only may be)
Async task framework and reporting details of specific steps to client code
Storage node provisioning (installation of all required bits)
Cluster provisioning (forming the full cluster layout at backend)

These are just few points and may be detailed while design detailing done.

Clarification of component diagram in Deploying Tendrl

In Deploying Tendrl document there is a diagram with Tendrl Core Components, which presents a general overview and doesn't try to convey particular deployment scenario.

The problem is that I'm not sure if I understand a relation between Non-HA Layout and the Tendrl Core Components diagram:

Storage Node from the Non-HA Layout matches with Storage Node from the diagram, that's clear
It seems that Tendrl Master machine (from the Non-HA Layout) contain all other components from the diagram, including Central Store, Tendrl Application and UI, do I read it right?
It seems that Central Store is provided by etcd, right?

If answers the these questions are yes, I would suggest to improve the description of the Layout so that the relation is absolutely clear (eg. there is no clear statement that Central Store is provided by etcd anywhere including Glossary and Components Overview documents). Suggested improvements include:

update the diagram to make it clear what box is a machine, group of machines or a component.
add/update description of Tendrl Master and Storage Node so that the same terminology is used and relation between a particular layout and the general component overview is clear

On the other hand, if any answer to the questions above is no, more work would be needed to make the document clear.

tendrl / documentation Goto Github PK

documentation's Introduction

Tendrl API

documentation's People

Contributors

Stargazers

Watchers

Forkers

documentation's Issues

Recommend Projects

Recommend Topics

Recommend Org