Giter Site home page Giter Site logo

dcache / sapphire Goto Github PK

View Code? Open in Web Editor NEW
1.0 7.0 0.0 246 KB

dCache nearline-storage driver implementation for small file aggregation

Home Page: https://dcache.org

License: Other

Java 42.59% Python 57.41%
dcache nearline-storage-plugin hsm-support

sapphire's Introduction

Sapphire Plugin for dCache

1. What is Sapphire?

Sapphire is a dCache-plugin for packing small files into bigger ones intended to improve performance in writing files to tape. Sapphire is divided into two parts: driver and packer. The driver is used directly in dCache while the packer runs separately, usually on a dedicated machine.

Requirements

  • Driver:
    • dCache 7.2.2 or higher with
      • WebDav
      • Frontend
  • Packer:
    • Python 3
      • pymongo 3.11.0 or higher
      • requests
    • MongoDB 4.4 or higher

2. Installation

Installation via RPM

Driver

Take the RPM of Sapphire-Driver and install it on the machine where dCache and the pool, you want to install Sapphire on, is running.

Packer

Install the RPM on the machine that should pack the files.

Installation from source

Driver

First, compile the plugin by running mvn clean package in the folder driver in this repository. This will create a new directory, called target where a tarball can be found. This needs to be unpacked to your dCache plugin directory, usually /usr/share/dcache/plugins. Run systemctl daemon-reload afterwards and restart your pool. When you login to your pool via admin shell, the plugin should now appear in the list of hsm show providers.

Packer

To install the packer simply move the three python scripts in packer/src - pack-files.py, verify-container.py and stage-files.py - to /usr/local/bin. Give them the permission to be executed with chmod +x. Also, put container.conf from packer/conf to /etc/dcache. It's possible to have the file in another directory, you'll find more information in the Start section of this documentation.

To use systemd to run the script, the files in packer/service needs to be placed to /etc/systemd/system. Afterward, it is necessary to run systemctl daemon-reload.


3. Preparation and Configuration

General:

Sapphire needs a MongoDB to run correctly. For installation and configuration take a look into the MongoDB documentation. The MongoDB has to be accessible for all machines that are parts of Sapphire. Inside of MongoDB a database is needed. The name can be chosen freely in compliance with the rules MongoDB has itself for naming databases. The database has two collections that need to be configured further, other collections are created by the scripts themselves. Run the following commands via mongo Shell:

db.files.ensureIndex( { ctime: 1 }, { sparse: true } )
db.files.ensureIndex( { pnfsid: 1}, { dropDups: true, unique: true } )
db.stage.ensureIndex( { pnfsid: 1}, { dropDups: true, unique: true } )

Packer:

There's a configuration file that has to be filled: container.conf. The file is located in /etc/dcache/container.conf by default but can be placed somewhere else and be renamed, too. The single parameters in this file are explained in the file itself. It has a mandatory DEFAULT section on the top. Below this DEFAULT section is space to create further sections which are needed for the packing itself. With these sections it's possible to define rules for different directories. The names of the sections can be chosen freely. Please read the chapter about Macaroons in dCache-UserGuide to learn how to get one for the configuration.

Driver and dCache:

On the driver side, dCache has to be prepared to interact with a tertiary storage system. Follow the link to find instructions on how to configure pool(s) to run correctly: https://dcache.org/old/manuals/Book-5.0/config-hsm.shtml#configuring-pools-to-interact-with-a-tertiary-storage-system.


3. Start

Packer:

Manual start

To run the scripts in background, the following commands can be used as root:

nohup /usr/local/bin/pack-files.py > /tmp/pack-files.log &
nohup /usr/local/bin/verify-container.py > /tmp/verify-container.log &
nohup /usr/local/bin/stage-files.py > /tmp/stage-files.log &

If the configuration file is not /etc/dcache/container.conf, the correct path with the filename has to be given as a parameter to the scripts.

Systemd

First, it's important to know that it's not possible to use a configuration file other than /etc/dcache/container.conf when running the scripts with systemd.

Start the scripts with systemctl start pack-files.service, systemctl start verify-container.service and systemctl start stage-files.service.

Driver:

Create an instance of the plugin in your pool via admin shell with

hsm create <instance> <name> sapphire [-key=value]...

Please make sure, <instance> matches the tag hsmInstance of the directory that contains the files to be packed.

The available configuration options:

Name Description required default
database The mongo database name yes -
mongo_url The mongodb connection url yes -
port The port where the plugin should run yes -
whitelist The IP addresses that are allowed to connect to Sapphire yes -
period The period between successive scans of flush queue no 1
period_unit The time unit of period, SECONDS, MINUTES ... no MINUTES
certfile The path to the certificate file for TLS no /etc/dcache/grid-security/hostcert.pem
keyfile The path to the key file for TLS no /etc/dcache/grid-security/hostkey.pem

After successful creation of the hsm instance the packing should work.

sapphire's People

Contributors

kofemann avatar svemeyer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.