instaclustr / icarus Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 5.0 386 KB

Sidecar for Cassandra with integrated backup / restore

Home Page: https://instaclustr.com

License: Apache License 2.0

Java 99.54% Makefile 0.12% Shell 0.33%

cassandra sidecar java rest jmx operations kubernetes ops side car

icarus's Introduction

Instaclustr Icarus

Cloud-ready and Kubernetes-enabled Sidecar for Apache Cassandra

Website: https://www.instaclustr.com/
OpenAPI Spec: https://instaclustr.github.io/instaclustr-icarus-go-client/
Documentation: https://www.instaclustr.com/support/documentation/

This repository is home of a sidecar for Apache Cassandra database. Sidecar is meant to be run alongside of Cassandra instance and talks to Cassandra via JMX.

Sidecar JAR is also integrating Esop for backup and restore seamlessly so it might be used in two modes—either as a long-running server application which responds to clients requests, or as a CLI application via which one can backup/restore a Cassandra node.

Note	Icarus is used in Instaclustr Kubernetes operator as well as in CassKop—Kubernetes operator for Cassandra from Orange where it is used for backup and restore of clusters.

Overview

Icarus is a server-like application which acts as a gateway to the Cassandra node it is logically coupled with. In theory, Icarus process can run wherever, but it needs to talk to a Cassandra node via JMX connection. If you do not want to expose JMX remotely, Icarus connects to 127.0.0.1:7199 by default. The other prerequisite for backup and restore operations is that Icarus has to be able to see the same directory structure as Cassandra does as it uploads/downloads SSTables and so on.

If you have a cluster of, let’s say, 5 nodes, there should be one Icarus server running alongside each node.

Each Icarus is independent and will execute operations only against the particular node it is configured to interact with. However, for backups and restores, we need to have a cluster-wide orchestration layer which deals with the complicated logic of on-the-fly backup and restore for the whole cluster. This mode of operation is called "global". When a backup request is sent to one of sidecars, whichever Icarus does the job will act as a coordinator of that operation.

The job of a coordinator Icarus is to disperse individual requests to each node in a cluster (and to itself as well), and continually pull for results and advance the execution once all nodes finish their phases. Backup and restore only advances to the other phase when all nodes have finished the current phase a coordinator orchestrates. After all phases are passed successfully, the coordinator’s operation transitions to completed state.

Currently, the coordination/orchestration layer is done only for backup and restore operations as they require such behavior inherently. All other operations are, as of now, executed only against one particular sidecar and only that particular Cassandra node it is configured to connect to is affected.

Note

For global operation cases, it is important to realize that there is not any main Icarus; all Icaruses are equal in their role— only if a global request is sent to one particular Icarus during backup or restore will it result in it being promoted to be a coordinator of a cluster-wide backup and restore operation.

The whole idea of how Icarus works in the case of backup/restore is depicted in the following image:

c1,c2,c3 are Cassandra nodes, forming a ring
Client is a REST client, submitting operations (1) to Icarus, s1. In the image, it acts as a coordinator of the whole operation
Coordinator Icarus disperses modified requests (2) to other Icaruses, s2 and s3 (it sends a request to itself too)
each Icarus talks to Cassandra via JMX (3), executing its particular operation
s1 knows that other nodes finished a phase because it is continually retrieving these operations and checks its statuses (4)
After orchestration is done and all phases are finished, client can retrieve its status (5). Icarus does not try to send anything back to client on its own. We can only query it.

Build and Release

$ mvn clean install

If you want to build rpm or deb package, you need to enable rpm and/or deb Maven profile.

There are built RPM and DEB packages for each GitHub release as well as Icarus JAR. We are publishing each release to Maven central.

If you install RPM or DEB package, there is a bash script installed in /usr/local/bin/icarus via which Icarus is interacted with. You can or course invoke Icarus via java -jar icarus.jar.

CLI Tool or Server?

Whatever installation method you prefer, Icarus contains both an Esop and Icarus servers. The reason for CLI being embedded into Icarus is purely practical—Icarus needs to be able to do backup/restores and this implementation logic is located in Esop, so Icarus depends on this tool. Secondly, we do not need two separate JARs—this is particularly handy when Icarus is used in our Kubernetes operator for Cassandra for restoration purposes. It is invoked like a CLI tool but for the sidecar container; it is invoked like a server application which accepts user’s operation requests from the operator.

Note	Icarus reuses Esop—that means that the configuration parameters for CLI of Esop have the same meaning for Icarus. Instead of feeding these flags via CLI in the case of Esop, we interact with Icarus via REST by sending a JSON body to the operations endpoint.

$ java -jar target/icarus.jar
Usage: <main class> [-hV] [COMMAND]
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  esop    Application for backup and restore of a Cassandra node.
  icarus  Sidecar management application for Cassandra.

To execute Icarus:

$ java -jar target/icarus.jar icarus

To execute help for Icarus:

$ java -jar target/icarus.jar icarus help

Connecting to Cassandra

By default, Icarus uses 127.0.0.1:7199 for JMX connection to Cassandra. Please consult Icarus help command to learn how to secure it, or how to point it to another location.

The connection to Cassandra via JMX is done lazily; it is only initiated if it is needed and closed afterwards, so each JMX interaction is a completely separate JMX connection.

REST Interface

You may consult (and try) all operations via our Swagger spec.

To walk-though a reader via an operation lifecycle, let’s explain the most "complicated" operations—backups and restores. After the explanation, a user will be able to do a backup and restore against a running Cassandra node (or Cassandra cluster) as well as being able to similarly submit and query any other operation.

The default address and port for REST is 127.0.0.1:4567

Backup Request Submission

Let’s say we have keyspaces test1 and test2 with tables testtable1,testtable2, and testtable3 respectively which we want to make a backup of. In order to do that, we need to send a backup operation to one of the sidecars.

Let’s say that we have three nodes, where hostname is node1,node2, and node3 respectively and it resolves to IP.

You need to send the following JSON as body for POST http://node1:4567/operations with application/json Content-Type and Accept headers.

{
    "type": "backup",
    "storageLocation": "s3://my-s3-bucket",
    "snapshotTag": "icarus-snapshot",
    "entities": "system_schema,test1,test2",
    "globalRequest": true
}

Firstly, we said that this operation is of type backup—each operation has to have this field with its own type.

Next, we specify our bucket to backup the cluster to via storageLocation—we do not expose credentials in this request. For example, the interaction with S3 has to be backed by some credentials. Please refer to Esop documentation to learn how the credentials resolution for each supported cloud environment works.

Thirdly, we specify snapshotTag under which this backup will be taken and which we need to refer to upon restoration.

Additionally, we specify entities that tells us what keyspaces to actually backup. Icarus does not backup anything you do not specify, so if you want to backup system_schema where table definitions are you have to do that yourself. If you want to backup just one table (or only tables), you need to use ks1.t1,ks2.t2 format.

Last but not least, we say that this request is global by setting globalRequest: true. Once this request is sent to node1 Icarus, that instance will act as a coordinator of the cluster-wide backup. If we did not specify it, there would be a backup of just that node1 done.

After submission of this operation, we may check what operations there are by calling GET http://node1:4567/operations (again JSON Accept).

[
    {
        "type": "backup",
        "id": "ce3014a7-6d5c-4bbd-a680-586f9be27435",
        "creationTime": "2020-11-09T11:49:26.178Z",
        "state": "COMPLETED",
        "errors": [],
        "progress": 1.0,
        "startTime": "2020-11-09T11:49:26.224Z",
        "storageLocation": "s3://my-s3-bucket/test-cluster/dc1/bf96d50b-bb7b-4493-9a2b-048f0fd354da",
        "concurrentConnections": 10,
        "metadataDirective": "COPY",
        "cassandraDirectory": "/var/lib/cassandra",
        "entities": "system_schema,test1,test2",
        "snapshotTag": "icarus-snapshot-6f5e9841-4f97-3198-9398-161b445b3954-1604922565811",
        "k8sNamespace": "default",
        "k8sSecretName": "cloud-backup-secrets",
        "globalRequest": false,
        "timeout": 5,
        "insecure": false,
        "createMissingBucket": false,
        "skipBucketVerification": false,
        "schemaVersion": "6f5e9841-4f97-3198-9398-161b445b3954",
        "uploadClusterTopology": false,
        "completionTime": "2020-11-09T11:49:54.628Z"
    },
    {
        "type": "backup",
        "id": "d668d300-b28c-414d-a08e-4e41f6f0cdfc",
        "creationTime": "2020-11-09T11:49:16.485Z",
        "state": "COMPLETED",
        "errors": [],
        "progress": 1.0,
        "startTime": "2020-11-09T11:49:16.491Z",
        "storageLocation": "s3://my-s3-bucket",
        "concurrentConnections": 10,
        "metadataDirective": "COPY",
        "cassandraDirectory": "/var/lib/cassandra",
        "entities": "system_schema,test1,test2",
        "snapshotTag": "icarus-snapshot-6f5e9841-4f97-3198-9398-161b445b3954-1604922565811",
        "k8sNamespace": "default",
        "k8sSecretName": "cloud-backup-secrets",
        "globalRequest": true,
        "timeout": 5,
        "insecure": false,
        "createMissingBucket": false,
        "skipBucketVerification": false,
        "schemaVersion": "6f5e9841-4f97-3198-9398-161b445b3954",
        "uploadClusterTopology": false,
        "completionTime": "2020-11-09T11:50:02.292Z"
    }
]

We see that our Icarus running on node node1 (Cassandra runs there too, having same IP/hostname) is running two backup operations. There are also other fields returned which the reader can consult by reading our Swagger spec. The first fields returned are the same for every operation (type, id, creationTime, state, errors, progress, startTime).

The last operation is the global one (globalRequest:true). Internally, Icarus detected what the cluster topology is and it executed other, individual, backup request, to each node in a cluster. This fact is reflected in having two backup operations. If you check closely, you see that globalOperation for the first operation is set to false and storageLocation is updated with name of a cluster, data center, and node id.

The similar response to the first operation would be found for the other two nodes too—node2 and node3, (with specific fields updated to reflect each node).

Once an operation is finished or changes its state, we might see it from state field. Here, we see that such operation is COMPLETED. An operation bumps over these states:

submitted—operation is submitted to be executed but it is not actively run
running—operation is executing its logic
completed—operation finished successfully
failed—operation finished with an error.

Progress of an operation might be checked via progress field—it varies from 0.0 to 1.0. If job has completed at 80%, the progress would be 0.8. It is up to each operation to update its progress.

If an operation fails, whatever exceptions are thrown, they are captured in errors field. errors is an array; there might be 5 nodes and each might fail in its own way, so for each failure there would be a separate entry in errors array having different hostname—distinguishing the source of that error from each other.

snapshotTag is updated as well—there is automatically appended host id as well as timestamp—timestamp is equal for each node.

Restore Request Submission

After a cluster is backed up, we may want to restore it. The restoration logic is similar to the backup one when it comes to a coordination/orchestration. If you want to restore a keyspace on a running Cassandra cluster, you just send one JSON to whatever node with this payload:

{
	"type": "restore",
	"storageLocation": "s3://my-s3-bucket/test-cluster/dc1/abcd",
	"snapshotTag": "icarus-snapshot",
	"entities": "test1",
	"globalRequest": "true",
	"restorationStrategyType": "HARDLINKS",
	"restorationPhase": "download",
	"import": {
		"type": "import",
		"sourceDir": "/var/lib/cassandra/downloadedsstables"
	}
}

Again, globalRequest is true, storageLocation will be updated on node id, test-cluster and dc1 needs to be there. Other fields are self-explanatory—you can consult Swagger spec to learn more about them.

You may similarly GET operations to see there are multiple individual operations for each respective restoration phase.

There are restoration phases; these phases are passed per node, and each phase is initiated from coordinator only in case all nodes have passed that particular phase successfully.

DOWNLOAD—download SSTables from remote location locally
IMPORT—hardlinking (for Cassandra 3/4) or importing (for Cassandra 4 only) of SSTables from local destination of the respective node
CLEANUP—optionally (by default turned on) delete downloaded SSTables

SSTables are truncated only after DOWNLOAD phase is over for all nodes.

import field specifies where downloaded SSTables will be located before they will be hardlinked or imported.

Each global restoration procedure has to start with restorationPhase: download. All other phases are handled automatically. This is the advantage of Icarus. In the case of Esop, one would have to execute each phase individually. In Icarus' case, a coordinator will take care of that transparently so in order to do a cluster-wide backup and restore on a running cluster, one just needs to ever send one JSON to whichever node and say it is a global request.

Note

During the backup and restore operations, your whole cluster functions without any disruption. It is even possible to backup and restore while your cluster is running, there is no downtime for both operations. Please consult Esop readme to learn more about the nature of backups and restore and their configuration.

Anatomy of an Operation

The interaction with Icarus via REST is conceptually driven around operations.

An operation has these features:

it is created by calling POST on /operations
upon submit, it returns UUID which uniquely identifies it
such an operation may be checked for its status and progress via returned UUID on GET /operations/{uuid} endpoint
operation runs asychronously
in general, there might run multiple operations in parallel
you may query an operation via its UUID even after such operation has finished, regardless of sucess or error
finished operations from GET /operations endpoint expire, by default, after one hour

What Operations and Endpoints Are Available?

You may consult (and try) all operations via our Swagger spec.

How to Create My Own Operation?

To implement an operation against a Cassandra node, you need to, in general, do the following steps. We will guide a reader through cleanup operation which is very easy to grasp. In this simple example, an operation consists of three classes:

CleanupOperation—the implementation of cleanup against Cassandra

// each operation has to extend Operation class
// each operation has its operation request
public class CleanupOperation extends Operation<CleanupOperationRequest> {

    private final CassandraJMXService cassandraJMXService;

    // Injection of necessary resources / instances
    // for the interaction with Cassandra, you will very likely need JMX connection
    // CassandraJMX service encapsulates this logic.
    // request instance is injected there according to body in HTTP POST request upon
    // operation submission
    @Inject
    public CleanupOperation(final CassandraJMXService cassandraJMXService,
                            @Assisted final CleanupOperationRequest request) {
        super(request);
        this.cassandraJMXService = cassandraJMXService;
    }

    // for brevity, constructor for JSON deserialisation is not shown here
    // please consult the source code to know more

    @Override
    protected void run0() throws Exception {
        // implement your operation, you have access to request from this method
        cassandraJMXService.doWithStorageServiceMBean(
                new FunctionWithEx<StorageServiceMBean, Integer>() {
                    @Override
                    public Integer apply(StorageServiceMBean ssMbean) throws Exception {
                        return ssMbean.forceKeyspaceCleanup(/* necessary arguments */);
                    }
                });
    }
}

CleanupOperationRequest—this class represents your operation request sent to POST /operations. Feel free to model your operation request as you please, but type field has to be there. Each operation request is validated via Hibernate Validator so you may use javax.validation annotations on fields.

public class CleanupOperationRequest extends OperationRequest {

    @NotEmpty
    public final String keyspace;

    public final Set<String> tables;

    @Min(0)
    public final int jobs;

    @JsonCreator
    public CleanupOperationRequest(
            @JsonProperty("type") String type,
            @JsonProperty("keyspace") String keyspace,
            @JsonProperty("tables") Set<String> tables,
            @JsonProperty("jobs") int jobs) {
        this.jobs = jobs;
        this.keyspace = keyspace;
        this.tables = tables;
        this.type = type;
    }

Finally, we need a Guice module which is installed upon Icarus start:

public class CleanupsModule extends AbstractModule {

    @Override
    protected void configure() {
        installOperationBindings(binder(),
                                 "cleanup",
                                 CleanupOperationRequest.class,
                                 CleanupOperation.class);
    }
}

Please note:

this class extends Guice’s AbstractModule
installOperationBindings static method is from com.instaclustr.operations.OperationBindings
cleanup is operation type. You have to specify type field in your JSON request. Each operation is uniquely known and refered to through this type.
Each operation has its implementation and its request—specified as other arguments to installation method.

After we have our operation module, we need to install it. This happens via com.instaclustr.icarus.Icarus#operationModules method.

Tests

There are cloud tests which are primarily focused on backup and restore against a cloud destination—being GCP, Azure, or S3. Clouds tests are disabled by default. You have to enable them like this:

 mvn clean install -PcloudTests \
  -Dawsaccesskeyid=_enter__ \
  -Dawssecretaccesskey=_enter_ \
  -Dgoogle.application.credentials=/path/to/gcp.json \
  -Dazurestorageaccount=_enter_ \
  -Dazurestoragekey=_enter_

Usage

Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project

icarus's People

Stargazers

Watchers

Forkers

erdrix balajivenki c3-e 2ban kthompsonpros

icarus's Issues

create swagger spec

[FEATURE] Support for client-side encryption

Is your feature request related to a problem? Please describe.
Problem is putting sensitive data in the cloud is scary.

Another thing is if one has multiple Cassandra clusters being backed up to the same S3 bucket, one might want to be able to use encryption to make sure one cannot restore other data from the bucket. This can obviously also be done with ACL, but encryption can be an additional layer against that.

Describe the solution you'd like
I'd like

client-side encryption of sstables before they are uploaded to S3.
client-side decryption of sstables after downloaded from S3.

Describe alternatives you've considered
Obviously you can enable server-side encryption in S3, but honestly that's semi-fake security (more compliance) and doens't really protect against an S3 bucket being accessed from the Internet etc. etc.

Additional context
Does this feature need to live in Esop or Icarus? I assume both.

Har this been considered before? I'd be fine with supporting a single encryption key stored in a configuration file or something.

[FEATURE] add the esop remove-backup function to the REST interface.

Is your feature request related to a problem? Please describe.
We are using icarus to create backups and restore functions for our cassandra database. But an important part of the backup procedure is the cleanup of old backups. We can't use the simple retention date option, to delete old backups, because we're using an S3 repository and the skiprefresh option is set to true.
Therefore we can't cleanup old backups easily.

Describe the solution you'd like
We would like to use the remove-backup function in esop for cleaning old backups though the REST interface of icarus, with all the options available in esop remove-backup. With this option we can remotely cleanup old backups.

Describe alternatives you've considered
We are also looking if it is possible to use the esop tooling remotely. At this moment it is unclear if the esop tooling also need to be local when using the remove-backup function.
Also looking at the possibility to make something by ourself, to delete them directly from the S3 repository

Additional context
None.

[BUG] Restore with hardlinks fails in IMPORT phase

Describe the bug
I'm trying to restore a created snapshot which fails with error:

Error{source=cassandra-dev01-ird, message=Hardlinking phase finished with errors, the linking of downloaded SSTables to Cassandra directory has failed., throwable=com.instaclustr.esop.impl.restore.RestorationPhase$RestorationPhaseException: Unable to pass IMPORT phase.}

To Reproduce
make backup
curl --header "Content-Type: application/json" --data '{"type":"backup", "globalRequest":"true", "storageLocation" : "ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/1", "metadataDirective":"REPLACE", "dataDirs":["/icarus/cassandra/data/data"], "skipRefreshing":"true"}' cassandra-dev00-ird:4567/operations

restore backup
curl --header "Content-Type: application/json" --data '{"type":"restore", "globalRequest":"true", "dataDirs":["/icarus/cassandra/data/data"], "snapshotTag":"autosnap-1645605747", "restorationPhase":"INIT", "restorationStrategyType":"HARDLINKS", "storageLocation":"ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/2", "import":{"type":"import", "sourceDir":"/icarus/tmp/"}, "resolveHostIdFromTopology":"true", "cassandraDirectory":"/icarus/cassandra/data/"}' cassandra-dev00-ird:4567/operations

Expected behavior
restore should complete without errors.

Versions (please complete the following information):

OS "Ubuntu 20.04.3 LTS (Focal Fossa)"
cassandra_version: 3.11.6
openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
icarus version 2.0.2

Additional context
command used:
curl --header "Content-Type: application/json" --data '{"type":"restore", "globalRequest":"true", "dataDirs":["/icarus/cassandra/data/data"], "snapshotTag":"autosnap-1645605747", "restorationPhase":"INIT", "restorationStrategyType":"HARDLINKS", "storageLocation":"ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/2", "import":{"type":"import", "sourceDir":"/icarus/tmp/"}, "resolveHostIdFromTopology":"true", "cassandraDirectory":"/icarus/cassandra/data/"}' cassandra-dev00-ird:4567/operations

Part of icarus logging
[] - 08:53:42.480 [pool-4-thread-3] INFO c.i.e.i.r.RestorationPhase$HardlinkingPhase - Hardlinking phase has started.
[] - 08:53:43.565 [pool-4-thread-3] INFO c.i.e.i.Manifest - Resolved manifest: cassandra_gms_dev/rc3/982ab74c-cd42-4885-a986-b61c3eb186b4/manifests/autosnap-1645605747-73919840-436c-3d25-a9d8-42659f4e5722-1645605756900.json
[] - 08:54:31.818 [pool-4-thread-3] ERROR c.i.e.i.r.RestorationPhase$HardlinkingPhase - Unable to create a hardlink from /icarus/tmp/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db to /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db, skipping the linking of all other resources and deleting already linked ones.
java.nio.file.FileAlreadyExistsException: /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db -> /icarus/tmp/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476)
at java.nio.file.Files.createLink(Files.java:1086)
at com.instaclustr.esop.impl.restore.RestorationPhase$HardlinkingPhase.execute(RestorationPhase.java:539)
at com.instaclustr.esop.impl.restore.strategy.AbstractRestorationStrategy.restore(AbstractRestorationStrategy.java:93)
at com.instaclustr.esop.impl.restore.coordination.BaseRestoreOperationCoordinator.coordinate(BaseRestoreOperationCoordinator.java:57)
at com.instaclustr.icarus.coordination.IcarusRestoreOperationCoordinator.coordinate(IcarusRestoreOperationCoordinator.java:139)
at com.instaclustr.esop.impl.restore.RestoreOperation.run0(RestoreOperation.java:148)
at com.instaclustr.operations.Operation.run(Operation.java:247)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
[] - 08:54:31.840 [pool-4-thread-3] ERROR c.i.e.i.r.RestorationPhase$HardlinkingPhase - Hardlinking phase has failed: Hardlinking phase finished with errors, the linking of downloaded SSTables to Cassandra directory has failed.

Data on filesystem
ls -al /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db
-rw-r--r-- 2 1001 root 43 Feb 22 14:56 /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db

Backup taken with globalRequest enabled cannot be restored

Hopefully I'm just doing something wrong here. I've tried it several ways to no avail though. I can get the backups to work just fine. They look correct in object storage too. If I just backup a single node, I can restore it w/no issues as well. However if I try to restore (even a single node) from a backup with globalRequest: true, it fails with the following error (which I formatted for ease of reading).

There is not one key which satisfies key filter: [
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/792aa697-ca02-4b44-83c0-1dc6c48ef47f/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='ab34f1c0e87d136f503de0b238049575', 
    size=5158, 
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  },
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/99fe4904-91ad-402a-bc0d-6acbb31def99/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec',
    eTag='68a294ccd1abd9104dbbb5e88f451bdc',
    size=1971,
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/a0eb1f45-5f0f-4589-9919-c1d32de2c68d/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='f88d981fad23a80184d421372eb3b21c', 
    size=6100, 
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket',
    key='cassandra/dc2/3852f40d-8bfe-4bc2-b529-3aba699b44d7/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec',
    eTag='90a50752c6813e9d6910d09f2dcbf735', 
    size=4198, lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc2/b5cab591-bee5-4649-ac8d-ad80cef79287/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='c027f679c82e29a21cdffe3f4dd10504', 
    size=1754, 
    lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket',
    key='cassandra/dc2/bcc2d49c-0416-40ff-b00f-ae208513ceeb/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='39babb109aa669aee29eaf0cb21c1aba', 
    size=5149, lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }
]

So you can see that it's picking out 6 directories/nodes that I have backed up (for testing, I just setup a 2 DC cluster with 3 nodes per DC).

Here's the POST body:

        {
          "type": "restore",
          "cassandraDirectory": "/var/lib/cassandra",
          "cassandraConfigDirectory": "/etc/cassandra",
          "storageLocation": "s3://my-bucket",
          "snapshotTag": "hello",
          "k8sNamespace": "craig",
          "k8sSecretName": "c-backup-secret",
          "entities": "system_auth",
          "restorationStrategyType": "import",
          "updateCassandraYaml": true,
          "restorationPhase": "download",
          "import": {
            "type": "import",
            "sourceDir": "/var/lib/cassandra/data/downloadedsstables"
          },
          "globalRequest": false
        }

I've also tried it from the command line with similar results:

java -jar /cassandra-sidecar.jar \
backup-restore \
restore  \
--jmx-service=service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi \
--jmx-user=xxx \
--jmx-password=xxx \
--k8s-namespace=craig \
--k8s-secret-name=c-backup-secret \
--data-directory=/var/lib/cassandra \
--storage-location=s3://my-bucket \
--snapshot-tag=hello \
--restoration-strategy-type=import \
--restoration-phase-type=download \
--import-source-dir=/var/lib/cassandra/data/downloadedsstables

Is there a setting or something that I'm missing here?

EDIT:
Here's the backup POST body that I'm using:

{
    "type": "backup",
    "storageLocation": "s3://my-bucket",
    "snapshotTag": "hello",
    "entities": "system_auth",
    "k8sNamespace": "craig",
    "k8sSecretName": "c-backup-secret",
    "dataDirectory": "/var/lib/cassandra",
    "globalRequest": true
}

Provide SPI for modules so there might be developed independently

Is your feature request related to a problem? Please describe.

I would like to be able to implement modules and use them without actually merging them into the code base.

Describe the solution you'd like

There needs to be a SPI provided / implemented so mere placing of an implementation to class path will result in an operation module being instantiated and registered upon Icarus starting.

Is JMX auth and SSL supported?

I'm not seeing where you can inject credentials for JMX or truststore/keystore information in the sidecar. Is this supported?

[BUG] icarus restore fails when dropped tables still have cassandra snapshots or backups

Describe the bug
We tried to restore a backup, using for both the backup and restore the icarus rest interface.
But during the restore we get the error:
ERROR [MutationStage-1] 2022-02-23 08:49:09,695 TruncateVerbHandler.java:44 - Error in truncation
java.lang.IllegalArgumentException: Unknown keyspace/cf pair (gms.incidents_by_label_mv)

This table did exists before, but is dropped a long time ago. This table isn't described in the manifest of the backup.
But on the cassandra filesystem we see several directories like:
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718/backups
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718/snapshots
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718/snapshots/1590585781812-incidents_by_label_mv
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718/snapshots/1590585781812-incidents_by_label_mv/manifest.json
incidents_by_label_mv-2edad310a01d11eaa83cb14d6a129718/snapshots/1590585781812-incidents_by_label_mv/schema.cql

It looks like de restore tool tries to refresh tables which doesn't exists anymore, because the tool found a directory, with snapshots of the old table, on the file system of cassandra.

To Reproduce
create a table in cassandra.
Create snapshot within cassandra itself
drop the table
Create backup with icarus
restore the backup with icarus

Expected behavior
It should be able to restore the database without problems.
The restore should probably detect the table doesn't exist and skip the refresh step (which truncate the table)

Versions (please complete the following information):

OS "Ubuntu 20.04.3 LTS (Focal Fossa)"
cassandra_version: 3.11.6
openjdk version "1.8.0_322"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06)
OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
icarus version 2.0.2

Additional context
Complete error
ERROR [MutationStage-1] 2022-02-23 08:49:09,695 TruncateVerbHandler.java:44 - Error in truncation
java.lang.IllegalArgumentException: Unknown keyspace/cf pair (gms.incidents_by_label_mv)
at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:198) ~[apache-cassandra-3.11.6.jar:3.11.6]
at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:39) ~[apache-cassandra-3.11.6.jar:3.11.6]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) [apache-cassandra-3.11.6.jar:3.11.6]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_252]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165) [apache-cassandra-3.11.6.jar:3.11.6]
at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137) [apache-cassandra-3.11.6.jar:3.11.6]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113) [apache-cassandra-3.11.6.jar:3.11.6]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_252]

Possible solution:
Check if the directory contains table files (*.db) when parsing the CassandraData
com.instaclustr.esop.impl.CassandraData.parse (line 296)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.