Giter Site home page Giter Site logo

helyim's Introduction

Helyim

CI codecov License

seaweedfs implemented in pure Rust.

Features

Additional Features

  • Can choose no replication or different replication levels, rack and data center aware.
  • Automatic compaction to reclaim disk space after deletion or update.
  • Automatic master servers fail over - no single point of failure (SPOF).
  • Erasure Coding for warm storage Rack-Aware 10.4 erasure coding reduces storage cost.

Usage

By default, the master node runs on port 9333, and the volume nodes run on port 8080. Let's start one master node, and one volume node on port 8080. Ideally, they should be started from different machines. We'll use localhost as an example.

Helyim uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.

1. Start Master Server

cargo run --bin helyim master

2. Start Volume Servers

cargo run --bin helyim volume --port 8080 --folders ./vdata:70 --folders ./v1data:10

3. Write File

To upload a file: first, send a HTTP POST, PUT, or GET request to /dir/assign to get an fid and a volume server URL:

curl http://localhost:9333/dir/assign
{"fid":"6,16b7578a5","url":"127.0.0.1:8080","public_url":"127.0.0.1:8080","count":1,"error":""}

Second, to store the file content, send a HTTP multi-part POST request to url + '/' + fid from the response:

curl -F file=@./sun.jpg http://127.0.0.1:8080/6,16b7578a5
{"name":"sun.jpg","size":1675569,"error":""}

To update, send another POST request with updated file content.

For deletion, send an HTTP DELETE request to the same url + '/' + fid URL:

curl -X DELETE http://127.0.0.1:8080/6,16b7578a5

Failover Master Server

When initiating a Raft cluster, it is necessary to specify the same node sequence when starting the Leader and Follower instances.

You can view the cluster status by accessing http://localhost:9333/cluster/status.

# start master1
cargo run --release --bin helyim -- master --ip 127.0.0.1 --port 9333 \
      --peers 127.0.0.1:9333 \
      --peers 127.0.0.1:9335 \
      --peers 127.0.0.1:9337
      
      
# start master2
cargo run --release --bin helyim -- master --ip 127.0.0.1 --port 9335 \
      --peers 127.0.0.1:9333 \
      --peers 127.0.0.1:9335 \
      --peers 127.0.0.1:9337
      
# start master3
cargo run --release --bin helyim -- master --ip 127.0.0.1 --port 9337 \
      --peers 127.0.0.1:9333 \
      --peers 127.0.0.1:9335 \
      --peers 127.0.0.1:9337

Benchmark

My laptop results on Lenovo IdeaPad Pro 16 (2023) with SSD, CPU: 14 Intel Core i9 5.4GHz.

It seems to be slower than seaweedfs, especially in terms of reading.

โžœ ./weed benchmark -server=localhost:9333
This is SeaweedFS version 0.76 linux amd64

------------ Writing Benchmark ----------
Completed 15199 of 1048576 requests, 1.4% 15198.1/s 15.3MB/s
Completed 31887 of 1048576 requests, 3.0% 16687.9/s 16.8MB/s
Completed 48439 of 1048576 requests, 4.6% 16551.6/s 16.7MB/s
...
Completed 994044 of 1048576 requests, 94.8% 16645.2/s 16.8MB/s
Completed 1010800 of 1048576 requests, 96.4% 16755.8/s 16.9MB/s
Completed 1027412 of 1048576 requests, 98.0% 16612.2/s 16.7MB/s
Completed 1044319 of 1048576 requests, 99.6% 16907.0/s 17.0MB/s

Concurrency Level:      16
Time taken for tests:   63.249 seconds
Complete requests:      1048576
Failed requests:        0
Total transferred:      1106759553 bytes
Requests per second:    16578.50 [#/sec]
Transfer rate:          17088.29 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.1      0.9       29.8      0.4

Percentage of the requests served within a certain time (ms)
   50%      0.9 ms
   66%      1.0 ms
   75%      1.1 ms
   90%      1.3 ms
   95%      1.5 ms
   98%      1.7 ms
   99%      1.8 ms
  100%     29.8 ms

------------ Randomly Reading Benchmark ----------
Completed 89963 of 1048576 requests, 8.6% 89957.6/s 90.5MB/s
Completed 187560 of 1048576 requests, 17.9% 97597.1/s 98.2MB/s
Completed 283486 of 1048576 requests, 27.0% 95925.8/s 96.6MB/s
Completed 382035 of 1048576 requests, 36.4% 98549.4/s 99.2MB/s
Completed 480649 of 1048576 requests, 45.8% 98613.9/s 99.3MB/s
Completed 583585 of 1048576 requests, 55.7% 102933.7/s 103.6MB/s
Completed 683954 of 1048576 requests, 65.2% 100370.9/s 101.0MB/s
Completed 782522 of 1048576 requests, 74.6% 98567.9/s 99.2MB/s
Completed 883504 of 1048576 requests, 84.3% 100982.7/s 101.7MB/s
Completed 987320 of 1048576 requests, 94.2% 103814.3/s 104.5MB/s

Concurrency Level:      16
Time taken for tests:   10.600 seconds
Complete requests:      1048576
Failed requests:        0
Total transferred:      1106777459 bytes
Requests per second:    98925.73 [#/sec]
Transfer rate:          101969.36 [Kbytes/sec]

Connection Times (ms)
              min      avg        max      std
Total:        0.0      0.1       2.3      0.1

Percentage of the requests served within a certain time (ms)
   50%      0.1 ms
   95%      0.2 ms
   98%      0.4 ms
  100%      2.3 ms

Acknowledgments

  • seaweedfs - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
  • zergling - seaweedFS re-implemented in Rust.

helyim's People

Contributors

iamazy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

helyim's Issues

Support async file operations

To facilitate file operations with tokio_uring, it is essential to initially transform synchronous file operations into asynchronous ones, and then envelop the unified file operations of tokio and tokio_uring for integration.

Volume data integrity checking failed after compaction

In this function, once the compaction process is complete, all data will be wiped clean and new data file will be filled with 0.
At present, compaction method in use is compact2, Upon fixing this bug, an option should be provided to allow for the selection of the compaction method employed.

pub fn compact(&self) -> Result<(), VolumeError> {
let filename = self.filename();
self.set_last_compact_index_offset(self.needle_mapper()?.index_file_size()?);
self.set_last_compact_revision(self.super_block.compact_revision());
self.set_readonly(true);
self.copy_data_and_generate_index_file(
format!("{}.{COMPACT_DATA_FILE_SUFFIX}", filename),
format!("{}.{COMPACT_IDX_FILE_SUFFIX}", filename),
)?;
info!("compact {filename} success");
Ok(())
}

Feat: format the startup parameters output, including git commit info

Just like kafka

[main] INFO org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values: 
	auto.commit.interval.ms = 5000
	auto.offset.reset = latest
	bootstrap.servers = [test-nameserver.jmq.jd.local:50088]
	check.crcs = true
	client.dns.lookup = default
	client.id = liyue1
	connections.max.idle.ms = 540000
	default.api.timeout.ms = 60000
	enable.auto.commit = true
	exclude.internal.topics = true
	fetch.max.bytes = 52428800
	fetch.max.wait.ms = 500
	fetch.min.bytes = 1
	group.id = liyue1
	heartbeat.interval.ms = 3000
	interceptor.classes = []
	internal.leave.group.on.close = true
	isolation.level = read_uncommitted
	key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
	max.partition.fetch.bytes = 1048576
	max.poll.interval.ms = 300000
	max.poll.records = 500
	metadata.max.age.ms = 300000
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
	receive.buffer.bytes = 65536
	reconnect.backoff.max.ms = 1000
	reconnect.backoff.ms = 50
	request.timeout.ms = 30000
	retry.backoff.ms = 100
	sasl.client.callback.handler.class = null
	sasl.jaas.config = [hidden]
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism = PLAIN
	security.protocol = SASL_PLAINTEXT
	send.buffer.bytes = 131072
	session.timeout.ms = 10000
	ssl.cipher.suites = null
	ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
	ssl.endpoint.identification.algorithm = https
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLS
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

[main] INFO org.apache.kafka.common.security.authenticator.AbstractLogin - Successfully logged in.
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.1.1
[main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 21234bee31165527

s3 support assumption

If it support s3 api, it must store some s3 object matedata like bucket, key, fid, ...
Does it require the use of a database? We need to write a lot of s3 logic?
Is there a solution where you can just use this project as the data storage backend and not care about the implementation of S3 metadata and logic?

Proxy to leader

when request to follower with http, it should redirect to leader

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.