ivanary / cabin Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 9 KB

License: MIT License

cabin's Introduction

Hi there 👋

🔭 I’m currently working on Home
🌱 I’m currently learning Rust

cabin's People

Stargazers

Watchers

cabin's Issues

Background

本篇是基于MIT6.824课后实践实现的一个简单Raft，包括三个部分：Leader选举，日志复制，持久化。整篇的核心在raft paper里面的figure2中，这张图可以直接理解为编程语言了，是整个raft的核心。

Leader选举

状态机
Leader选举的概念状态机再重新贴一下：

先理清楚两个timeout：electionTimeout以及heartbeatTimeout。

每个Follower节点初始化的时候都会随机设置一个electionTimeout，到达这个时间点就会变成Candidate发起投票请求。
Leader需要维持心跳包，每隔一段时间即heartbeat timeout就会发送一个heart beat。状态机的变化对照figure 2即可进行转化，这里我们只关注leader选举的部分。
Follower:
响应candidate的投票请求和leader的心跳请求
在electionTimeout时间内若没有收到心跳包或者投票请求则转化为candidate

Candidate:

变成Candidate后需要的操作

将自身的currentTerm自增
给自己投一票
重置electionTimeout

给其他节点发送投票请求
收到大多数响应就变成leader
如果收到心跳包就变成follower
如果electionTimeout超时，则再进行重试
Leader:
每间隔heartbeatTimeout则发送心跳包
Raft 结构体:

` type Raft struct {
mu sync.RWMutex // Lock to protect shared access to this peer's state
peers []*labrpc.ClientEnd // RPC end points of all peers
persister *Persister // Object to hold this peer's persisted state
me int // this peer's index into peers[]
dead int32 // set by Kill()

// Your data here (2A, 2B, 2C).
// Look at the paper's Figure 2 for a description of what
// state a Raft server must maintain.
currentTerm int
votedFor    int
logs        []LogEntry

// Volatile state on all servers
commitIndex int
lastApplied int

// Volatile state on leaders
nextIndex  []int
matchIndex []int

voteCount     int
state         uint64
granted       chan struct{}
AppendEntries chan struct{}
electWin      chan struct{}
applyCh       chan ApplyMsg

}`
状态机里面的electionTimout设置为随机值是为了避免选举 split vote 情况。

`func (rf *Raft) run() {
for {
switch rf.state {
case StateCandidate:
// vote
select {
// 超时，再重试一遍
case <-time.After(time.Millisecond * time.Duration(rand.Intn(200)+300)):
rf.mu.Lock()
rf.becomeCandidate()
rf.mu.Unlock()
case <-rf.AppendEntries:
rf.mu.Lock()
rf.becomeFollower("candidate receive heart beat")
rf.mu.Unlock() case <-rf.electWin:
rf.mu.Lock()
rf.becomeLeader()
rf.mu.Unlock()
}
case StateFollower:
select {
// 收到投票请求
case <-rf.granted:
// 收到心跳请求
case <-rf.AppendEntries:
case <-time.After(time.Millisecond * time.Duration(rand.Intn(200)+300)):
rf.mu.Lock()
rf.becomeCandidate()
rf.mu.Unlock()
}
case StateLeader:
go rf.sendAppendEntries()
time.Sleep(time.Millisecond * 100)
}
}
}

func (rf *Raft) becomeLeader() {
rf.debug("changed to Leader, id %d , term %d, logs %v", rf.me, rf.currentTerm, rf.logs)
rf.state = StateLeader
rf.nextIndex = make([]int, len(rf.peers))
rf.matchIndex = make([]int, len(rf.peers))
for i := range rf.peers {
rf.nextIndex[i] = rf.getLastIndex() + 1
}
}

func (rf *Raft) becomeFollower(reason string) {
rf.debug("changed to Follower, id %d, term %d, reason %s", rf.me, rf.currentTerm, reason)
rf.state = StateFollower
}

func (rf *Raft) becomeCandidate() {
rf.debug("changed to Candidate, id %d, term %d, logs %v", rf.me, rf.currentTerm, rf.logs)
rf.state = StateCandidate
rf.currentTerm++
rf.votedFor = rf.me
rf.voteCount = 1
rf.persist()
go rf.sendAllVotesRequests()
}`

为了调式方便，写了一个带时间戳的Log，输出的日志看的比较方便

func (rf *Raft) debug(msg string, a ...interface{}) { if debug < 1 { return } selfMsg := fmt.Sprintf(" [me:%d term:%d, state: %d, log: %d] ", rf.me, rf.currentTerm, rf.state, len(rf.logs)) fmt.Println(strconv.Itoa(int(time.Now().UnixNano())/1000) + selfMsg + fmt.Sprintf(msg, a...)) }
投票请求 RequestVote
RequestVote接口是发起投票用的RPC接口，只能由candidate发起。默认理解为每个Raft 集群的Node上都会这么一个接口用来接收candidate发起的投票请求。
Candidate给其他节点发送请求，处理response也在这个函数里面。

`func (rf *Raft) sendAllVotesRequests() {
rf.mu.Lock()
// 投票的参数
args := &RequestVoteArgs{}
args.Term = rf.currentTerm
args.CandidateId = rf.me
// args.LastLogIndex = rf.getLastIndex()
// args.LastLogTerm = rf.getLastTerm()
rf.mu.Unlock()

var wg sync.WaitGroup

for p := range rf.peers {
	if p != rf.me {
		wg.Add(1)
		go func(p int) {
			defer wg.Done()

			ok := rf.sendRequestVote(p, args, &RequestVoteReply{})
			if !ok {
				rf.debug("send request to p: %d, ok: %v", p, ok)
			}

		}(p)
	}
}
wg.Wait()

rf.mu.Lock()
// 等待结果返回，如果符合quorum协议，则投票成功
win := rf.voteCount >= len(rf.peers)/2+1
// make sure the vote request is valid
if win && args.Term == rf.currentTerm {
	rf.electWin <- struct{}{}
}
rf.debug("vote finished, voteCount: %d, win: %v", rf.voteCount, win)
rf.mu.Unlock()

}`
sendRequestVote里面加入了RPC的timeout，是因为测试用例里面会模拟网络不可达的情况，如果一个请求一直hang下去，系统会更加复杂。虽然paper上说的是可以无限重试，但是实际生产环境中外部RPC调用都是需要加上一个timeout来保护资源泄漏。

`func (rf *Raft) sendRequestVote(server int, args *RequestVoteArgs, reply *RequestVoteReply) bool {
respCh := make(chan bool)
ok := false
go func() {
respCh <- rf.peers[server].Call("Raft.RequestVote", args, reply)
}()
select {
case <-time.After(time.Millisecond * 60): // 1s
return false
case ok = <-respCh:
}
if !ok {
return false
}
rf.mu.Lock()
defer rf.mu.Unlock()
defer rf.persist()

if rf.state != StateCandidate || args.Term != rf.currentTerm {
	return ok
}
// 当前term较小了
if reply.Term > rf.currentTerm {
	rf.becomeFollower("candidate received large term")
	rf.currentTerm = args.Term
	rf.votedFor = -1
}

if reply.VoteGranted {
	rf.voteCount++
}
return ok

}`
心跳维持 AppendEntries
心跳包的维持是每隔一段时间(heartbeat timeout)去发送的，函数名为AppendEntries，因为log之后的数据每次同步也都是在这里面发送的。

`func (rf *Raft) sendAppendEntries() {
var wg sync.WaitGroup

rf.mu.RLock()
for p := range rf.peers {
	if p != rf.me {
		args := &RequestAppendEntriesArgs{}
		// 发送leader term
		args.Term = rf.currentTerm
		// leader ID
		args.LeaderID = rf.me
		// 日志同步需要用到的 leader 选举暂时用不到
		args.PrevLogIndex = rf.nextIndex[p] - 1
		args.LeaderCommit = rf.commitIndex
      
		if args.PrevLogIndex >= 0 {
			args.PrevLogTerm = rf.logs[args.PrevLogIndex].Term
		}
		// 发送空数据
		if rf.nextIndex[p] <= rf.getLastIndex() {
			args.Entries = rf.logs[rf.nextIndex[p]:]
		}
		//rf.debug("send Entries is: %v, index is: %d", args.Entries, p)
		wg.Add(1)

		go func(p int, args *RequestAppendEntriesArgs) {
			defer wg.Done()
			ok := rf.sendRequestAppendEntries(p, args, &RequestAppendEntriesReply{})
			if !ok {
				rf.debug("send %d AppendEntries result:%v", p, ok)
			}
		}(p, args)
	}
}
rf.mu.RUnlock()
wg.Wait()

}`
Leader在发送RPC请求的时候也需要带上一个timeout，这样方便控制整个流程。

`func (rf *Raft) sendRequestAppendEntries(server int, args *RequestAppendEntriesArgs, reply *RequestAppendEntriesReply) bool {
respCh := make(chan bool)
ok := false
go func() {
respCh <- rf.peers[server].Call("Raft.RequestAppendEntries", args, reply)
}()

select {
case <-time.After(time.Millisecond * 60): // 100ms
	return false
case ok = <-respCh:
}

rf.mu.Lock()
defer rf.mu.Unlock()
if !ok || rf.state != StateLeader || args.Term != rf.currentTerm {
	return ok
}
if reply.Term > rf.currentTerm {
	rf.becomeFollower("leader expired")
	rf.currentTerm = reply.Term
	rf.persist()
	return ok
}
return ok

被调用方收到的请求处理流程

func (rf *Raft) RequestAppendEntries(args *RequestAppendEntriesArgs, reply *RequestAppendEntriesReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
defer rf.persist()

if args.Term < rf.currentTerm {
	reply.Term = rf.currentTerm
	return
}

// 发送channel表示收到心跳包，重置timeout
rf.AppendEntries <- struct{}{}
if args.Term > rf.currentTerm {
rf.currentTerm = args.Term
if rf.state != StateFollower {
rf.becomeFollower("request append receive large term")
rf.votedFor = -1
}
}
reply.Success = true
`

日志复制

日志复制应该算是整个Lab里面最复杂的一部分，先简单回顾下paper内容。

当Leader被选出来后，就可以接受客户端发来的请求了，每个请求包含一条需要被replicated state machines执行的命令。leader会把它作为一个log entry append到日志中，然后给其它的server发AppendEntriesRPC请求。当Leader确定一个log entry被safely replicated了（大多数副本已经将该命令写入日志当中），就apply这条log entry到状态机中然后返回结果给客户端。如果某个Follower宕机了或者运行的很慢，或者网络丢包了，则会一直给这个Follower发AppendEntriesRPC直到日志一致。

当一条日志是commited时，Leader才可以将它应用到状态机中。Raft保证一条commited的log entry已经持久化了并且会被所有的节点执行。

因此，需要有一种机制来让leader和follower对log达成一致，leader会为每个follower维护一个nextIndex，表示leader给各个follower发送的下一条log entry在log中的index，初始化为leader的最后一条log entry的下一个位置。leader给follower发送AppendEntriesRPC消息，带着(term_id, (nextIndex-1))， term_id即(nextIndex-1)这个槽位的log entry的term_id，follower接收到AppendEntriesRPC后，会从自己的log中找是不是存在这样的log entry，如果不存在，就给leader回复拒绝消息，然后leader则将nextIndex减1，再重复，知道AppendEntriesRPC消息被接收。

初始化，nextIndex为11，leader给b发送AppendEntriesRPC(6,10)，b在自己log的10号槽位中没有找到term_id为6的log entry。则给leader回应一个拒绝消息。接着，leader将nextIndex减一，变成10，然后给b发送AppendEntriesRPC(6, 9)，b在自己log的9号槽位中同样没有找到term_id为6的log entry。循环下去，直到leader发送了AppendEntriesRPC(4,4)，b在自己log的槽位4中找到了term_id为4的log entry。接收了消息。随后，leader就可以从槽位5开始给b推送日志了。

相较于leader选举，根据figure2可以知道会增加几个变量，先解释几个参数的意义:

`type raft struct {
logs []LogEntry
commitIndex int
lastApplied int

nextIndex  []int
matchIndex []int
applyCh       chan ApplyMsg

}`
commitIndex 表示的是当前节点已经commit的位置
lastApplied 表示的是上次apply的位置
nextIndex 里面是一个数组，只有leader的nextIndex才有意义，表示的是希望与对应的peer下次同步日志的位置，初始化的时候是当前最长log的位置
matchIndex 是用来表示已经同步过log的位置，初始化的时候位置为0，这个也是只有leader才有意义
applyCh 在commit之后可以进行apply操作的channel
Log可以定义为[]LogEntry, 里面的command是Lab所需要的，这么一来Log的定义就完成了。

type LogEntry struct { Term int Command interface{} }
RequestVoteArgs 里面会新增LastLogIndex和LastLogTerm，用来判断当前leader是否是最新的。

type RequestVoteArgs struct { LastLogIndex int LastLogTerm int }
另外RequestAppendEntriesArgs里面也有所改变

`type RequestAppendEntriesArgs struct {
// Your data here (2A, 2B).
PrevLogIndex int
PrevLogTerm int

Entries      []LogEntry

}`
PrevLogIndex对应的Leader中的nextIndex数组减去一，PrevLogTerm同理。

接受写请求 Start
这里唯一需要注意的一点就是Lab与raft paper不同，每次是直接在append之后就返回，没有等待其他Leader的append。

`func (rf *Raft) Start(command interface{}) (int, int, bool) {
rf.mu.Lock()
defer rf.mu.Unlock()
defer rf.persist()
if rf.state != StateLeader {
return 0, 0, false
}

index := rf.getLastIndex() + 1
term := rf.currentTerm
isLeader := true

// Your code here (2B).
// append to current logs
rf.logs = append(rf.logs, LogEntry{term, command})
rf.debug("receive start command, logs is :%v", rf.logs)
return index, term, isLeader

}`
Log发送 AppendEntries
相较于上次的Leader选举，新的AppendEntries会去同步日志，主要需要构建 PrevLogIndex以及Entries，Entries为空的话发送一个心跳包即可。

`func (rf *Raft) sendAppendEntries() {
var wg sync.WaitGroup

rf.mu.RLock()
for p := range rf.peers {
	if p != rf.me {
		args := &RequestAppendEntriesArgs{}
		args.Term = rf.currentTerm
		args.LeaderID = rf.me
		args.PrevLogIndex = rf.nextIndex[p] - 1
		args.LeaderCommit = rf.commitIndex

		if args.PrevLogIndex >= 0 {
			args.PrevLogTerm = rf.logs[args.PrevLogIndex].Term
		}
		// send empty data if index are same
		if rf.nextIndex[p] <= rf.getLastIndex() {
			args.Entries = rf.logs[rf.nextIndex[p]:]
		}
		//rf.debug("send Entries is: %v, index is: %d", args.Entries, p)
		wg.Add(1)

		go func(p int, args *RequestAppendEntriesArgs) {
			defer wg.Done()
			ok := rf.sendRequestAppendEntries(p, args, &RequestAppendEntriesReply{})
			if !ok {
				rf.debug("send %d AppendEntries result:%v", p, ok)
			}
		}(p, args)
	}
}
rf.mu.RUnlock()
wg.Wait()

}`
在看具体发送逻辑, 在每次成功响应后都会去提交日志, 更新Leader本地的rf.nextIndex以及rf.matchIndex。RetryIndex是用来优化的一个点，下个函数会讲到。

select {
case <-time.After(time.Millisecond * 60): // 100ms
	return false
case ok = <-respCh:
}

rf.mu.Lock()
defer rf.mu.Unlock()
if !ok || rf.state != StateLeader || args.Term != rf.currentTerm {
	return ok
}
if reply.Term > rf.currentTerm {
	rf.becomeFollower("leader expired")
	rf.currentTerm = reply.Term
	rf.persist()
	return ok
}
//rf.debug("rf matchIndex is %v", rf.matchIndex)
if reply.Success {

	rf.matchIndex[server] = args.PrevLogIndex + len(args.Entries)
	//rf.debug("reply success, server is %d, matchIndex is %d", server, rf.matchIndex[server])
	rf.nextIndex[server] = rf.matchIndex[server] + 1

	go rf.commit()
} else {
	rf.nextIndex[server] = reply.RetryIndex
}

return ok

}`
Commit的逻辑也很简单，遍历peers，如果超过半数以上的matchIndex都等于当前Leader Log的结尾，则认为这是一次有效的Append，进行提交。

`func (rf *Raft) commit() {
majority := len(rf.peers)/2 + 1

for i := rf.getLastIndex(); i > rf.commitIndex; i-- {
	count := 1
	if rf.logs[i].Term == rf.currentTerm {
		for j := range rf.peers {
			if j == rf.me {
				continue
			}
			// 当前的Leader的Log得到认可
			if rf.matchIndex[j] >= i {
				count++
			}
		}
	}
	
	if count >= majority {
		rf.commitIndex = i
		go rf.applyLog()
		break
	}
}

}

func (rf *Raft) applyLog() {
rf.mu.Lock()
defer rf.mu.Unlock()
// apply changes
for i := rf.lastApplied + 1; i <= rf.commitIndex; i++ {
msg := ApplyMsg{CommandIndex: i, Command: rf.logs[i].Command, CommandValid: true}
rf.debug("send msg is: %v, lastApplied is %d, commitIndex is %d", msg, rf.lastApplied, rf.commitIndex)
rf.applyCh <- msg
}

rf.lastApplied = rf.commitIndex

}`
接受方的逻辑，使用retry index进行优化，当收到的request是有效之后，覆盖有冲突的Logs，直接从rf.logs[:args.PrevLogIndex+1]开始，然后进行提交。

`func (rf *Raft) RequestAppendEntries(args *RequestAppendEntriesArgs, reply *RequestAppendEntriesReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
defer rf.persist()

if args.Term < rf.currentTerm {
	reply.Term = rf.currentTerm
	return
}

rf.AppendEntries <- struct{}{}
if args.Term > rf.currentTerm {
	rf.currentTerm = args.Term
	if rf.state != StateFollower {
		rf.becomeFollower("request append receive large term")
		rf.votedFor = -1
	}
}

// which means the request need to decrease the index and send request again
if args.PrevLogIndex > rf.getLastIndex() {
	reply.RetryIndex = rf.getLastIndex() + 1
	return
}
// 这里使用retry index 其实是一个优化点
// paper 里面是每次自减，回复一个false，这里直接找到下一个term的位置
// 减少了心跳包的发送次数
if args.PrevLogIndex > 0 && rf.logs[args.PrevLogIndex].Term != args.PrevLogTerm {
	for reply.RetryIndex = args.PrevLogIndex - 1;
		reply.RetryIndex > 0 && rf.logs[reply.RetryIndex].Term == rf.logs[args.PrevLogIndex].Term;
	reply.RetryIndex-- {
	}
	return
}
rf.logs = append(rf.logs[:args.PrevLogIndex+1], args.Entries...)
//rf.debug("args.LeaderCommit is :%d, PrevLogIndex %d, commitIndex: %d", args.LeaderCommit, args.PrevLogIndex, rf.commitIndex)
if args.LeaderCommit > rf.commitIndex {
	rf.commitIndex = min(rf.getLastIndex(), args.LeaderCommit)
	go rf.applyLog()
}
reply.Success = true

持久化

根据Paper的内容，需要持久化的内容有三个：currentterm, votedFor, log[]

这就意味着每次当raft结构体内上诉三个变量发生改变的时候我们都需要将其持久化。persisth和readPersist都很简单。

`func (rf *Raft) persist() {
// Your code here (2C).
// Example:
w := new(bytes.Buffer)
e := labgob.NewEncoder(w)
e.Encode(rf.currentTerm)
e.Encode(rf.votedFor)
e.Encode(rf.logs)
data := w.Bytes()
rf.persister.SaveRaftState(data)
}

//
// restore previously persisted state.
//
func (rf *Raft) readPersist(data []byte) {
if data == nil || len(data) < 1 { // bootstrap without any state?
return
}
r := bytes.NewBuffer(data)
d := labgob.NewDecoder(r)
d.Decode(&rf.currentTerm)
d.Decode(&rf.votedFor)
d.Decode(&rf.logs)
}`
至于Persist调用的地方只要完成了前面两个实现，添加也很简单，这里就不再贴代码了。

小结

在调测试的时候其实是很懵的，需要仔细看看测试代里面的内容，然后在调试的时候带上时间戳以及当前节点的信息，这样看起来就会容易许多。实现部分的代码其实没有多少，最精华的部分应该是这部分的测试代码，从模拟分区再到split over，再到节点的网络失效，有兴趣的可以仔细看下实现。

reference
https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf
https://zhuanlan.zhihu.com/p/27207160
https://github.com/kophy/6.824/blob/master/src/raft/raft.go

存储技术基础论文阅读

In-Storage Processing n Cognitive SSD

A Deep Learning Engine for In-Storage Data Retrieval
DeepStore: In-Storage Acceleration for Intelligent Queries
Insider: Designing In-Storage Computing System for Emerging High-Performance Drive
GraphSSD: Graph Semantics Aware SSD
FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs
Biscuit: a framework for near-data processing of big data workloads
GraFboost: using accelerated flash storage for external graph analytics
Query Processing on Smart SSDs: Opportunities and Challenges
Willow: A Userprogrammable SSD

Solid-State Drives (Open-Channel, Commercial SSD)

Extending the lifetime of flash-based storage through reducing write amplification from file systems
SDF: software-defined flash for web-scale internet storage systems n ParaFS
A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices
LightNVM: The Linux Open-Channel SSD Subsystem
FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs
OCStore: Accelerating Distributed Object Storage with OpenChannel SSDs
Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND
SSDs An efﬁ cient design and implementation of LSM-tree based keyvalue store on open-channel SSD
FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives Read as Needed: Building WiSER, a Flash-Optimized Search Engine

High-Performance IO Stack

Barrier-Enabled IO Stack for Flash Storage
FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for UltraLow Latency SSDs
SPDK: A development kit to build high performance storage applications
When poll is better than interrupt n KVell: the design and implementation of a fast persistent key-value store n Multi-Queue Fair Queueing
SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores
Reaping the performance of fast NVM storage with uDepot

File System & Crash Consistency

Distributed Flash n ReFlex: Remote Flash ≈ Local Flash
LeapIO: Eﬃcient and Portable Virtual NVMe Storage on ARM SoCs
CORFU: A Shared Log Design for Flash Clusters
vCorfu: A Cloud-Scale Object Store on a Shared Log n
Tango: Distributed Data Structures over a Shared Log
Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log，NSDI20

Processing in Memory n ISAAC
Processing in Memory n ISAAC
RDMA & DSM
NVM
Programmable Network
Data Reliability

Alleviating Garbage Collection Interference Through Spatial Separation in All Flash Arrays File Systems Unfit as Distributed Storage Backends: Lessons from 10 Years of Ceph Evolution LightStore: Software-defined Network-attached Key-value Drives Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases

(5) File System & Crash Consistency n ReconFS: A Reconstructable File System on Flash Storage n Scaling a file system to many cores using an operation log n TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions n Physical Disentanglement in a Container-Based File System n The Design and Implementation of a Log-Structured File System n Application Crash Consistency and Performance with CCFS n SplitFS: Reducing Software Overhead in File Systems for Persistent Memory n Performance and protection in the ZoFS user-space NVM file system n Consistency Without Ordering n Optimistic Crash Consistency

(6) Processing in Memory n ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars n PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory n LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture n Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory n Processing in memory: the terasys massively parallel PIM array n Transparent offloading and Mapping ( TOM ): Enabling Programmer- n

Transparent Near-Data Processing in GPU Systems HRL: Efficient and flexible reconfigurable logic for near-data processing PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture Equivalent-accuracy accelerated neural- network training using analogue memory

(7) RDMA & DSM n Efﬁ cient Distributed Memory Management with RDMA and Caching n FaRM: Fast Remote Memory n Latency-Tolerant Software Distributed Shared Memory n Distributed Shared Persistent Memory n Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory n Implementation and Performance of Munin n Memory Coherence in Shared Virtual Memory Systems n Software-Extended Coherent Shared Memory: Performance and Cost n Fine-grain Access Control for Distributed Shared Memory n TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

(8) NVM

n n n n

(File System + Programming model + Indexing Structure) NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Strata: A Cross Media File System A high performance file system for non-volatile main memory System Software for Persistent Memory NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories Mnemosyne: Lightweight Persistent Memory n

n n n

Recipe: Converting Concurrent DRAM Indexes to Persistent-Memory Indexes Aerie: Flexible File-system Interfaces to Storage-class Memory Software Wear Management for Persistent Memories Endurable Transient Inconsistency in ByteAddressable Persistent B+Tree

(9) Programmable Network n NetCache: Balancing Key-Value Stores with Fast In-Network Caching n NetChain: Scale-Free Sub-RTT Coordination n IncBricks: Toward In-Network Computation with an In-Network Cache n Just say NO to Paxos Overhead: Replacing Consensus with Network Ordering n KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC n Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control n Accelerating Distributed Reinforcement Learning with In-Switch Computing n Be Fast, Cheap and in Control with SwitchKV n Enabling Programmable Transport Protocols in High-Speed NICs

(10) Data Reliability (SSD 相关的，同学们也可以读一两篇经典的 HDD 相关的) n Evaluating file system reliability on solid state drives n Improving storage system reliability with proactive error prediction n Flash reliability in production: The expected and the unexpected n Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?

n A large-scale study of flash memory failures in the field n SSD Failures in Datacenters: What? When?

n "Error Characterization, Mitigation, and Recovery in Flash Memory n

Based Solid State Drives" Lessons and Actions: What We Learned from 10K SSD-Related Storage System Failures Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems A Study of SSD Reliability in Large Scale Enterprise Storage Deployments

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

ivanary / cabin Goto Github PK

cabin's Introduction

Hi there 👋

cabin's People

Stargazers

Watchers

cabin's Issues

naming

leveldb-handbook

Raft 实现(MIT6.824 Lab)

Raft 实现(MIT6.824 Lab)

Background

Leader选举

日志复制

持久化

小结

存储技术基础论文阅读

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent