Giter Site home page Giter Site logo

pglock's Introduction

PostgreSQL Lock Client for Go

GoDoc Build status Mentioned in Awesome Go

The PostgreSQL Lock Client for Go is a general purpose distributed locking library built for PostgreSQL. The PostgreSQL Lock Client for Go supports both fine-grained and coarse-grained locking as the lock keys can be any arbitrary string, up to a certain length. Please create issues in the GitHub repository with questions, pull request are very much welcome.

Recommended PostgreSQL version: 11 or newer

Use cases

A common use case for this lock client is: let's say you have a distributed system that needs to periodically do work on a given campaign (or a given customer, or any other object) and you want to make sure that two boxes don't work on the same campaign/customer at the same time. An easy way to fix this is to write a system that takes a lock on a customer, but fine-grained locking is a tough problem. This library attempts to simplify this locking problem on top of PostgreSQL.

Another use case is leader election. If you only want one host to be the leader, then this lock client is a great way to pick one. When the leader fails, it will fail over to another host within a customizable lease duration that you set.

Getting Started

To use the PostgreSQL Lock Client for Go, you must make it sure it is present in $GOPATH or in your vendor directory.

$ go get -u cirello.io/pglock

This package has the go.mod file to be used with Go's module system. If you need to work on this package, use go mod edit -replace=cirello.io/pglock@yourlocalcopy.

For your convenience, there is a function in the package called CreateTable that you can use to set up your table, or you may use the schema.sql file. The package level documentation comment has an example of how to use this package. Here is some example code to get you started:

package main

import (
	"log"

	"cirello.io/pglock"
)

func main() {
	db, err := sql.Open("postgres", *dsn)
	if err != nil {
		log.Fatal("cannot connect to test database server:", err)
	}
	c, err := pglock.New(db,
		pglock.WithLeaseDuration(3*time.Second),
		pglock.WithHeartbeatFrequency(1*time.Second),
	)
	if err != nil {
		log.Fatal("cannot create lock client:", err)
	}
	if err := c.CreateTable(); err != nil {
		log.Fatal("cannot create table:", err)
	}
	l, err := c.Acquire("lock-name")
	if err != nil {
		log.Fatal("unexpected error while acquiring 1st lock:", err)
	}
	defer l.Close()
	// execute the logic
}

Selected Features

Send Automatic Heartbeats

When you create the lock client, you can specify WithHeartbeatFrequency(time.Duration) like in the above example, and it will spawn a background goroutine that continually updates the record version number on your locks to prevent them from expiring (it does this by calling the SendHeartbeat() method in the lock client.) This will ensure that as long as your application is running, your locks will not expire until you call Release() or lockItem.Close()

Read the data in a lock without acquiring it

You can read the data in the lock without acquiring it. Here's how:

lock, err := lockClient.Get("kirk");

Logic to avoid problems with clock skew

The lock client never stores absolute times in PostgreSQL. The way locks are expired is that a call to tryAcquire reads in the current lock, checks the record version number of the lock and starts a timer. If the lock still has the same after the lease duration time has passed, the client will determine that the lock is stale and expire it.

What this means is that, even if two different machines disagree about what time it is, they will still avoid clobbering each other's locks.

Go Version Compatibility Promise

This package follows the same guidance as the Go's:

Each major Go release is supported until there are two newer major releases. For example, Go 1.5 was supported until the Go 1.7 release, and Go 1.6 was supported until the Go 1.8 release. We fix critical problems, including critical security problems, in supported releases as needed by issuing minor revisions (for example, Go 1.6.1, Go 1.6.2, and so on).

pglock's People

Contributors

alexeykazakov avatar dependabot-preview[bot] avatar dependabot[bot] avatar ucirello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pglock's Issues

Don't log missed heartbeat as error when context is canceled

We are encountering the following error message in our logs:

heartbeat missed: cannot send heartbeat (kuma-cp-lock): context canceled

This error appears intermittently and, based on our understanding, should not be classified as an error. We believe it's safe to suppress the logging of this message.

We are considering contributing a change to the codebase that would prevent these messages from being logged as errors. We would like to hear your thoughts on this approach and any potential considerations before proceeding.

`TestFailIfLocked` doesn't finish when used with `pgxpool`

I'm not sure if this is about pgxpool or how the context timeouts are handled:

https://github.com/cirello-io/pglock/blob/v1.14.1/client.go#L269

	if actualRVN != rvn {
		l.recordVersionNumber = actualRVN
		return ErrNotAcquired
	}

On the return case here, the transaction is not immediately rolled back. It waits for the ctx (context.WithTimeout(ctx, l.leaseDuration)) to expire (or the function to exit and then defer cancel() runs) but somehow using pgxpool this doesn't happen.

Adding a simple defer tx.Rollback() after BeginTx (maybe here) seems to remedy the issue.

I'm not sure if you're big on supporting connections opened with pglock.UnsafeNew, but adding explicit tx.Rollback() after every BeginTx (3 store* methods -- rather than trusting the context cancellation) might be a good idea.

Lock not marked as released when heartbeat cannot be sent

Hello,

I have a situation where a lock isn't marked as released when the heartbeat cannot be sent.

The situation arrives when 2 locks are trying to be get and the database connection is lost (postgresql is restarted in this case).
As we can see, the heartbeat cannot be sent for "lock1", which is normal. "lock2" is now getting the lock. But "lock1" isn't marked as released or anything that tells "there is a problem here".

I have this code as an example:

lock1, err := c.Acquire("test")
if err != nil {
	logger.Fatal(err)
}

go func() {
	for {
		logger.Infof("lock1 isReleased %v", lock1.IsReleased())
		time.Sleep(time.Second)
	}
}()

lock2, err := c.Acquire("test")
if err != nil {
	logger.Fatal(err)
}

go func() {
	for {
		logger.Infof("lock2 isReleased %v", lock2.IsReleased())
		time.Sleep(time.Second)
	}
}()

This code will output this:

DEBU[0000] storeAcquire in test 1868 [] 0               
DEBU[0000] storeAcquire out test 1868 [] 1868           
DEBU[0000] heartbeat started test                       
INFO[0000] lock1 isReleased false                       
DEBU[0000] storeAcquire in test 1869 [] 0               
DEBU[0000] storeAcquire out test 1869 [] 1868           
DEBU[0000] not acquired, wait: 3s                       
INFO[0001] lock1 isReleased false                       
INFO[0002] lock1 isReleased false                       
INFO[0003] lock1 isReleased false                       
DEBU[0003] storeAcquire in test 1873 [] 1868            
DEBU[0003] storeAcquire out test 1873 [] 1872           
DEBU[0003] not acquired, wait: 3s                       
DEBU[0003] heartbeat missed cannot send heartbeat (test): failed to connect to `host=localhost user=postgres database=postgres`: server error (FATAL: the database system is shutting down (SQLSTATE 57P03)) 
DEBU[0003] heartbeat stopped test                       
INFO[0004] lock1 isReleased false                       
INFO[0005] lock1 isReleased false                       
INFO[0006] lock1 isReleased false                       
DEBU[0006] storeAcquire in test 1874 [] 1872            
DEBU[0006] storeAcquire out test 1874 [] 1874    
DEBU[0006] heartbeat started test                       
INFO[0006] lock2 isReleased false      
INFO[0007] lock1 isReleased false                       
INFO[0007] lock2 isReleased false                       
INFO[0008] lock1 isReleased false                       
INFO[0008] lock2 isReleased false                       
INFO[0009] lock1 isReleased false                       
INFO[0009] lock2 isReleased false                       
INFO[0010] lock1 isReleased false                       
INFO[0010] lock2 isReleased false                       
INFO[0011] lock1 isReleased false                       
INFO[0011] lock2 isReleased false                       
INFO[0012] lock1 isReleased false 

Can you help me by telling me what is wrong with this approach please ? Or how to get the information that the lock has been "probably released because heartbeat cannot be sent" ?

Thanks in advance !

relation "locks" does not exist

Hello!

I stumbled upon this library and would like to use it but find no description about how to setup the PSQL and I see nothing about what this mirrors and where or a documentation or anything. :)

So now, when I'm trying to use it, I'm getting this error:

pq: relation "locks" does not exist

I already figured out that it needs a locks_rvn sequence, but I have no idea what kind of locks relation it requires.

Cheers.

Introduce configurable logging levels

Hi all!

Some of the logs in this library are quite noisy in our system (e.g. storeAcquire in/out was logged 1.8M times last week). Is this normal or out of the ordinary? If normal, would you be open for a contribution to introduce logging levels to the logger package?

Best regards

Question: dealing with old locks

Hello,

I have an issue with dealing with obsolete locks. My situation simple, and is the following:

Context

I have a long process that uses pglock to restrict the access to some API endpoint until it's done. The service have 3 instances which is why I needed a distributed lock. That lock is only used to reject requests to schedule some works.

Problem

Now, the service that was running the long process fails and crash. The lock is still in the database.
If I attempt to acquire the lock, it fails because the rvn doesn't match.
If I attempt to release it, it works but I see two issues with that.

  1. I expected the system to automatically detect and internally deal with obsolete locks, such as trying to acquire an obsolete lock doesn't result in an error
  2. I can't even clean it manually by releasing the lock because the error covers two cases: locked and obsolete

I'm starting to think about using the data field to store things like the heartbeat frequency + current date, and have a goroutine in my long work to update the data periodically but it feels like that should be happening internally

So the question is the following: How to deal with obsolete keys ? I can't just try acquire with fail on lock because that would not help distinguish between a legit lock, and an obsolete one. And I can't just blindly release that lock after the error because it could be legit.

Last thoughts

If those are behaviors do not exist yet and that are welcome for the enhancement of the library I might give it a try in a PR. Otherwise if there is no existing solution I suppose I would have to write a wrapper.

Thanks for your attention,

Edit: pressed send early by mistake: finishing the message
Edit 2: done

Support for `pgxpool`?

Hi,

Great project. I was wondering if there will be support for pgxpool.Pool or other postgres "connectors" in the future. Having *sqldb.DB as the only acceptable form of database really blocks things in that regard, where the codebase really doesn't need sql.DB specifically. It also goes against Go's "Accept interfaces, return structs" proverb.

CreateTable should be idempotent

At the moment there is no way to find out if locks table exists or not, so it would be nice that this method actually first checks for existence of the table and then create it if not there. So it should not trigger an error if the table already exists.

If this is not acceptable for any reason then it would be nice to introduce a new method which checks for existence of the locks table. This way we could avoid errors in the logs even if we ignore it at CreateTable. Namely, if you turn on pgx logging, there will always be an error reported by pgx.

I could contribute this, but please let me know do you want me to modify CreateTable to make it idempotent or create a new method, let's say TableExists or something like that.

Cancelled heartbeat causes release failures

Here in Do() func we are doing this:

defer l.Close()
defer l.heartbeatCancel()

Due to LIFO, we would cancel heartbeat before releasing the lock. If the lock is heartbeating when ctx is cancelled, the lock might be marked as as Released here. This will cause that when we call l.Close() shortly after, we will hit this branch and skip actually releasing the lock.

My proposal is to revert the ordering of these 2 defer operations like this so that we release the lock before cancelling the heartbeats.

defer l.heartbeatCancel()
defer l.Close()

Thoughts?

Heartbeat not retried in case of errors

As of today the storeHeartbeat method is wrapped inside a retry mechanism. However, any kind of error happening during the storeHeartbeat is going to immediately flag the lock as released, skipping all the subsequent retries.

I think that in general the lock shouldn't be released in case of errors so that the heartbeat can be retried. This is useful for all transient errors like network errors or serialization errors from postgres.

I think it's safe to keep releasing the lock when the affected rows count is zero (here).

Hearbeat not working when using AcquireContext

Hello,

I think that when I try to AcquireContext with a small context timeout (~10s) and the lock have to be held for at leat 1m. This is because the hearbeat goroutine uses the context from AcquireContext thus leading to the context to expire fast. I think it will be better to use a context.Background() for the hearbeat goroutine. @ucirello what do you think?

SQLSTATE 400001 error on multiple boxes trying to acquire the same named lock

We are trying to use this library to implements distributed locking across multiple instances of an application. When 2 or more of these instances try to acquire a lock with the same name, we are sometimes seeing the following error:

ERROR: could not serialize access due to concurrent update (SQLSTATE 40001)

We are using postgres 11.11 and pgx.

When acquiring the lock, we use:

		lock, err := client.Acquire("lockName")

The client is being initialized as follows:

		lockdb, err := sql.Open("pgx", connectionString)
                ...
		client, err = pglock.UnsafeNew(lockdb, pglock.WithOwner(instanceName))

We have other locks with different names that could be acquired on other threads on the same machines.

Thoughts on Supporting GetAllLocksMetadata

The data column and the feature to fetch it without acquiring the lock is pretty useful to be used as a persistence layer to serve health check queries. However there is no "Get All" feature here, so if a heath check agent wants to get health status of all lock holders, they need to query DB one by one which is less efficient.

Things like this will be useful:

// LockMetadata wraps metadata of a lock.
type LockMetadata struct {
  Name     string
  Owner    string
  Data     []byte
}

// GetAllLocksDataLock fetches all LockMetadata from the table without trying to hold the lock.
func (c *Client) GetAllLocksMetadata(ctx context.Context) ([]LockMetadata, error) {
   // ...
   // SELECT name, owner, data FROM locks;
}

Thoughts?

ERROR: could not serialize access due to XXX (SQLSTATE 40001)

Hello @ucirello ,

I recently had those kind of errors:

  • ERROR: could not serialize access due to concurrent update (SQLSTATE 40001)
  • ERROR: could not serialize access due to read/write dependencies among transactions (SQLSTATE 40001)

This is failing on the begin transaction part but don't know why...
But I succeed by creating a repro-case.

Here is the code for the repro-case:

package main

import (
	"fmt"
	"strings"

	"cirello.io/pglock"
	"gorm.io/driver/postgres"
	"gorm.io/gorm"
)

func main() {
	// Connect to database
	gormDb, err := gorm.Open(postgres.Open("host=localhost port=5432 user=postgres dbname=postgres password=postgres sslmode=disable"))
	// Check if error exists
	if err != nil {
		panic(err)
	}

	// Initialize pg lock
	// Get sql db
	sqlDb, err := gormDb.DB()
	// Check if error exists
	if err != nil {
		panic(err)
	}
	// Create pglock client
	c, err := pglock.UnsafeNew(sqlDb)
	// Check if error exists
	if err != nil {
		panic(err)
	}

	// Now create a lot of go routine with a lock acquire inside
	for i := 0; i < 99; i++ {
		go func() {
			for {
				lock, err := c.Acquire("lock-name")
				if err != nil {
					if strings.Contains(err.Error(), "could not serialize access due to") {
						// Debug here
						fmt.Println("here")
					}

					panic(err)
				}

				fmt.Println("got the lock !")

				err = lock.Close()
				if err != nil {
					panic(err)
				}
			}
		}()
	}
	<-make(chan bool, 1)
	return
}

With this go mod main modules:

require (
	cirello.io/pglock v1.8.1-0.20211117154543-39de3558537f
	gorm.io/driver/postgres v1.3.1
	gorm.io/gorm v1.23.3
)

PG server is running on version 13.6.

Sometimes, it is long to appear, sometimes not.

This is relating to #26 I think.

Regards,

Oxyno-zeta

EDIT: Added the PG server version

Errors while acquiring/releasing locks

2023-05-14 03:00:11 UTC:10.0.13.153(59140):[25609]:ERROR:  could not serialize access due to read/write dependencies among transactions
2023-05-14 03:00:11 UTC:10.0.13.153(59140):[25609]:DETAIL:  Reason code: Canceled on identification as a pivot, during conflict out checking.
2023-05-14 03:00:11 UTC:10.0.13.153(59140):[25609]:HINT:  The transaction might succeed if retried.
2023-05-14 03:00:11 UTC:10.0.13.153(59140):[25609]:STATEMENT:  
			DELETE FROM
				locks
			WHERE
				"name" = $1
				AND "record_version_number" IS NULL
2023-05-14 03:00:24 UTC:10.0.20.56(36742):[25802]:ERROR:  could not serialize access due to read/write dependencies among transactions
2023-05-14 03:00:24 UTC:10.0.20.56(36742):[25802]:DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-05-14 03:00:24 UTC:10.0.20.56(36742):[25802]:HINT:  The transaction might succeed if retried.
2023-05-14 03:00:24 UTC:10.0.20.56(36742):[25802]:STATEMENT:  
			UPDATE
				locks
			SET
				"record_version_number" = NULL
			WHERE
				"name" = $1
				AND "record_version_number" = $2
		
2023-05-14 03:00:24 UTC:10.0.13.153(33676):[23613]:ERROR:  could not serialize access due to read/write dependencies among transactions
2023-05-14 03:00:24 UTC:10.0.13.153(33676):[23613]:DETAIL:  Reason code: Canceled on conflict out to pivot 1014538, during read.
2023-05-14 03:00:24 UTC:10.0.13.153(33676):[23613]:HINT:  The transaction might succeed if retried.
2023-05-14 03:00:24 UTC:10.0.13.153(33676):[23613]:STATEMENT:  SELECT "record_version_number", "data", "owner" FROM locks WHERE name = $1 FOR UPDATE
2023-05-14 03:01:21 UTC:10.0.20.56(53760):[23615]:ERROR:  could not serialize access due to read/write dependencies among transactions
2023-05-14 03:01:21 UTC:10.0.20.56(53760):[23615]:DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-05-14 03:01:21 UTC:10.0.20.56(53760):[23615]:HINT:  The transaction might succeed if retried.
2023-05-14 03:01:21 UTC:10.0.20.56(53760):[23615]:STATEMENT:  
			UPDATE
				locks
			SET
				"record_version_number" = NULL
			WHERE
				"name" = $1
				AND "record_version_number" = $2

Looking for some help with this issue. Using 1.11 version of pglock with Postgres 14.7

Add Table schema in README

Thanks for implementing this module!
It would be nice to have the Table schema required to use this module in the README. Not all of us can use the CreateTable API and we don't have to look through the source code to get the table schema.

Create Sequence Fails

Setting up the migration for pglocks but my db rejects the migration with ERROR: sequence must have same owner as table it is linked to (SQLSTATE 55000)

My migration

-- +goose Up
CREATE TABLE dif_kin_locks (
	name CHARACTER VARYING(255) PRIMARY KEY,
	record_version_number BIGINT,
	data BYTEA,
	owner CHARACTER VARYING(255)
);

CREATE SEQUENCE dif_kin_locks_rvn CYCLE OWNED BY dif_kin_locks.record_version_number;

-- +goose Down
DROP TABLE fdi_ikn_locks;
DROP SEQUENCE IF EXISTS fdi_ikn_locks_rvn;

The failure happens with Postgres 15.4.
On my local machine I have 15.4 as well and can't reproduce ๐Ÿคท๐Ÿพโ€โ™‚๏ธ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.