gatewayd-io / gatewayd-plugin-cache Goto Github PK

View Code? Open in Web Editor NEW

9.0 1.0 2.0 338 KB

GatewayD plugin for caching query results

Home Page: https://docs.gatewayd.io/plugins/gatewayd-plugin-cache

License: GNU Affero General Public License v3.0

Makefile 4.13% Go 95.87%

caching postgresql redis in-memory-caching gatewayd plugin postgres

gatewayd-plugin-cache's Introduction

gatewayd-plugin-cache

GatewayD plugin for caching query results.

Download · Documentation

Features

Basic caching of database responses to client queries
Invalidate cached responses by parsing incoming queries (table-based):
- DML: INSERT, UPDATE and DELETE
- Multi-statements: UNION, INTERSECT and EXCEPT
- DDL: TRUNCATE, DROP and ALTER
- WITH clause
- Multiple queries (delimited by semicolon)
Periodic cache invalidation for invalidating stale client keys
Support for setting expiry time on cached data
Support for caching responses from multiple databases on multiple servers
Detect client's chosen database from the client's startup message
Skip caching date-time related functions
Prometheus metrics for quantifying cache hits, misses, gets, sets, deletes and scans
Prometheus metrics for counting total RPC method calls
Logging
Configurable via environment variables

Build for testing

To build the plugin for development and testing, run the following command:

make build-dev

Running the above command causes the go mod tidy and go build to run for compiling and generating the plugin binary in the current directory, named gatewayd-plugin-cache.

Sentry

This plugin uses Sentry for error tracking. Sentry can be configured using the SENTRY_DSN environment variable. If SENTRY_DSN is not set, Sentry will not be used.

gatewayd-plugin-cache's People

Contributors

Stargazers

Watchers

Forkers

eabasir sinadarbouy

gatewayd-plugin-cache's Issues

Add config option for extra Redis config

Include the `gatewayd_plugin.yaml` file in the release archives

This is a prerequisite for this ticket:

gatewayd-io/gatewayd#309

As part of this ticket, these files will also be included:

README.md
LICENSE

Split cache for big values

The idea is to use multiple keys for holding very large responses, a away of partitioning the response, so that the plugin can deal with them and it doesn't break the cache server. The data should be cached in chunks and the plugins should be able to detect those chunks based on their key pattern.

Invalidate stale keys for responses in periodic invalidator

Related to #7.

Add tracing

Fix UDS path on Windows

Currently the UDS path is set to /tmp/gatewayd-plugin-cache.sock, which is not accessible in Windows, hence the plugin fails to run and exists abruptly. The current workaround is to set the path to the current directory, so that the UDS file is created relative to GatewayD, for example METRICS_UNIX_DOMAIN_SOCKET=.\gatewayd-plugin-cache.sock. This helps run the plugin, yet it won't remove the UDS file, hence the next run of the plugin will fail. The following fixes should be made to the plugin to work smoothly on Windows:

Find a suitable path for UDS file (like %temp% or similar)
Remove UDS file on cleanup
Provide a separate plugin config file for Windows, like gatewayd_plugin_windows.yaml
Have gatewayd plugin install pick up the Windows config file (for Windows builds)
Have gatewayd plugin init generate plugins config file suitable for Windows

Return errors

Currently some errors are swallowed, which makes it hard to debug the plugin sometimes.

Support multiple databases

Currently, only a single DBMS, PostgreSQL, is supported. In the future, the following should also be supported:

MySQL/MariaDB
Cassandra?
etc.

This requires work on gatewayd-io/gatewayd-plugin-sdk#6, and possibly other DBMS protocols should be added to the plugin SDK.

Move query parsing to SDK

gatewayd-io/gatewayd-plugin-sdk#8

Fix bug in handling traffic

This is the log output of GatewayD when running it against the latest changes in this PR. It seems that the OnTrafficFromClient hook returns an error, which is swallowed until the server is killed or stopped.

Note
The SDK, Go template and the plugins should be tested with the new changes in this PR to ensure compatibility.

2023-10-16T13:22:20+02:00 WRN Running GatewayD in development mode (not recommended for production)
2023-10-16T13:22:20+02:00 INF configuring client automatic mTLS plugin=gatewayd-plugin-cache
2023-10-16T13:22:20+02:00 INF Starting metrics server via HTTP over Unix domain socket endpoint=/metrics plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:20.157+0200 unixDomainSocket=/tmp/gatewayd-plugin-cache.sock
2023-10-16T13:22:20+02:00 ERR Failed to ping Redis server error="dial tcp 127.0.0.1:6379: connect: connection refused" plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:20.227+0200
2023-10-16T13:22:20+02:00 INF configuring server automatic mTLS plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:20.227+0200
2023-10-16T13:22:20+02:00 INF Registering plugin hooks name=gatewayd-plugin-cache
2023-10-16T13:22:20+02:00 INF Plugin is ready name=gatewayd-plugin-cache
2023-10-16T13:22:20+02:00 INF Started the metrics merger scheduler metricsMergerPeriod=5s startDelay=1697455345
2023-10-16T13:22:20+02:00 INF Starting plugin health check scheduler healthCheckPeriod=5s
2023-10-16T13:22:20+02:00 INF Metrics are exposed address=http://localhost:9090/metrics
2023-10-16T13:22:20+02:00 INF There are clients available in the pool count=10 name=default
2023-10-16T13:22:20+02:00 INF Started the client health check scheduler healthCheckPeriod=1m0s startDelay=2023-10-16T13:23:20+02:00
2023-10-16T13:22:20+02:00 INF GatewayD is listening address=0.0.0.0:15432
2023-10-16T13:22:20+02:00 INF Started the HTTP API address=localhost:18080
2023-10-16T13:22:20+02:00 INF Started the gRPC API address=localhost:19090 network=tcp
2023-10-16T13:22:20+02:00 INF GatewayD is running pid=1343

2023-10-16T13:22:25+02:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:25.499+0200
2023-10-16T13:22:25+02:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:25.499+0200
2023-10-16T13:22:25+02:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:25.703+0200
2023-10-16T13:22:25+02:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:25.703+0200
2023-10-16T13:22:28+02:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:28.482+0200
2023-10-16T13:22:28+02:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:28.482+0200
^C2023-10-16T13:22:30+02:00 INF Notifying the plugins that the server is shutting down
2023-10-16T13:22:30+02:00 INF GatewayD is shutting down
2023-10-16T13:22:30+02:00 INF Stopped health check scheduler
2023-10-16T13:22:30+02:00 INF Stopped metrics merger
2023-10-16T13:22:30+02:00 INF Stopped metrics server
2023-10-16T13:22:30+02:00 INF Stopping server name=default
2023-10-16T13:22:30+02:00 INF Stopped all servers
2023-10-16T13:22:30+02:00 ERR Cannot receive startup message error=EOF plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:30.321+0200
2023-10-16T13:22:30+02:00 ERR Hook returned an error error="rpc error: code = Unavailable desc = error reading from server: EOF" hookName=HOOK_NAME_ON_TRAFFIC_FROM_CLIENT priority=1000
2023-10-16T13:22:30+02:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:30.323+0200
2023-10-16T13:22:30+02:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-10-16T13:22:30.323+0200
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb77762]

goroutine 129 [running]:
github.com/gatewayd-io/gatewayd/plugin.(*Registry).Run(0xc0002341c0, {0x17d00f8, 0xc0000fe0e0}, 0xc00049ff00?, 0x1?, {0x0, 0x0, 0x0})
        /gatewayd/gatewayd/plugin/plugin_registry.go:343 +0x8a2
github.com/gatewayd-io/gatewayd/network.(*Proxy).PassThroughToServer(0xc00058d110, {0x17d4be8?, 0xc0004aa008}, 0xc00025ab50?)
        /gatewayd/gatewayd/network/proxy.go:295 +0x366
github.com/gatewayd-io/gatewayd/network.(*Server).OnTraffic.func1(0xc0006c2d00, {0x17d4be8, 0xc0004aa008}, 0xc0004aa008?, 0x0?)
        /gatewayd/gatewayd/network/server.go:280 +0xa6
created by github.com/gatewayd-io/gatewayd/network.(*Server).OnTraffic in goroutine 51
        /gatewayd/gatewayd/network/server.go:277 +0x49b
exit status 2
make: *** [Makefile:98: run] Error 1

Build for Windows (`amd64` and `arm64`)

Build on Windows is not supported yet on pg_query_v2 project 1, 2, mainly because it is not supported on the underlying libpg_query either. Thus, Windows builds will be disabled for the moment being.

Add metrics for method calls

Random cache invalidation

https://github.com/erpeng/godis-cli-bigkey
redis-cli --bigkeys
Redis SCAN
Redis BITCOUNT
Redis RANDOMKEY
The effects of Redis SCAN on performance and how KeyDB improved it

The plugin file could not be found

after executing this line
./gatewayd plugin install https://github.com/gatewayd-io/[email protected]
I get this error
2023/09/15 13:25:14 The plugin file could not be found

Periodic cache invalidation

Invalidate stale keys for clients (ip:port) #23
#24

Build for macOS (`amd64` and `arm64`)

Since the pgquery_v2 uses CGO, the macOS targets should be configured properly.

Add change detection for all changes to DB

Currently, only upsert and delete (DML) queries are checked. Create and drop tables (DDL) and possibly other queries (DCL?) should be checked for invalidating cache keys. This should also include joins, unions and other SQL and DBMS features.

Use `SCAN` instead of `KEYS` to look for cached keys

Store database info when caching data

Consider this scenario:

Client queries the list of tables from DB 1. The result is cached with query (actually, request) as key and response as value. The next time client queries the list of tables, it'll be retrieved from the cache, instead of sending the query to the database and waiting for a response. In this scenario, if there are two databases, and the second client queries the DB 2, it'll receive the cached list of tables from DB 1, which is incorrect.

This can be fixed by introducing database name and other distinguishing parameters to be included in the key.

Admin API

Add REST API for administering and controlling the behavior of the cache plugin. This can be used to invalidate all keys or select only a subset of queries to be cached.

Set expiry time on cached keys

Upsert and delete-based cache invalidation

Upsert and delete queries running through GatewayD and the cache plugin should be detected and the plugin should either invalidate those keys holding on to invalid data or update the cached values. Updating the cached values are a non-goal for now.

Continue running on startup error

Currently the plugin exits if it can't ping Redis on startup. This causes the plugin to be completely removed on GatewayD startup. This might cause disruptions if Redis is started after GatewayD or some other issues causes errors. It would be better to log the error and continue running if the env-var is False.

EXIT_ON_STARTUP_ERROR=False

Update cached values periodically

Refer to #3 (non-goal).

Investigate and fix repeated error on Windows

When running the plugin on Windows, and trying to run a test, everything works well, except when the psql/usql is disconnected. At which time, GatewayD starts printing an error message infinitely:

2023-12-04T12:50:50+01:00 ERR Cannot receive startup message error=EOF plugin=gatewayd-plugin-cache.exe timestamp=2023-12-04T12:50:50.692+0100
2023-12-04T12:50:50+01:00 ERR Cannot receive startup message error=EOF plugin=gatewayd-plugin-cache.exe timestamp=2023-12-04T12:50:50.694+0100
...

Write to cache async to prevent blocking

Writing to cache should be async to prevent the hook function from blocking GatewayD traffic.

Skip caching the result of certain functions

If the query contains date/time functions, the results should not be cached, as the cached result is immediately invalid and causes issues. Other functions must be investigated.

Support multiple caches

Add usage report

Add build workflow

Ignore errors before startup message is seen

This is the log output while the client is connecting:

2023-03-11T01:17:58+01:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-03-11T01:17:58.093+0100
2023-03-11T01:17:58+01:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-03-11T01:17:58.093+0100
2023-03-11T01:17:58+01:00 ERR Database name not found or set in cache, startup message or plugin config. Skipping cache plugin=gatewayd-plugin-cache timestamp=2023-03-11T01:17:58.095+0100
2023-03-11T01:17:58+01:00 ERR Consider setting the database name in the plugin config or disabling the plugin if you don't need it plugin=gatewayd-plugin-cache timestamp=2023-03-11T01:17:58.095+0100

Replace `RedisClient` with `eko/gocache`

Currently, only a small subset of the Redis functionality is exposed by gocache. However, the functionality needed right now and moving forward is to access the RedisClient from the go-redis library. The use cases should be investigated and the right choice should be made, as whether to replace them or keep it as is.

Enable Sentry

Build plugin with `musl` and static dependencies

Resources

Row-based cache invalidation

The current implementation of the caching stores tables as keys to invalidate the responses. This means that if any change is made to a table (or more), this will invalidate the entire cache for that table. This is inefficient, as read-heavy tables would require unnecessary caching and invalidation (aka. heavy-lifting) between the GatewayD, the cache plugin, the cache server (Redis) and the database (PostgreSQL). This means that for every update to even a single row in a table that has many cached responses in the cache server, the plugin will remove all the cached responses, effectively making subsequent queries to hit the database for a while until all those cache responses are cached.

Fallback to single database if the database cannot be extracted from the startup message

This is to prevent cache bypass if the plugin somehow can't figure out the database from the client's startup message. This should be behind a feature flag, so that the default behavior is not to cache, and if enabled, it'll treat all databases on a server (DBMS) the same, that is, one database per DBMS, which seems not very uncommon to me.

Add tests

Add support for complex queries

The current implementation of the query detection is rather naive, and doesn't consider complex queries, like UPDATE...SELECT queries and other nested queries. It would be very beneficial if the parse tree (AST) is retrieved and walked to decide which cached responses to invalidate and which not.

Implement basic caching of query responses

The plugin now correctly caches the response with the request as the key. At the moment, the request and response are saved as it, which is a base64-encoded raw binary message. This limits addition of more capabilities. This will be solved in other tickets.

Add test workflow

Lint code and add linting to test workflow

Transfer blobs more efficiently via gRPC

Base64-encoded binary for responses is very inefficient and waste both CPU and memory resources. There should probably be a better way to transfer data between GatewayD and plugins without converting it back and forth several times.

This is a multi-repo project that involves changes to GatewayD, SDK and plugins at the same time.

A select query on a table with 158k rows returns 64.59 MB of binary data to the client. The same exact binary data is base64-encoded into 304.81 MB of stored cache value, a 4.7x increase in memory usage and a lot of time lost in processing.

Progress Report

No. 1

I've made some progress in using a message that can contain JSON as well as binary message as bytes array. The JSON encoding and decoding is heavily tangled and intertwined into the code and it affects almost all the repositories, including, but not limited to, the following:

The above changes are still inconsistent and unstable and needs a lot of work to make it actually work.

No. 2

I made some more progress. Now the gatewayd process and the plugins don't crash, but messaging between the core and the plugins don't work as expected. I am trying to fix them as they appear. The code is ugly for now, but works. The moment I get feature-parity with the old message, and the responses and other binary objects are transferred in the Message.Blob[x].Value, I'll refactor and fix the ugly parts and release a new version.

No. 3

The request/response messages are correctly sent to the plugin and back to the core. The missing part is Redis. Redis cannot handle byte array, so the data has to be converted to an string representation, which again converts the data to base64-encoded string, which is not ideal. I also tried the gocache.Marshaler to use msgpack, but it also adds another conversion layer. ~~I have to either convert the bytes to string, which increases time complexity, or find another way to deal with this issue.~~ Using gocache.Marshaler with msgpack seems to be the way to go. So far, the separation of JSON from BLOB proved to be very useful, both usability- and performance-wise.

Another idea I tried to quickly find a way for this performance bottleneck is to compress/decompress the data before saving/restoring it from cache. I used Snappy, which is quite fast compared to alternatives. The result is really impressive, compared to the current storage requirements of the base64-encoded string. Compressing the base64-encoded response from the 158k-rows table resulted in 114.06 MB of data being stored in cache, which is 1.76x increase in size. However, the transferred data between the core and the plugins stay at 4.7x (or 304.81 MB).

No. 4 ✅

The obvious: protocolbuffers/protobuf#3078.

This led to me copying the struct.proto and extend it with a BytesValue. So, the resulting output is an extended struct that contains an extra field type: BytesValue. This removes a lot of copying and creating custom message with multiple fields to handle bytes, like I did in the above changes in the use-bytes-array-for-efficiency branches.

I tried the above, but there are still issues:

This works as expected, except it hangs on large blobs. That is, the transferred from DB to client and its cached value are almost equal in size (almost, as in 99%, probably due to CRC, headers or memory overhead added by Redis). On the other hand, the code is more simplified and one doesn't need to deal with multiple different variations introduced by solution no. 1. If I can fix the large blob transfer issue, this is the way to go forward and it has maximum compatibility with google.protobuf.Struct.

No. 5

The Envoy Proxy approach: https://blog.envoyproxy.io/dynamic-extensibility-and-protocol-buffers-dcd0bf0b8801.

They used a TypedStruct, which contains an arbitrary JSON serialized protocol buffer message with a URL that describes the type of the serialized message. This is very similar to google.protobuf.Any, instead of having protocol buffer binary, this employs google.protobuf.Struct as value.

This is not what I want, as it is the same thing as google.protobuf.Struct, and wouldn't support byte array.

No. 6

So far I tried running GatewayD on Debian 11 (on WSL2) and it caused a lot of challenges on Microsoft and custom kernels I compiled. I ran it again with the main (stable) branch without the changes on Debian 12 (testing) and it behaved differently. The difference in behavior was 1) faster load time, especially with cached data and 2) no intermittent unresponsiveness.