mpmilano / mixt Goto Github PK

View Code? Open in Web Editor NEW

57.0 7.0 3.0 5.04 MB

Prototype Mixed-consistency transaction implementation

Home Page: https://mpmilano.github.io/MixT/

License: Apache License 2.0

C++ 89.57% Makefile 2.03% Shell 2.33% PLpgSQL 0.26% Python 0.57% PHP 5.23%

c-plus-plus transactions distributed-systems programming-language domain-specific-language

mixt's Introduction

MixT Prototype: Code

DSL for mixed-consistency transactions: website, paper

Hello! This is the prototype for MixT. This is not intended for distribution or serious end-user experience; there are some very platform-specific assumptions in this code. If you are brave or curious, welcome!

Building MixT

The transactions directory is the code's proper home. Build from there.
the pg_env.sh script will configure your environment for MixT. source pg_env.sh before attempting to build!
clang++-5.0 is the Officially Supported Compiler™; ≥g++-7.2 should also work.
64-bit Gentoo Linux is the only tested build and runtime environment for MixT.
dev-libs/libpqxx is a required dependency for building MixT with postgres (as is default)
the build does not produce a library; link the object files explicitly please.
there is no configure; just source pg_env.sh; make <target>

Using MixT

MixT's transaction compiler is implemented entirely in compile-time C++ through the use of constexpr and, yes, some templates. To write a MixT transaction, just #include mtl/mtl.hpp. There are some example transactions in the root directory; look for the TRANSACTION(...) invocations. MixT includes bindings for a postgres backend and a simple in-memory backend; use the in-memory backend if you're just trying out some transaction code. The in-memory backend is also a good thing to copy when writing your own bindings.

Interpreting errors

C++ is famously bad at giving reasonable error messages, especially when those errors involve templates. First, I must strongly recommend clang here; g++ is not quite there yet with error messages. If you are building under clang and MixT gives you an error when compiling your transaction, there are a few common cases to look for:

Always build with -ferror-limit=1 set. clang and gcc both tend to treat type errors as "pretend it's an integer and move on", which makes errors after the first one likely to be spurious or misleading.
constexpr variable 'prev' must be initialized by a constant expression indicates an exception has been thrown in constexpr code. Scroll down until you see subexpression not valid, which will tell you the exact exception the code tried to throw. This is usually enough to understand the error.
if you're getting an error in constexpr code, you can explicitly call the constexpr function that's throwing the error. This will now execute at runtime, and give you normal exception behavior.
static_assert failed: errors usually pertain to typechecking failures or flow violations. In either case, you'll get an error message directly.

Notes on compilers with MixT

This project is actively pushing the boundaries of compiler support for modern C++. Some things to watch out for:

Things clang really doesn't like:

concept-style auto, which means the mixt_method syntax only builds on g++ until clang adds support.

Things that g++ really doesn't like:

exceptions thrown from constexpr contexts unless guarded by explicit conditionals
anonymous structs used as template parameters
deeply-nested constexpr; you need a lot of RAM to build large examples under g++. My largest example took 45GB on g++-8.1

mixt's People

Contributors

Stargazers

Watchers

Forkers

apl-cornell dhownah dhonna101

mixt's Issues

Metadata explosion: we are not updating ends

This is a fairly big-deal bug, but it does go a long way towards explaining why the tracking code has been so terribly slow. Here's the two components of tracking:

linearizable writes need to explicitly list causal dependencies
causal reads need to read from stores that can satisfy causal dependencies.

We satisfy the second point with tombstones - when you encounter the tombstone for a lin. operation, you have access to all its dependencies. This is intuitive, fast, and sound; everybody wins!

We satisfy the first one with the a vector clock named "ends." This tracks a global "known throughout the world" clock, which serves to limit the number of causal operations a particular lin. write needs to track. For some reason, we've actually never been updating this clock - not even in the java prototype code. Which means the tracking set grows without bound.

actually incrementing ends is not hard; the problem is figuring out when to do so. Also it's probably masking bugs, because that's how my this project seems to go.

Eiger: interface design

Language/runtime: transactionContext association

right now, TransactionContexts are stored in the various remoteObjects that are currently undergoing the transaction. This will break if we have concurrent or nested transactions. Better idea is to modify levelCall(...) to take in transaction context along with Cache and Store. As stopgap, we could implement this by storing the TransactionContext in the Cache or Store.

Hygene: Move DeclarationScope out of TempBuilder

Hygene: unify contexts

we have two notions of context right now; one is the state of a running transaction, and the other is the manner in which we should be treating encountered handles. We just refactored the code to allow the first to be passed around during transaction execution; the second is still looked up via "magic number" in the execution Cache. It would be better if we made it a member of the transaction context

Eiger: first run

blocking on #8, #9, #7

Language/Story: why are we splitting transactions?

It seems like we should be able to accomplish the sort of transactions we're currently supporting without splitting transactions up. Let's explore exactly what benefits can come from splitting transactions.

Notes on VM setup

These aren't really associated with this repo, but I left my notepad in the department, so I'll leave them here regardless.

tc lets you introduce latency, but it'll effect all traffic going out a specific interface, which we don't really want. So what we're going to do is use a USB ethernet adapter (from the department) to add a new interface to both desktops, put these on the 192.168.1.x subnet, and then route all replication traffic over them. This should simulate serious delays in postgres replication.

change from std::string for named items to just int. Hash in the Tracker if i need to.

propagate awareness of preserve construct

blocking on #2

re-write temporary bindings (let_ifvalid and let_mutable) for new drefing paradigm

Pursuant to a conversation with ACM, i should replace the let_ifValid and let_Mutable constructs with new constructs more focused on referencing/dereferencing. let_ifValid() becomes a dereferencing bind; the bound variable mutates the remote object when assigned to, and is drefd in free_exprs and operations let_mutable() becomes a box-bind; the bound variable mutates the pointer itself, and is preserved as a raw handle in free_exprs and operations Note: we should likely change the syntax of free_expr; no need to "dref" a handle syntactically, as that behavior is now dependant on how it was bound. Note: this should be a large re-write (if done correctly). Will defer until after current context work and store implementations.

don't forget: forbidden merge function

you just thought of having a default-merge function which, in the absence of a static "merge" function, will next default to forbidding merges in general; i.e, crash the program with an error saying "divergent history for non-mergable object foo encountered."

It would be really nice if we had a way of statically preventing this, but we honestly don't other than forbidding the creation of causal objects which fail to have some merge() defined on them. Which I honestly don't think I want to do.

Robustness: expose non-boolean failure mechanism; eliminate boolean failure indicators

Endorsements

blocking on #2, #12

Language/runtime: HandleCaching review

there's something fishy about the way I just fixed HandleCaching. In particular:

we'll fetch too many times, probably
we still need lots of testing here.

Utils: get better hash function for strings

Hygene: Break up Temporary.hpp

Ops: multi-host deployment (VM?)

Ends: causal metadata with native merge

I should really allow a "native" implementation of Ends which is store-specific and can take advantage of the causal store's natural merge semantics. Right now I have an efficient vector-based map for Ends, which is fine but means the "natural" merge function is the wrong merge function.

Testing: figure out parallel make

Language: operations outside of transactions

we used to support this; we don't anymore. This is a distinctly wishlist item.

Hygene: explicitly allow or forbid switching trackers

Switching trackers won't work right now (static in a couple places) but also won't throw errors. Change one of these things.

Hygene: why are EnvironmentExpressions duplicated?

When we search for environment expressions within a transaction, the search comes back with serious duplication. This isn't actually a problem now (so long as it's duplication and not phantom values), but would be if we allowed variable arity for environment expressions. Probably just got lazy with a tuple_cat somewhere, but it will be annoying to track down.

File under: TODO if we ever need variable arity.

FileStore: either implement or disable transactions

First "real" numbers

blocking on #13, #12, #9, #8, #7, #3

Ends: broken?

We need to write causal metadata alongside each causal write of an object. This is so obvious it's almost a tautology. Right now each object has exactly one metadata slot, which we plop our ends object into. This is wrong, because we might have a stale view of other replicas' timestamps, and doing this could over-write them.

Best way to fix this would be for Ends to have semantics at the datastore itself. There are probably other was to fix it too, but they seem slow.

Things were much easier when we just tracked everything by accident.

Language: fix object-field referencing syntax

blocking on #2

CrossStore: merging objects

CrossStore's design assumes all objects stored at causal locations have some native ability to be merged. This isn't enforced or exposed right now. Ideas:

assume stores support a merge() function on objects
always take from most up-to-date remote store (for the specific constraint we're optimizing)
forbid storage of objects which do not implement some Mergable interface.

SQLStore: serialization code needs cleanup

blocking on #3

Language, Eiger: support read-only/write-only causal fragments in transactions

blocking on #2, #20, #13

Language: support "else"

blocking on #2

Language/runtime: "context" enum

this should be part of a larger context object which is passed around as a first-class item and includes a pointer to the current store-specific TransactionContext(s).

Eiger: Hello World / initial configuration

Testing: expand language tests, add functionality tests

Ops: many-core deployment

blocking on #10

un-stub SQLStore new stubs

namespaces

the transactions language should be in a separate namespace from everything else because operator overloading. The whole project should be in some non-global namespace.

Ops: multi-IP deployment (local)

CrossStore: replicate commit

right now, commit is only at the client. Obviously not fault-tolerant.

Design for this was: have a bunch of lightweight workers co-located with the stores. Each of these serves the single purpose of accepting commit messages and carrying them out.

CrossStore: implement interface

this is large.

blocking on #18 and #28

is rand() random enough?

Probably should swap it out for something more random.

SQLStore: non-blob object types

Tracker: implement needed datastructures

Metadata structures: allow custom implementations

really just for Ends. If we can implement it with a datastructure whose "native" merge semantics match the behavior we want, then everyone wins - especially because I have no idea how eiger would merge the Ends structure otherwise.

Merge on eiger is unimplemented

I kid you not, and have spent an embarrassing amount of time trying to be wrong about this. It was there in COPS, but it's missing from Eiger. Eiger is a pure last-writer-wins database, which is actually a really poor match for weak consistency in general. Among other things, my demo applications really won't work under this regime.

Last-writer is deterministic based on their logical time-stamping, which also is pretty low resolution, which may admit local history inversions - I.E. I design my system so that it makes a decision if a value is over a certain threshold, but due to conflicting updates only 1 replica site is able to observe that value being over that threshold, despite the fact that no operation intends to decrease the value.

Language: disable type coercion

that thing where we can upgrade the handle's level to ro / strong or wo / causal is broken w.r.t assumptions made in transaction splitting. Probably should just delete that code, honestly.

Revisiting tracking

see email to andru, 11/4/2015, for explanation of problem.

What would be sound: if objects have vector-clock version numbers, can only drop extra causal replica once all objects observed from that replica have equal-or-older versions of those same objects at your local replica. This seems unlikely to happen.

Natural result of the read-from-extra-replicas scheme is swiftly imposing full quorum reads on the entire cluster, thus upgrading the causal store to a linearizable store.

Could always abandon tracking entirely. Then the composite schedule of causal + linearizable would not be causally consistent, even though each fragment of the schedule would be.

FileStore: nameDecoder is in-memory; cannot currently retrieve existingObject created in previous run of program.

If I create an object via FileStore, I translate the given int name into a string and store a mapping from that int to that string. When calling existingObject(int) later, I check that mapping for int and assert-fail if I can't find it. That's fine for a single run, but if we try and persist objects across runs they'll be un-nameable.

Obvious solution is to just store this mapping on the filesystem itself, and reference it from there. But FileStore is for debugging rather than actual use, which means this probably doesn't need to be fixed.

Hygene: require fewer macro-level declarations

macro simplification: multimethods.

Context: I'm retaining the Level information at the RemoteObject point, not just at the handle point, so that I can specify extra requirements on causal remoteobjects. This means I need to change the code which resolves calls to operations as multimethods. Surprising nobody, the resolution process for multimethods is really hard to follow. I've made the decision to restrict the parameters that declared operations can take in order to simplify the resolution code.

If I get time (unlikely), I should probably either re-generalize the resolution code or add further constraints to declarations in order to simplify it.

mpmilano / mixt Goto Github PK

mixt's Introduction

MixT Prototype: Code

Building MixT

Using MixT

Interpreting errors

Notes on compilers with MixT

mixt's People

Contributors

Stargazers

Watchers

Forkers

mixt's Issues

Recommend Projects

Recommend Topics

Recommend Org