drewmccormack / ensembles Goto Github PK
View Code? Open in Web Editor NEWA synchronization framework for Core Data.
License: MIT License
A synchronization framework for Core Data.
License: MIT License
In the event integrator, it is not necessary to treat changes for an object if they are all from the local persistent store.
This is how it should work:
When a device first joins, it generates a baseline event with all of the objects in its database as inserts. This event will have a global count of 0.
When two baseline events are discovered, they need to be combined. If one is discovered to be a subset of the other, because all events that it includes are also in the other event, then it can be removed.
If neither baseline is a subset of the other, the union must be taken and put into a new baseline, with the original baselines deleted. We simply go through the inserts in each baseline, keeping one for each object, using global count to choose which one to keep, as usual.
Each baseline belongs to a baseline-history. As long as a baseline is just based on a previous baseline, plus any recent changes, it is in the same baseline history. However, if two disjoint baselines are merged with a union, the new baseline belongs to a new baseline-history.
The reason this is significant is that each event store will have to store the baseline-history id that the Core Data store is based on. If the history id of the current baseline differs from that stored in the event store, the Core Data store needs a rebuild from the new baseline. This involves applying the inserts from the baseline, and all newer change sets, to the existing Core Data store. A baseline can only ever include inserts, and a merge can only ever be a superset, so we should not need to figure out what objects in the current Core Data store need to be deleted or anything like that.
The baseline in a single baseline-history needs to be updated regularly, compacting the new events.
To find the included events, we use this blog post: http://blog.helftone.com/clear-in-the-icloud/
In short, we decide on a global count that will be the cutoff. We determine the supremum of local revisions, and that becomes the set of events included, as well as the revision numbers assigned to the baseline. The global count used for the baseline is the minimum of the global counts on each device’s most recent event.
Introduce random delays in file copy operations and have merge operations occur simultaneously, to simulate unexpected changes
If a save to the monitored store should occur during leeching, data may be lost. Saves should be monitored, and the leech process reinitiated or terminated with an error if a save to the store is observed.
A file should be added for each persistent store identifier to the cloud when leeching. It could just contain the creation date, but it is not so important what is in it.
A check should be made on launch and before a merge, as to whether the file exists for the local store. If not, data corruption has occurred, and a forced deleech should be the result.
The leeched device should be added to the cloud files at the end of the leeching process, when everything is already imported. That way, if a crash occurs before that, it will be flagged as a problem next time a sync is done, and the store automatically deleeched.
To support legacy, need to keep a flag in the store metadata to indicate whether the device check should be carried out.
The CDEEventBuilder class currently uniques global identifiers across all entities. It would be better to unique on a per-entity basis.
Eg. It is conceivable that a Tag class might have a global identifier the same as its text (e.g. Car). Another entity may have a similar approach, and a conflict on the global identifier is likely to occur.
At the moment, CDEObjectChange handles to-many relationship changes by storing sets of added and removed global ids. This would need to be generalized to include an ordering parameter.
The removed set would not need indexes, but the added would. It may be difficult to guarantee proper ordering when there are conflicts.
Add an extra to-many entity to Idiomatic that can be used to store an image for each note.
To ensure that save changes are not lost, it is important to monitor save notifications during import of a persistent store, and reimport if one saves results to the monitored store.
In the full sync tests, add conflicting updates to relationships, and ensure they all end the same. Try all 3 types of relationship
Object changes from the local device that are first in a set of changes can be ignored.
You should be able to tap a setup button and choose from a list of cloud file systems.
Because other code may have already modified values post save before Ensembles gets to access the objects.
Record a model hash in each event. When merging, check that all new events have hashes in the model versions. If there is one or more not known, the merge should fail with appropriate error code.
The ensemble should keep recording save events, but merges would require upgrading to get the new model version.
Test that any fixes are properly captured in the store mod event.
There are several places that store mod events. Some can take a while (eg import), and take up memory.
To allow intermediate saves before the event is fully built, store an id for the currently building event in the event store metadata. If a crash occurs before the event can be finished, this can be detected at next launch, and the event deleted, or selected as appropriate.
Events can be mandatory and non-mandatory. A mandatory event must complete, or the store will not be in a valid state. If a mandatory event is incomplete on startup, deleech and report error.
If a non-mandatory event is incomplete, just delete the event on launch.
Cases where incomplete events should be registered
In CDEPersistentStoreEnsemble, check on init for incomplete events
NSEnumerator was showing unexpected/buggy behaviour when used with arrays of fetched objects. The fetched arrays are probably not true NSArray objects, and perhaps do not work well with NSEnumerator.
A scan of the framework should be made to remove uses of NSEnumerator.
This should be in the CDEPersistentStoreEnsemble class.
Method could be persistentStoreEnsemble:shouldImportPersistentStoreAtURL:. Returning NO results in no import. Returning YES causes a basic migration to take place.
Manual migration could occur here if desired when returning NO.
This could be useful, though it might be better to write our own code rather than get into technical debt, if the code is not too complex.
At the moment, the whole merge fails. Should we just continue regardless?
Think we should fail, but fail with a clear error code that indicates that it couldn’t save the child context modified by the developer.
If a tag is selected when creating a new note in idiomatic, add that tag automatically to the new note.
Needs new UI to choose a cloud syncing service, and login.
It is possible for someone to reinsert an object with the same global identifier as one that has previously been deleted. Make a test to ensure that if this happens, the object does end up reinserted.
At the moment, we take the events added since the last merge, and then add to that set the events that are concurrent with those new events.
It is conceivable that this may be inadequate. It may be necessary to recursively keep adding concurrent events until the set no longer changes.
Alternatively, we could move to the approach used by clear, where we include all events that have a global count greater or equal to the smallest global count in the new set of events.
In the full sync tests, add a to-many relationship, with an update
Add a property on the ensemble to set a model configuration. Make sure this is used when accessing the model through the framework.
Cells not onscreen when the button is pressed do not show download progress. It would be necessary to add the appropriate observers when creating new cells.
We can add decorator classes to enhance cloud file systems. One option is an encrypting class. Another is a zipping/compressing class. These could even be used together.
Not really an issue, but a question.
I've coded a sample Mac app to test Ensemble, adding a target "IdiomaticMac" in current's sample project.
since it's a big change and i don't know if you may have planned already a Mac version of the test app, i thought i'd ask first before sending a pull request. It's currently working quite fine and sharing Notes with iOS version of the app just fine.
Since is just for test, i put most of the code in the app delegate instead of creating different controllers.
you can have a look at it in this branch of my fork of Ensembles
https://github.com/erndev/ensembles/tree/IdiomaticMac
Let me know if you find it interesting and i can send the pull request. Also, any changes/improvements/suggestions are appreciated.
cheers
Currently, we determine deltas for to-many notifs in the willSave notif. But objects can be changed after this in the validation and merging phases.
So instead, just store object ids for the to-many relationships in willSave, and use them to determine deltas in didSave.
The Core Data documentation states that all access to a queue-concurrent MOC should be from performBlock:, including setting parent contexts and other initialisation.
Check that this is the case over the whole framework.
Just before the CDEEventIntegrator inserts a new object, it first checks if one exists with that global identifier, and will not regenerate the object if it does. This check should also confirm that the existing object is not already deleted. If it is, a new object should be created.
I'm playing with CDEAsynchronousTaskQueue. It's great but it would be nice if it exposed some measure of progress, e.g. [tasks count]
and, maybe, -currentTaskIndex
.
The count seems simple enough. Setting the current index could be done after pulling the task from the enumerator in -startNextTask
.
If this seems at all desirable I'm happy to put together a pull request.
Should have a local cache on each device. Would locate another device using bonjour, and should go through a pairing procedure. Each device would then store a 4 digit number that is used to handshake.
The sync procedure would simply involve each device sending the files that it currently has. The other device would then send back any new files it has. This sync would not necessarily have to happen at the same time as a ensembles merge. Usually, it would be wise to do the file sync first, and after that trigger a merge.
If a crash occurs at inopportune times during a merge, it is possible that the persistent store does not represent the ensemble events.
To detect this, store a unique token at the ‘point of no return’, when the results of the merge are being committed. Remove this token after a successful merge.
The token could be stored in the metadata of the CDEEventStore. It could track all building events as a set of tokens, and offer a method to add, remove, and retrieve the tokens.
If the token is discovered to exist at launch, it indicates a crash occurred during committing, and a forced deleech should occur.
(Turns out this is because ordered properties aren't supported yet)
Write tests to ensure pre- and post-merge delegate methods are called, and that changes made in the delegate method are captured in the merge event.
The idea would be to invoke the ‘decorator’ design pattern to make a class that conforms to CDECloudFileSystem, but also wraps an existing CDECloudFileSystem, and encrypts data as it is uploaded, and decrypts as it downloads. In this way, the rest of the framework would not need to know about the encryption, but the data in the cloud would be encrypted.
The init… method of the class would take another cloud file system, as well as any keys etc it needs to encrypt/decrypt.
Include updating an object never inserted
Test double deletion of an object
Test delete when object was never inserted
Test inserting after deletion. Object should survive
If it is possible for properties to change after willSave, we will need to adapt to ensure the true changes are committed, not the preliminary ones.
One thing to consider is if the objects in the willSave notif have already had the merge policy applied. If not, they represent the actual value saved, and we should only trust final values in didSave, and just store committed values in the willSave notif.
This will require enhancements to all of the classes involved in the merge.
A cancelled merge should produce a error with a ‘cancelled’ code.
Generate a random set of data for a standard model that includes all important attribute types and relationships.
Make random changes, and test that two coupled stores end up in the same state.
The CDESaveMonitor class currently stores updated values from the contextWillSave notification in a dictionary. If the context does not completely save, and then is released, the objects will be retained unnecessarily.
To prevent this, replace the dictionary with a NSMapTable, and make sure the key (context) is weak. (At the moment it is an NSValue.)
I have this in another repository already, and will integrate it.
You can't just checkout and build right now, because this file is missing.
DBRestClient+OSX.h
It would be useful to have some high-level logging for tracing overall operations. This should not be excessive, but should indicate when the framework starts and finishes tasks.
Migrating a whole store is expensive. It takes a long time, and it uses a lot of memory if the whole store has been loaded into RAM.
To get around this, we will allow guidance from the app developer. They can add keys to the user info in the Core Data model. These will be used to
The user info keys will basically be a migration priority on entities, and a batching size, with appropriate defaults when the keys are not included.
When no batch size is given, no saves will occur. The app developer will need to make sure to order entities, and set batch saving options, to ensure that valid saves an occur.
Another similar option is to take the advice of Apple when it comes to migration. Apple recommends large migrations are broken into several smaller ones, or a few entities each. We could do the same by making entity groups (i.e. assign a tag to each entity for the migration group). Each group would be migrated together. But this may not be as granular as the batch option discussed above.
From Apple Core Data Docs:
Multiple Passes—Dealing With Large Datasets
The basic approach shown above is to have the migration manager take two models, and then iterate over the steps (mappings) provided in a mapping model to move the data from one side to the next. Because Core Data performs a "three stage" migration—where it creates all of the data first, and then relates the data in a second stage—it must maintain “association tables" (which tell it which object in the destination store is the migrated version of which object in the source store, and vice-versa). Further, because it doesn't have a means to flush the contexts it is working with, it means you'll accumulate many objects in the migration manager as the migration progresses.
In order to address this, the mapping model is given as a parameter of the migrateStoreFromURL:type:options:withMappingModel:toDestinationURL:destinationType:destinationOptions:error: call itself. What this means is that if you can segregate parts of your graph (as far as mappings are concerned) and create them in separate mapping models, you could do the following:
Get the source and destination data models
Create a migration manager with them
Find all of your mapping models, and put them into an array (in some defined order, if necessary)
Loop through the array, and call migrateStoreFromURL:type:options:withMappingModel:toDestinationURL:destinationType:destinationOptions:error: with each of the mappings
This allows you to migrate "chunks" of data at a time, while not pulling in all of the data at once.
From a "tracking/showing progress” point of view, that basically just creates another layer to work from, so you'd be able to determine percentage complete based on number of mapping models to iterate through (and then further on the number of entity mappings in a model you've already gone through).
Eg. cloudFileSystem:didDetectNewFilesAtPaths:
This method could be used directly by the user code to decide to merge, or it could be used internally by the ensemble to generate a second merge hint delegate call for the user code. The second approach would be interesting, because the ensemble has more information, and can decide whether the existing files are enough to allow a merge to proceed.
Note that iCloud has means of determining when new files arrive, as well as dropbox (http://www.dropbox.com/developers/blog/63).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.