dataphor / dataphor Goto Github PK

Dataphor Federated Database Management System

License: BSD 3-Clause "New" or "Revised" License

C# 97.45% CSS 0.53% TypeScript 0.01% ASP 0.08% JavaScript 0.35% C 0.87% HTML 0.32% PLpgSQL 0.13% PLSQL 0.08% Batchfile 0.03% C++ 0.11% Objective-C 0.02% TSQL 0.02%

dataphor's Introduction

Dataphor

Dataphor is an open source application development platform designed to streamline the process of designing, developing, and maintaining software applications. Dataphor is not just another set of application framework components, or yet another take on building applications using today’s common patterns, such as MVC/MVP; Dataphor re-approaches the problem from a fresh perspective, building from first principles.

See our Github for the newest source code. For more information, check out our website.

Documentation

Installation

To get started with running Dataphor as a developer, using its Dataphoria IDE, open Dataphor/Dataphor.sln in MS Visual Studio 2013 or another suitable environment, and select Dataphoria as the startup project to run.

While Dataphor is mainly used in production as a federation of SQL DBMSs such as MS SQL Server, Oracle, and Postgres, it also has its own built-in database engine called 'MemoryDevice' or 'SimpleDevice' respectively, and so you can start off using Dataphor without needing to first install any external dependencies.

Troubleshooting

If Dataphoria fails after startup with an error message citing a missing DLL or such, try using "Rebuild Solution" rather than "Build Solution" in MS Visual Studio as that is more reliable for making sure all the parts of Dataphor have been compiled and are not missing.

Additional troubleshooting help can be found in our Documentation.

dataphor's People

Contributors

Stargazers

Watchers

Forkers

ashalkhakov dshalkhakov cryptodj n8allan cybernetics supreet11agrawal barbdowns mkbiltek2019 johanduf laszlo-kiss jheggland dystudio kuangshi00 alexvoda radtek brucerennie nostradavid

dataphor's Issues

Dataphor CI and Releases

Dataphor releases currently aren't available on Github. Setting up a free-to-open-source-project CI build that publishes the results in the github release area would make it clear when new releases are available.

Use .Net Generics

Much of Dataphor's core engine code was written before .Net generic / generic lists were available. It could be refactored to use the built-in lists to reduce the amount of code in the project.

Incorrect rewrite of column extractor assignment

The following D4

var ARow : row { Price : Money } := row { nil Price };
ARow.Price := nil;

is rewritten by the compiler as:

var ARow : row { Price : Money } := row { nil Price };
var <uniqueobjectname> : generic := nil;
update ARow set { Price := <uniqueobjectname> };

If this rewrite is emitted, and then recompiled (due to catalog serialization, A/T rewrite, or client-side catalog caching, the rewritten logic will fail because the <uniqueobjectname> variable is of type generic and that assignment is invalid in general. The pattern of assignment works with the nil literal due to special-case handling.

Improving the interpreter

Below are some thoughts on improving the interpreter. Posting mainly for discussion.

Proposal 1 (on nil handling in the interpreter):

add a new characteristic to operators, IsNilPropagating, which means that if any arguments evaluate to a nil, then the operator will also inevitably evaluate to a nil (my guess is that most operators are nil-propagating)
use a new kind of exception (say NilException) for signalling if a nil is the result of evaluating a plan node; said exception will be caught in the following situations:
1. when storing a nil value to a variable (assignment, update/insert/delete statements)
2. when encountering a nil in a restrict (or the like), where it is considered same as false
3. when exiting the expression context (e.g. in a program begin Foo(); Bar() end, where Foo() is a function, a call to Foo() is evaluated and results in a nil)
4. inside if-condition, case-condition, for/while-condition, where-condition evaluation (and the like)

The upside is that a lot of repetitive conditional logic is replaced by exception handling, which should also be faster at runtime.

Proposal 2 (avoiding scalar type boxing when calling operators):

The PlanNode.Execute() function works only on boxed types. Avoiding boxing of return types and of arguments should improve performance considerably (by lowering GC stress).

introduce static non-virtual Evaluate methods
1. .NET type will be along these lines: A (Program, A1,A2,...,An) if the operator is nil-propagating, and MakeNullable<A> (Program, MakeNullable<A1>,MakeNullable<A2>,...,MakeNullable<An>) if the operator is not nil-propagating, where MakeNullable is just like Nullable if T is a value type, and is just T otherwise
for every PlanNode, keep a Type _returnType for the .NET type it's Evaluate method evaluates to (or, introduce TypedPlanNode)
at the CallNode, figure out if the operator's defining class defines a typed static Evaluate method (if it does, then the CallNode will invoke the method via a DynamicMethod)

Note that we could probably dispense with the Nodes array in the plan node for making every node typed on its arguments. For instance, IntegerPowerNode will have members _left of type TypedPlanNode<int> and _right of type TypedPlanNode<int>. There doesn't seem to be much code that goes over the Nodes array in some involved computation.

Is there a characteristic for operators which could help in deciding if Program argument is necessary for the operator to evaluate?

Obviously, some unboxing/boxing will still be necessary (e.g. when reading from/storing into variables).

Proposal 3 (avoid boxing of rows):

This one I haven't figured out yet. Currently, all rows are generically boxed (see Runtime.Data.Row and Runtime.Data.NativeRow).

generate descendants of a Row/NativeRow which implement the existing API and typed getters/setters (currently, values are persisted as objects...)
keep a cache somewhere that will ensure there is only one dynamic type for every DAE type
http://www.codeproject.com/Articles/13337/Introduction-to-Creating-Dynamic-Types-with-Reflec

I guess we could specialize some interpreter code with the exact row types?

for instance, specialize PlanNodes by the respective types of rows they can handle

Dataphor 3.0, library assembly loading while running server in-process

Greetings,

Looking through the code, I discovered that it's possible to easily extend Frontend with custom UI node types. It seems unmentioned in the docs.

In https://github.com/DBCG/Dataphor/blob/3.0/Dataphor/Frontend.Client/Session.cs#L154, there's code that downloads and registers library assemblies containing UI node definitions.
However, this code is only executed for remote connections. When running in-process, both Dataphoria and WindowsClient don't seem to register the library assemblies leading to node instatiation errors on form deserialization.

I've tried making the assembly register itself on server load, but the issue persists.

Any pointers?

catalog not transactional, at least for operators

The Dataphor catalog does not seem to be transactional, that is, "create operator" statement is not subject to an explicit transaction.

BeginTransaction();

create operator Foo() : Integer
begin
    result := 42;
end;

select Foo();

RollbackTransaction();

Running that once succeeds and outputs 42. Running it a second time dies citing that Foo() already exists. This means that the creation of operator Foo() was not undone by the transaction rollback as it should have been.

That behaviour is the same for both "session" and non-"session" creates.

Data definition should be subject to transactions.

README - Dead links

Hi!

I tried to access the Dataphor User’s Guide when I realized the link was dead.
It's a link to the .com domain when it should be .org.

Cheers!

UI feature - distinguishing in progress from done

Sometimes the system running a Dataphor GUI application can be slow. Currently the forms for a particular application can display and appear to be "done" but show no records, however the lack of records might just be because the system is still processing and getting ready to retrieve/display them. So we have a problem where we can't distinguish "done" from "in progress".

This feature request is to add some kind of indicator either to a form as a whole or to each independently loaded portion thereof (such as a record list viewer) that displays when an activity is in progress and then disappears or explicitly says done when the waiting is over. It doesn't have to show fraction of progress, just yes or no, but ideally it would be animated in some way.

Mark MSSqlDevice catalog operators as deterministic

Sql server is able to optimize query plans when they use operators that are deterministic, because it can call the operator once, and then cache the result.

The are operators in the catalog that could be marked as deterministic, but are not. We need to go through and analyze which operators could be marked as deterministic and do so.

There are examples in #42

StreamID not found: Graphic column in a detail table, embedded browse form

When accepting a derived Add form with embedded Browse form for a table that has a column of type Graphic, I get the error "StreamID {id} not found".

I've uploaded a library to reproduce this behavior in Dataphoria:

https://dl.dropboxusercontent.com/u/53672107/Abitech.Test.Image.zip

Just register it, and then please do the following:

open the FaultyForm
enter some data (in particular, add a row into Table3 with an image
hit Accept

You should see the error message.

I've tracked it down to the process being disposed of deallocating all owned streams. The temporary work-around is here (patch made against Dataphor 3.0):

https://gist.github.com/ashalkhakov/464c321645fe02602de6

Is this an acceptable work-around?

Dataphor cross-platform support

Dataphor isn't currently supported on Linux or MacOS.

D4 generic type failures

Dataphoria is having a number of failures in handling generic declared types.

For a baseline, run the following D4 code:

create session operator Foo() : table {x: Integer, y: Integer}
begin
  result := table { row { 42 x, 17 y }, row {29 x, -1 y} };
end;

select Foo();

This correctly produces 2 rows.

Next take this variant which differs only by a more generic declared return type:

create session operator Foo() : table
begin
  result := table { row { 42 x, 17 y }, row {29 x, -1 y} };
end;

select Foo();

The result of running that is Statement executed successfully, returning 0 rows which is wrong; the result should be the same 2 rows.

Next take this variant which changes the declared return type again to even be more generic:

create session operator Foo() : generic
begin
  result := table { row { 42 x, 17 y }, row {29 x, -1 y} };
end;

select Foo();

That outputs <Unknown Result Type: System.Generic> Statement executed successfully which is likewise wrong, and the result should be the same 2 rows.

Optimize column extractor assignment

A column extractor assignment such as:

var ARow : row { Price: Money } := row { nil Price };
ARow.Price := nil;

Will be rewritten by the compiler for implementation as:

var ARow : row { Price : Money } := row { nil Price };
var <uniqueobjectname> : Money := nil;
update ARow set { Price := <uniqueobjectname> };

The reason for this rewrite is unclear, and it has a fairly significant run-time cost because it declares a scope to support the variable allocation, performs a variable allocation, and then a read during the actual assignment, all of which is overhead that is clearly unnecessary in the simple compile-time literal case, and most likely unnecessary in the general case as well.

Catalog ObjectId Overflow

ObjectId in the catalog currently overflows when a server has been running for extended period of time. A server reboot fixes the issue.

Docker container for Dataphor

Dataphor doesn't have a published docker container. Having a container would make testing / dev / deployment a bit easier.

feature - more reflection tablevars or operators

Dataphor has its own reflection analogies to the SQL INFORMATION_SCHEMA, in some ways more powerful, and in other ways seemingly very lacking.

A key thing that is lacking is an analogy to the Columns() operator that can be used in concert with System.TableVars.

One can use System.TableVars in a query to get a list of tablevars for a database or Dataphor library whose name is determined at runtime, because one can just select from TableVars where Library_Name matches a string, and process the resulting tablevar names et al that are strings.

One can use Columns() with an arbitrary table-valued expression as its input, but one still has to hard-code the source expression, which is as simple as a tablevar name or table literal, in their source code and hence by compile time.

It is not currently possible to take the String name of a tablevar as read at runtime say from System.TableVars and feed it to Columns() to get details about that tablevar.

The fact that Columns() and similar functions can't do the above, or that no corresponding System tablevars exist, is a critical weakness for the ability to write arbitrary D4 code to introspect a database in a meaningful manner.

One can kludge together a solution using say System.Objects.ServerData and parsing the D4 tablevar definition source code, but this is complex and error-prone and users shouldn't have to do that for something which should be built-in.

This ticket is about requesting the missing obvious functionality like the above.

Or alternately it is a documentation request to make it more clear how the functionality already exists.

Thank you.

D4 two-arg Concat() failures

Compare the following D4 expressions:

Exhibit A: select System.TableVars add {Name.Value x} group add {Concat(",", x order by {x}) y}

Exhibit B: select System.TableVars add {Name.Value x} group add {Concat(x order by {x}, ",") y}

The first version will produce output like ,Foo,Bar,Baz, while the second version will fail to parse with the error message Application:102104 ---> ")" expected. Syntax error near ","..

Normally I would expect there to be a 2 argument form of Concat where the non-list argument specifies the delimiter between elements of the list element, but in one sense either the reverse has happened, or leading and trailing commas have also been added.

The behaviour is counter-intuitive either way. In Exhibit B, I would expect not a syntax error but rather an error citing no matching operator signature, if that is the case. In Exhibit A I would expect no leading or trailing comma.

D4 parser internal failure - whitespace sensitivity

The D4 parser as seen from Dataphoria is sensitive to whitespace differences where it shouldn't be.

create operator Foo() : Integer begin result := 42; end;

If this is executed (all on one line as given), the parser dies with an internal exception:

ArgumentOutOfRangeException --->
Index and length must refer to a location within the string.
Parameter name: length
---- Stack Trace ----
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Alphora.Dataphor.DAE.Debug.SourceUtility.CopySection(String script, LineInfo lineInfo) ...

Whereas, if a line break was added before the "end;", then this parsed and created the operator successfully (which would output 42 when invoked).

The fix is to make the parser not whitespace sensitive when it shouldn't be.

Dataphoria Ascend.Net project dependency

Dataphoria uses some controls from the now defunct Ascend project archived at:

https://archive.codeplex.com/?p=ascendnet

Need to remove this dependency.

Fix unit tests

The unit tests are using an older version of NUnit that's not supported in VS 2017

Change tracking support in SQL devices

We have implemented change tracking operators in MSSQL device following the MS SQL Server change tracking support:

CHANGETABLE_CHANGES(table,Long): table (maps to CHANGETABLE(CHANGES...) operator in SQL Server): returns some change tracking information, keyed by the clustering key of the table
CHANGE_TRACKING_MIN_VALID_VERSION(table): Long (maps to operator of the same name in SQL Server): returns the minimal version among all rows in the table
CHANGE_TRACKING_CURRENT_VERSION() (maps to operator of the same same in SQL Server); this function, when added to a table which has change tracking enabled, will yield the version of every row
(We haven't implemented CHANGETABLE(VERSION ...) at the moment.)

Refer to: https://technet.microsoft.com/en-us/library/cc280358%28v=sql.105%29.aspx

A few questions:

Is anybody interested in change tracking for other databases?
Perhaps a unified change tracking API needs to be added to D4? For instance, a new tag Storage.ChangeTrackingType could be used to emit additional DDL statements during reconciliation (for SQL Server change tracking, it is necessary to (a) enable database-wide change tracking, (b) enable change tracking for tables; in Postgres, for every tracked table, have to create insert/update/delete triggers and a tombstone)
Not sure what it means to use change tracking for different devices? Wouldn't that wreak havoc? The timestamps/versions would be incompatible, wouldn't they?

Validating row constraints in adorn node

D3.0 will only validate column constraints in adorn node. I guess this is a missing feature.

It would be highly desirable to implement row constraint validation, making it possible to validate, say, custom filters.

Dataphor nuget packages

There aren't nuget packages available for core Dataphor components which makes authoring extensions a bit more difficult since you have to have the full Dataphor build in order to do it.

Make SQLCE catalog db size a configuration

I think it should go here: https://github.com/DBCG/Dataphor/blob/master/Dataphor/DAE.Service.ConfigurationUtility/ConfigForm.cs

feature - attribute names first in row literals

Currently the only literal syntax D4 provides for row values is row { v1 name1, v2 name2, ... } while row type declarations have the format row { name1 : type1, name2 : type2, ... }.

I request a new feature such that D4 also supports having the attribute names on the left in row value literals as well, for example row { name1 : v1, name2 : v2, ... }.

Practically speaking, human readable data formats or source code representing name-value pairs are much easier to read when the names come first / on the left, especially when those may have multiple levels of nesting.

This feature would gain consistency with a majority of other languages and formats, including JSON and XML and C# and many others whose syntax for name-value pairs has the name first.

This feature would also boost internal consistency for D4 itself, such that the name comes first both in row literals and in row type declarations.

It appears that D4 already has a pattern to distinguish names on the left vs on the right, such that a colon is between the pair when it is on the left, while there is no colon when the name is on the right.

Given that, AFAIK, there isn't a lexical context where both a row literal and a type declaration could both appear, context should then tell the parser whether a row { foo : bar } is a row literal or a row type declaration. If that isn't the case, we could come up with another syntax to avoid ambiguity, but the key feature is that the name appears on the left.

Optionally, other areas of syntax that have the name on the right could also gain an alternative where it is on the left, but the row literals are the most important.

Thank you in advance.

Default SLQCE catalog db to max size, 4mb

feature - trailing commas in lists

Currently D4 is strict about the format of all comma-lists such that they may only appear between list items and may not appear before the first item or after the last item.

This is a request to relax that restriction so that any number of commas may appear within a delimited comma-list, so that they exist more to disambiguate where list items start and end and don't indicate the number of items.

For example, this should be valid syntax:

table {
  row {
    42 x,
    "hello" y,
  },
  row {
    -3 x,
    "world" y,
  },
}

A key idea here is that all list items have the same format, rather than the last one having a different format due to forced lack of a trailing comma. This makes writing code manually a lot less error prone as one can easily add or reorder list items without having to make special exceptions.

A further benefit is that code generators can have simpler logic by not having to deal with exceptions of the last list item being different, and can simply put a comma after or before every list item, period.

The rules are also more consistent with semicolon-separated lists.

Examples of common languages already supporting the wider format I indicated include C# (for some kinds of comma-lists) and Perl.