goldmansachs / obevo Goto Github PK

Obevo is a database deployment tool that handles enterprise scale schemas and complexity

License: Apache License 2.0

Java 78.82% SQLPL 0.15% PLSQL 0.34% PLpgSQL 1.54% FreeMarker 0.36% Batchfile 0.07% Shell 0.59% PowerShell 0.12% JavaScript 0.14% Kotlin 12.16% TSQL 5.71%

java database database-migrations deployment in-memory-database sybase stored-procedures maintenance database-management database-schema

obevo's Introduction

Obevo

Database Deployment Tool for Enterprise Scale and Complexity

Deploying tables for a new application?

Or looking to improve the DB Deployment of a years-old system with hundreds (or thousands) of tables, views, stored procedures, and other objects?

Obevo has your use case covered.

Supported platforms: DB2, H2, HSQLDB, Microsoft SQL Server, MongoDB, Oracle, PostgreSQL, Redshift (from Amazon), Sybase ASE, Sybase IQ

NoSQL (MongoDB) platforms also supported!

Obevo can be used for more than just relational databases.

MongoDB support is now available and supported for MongoDB versions 4.1.x and greater (i.e. not using the deprecated "eval" API).

This is the first non-RDBMS platform supported, which shows that the object-based management pattern described here can be applicable elsewhere.

Feel free to reach out to us if you'd like to apply our core algorithm onto a new use case.

Quick Links

Getting Started
Documentation
Quickstart Examples (Setup a new database or Onboard an existing database)
NY JavaSIG Presentation - Slides with animations (preferred)
NY JavaSIG Presentation - PDF
InfoQ Publication - Introducing Obevo: Get Your Database SDLC under Control

APIs	Build Integration
Java API	Maven
Command Line API	Gradle

Docker image is available at Docker Hub: shantstepanian/obevo

How to Contribute?

Please see the Contribution Guide.

Why Use Obevo?

Organized maintenance of all your DB object files to handle all use cases

By allowing your DB files to be maintained per DB object (as you would with classes in application code), db file maintenance becomes much easier compared to DB Deployment tools that require a new file or change definition per migration:

Changes for a particular table can be reviewed in one place
Stateless objects like stored procedures and views can be edited in place without specifying any new incremental change files
All of this is possible without having to define the complete order of your file deployments; Obevo figures it out for you (a la a Java compiler compiling classes)

Click here for more information on how Obevo works and how its algorithm compares to what most other DB Deployment tools do

In-memory and integration testing

How do you test your DDLs before deploying to production?

Obevo provides utilities and build plugins to clean and rebuild your databases so that you can integrate that step into your integration testing lifecycle.

Obevo can take that a step further by converting your DB table and view code into an in-memory database compatible format that you can use in your tests. The conversion is done at runtime, so you do not have to maintain separate DDLs just for in-memory testing

Easy onboarding of existing systems

Hesitant about getting your existing database schema under SDLC control due to how objects your application has built up over the years? Obevo has been vetted against many such cases from applications in large enterprises.

Versatile to run

Obevo can be invoked via:

Obevo is designed to allow your DB code to be packaged and deployed alongside your application binaries.

Treat your DB code like you would treat your application code!

Acquiring Obevo

obevo's People

Contributors

Stargazers

Watchers

Forkers

deepadesai vladimiratwork aravindiiitb seokjunbing beejhuff mthenu arunavnayak sh1989 kylebegovich alxtseng greggleshere siyans56 rp300 jselvam2 mqbk diffblue-benchmarks ansh1207 jgwilson42 proximamonkey zxhx genesisxyl cnxtech gopalramadugu thexdesk idouble subratcall husin123 biven vodelerk jhonmurillogiraldo benluteijn thedark123 5l1v3r1 mistarheeog saurabh-net chao-huang captainrock kingshukmondal santhoshkotte thisisronak surya10197 anndream xinyiqiao mayank-bhardwaj-404 ten-bits chandrashekharkgp sahithimaddula classicvalues stjordanis listenbehind pierrenapoletano 9inluewtuma ahsanajabbar dantegpt

obevo's Issues

Ensure that DB-specific SQLs are logged to info for better client debugging

E.g.

DB2 post deploy actions

Do a search for more such examples in the code

Add entries for pom and tests classifiers in the BOM

Fix DB2 reorg exception detection

Default work folder for command line should write to ${tempdir}/${userid}/.obevo ...

In Linux, we currently use ${tempdir}/.obevo

This causes problems if multiple users try deployments due to folder permissions

We should add userId to the temp folder path to avoid permission issues

Allow users to specify "setup schema" scripts that will be invoked during the SetupEnvironment command

Notes:

use the /usertype type to define SQLs to create these types.
Set includePlatforms to H2 / HSQL / etc to only have those run for those environments
Contribute a pull request back to Obevo to add this to the common code

See additional requirements in #140 and #142

Support Synonym object types (start w/ Oracle as a test)

Add MySQL Support

Ensure that test sources are also uploaded to Maven Central

HSQLDB reverse engineering

Add a "sync-to-source" command to clean up all objects not defined in your source code

This is meant as a convenience utility for users to clean up their existing databases of unwanted objects

Support Obevo-specific comments that won't get passed down to SQL?

Teams may want to add comments to their Obevo code that won't get passed to the underlying SQL execution

Am open to suggestions on here; ideas would be something similar to XML or Freemarker, e.g. <!-- or <#--, but maybe with a different marker

Clearer error message for when the type provided in system-config.xml doesn't exist

Allow log level to be configured at command line and document the arg

Support a disk-based merge sort (or bulk loading options) for large CSV file loads

Note - allow the JDBC batch size to be controlled

Support "create or replace" paradigm so that we don't automatically require dropping rerunnable objects

RerunnableDbChangeTypeBehavior.deploy() - we drop the object automatically.
Suggestion on how to do this (but I'm open to alternatives): We should have a flag to detect if the drop is needed. Should be on the Change object - note that this class is in the obevo-core module, so it should be named db agnostic. We'd then have to update RerunnableChangeParser to fill in that value
RerunnableChangeTypeCommandCalculator - that is where we parse the dependency tree and re-execute the dependent changes. I believe we may not need that if we have "create or replace"

Test Case (in DB2):
CREATE TABLE MYTABLE(MYCOL VARCHAR(20))
GO
INSERT INTO MYTABLE (MYCOL) VALUES ('ABCD')
GO
CREATE OR REPLACE VIEW MYVIEW AS SELECT * FROM MYTABLE
GO
CREATE OR REPLACE VIEW MYVIEW_DEPENDENT AS SELECT * FROM MYVIEW
GO

// another schema - worth trying to detect this automatically
CREATE OR REPLACE VIEW MYVIEW_DEP_DIFF_SCHEMA AS SELECT * FROM MYVIEW
GO
drop view MYVIEW
go
CREATE OR REPLACE VIEW MYVIEW AS SELECT * FROM MYTABLE
GO

// this will show some invalid views
SELECT VIEWNAME, VIEWSCHEMA, VALID FROM SYSCAT.VIEWS WHERE VIEWNAME like 'MYVIEW%'
go

// however, if AUTO_REVAL = DEFERRED, then it will get fixed when we select from it
// db2 get db cfg for MYSERVER01
select * from MYVIEW_DEPENDENT
GO
select * from MYVIEW_DEPENDENT
GO
select * from MYVIEW_DEP_DIFF_SCHEMA

Allow for scripts to select data and return resultset to console output for logging purposes

Example: to keep track of purging rows from a table that might take a while. We should think about whether to enable this by default for all cases

Consider doing this alongside #59

/* CREATE TABLE */
CREATE TABLE album(
  Id int identity,
  title VARCHAR(100),
  artist VARCHAR(100)
)

/* POPULATE TABLE in 1 minute with 100k rows */
declare @mycount int
select @mycount=1
while  (@mycount < 100000)
begin
insert into album values ('title_'+ str (@mycount) ,'artist_'+ str(@mycount))
select @mycount=@mycount+1
end

/* PURGE TABLE WHERE id < 50000 in 10 batches of 5000*/
declare @rows int
declare @RetCode int
select @rows=1, @RetCode=0
SET ROWCOUNT 5000
WHILE (@rows !=0 and @RetCode=0)
BEGIN
        delete album where Id <= 50000
        select @rows=@@rowcount, @RetCode=@@error
        select “Album Purge”, getdate(), @rows, @RetCode
END

(5000 rows affected)
(1 row affected)                                                                     
 ----------- ------------------------------- ----------- ----------- 
 Album Purge             Jul 20 2017  4:20PM        5000           0 

(1 row affected)
(5000 rows affected)
(1 row affected)                                                                    
 ----------- ------------------------------- ----------- ----------- 
 Album Purge             Jul 20 2017  4:20PM        5000           0 

(1 row affected)
(5000 rows affected)
(1 row affected)                                                                  
 ----------- ------------------------------- ----------- ----------- 
 Album Purge             Jul 20 2017  4:20PM        5000           0 

(1 row affected)
(5000 rows affected)
(1 row affected)                                    
 ----------- ------------------------------- ----------- ----------- 
 Album Purge             Jul 20 2017  4:20PM        5000           0 

(1 row affected)
(5000 rows affected)
(1 row affected)
----------- ------------------------------- ----------- ----------- 
 Album Purge             Jul 20 2017  4:20PM        5000           0

Print a clearer error message at setup time if a schema does not exist in the DB

If a deployment is attempted against a non-existent schema in a DB, we will get an NPE like below

We should instead print out a nicer error message indicating that the schema is missing (if the DBMS type doesn't support creating the schema in the first place)

	at com.gs.obevo.dbmetadata.impl.DbMetadataManagerImpl.getTableInfo(DbMetadataManagerImpl.java:259)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao.getTable(SameSchemaDeployExecutionDao.java:277)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao.isDaoInitialized(SameSchemaDeployExecutionDao.java:256)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao.getDeployExecutions(SameSchemaDeployExecutionDao.java:381)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao.access$600(SameSchemaDeployExecutionDao.java:68)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao$5.safeValueOf(SameSchemaDeployExecutionDao.java:349)
	at com.gs.obevo.db.impl.core.changeauditdao.SameSchemaDeployExecutionDao$5.safeValueOf(SameSchemaDeployExecutionDao.java:346)
	at com.gs.obevo.db.impl.platforms.AbstractSqlExecutor.executeWithinContext(AbstractSqlExecutor.java:76)

Replace obevo-internal-comparer module with Tablasco

Leverage the newly-released Tablasco project for the table comparisons and retire the redundant obevo-internal-comparer code

Set failOnSetupException to false by default and make it opt-in. Include switch in Maven API too

6.3.0 documentation updates

Add a comprehensive list of object types to the doc, e.g. trigger, package, synonym, ...

Correct the FAQ link and add items as needed

Oracle - support package body as a separate change deployment within a package

Provide a clearer error message on static data loads to signal if folks are in CSV mode

Use case:

Teams write sql statements as "insert mytable", which does not fit the current check using "insert into"

A better error message should be given for such use cases to hint that the file format expects CSV

We also may want to feature flag this in the future to only allow .csv to be loaded as CSV for new clients

Document the INACTIVE functionality and/or clarify/remove references from the kata

Capture metrics on certain operations

Search for GITHUB#6 in the code for places where we can do this

Detect changes in permission schemes and apply those changes to DB object

Currently, the permission scheme is only applied to objects when they are created. If a scheme is modified, it won't affect existing objects

It would be nice to detect changes in the scheme and to apply them on all existing objects

In addition, a "sync grants" functionality would be useful to go w/ this

MS SQL implementation does not retrieve all overloads of SPs

Search GITHUB#7 in code for examples

Update 2017-11-18: A few thoughts

sp_helptext: Will return the value for an overloaded procedure, but this is not easily queryable
sql_modules or INFORMATION_SCHEMA is the preference according to SQL Server docs; however, it does not work for overloads

select object_name(object_id), definition
from sys.sql_modules as m
join sys.procedures as p
on m.object_id = p.object_id

syscomments does work (similar to Sybase ASE); however, we'd need to update the AbstractDbMetadataDialect class and add a method like searchExtraRoutineInfo (akin to searchExtraViewInfo)

select obj.name name, com.number number, obj.type, com.texttype
--, colid2 colid2
, colid colid, text text
from dbdeploy03..syscomments com
, dbdeploy03..sysobjects obj
, dbdeploy03..sysusers sch
where com.id = obj.id
and obj.uid = sch.uid and sch.name = 'dbo'
and obj.type in ('P','FN')
and com.texttype = 0
order by com.id, number
--, colid2
, colid

Reduce log level for schemacrawler.utility.TypeMap

Sybase prints logs on this line:
schemacrawler.utility.TypeMap.(TypeMap.java:116)

Let's ignore (or consider moving schemacrawler logging to error only)

DROP_TABLE bug fixes

See H2DeployerTest
The re-deploy of step1 should not cause the DROP_TABLE to be re-deployed again

Longer explanation:
Table has 3 changes, the last one being to drop_table
//// CHANGE name=drop_table DROP_TABLE

Upon re-deploy, the entire file is being redeployed

Allow multiple post-deploy executions to be configured and enabled/disabled individually

e.g. the current Db2PostDeployAction bundles a number of steps together that can ideally be separated

Add includeDependencies attribute to TableChangeParser

It was missing from that class, which already has dependencies and excludeDependencies

Add argument to reverse engineering to allow exclusion of certain objects

Implementation:

mimic the MainDeployerArgs.changeInclusionPredicate field and add it to AquaRevengArgs
Use this in RevengWriter

Renaming of concepts in Obevo

Some concepts in Obevo should be renamed to be more compatible with how most DBMSs name things. We will have to do this in a backwards-compatible manner

e.g.
schema -> rename this to catalog, as some DBMSs have schemas within catalogs (e.g. SQL Server, Sybase)

dbServer in DbEnvironment- should be a dbName instead

6.3.0 documentation corrections

Java API page - the BOM example has the wrong artifact

Add sample "large" project to demo the capabilities of Obevo

i.e. hundreds of objects, many object types

Exclude org.postgresql:postgresql:42.0.0 and com.microsoft.sqlserver:mssql-jdbc:6.1.0.jre8 from schemacrawler deps to avoid downstream conflicts

Confirm if a blank // ROLLBACK section is allowable, and document it if so

Allow users to define certain SQLs that will always be executed on each deploy

We can do this at the environment level.

Optionally, we can define certain hooks that lets this happen at an object-level too

ROLLBACKCONTENT field for audit table in DB2 should use a datatype that allows unlimited variable length

Test Case:

Create extremely long rollback content in DB2 (potentially all deployments) and try it out in deployment

Update build.md and contributing.md files

contributing.md still has remnants from Reladomo, e.g. on the code style

build.md and changelog.md generally need review; changelog.md may not be needed

Support Reverse Engineering using Oracle SQL Developer Data Modeler

TODOs:

Install Oracle SQL Developer Data Modeler
Try reverse-engineering with it against our sample DB

obevo-maven mojos should easily be extensible by clients for their own distributions

Upgrade extra-enforcer-rules to latest version

Add MariaDB Support

Allow templatized objects to read in token parameters from the environment

e.g.
in system-config.xml

<dbEnvironment>
    <tokens>
        <token key="mytok" value="abcdefgh" />
    </tokens>
</dbEnvironment>

and in the file

//// METADATA templateParams="${mytok}"

Use cases to test out:

Being able to define a different # of objects in each environment for the template params
Reusing the same token across multiple environments

Move build to use Trusty environment in Travis

https://blog.travis-ci.com/2017-07-11-trusty-as-default-linux-is-coming?utm_source=web&utm_medium=banner&&utm_campaign=trusty-default

We switched to precise as it seemed like Trusty was not picking up Java 7 for the build - https://travis-ci.org/goldmansachs/obevo/builds/260209787

Add FAQ section incl. the error message on bad CSV files

Folks may get confused on why such errors occur if they put regular SQLs in the /data folder. Could use some clarification (or simply a fix):

[ERROR] MainDeployer [main] [7-14 13:47:20] - This change failed: Object [mytable]; ChangeName [n/a]; Type [STATICDATA]; LogicalSchema [abc]; PhysicalSchema [abc]; File [mytable.sql]
[ERROR] MainDeployer [main] [7-14 13:47:20] - From exception: java.lang.IllegalStateException: Cannot have group breaks or any breaks other than Field or DataObject - is your primary key defined correctly? com.gs.obevocomparer.compare.breaks.GroupBreak@2ee3065e
at com.gs.obevo.db.impl.core.changetypes.CsvStaticDataDeployer.parseReconChanges(CsvStaticDataDeployer.java:304)
at com.gs.obevo.db.impl.core.changetypes.CsvStaticDataDeployer.getStaticDataChangesForTable(CsvStaticDataDeployer.java:189)
at com.gs.obevo.db.impl.core.changetypes.CsvStaticDataDeployer$1.valueOf(CsvStaticDataDeployer.java:117)

Define a distribution-wide property for the default encoding. Default Obevo distribution will use UTF-8
Allow a "system" encoding as well to keep default property
Define a property in system-config.xml to specify encodings at that level
Optionally, define this at file level; exact way to specify this is yet to be determined

Test it on these example characters with static data loading tests
Ãöé
\u2013
Japanese characters

Where the first character is '\uFEFF' 65279 (i.e. the BOM byte order mark - http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html)

Ensure that dependency detection works if schema is prefixed in front of object

Example:

-- file 1 - VIEW1.sql
... some sql code ...

-- file 2 - DEPENDENT_VIEW.sql
SELECT *
FROM myschema${dbSchemaSuffix}.VIEW1 as myview

A user found a case where the dependency was not being calculated correctly (i.e. deploying VIEW1.sql first). Let's ensure that we can still deploy this