Giter Site home page Giter Site logo

azure-samples / smartbulkcopy Goto Github PK

View Code? Open in Web Editor NEW
58.0 6.0 23.0 327 KB

High-Speed Bulk Copy tool to move data from one Azure SQL / SQL Server database to another. Smartly uses logical or physical partitions to maximize speed.

License: MIT License

C# 93.28% TSQL 4.90% Dockerfile 0.23% Shell 1.60%
azure-sql-database sql-server bulk-copy azure-sql-server

smartbulkcopy's Issues

Improve TABLOCK usage

Right now TABLOCK is always used, but if the destination table has indexes, this will cause massive lock contention.
Absence of indexes should be checked before copying to make sure TABLOCK can be effectively used.
If indexes are detected user should be WARNed that best practice is to drop them before copying.

Create logical partition by size

Aside from supporting the option to specify how many logical partition should be created, also add the support for the ability to specify the size you want the logical partitions to have: the number will be automatically calculated.
For example an 800GB table with 10GB logical partitions will generate 80 partitions. This is really much better for VLDB as it will be more efficient in case connection need to be recovered after a disconnection

SmartBulkCopy was fast only initially but slowed down for last heavy tables

I am trying to move ~1TB of data from one sql server to another sql and it has many tables with 10heavy tables( more than700gb size). I set the config file with 2 threads initially and then later 8 and then 12 threads. I observed that SmartBulkCopy(SBC) ran quickly with good network speed used up the threads and later slowly degraded once most of the small to medium sized tables are finished and left with ~5 heavy tables and also network speed dropped too. Finally we end up with ~4hours to migrate ~1TB of data. Why is this happening and how to speed up the process here?
What are your recommendations to migrate heavy tables?
How can we ensure the parallel threads use up network speed?

FYI, We disabled foreign key indexes and dropped non-clustered index to move fast

Add support for computed columns

If computed columns are present, an error is generated:

2019/09/20 17:44:06.317|ERROR> Task 7: The column "IsFinalized" cannot be modified because it is either a computed column or is the result of a UNION operator.

Copying Large Tables Generates Blocking

I am using Smart Bulk Copy to copy a 900GB database from Azure SQL DB to Managed Instance. The MI has 40 vCores and the destination database has a 257GB transaction log. I am running Smart Bulk Copy with 32 tasks and a batch size of 100000. When the process started I was getting ~32 mb/s log transfer. It's now copying 2 tables that are 25GB and 276GB used. The log transfer rate is hovering around 1 mb/s. When I check blocking, the 276GB has 22 bulk copy sessions running and is waiting on ASYNC_NETWORK_IO. The 25GB table has 10 bulk copy sessions running and 1 is blocking the other 9 with LCK_M_X. Is there anything I can do to reduce blocking and speed up transfer rate?

step by step guide

Can you please put a step by step guide to migrate Azure SQL Elastic pool databases /Azure SQL VM to Azure SQL Managed Instance using SQL Bulk Copy . I am getting very confused and complex with the given doc , as i am not from a Development background.

Nuget Package

Would it be possible to create a NuGet package for this code? We want to integrate it into our existing processing server stack, so Docker is out of the question for the time being :/

Seems like a great utility and you guys have thought more about this problem I could -- I'm hesitant to copy / paste the code and not be able to receive updates if there are bugs that come up etc.

If you are open to contributions, I would be happy to personally assist in this effort

Supporting VM Identity or Azure Authentication for Always Encrypted Database scenario

Currently migrating Azure SQL Databases to Azure SQL Managed Instances fro various reason,
other technique are too slow to migrate a db.

But how can I access Always Encrypted data when only connection string are supported and cannot use the identity of the VM on whcih I'm running.

Or maybe the possibility to add an AppId / AppSecret in the config file to support that kind of scenario.

Add an option to copy identity seed

It should be possible to ask Smart Bulk Copy to update the target table so that IDENTITY will generate the same values of the source table, once the copy is finished.

Add support to readonly database option

Beside reading from a snapshot, as another option it should be possible to check that the source database is in read-only mode. If not there are could be to outcomes:

  1. stop the application
  2. set the db in read-only mode

Cannot bulk load. The bulk data stream was incorrectly specified as sorted or the data violates a uniqueness constraint imposed by the target table.

I consistently cannot get this to work on a one of my large partitioned tables. I created the target schema (partition function and scheme) by scripting the source, so its identical.

However, when running the copy, I get the following error:
Cannot bulk load. The bulk data stream was incorrectly specified as sorted or the data violates a uniqueness constraint imposed by the target table. Sort order incorrect for the following two rows: primary key of first row: (2016-11-06, 8406347, 20), primary key of second row: (2016-11-02, 8406348, 1)

Everything looks good from the INFO logs:

Source and destination tables have compatible partitioning logic. Parallel load available
Partition By: Date
Order By: TransactionId,CategoryId
Source and destination clustered rowstore index have same ordering. Enabling ORDER hint.
Parallel load will use 18 physical partition(s)
Analysis result: usePartioning=True, partitionType=Physical, orderHintType=ClusteredIndex

Any ideas ?

Use Clustered Index for logical partitioning

If a source table has a cluster index, that should be used for logical partitioning, as a much cheaper alternative to %%PhysLoc%%, that cannot be pushed into the Storage Engine

Add staging support

If target table already exists, create a copy of the table, without any data, into a dedicated staging schema (configurable via config file), with no indexes and constraints.

.Net 3.1 Core is out of support.

I am getting the below warning while running the dotnet build command.

warning NETSDK1138: The target framework 'netcoreapp3.1' is out of support and will not receive security updates in the future. Please refer to https://aka.ms/dotnet-core-support for more information about the support policy. [/root/smartbulkcopy/client/SmartBulkCopy.csproj]

The .Net versions needs to be updated with .Net 8.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.