asdfjkl / yahb Goto Github PK
View Code? Open in Web Editor NEWDeduplicating File-Copy/Backup Tool (Commandline)
Home Page: https://github.com/asdfjkl/yahb
License: GNU General Public License v3.0
Deduplicating File-Copy/Backup Tool (Commandline)
Home Page: https://github.com/asdfjkl/yahb
License: GNU General Public License v3.0
If by design or errouneously one calls yahb twice in the same minute, it treats the backup folder both as source for the Hardlink-similarity check and target folder. I guess it would be easy to check against that.
It would be nice, if more than one source directory can be backuped at the same time, like in rsyncbackup.vbs:
sourceFolders = Array("BITTE TRAGEN SIE DIE QUELLPFADE IM SKRIPT EIN")
Hallo,
I wanted to try yahb, but it did not really work.
Scenario: In Windows 10 a share from my Synology NAS is mounted as J:, it contains a directory named Backups. (Maybe it's important: Windows 10 runs in a virtual box).
I tried
yahb c:\Users\joerg\Documents j:\Backups /s
Output:
creating list of directories ...
ERR:c:\Users\joerg\Documents\Eigene Videos:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Videos" wurde verweigert.
ERR:c:\Users\joerg\Documents\Eigene Musik:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Musik" wurde verweigert.
ERR:c:\Users\joerg\Documents\Eigene Bilder:Der Zugriff auf den Pfad "c:\Users\joerg\Documents\Eigene Bilder" wurde verweigert.
creating list of directories ... DONE
creating list of files ...
creating list of files ... DONE
unable to identify a previous backup location, copying all
copying files: [ ] 0%
Unbehandelte Ausnahme: System.DivideByZeroException: Es wurde versucht, durch 0 (null) zu teilen.
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)
Result: The backup directory ist created as expected:
j:\Backups\202004031108\c__\Users\joerg\Documents
But this contains only the directories of the source, which are all empty. No files were copied.
Do I make a mistake here? Should I change or check something? Or is there some bug?
I hope my comment helps to improve your project,
regards and good health,
Jörg
The full error message in the output after about 68% of successful backup is below. There is enough space in the drive.
unable to create hardlink, copying instead
Unbehandelte Ausnahme: System.IO.IOException: Nicht genügend Systemressourcen, um den angeforderten Dienst auszuführen.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.__ConsoleStream.Write(Byte[] buffer, Int32 offset, Int32 count)
bei System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
bei System.IO.StreamWriter.Write(Char[] buffer, Int32 index, Int32 count)
bei System.IO.TextWriter.SyncTextWriter.WriteLine(String value)
bei System.Console.WriteLine(String value)
bei yahb.Config.addToLog(String message)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)
currently still old setting (i.e. report error only on verbose); reverse behaviour.
I guess the file should be skipped and the Error logged.
If you should fix this error I would be really thankful for a Win7-Release. (Would be a pity if I couldn't use this very nice solution...)
yahb C:\ K:\YAHB\C /vss /+log:K:\YAHB\C.log /s /x
f:*.tmp;tmp;temp
copying files: [#### ] 40% ETR: 03:18:53
Unbehandelte Ausnahme: System.UnauthorizedAccessException: Der Zugriff auf den Pfad "C:\Users\All Users\Microsoft\Diagnosis\events00.rbs" wurde verweigert.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite, Boolean checkHost)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)
There is a problem with (incremental?) Backups. if the source file is readonly flagged.
Even with admin-right an error is generated. If on remove the readonly-flag from the source file, everything is fine:
Unbehandelte Ausnahme: System.UnauthorizedAccessException: Der Zugriff auf den Pfad "w:\BACKUP\202002182353\d__\test.pdf" wurde verweigert.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
bei System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize)
bei System.IO.File.OpenFile(String path, FileAccess access, SafeFileHandle& handle)
bei System.IO.File.SetCreationTimeUtc(String path, DateTime creationTimeUtc)
bei System.IO.FileSystemInfo.set_CreationTimeUtc(DateTime value)
bei System.IO.FileSystemInfo.set_CreationTime(DateTime value)
bei yahb.CopyModule.doCopy()
bei yahb.Program.Main(String[] args)
If something goes wrong it would be helpful to get back an errorlevel <> 0.
greetings gmlltg
Thank you for this nice piece of code!
Unfortunately, after the second run of a backup, I receive this error message after some percentages:
Unfortunately no informations about which file is affected in the log file.
Can you help me with this?
Thank you very much in advance!
My drive is NTFS and I'm using Windows 10. Everything looks normal, but the space used shows no hard-links have been used. I used DU by sysinternals to check. Flag -u results in the same count and size as without flag.
I use yahb to backup results of an indefinite long algorithm. Right now it has 7GB of results in a folder, which are refined again and again and keep growing. As loading / saving needs additional time, I designed my program to carefully keep ~3 GB of the currently used results in memory.
YAHB takes ~800 MB to backup that folder. Most often this results in >4 GB total memory usage, so Windows activates the Auslagerungsdatei. Of course I could tell my program to use less memory, but that would slow it down even more.
Best would be an option to tell YAHB to use a maximum of X MB for operation, like 500 MB in this case.
I accidentally backupped to a new folder, so all files are copied. This takes 5 hours exactly. When copying all files with robocopy, it takes ~30 minutes. So copying is much slower than strictly necessary.
Idea: If it's too difficult making your algorithm more efficient you could let it do all the hardlinks and then copy the remaining files with robocopy.
There is a problem with recent Adobe Acrobat Licensing Service that makes yahb and other backup software break when trying to access the log folders of the software for backup purposes. This has been reported to Adobe and will hopefully be worked on (https://community.adobe.com/t5/illustrator-discussions/com-adobe-dunamis-folder-cannot-be-backed-up-by-backup-solutions/td-p/14559757)
However, it would probably be possible to fix this (or work around it) in yahb as well in case this happens in the future with this or other software.
Here is the output from yahb just before the crash
Unbehandelte Ausnahme: System.IO.IOException: Der Prozess kann nicht auf die Datei "C:\Users\flo\AppData\Roaming\com.adobe.dunamis\d225650b-9c1f-4738-97e3-94e805951049\v1\0" zugreifen, da sie von einem anderen Prozess verwendet wird.
bei System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
bei System.IO.FileSystemEnumerableIterator`1.CommonInit()
bei System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)
bei System.IO.Directory.EnumerateDirectories(String path)
bei yahb.CopyModule.createDirectoryList()
bei yahb.Program.Main(String[] args)
Whatever Acrobat Licensing Service does here, there is currently no catch for IOException
in createDirectoryList()
. So maybe add this and keep working on the remaining directories as is already done for some other exceptions.
Maybe if I get this right we could also try a fix with different EnumerationOptions.
Update: Turns out after getting to the point in the .NET code where the exception is thrown we do not have an option to prevent this with different EnumerationOptions
. I'll leave the analysis here anyway in case someone wants to follow the thought process.
yahb calls EnumerateDirectories()
with only one argument, the directory to start from
Line 75 in 9643599
This means that the unused parameters are filled with default options, the EnumerationOptions
being set to EnumerationOptions.Compatible
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L216
Notably EnumerationOptions.Compatible
means that IgnoreInaccessible
is set to false
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/EnumerationOptions.cs#L20-L21
We thus end up calling EnumerateDirectories()
with all available parameters internally
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L223-L224
Which then calls InternalEnumeratePaths()
defined just above the different EnumerateDirectories()
definitions
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs#L196-L214
This leads to another internal call to FileSystemEnumerableFactory.UserDirectories()
defined here
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerableFactory.cs#L128-L140
Creating a new FileSystemEnumerable
instance defined here
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L14-L38
And here is the interesting part, at the end of the constructor we create a DelegateEnumerator
which according to the source code comment ensures that we get possible IO exceptions for the target directory right at the beginning
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L35-L37
This DelegateEnumerator
creates a FileSystemEnumerator
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerable.cs#L60-L68
Ath the end of the FileSystemEnumerator
constructor it calls its method Init()
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.cs#L31-L43
Which is implemented in the Windows specific file and creates a directory handle to check for any IO exceptions
https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.Windows.cs#L48-L50
So unfortunately there is no try/catch here and changing the EnumerationOptions
won't help, we just have to catch the IOException in yahb.
Maybe custom EnumerationOptions
would help at locations where IgnoreInaccessible
is actually used?
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerator.Windows.cs#L115
I think with IgnoreInaccessible
and RecurseSubdirectories
set to true
in EnumerationOptions
we may get the full directory list with much less going back and forth between yahb's createDirectoryList()
and the .NET functions.
But certainly not related to this bug then.
A verbose-level will help to reduce the log-file size. Something like
/verbose -> all operations (as now implemented)
/verbose:1 -> only new
/verbose:2 -> only new and non existent
something like this.
greetings
This is a wish for one additional feature: I'd like a flag to determine a maximum age x. If the last backup file is older than x, then copy instead of hardlink.
The reason: I feel unsafe if an important file was written to memory only once years ago and is only hardlinked since then. I'd feel better if it's written anew once in a while. Of course this relies on the original file aging better than the old Backup copy. So addtionally it would make sense to verify the versions against each other, but I guess that is a lot more work than my proposal above.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.