Giter Site home page Giter Site logo

untitaker / python-atomicwrites Goto Github PK

View Code? Open in Web Editor NEW
315.0 12.0 45.0 93 KB

Powerful Python library for atomic file writes.

Home Page: https://python-atomicwrites.readthedocs.org/

License: MIT License

Python 99.41% Makefile 0.59%
python atomic concurrency filesystem posix files

python-atomicwrites's Introduction

python-atomicwrites

Unmaintained

PyPI wants me to enable 2FA just because I maintain this package, and both that and the mess resulting from a stunt of mine, I thought it'd be a good time to deprecate this package. Python 3 has os.replace and os.rename which probably do well enough of a job for most usecases.


image

image

Documentation Status

Atomic file writes.

See API documentation for more low-level interfaces.

Features that distinguish it from other similar libraries (see Alternatives and Credit):

  • Race-free assertion that the target file doesn't yet exist. This can be controlled with the overwrite parameter.
  • Windows support, although not well-tested. The MSDN resources are not very explicit about which operations are atomic. I'm basing my assumptions off a comment by Doug Cook, who appears to be a Microsoft employee:

    Question: Is MoveFileEx atomic if the existing and new files are both on the same drive?

    The simple answer is "usually, but in some cases it will silently fall-back to a non-atomic method, so don't count on it".

    The implementation of MoveFileEx looks something like this: [...]

    The problem is if the rename fails, you might end up with a CopyFile, which is definitely not atomic.

    If you really need atomic-or-nothing, you can try calling NtSetInformationFile, which is unsupported but is much more likely to be atomic.

  • Simple high-level API that wraps a very flexible class-based API.
  • Consistent error handling across platforms.

How it works

It uses a temporary file in the same directory as the given path. This ensures that the temporary file resides on the same filesystem.

The temporary file will then be atomically moved to the target location: On POSIX, it will use rename if files should be overwritten, otherwise a combination of link and unlink. On Windows, it uses MoveFileEx through stdlib's ctypes with the appropriate flags.

Note that with link and unlink, there's a timewindow where the file might be available under two entries in the filesystem: The name of the temporary file, and the name of the target file.

Also note that the permissions of the target file may change this way. In some situations a chmod can be issued without any concurrency problems, but since that is not always the case, this library doesn't do it by itself.

fsync

On POSIX, fsync is invoked on the temporary file after it is written (to flush file content and metadata), and on the parent directory after the file is moved (to flush filename).

fsync does not take care of disks' internal buffers, but there don't seem to be any standard POSIX APIs for that. On OS X, fcntl is used with F_FULLFSYNC instead of fsync for that reason.

On Windows, _commit is used, but there are no guarantees about disk internal buffers.

Alternatives and Credit

Atomicwrites is directly inspired by the following libraries (and shares a minimal amount of code):

Other alternatives to atomicwrites include:

  • sashka/atomicfile. Originally I considered using that, but at the time it was lacking a lot of features I needed (Windows support, overwrite-parameter, overriding behavior through subclassing).
  • The Boltons library collection features a class for atomic file writes, which seems to have a very similar overwrite parameter. It is lacking Windows support though.

License

Licensed under the MIT, see LICENSE.

python-atomicwrites's People

Contributors

altendky avatar cosmichorrordev avatar darwinawardwinner avatar glenwalker avatar hugovk avatar jugmac00 avatar jwilk avatar lorengordon avatar lx avatar nagesh4193 avatar ret2libc avatar sbraz avatar sruggier avatar untitaker avatar whynothugo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-atomicwrites's Issues

Unlimited recursion on Windows with Python 2.7

I bumped to the latest pytest which now needs atomicwrites... and the tests fail to run on Windows.
I am under the impression that atomicwrites is the culprit but I may be wrong?

See https://ci.appveyor.com/project/nexB/license-expression/build/248/job/smx1qy33w1dxrlh5#L531
and https://ci.appveyor.com/project/nexB/license-expression/build/248/job/jpd8c35tgee9mi7y#L532

Things work fine on POSIX and other Python versions on Windows:
https://travis-ci.org/nexB/license-expression/builds/383421178
https://ci.appveyor.com/project/nexB/license-expression/build/248

The exact versions used are here:
https://github.com/nexB/license-expression/tree/fix-symbol-comparison/thirdparty/dev

Use of prefix, suffix and bufsize for tempfile broken by #37

I was using these arguments from a subclass of atomicwrites.AtomicWriter, which broke when upgrading from atomicwrites-1.1.5 to atomicwrites-1.2.1.

I see in #38 that you preferred not to support prefix and suffix, as you hadn't seen a good usecase for changing the defaults, so in the interests of persuading you our usecase is:

  • We are using atomicwrites to update files on large scale network storage. If a process crashes or is killed the temporary file will not be cleaned up. We set prefix and suffix appropriately to ensure we can find these files later, so they can be cleaned up. This happens often enough to be important.
  • We provide an API compatible with https://docs.python.org/2/library/functions.html#open to outside teams, and so need to pass a buffering / bufsize parameter through. (At some point we'll likely move to compatibility with io.open instead, but haven't yet)

Let me know what you think - if you'd prefer to leave support for these arguments out I already have a subclass and can update it to override get_fileobject completely instead of calling the base implementation.

Support Python 3.8

Python 3.8 is out for a while, so it would be nice if atomicwrites support it.

I ran the tests locally with Python 3.8, no errors, not even a deprecation warning.

If you are ok, I'd create a pull request to add support for Python 3.8.

I'd also clean up the current version mismatches:

$ check-python-versions 
setup.py says:              2.7, 3.4, 3.5, 3.6, 3.7
- python_requires says:     2.7, 3.4, 3.5, 3.6, 3.7, 3.8
tox.ini says:               2.7, 3.4, 3.5, 3.6, 3.7, 3.8, PyPy
.travis.yml says:           2.7, 3.4, 3.5, 3.6, 3.7, PyPy

mismatch!

Also, I noticed, in appveyor you only test up to Python 3.6, and only the 32 bit executables. If you are ok, I'd add also the 64 bit versions.

Thanks for your work!

P.S.: What do you think about dropping support for Python 3.4?

Proper fsync on Windows

See #6. This blogpost covers this issue in great detail. For Windows, this is mentioned:

On Windows, using FlushFileBuffers() is probably the way to go.

Not really a confident statement, more research needed.

Choice of encoding?

The documentation says that the lib opens files in 'w' mode (which is text). I don't see any mention of the encoding, though. Not only is there no explicit default, there isn't even a documented way to choose an encoding. After a long look I finally found a trail showing that one can, in fact, pass the encoding parameter: the passthrough atomic_write(**cls_kwargs)AtomicWriter(**open_kwargs) (the latter never mentioned).

The problem with that? Well, most people will probably assume it's UTF-8, but instead it seems to be getdefaultencoding (not even sure, it's hard to find). It seems like that will help to perpetuate the problems described at https://www.python.org/dev/peps/pep-0597/#using-the-default-encoding-is-a-common-mistake
So there are 2 (3?) issues I see:

  • The default encoding is implicit.
    • It's not documented that it's implicit.
  • A way to choose an encoding is not shown.

Atomicwrites 1.4.0 not found

Atomicwrites 1.4.0 no longer found with poetry:

RuntimeError

  Unable to find installation candidates for atomicwrites (1.4.0)

  at ~\.poetry\lib\poetry\installation\chooser.py:73 in choose_for
       69│             links.append(link)
       70│ 
       71│         if not links:
       72│             raise RuntimeError(
    →  73│                 "Unable to find installation candidates for {}".format(package)
       74│             )
       75│ 
       76│         # Get the best link
       77│         chosen = max(links, key=lambda link: self._sort_key(package, link))

probably linked to #61?

os.link fallback for unsupported environments

Ran into a situation where os.link was not supported as a system call; e.g. os.link failed with a permission error. Happens in Docker with VirtualBox on OS X. This seems to be mentioned in the SetupTools os link bug report.

Python 3.7.9 (default, Sep 10 2020, 17:09:36) 

import os
os.link("/app/tmplik9i5sh.parquet", "asdf-project.parquet")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted: '/app/tmplik9i5sh.parquet' -> 'asdf-project.parquet'

`pathlib.Path` support

Would it be possible to support passing in pathlib.Path for the path atomic_write(path)?

I noticed that doing so works fine for PosixPaths, but fails for WindowsPaths from this CI run. This can result in someone breaking cross-platform support without being obvious. I notice that the test follows the same solution I went with where you just cast the path to a str.

Allowing atomic_write to accept Paths also helps atomic_write behave more like open as well which also accepts Paths.

I haven't delved too far, but the fix might be as simple as

try:
    from pathlib import Path
    if isinstance(path, Path):
        path = str(path)
except ImportError:
    # Just assume `path` is already a `str`
    pass

Some network filesystems don't support `_sync_directory`

fsync on dir will cause OSError: [Errno 22] Invalid argument when run program on VirtualBox shared dir.

  File "/home/vagrant/venv/local/lib/python2.7/site-packages/atomicwrites/__init__.py", line 152, in _open
    self.commit(f)
  File "/home/vagrant/venv/local/lib/python2.7/site-packages/atomicwrites/__init__.py", line 177, in commit
    replace_atomic(f.name, self._path)
  File "/home/vagrant/venv/local/lib/python2.7/site-packages/atomicwrites/__init__.py", line 89, in replace_atomic
    return _replace_atomic(src, dst)
  File "/home/vagrant/venv/local/lib/python2.7/site-packages/atomicwrites/__init__.py", line 46, in _replace_atomic
    _sync_directory(os.path.normpath(os.path.dirname(dst)))
  File "/home/vagrant/venv/local/lib/python2.7/site-packages/atomicwrites/__init__.py", line 40, in _sync_directory
    _proper_fsync(fd)
OSError: [Errno 22] Invalid argument

_replace_atomic not working on Windows 10

I write to a file as follows:

with AtomicWriter(data_path, mode="wb", overwrite=True).open() as f:
    np.savez_compressed(f, arr)

On linux this works, but on Windows I receive the following error:

  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\contextlib.py", line 119, in __exit__
    next(self.gen)
  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\site-packages\atomicwrites\__init__.py", line 169, in _open
    self.commit(f)
  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\site-packages\atomicwrites\__init__.py", line 202, in commit
    replace_atomic(f.name, self._path)
  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\site-packages\atomicwrites\__init__.py", line 99, in replace_atomic
    return _replace_atomic(src, dst)
  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\site-packages\atomicwrites\__init__.py", line 81, in _replace_atomic
    _windows_default_flags | _MOVEFILE_REPLACE_EXISTING
  File "C:\Users\Karlson\Anaconda3\envs\chess-tuning-tools\lib\site-packages\atomicwrites\__init__.py", line 76, in _handle_errors
    raise WinError()
PermissionError: [WinError 5] Access is denied

The script is located in the home folder and should not be run with admin privileges.

Using the following versions:

atomicwrites              1.4.0                      py_0
numpy                     1.18.1           py37h93ca92e_0
python                    3.7.6                h60c2a47_2

Clarify whether Atomicwrites supports Python 3.7

Given Python 3.7 has been released well over a year ago and is the default for new versions of most distributions (e.g. Debian 10, Ubuntu 19, Fedora 29, modern Arch, etc.), as well the official Python.org binaries and Anaconda for other platforms, it would be helpful to either run the test suite on Python 3.7 in your tox/travis config and (presuming it passes) declare compatibility in your PyPI Trove classifiers, or else open an issue listing the bug(s) blocking compatibility so you and/or the community can know what to fix (we use it with Spyder on Python 3.7 with no issues, but its possible there could be an edge case the test suite exercises that we do not). If you'd like, I can submit a PR to at least trigger the builds and see if they pass when I get a chance, unless you're aware of issues already. Thanks!

Option to disable fsync

Unsure if anybody needs this. I might need this because in one usecase I'm writing a lot of files (to different filenames), and only need a guarantee that a SIGKILL won't leave a partially written file (at the target location, tmpfiles are irrelevant).

Also this might be a problem with SSDs, as mentioned in #6

option to preserve the permissions

When I overwrite a file, the permissions of the file are changed:

$ touch foo

$ chmod 0123 foo

$ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
Access: (0123/---x-w--wx)  

$ python3
Python 3.7.3rc1 (default, Mar 13 2019, 11:01:15) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from atomicwrites import atomic_write
>>> with atomic_write('foo', overwrite=True) as f:
...  f.write('foo')
... 
3
>>> 

$ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
Access: (0600/-rw-------)  

The normal non-atomic method of overwriting a file does not change the mode:

$ touch foo

$ chmod 0777 foo

$ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
Access: (0777/-rwxrwxrwx)  

$ python3
Python 3.7.3rc1 (default, Mar 13 2019, 11:01:15) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> with open('foo', 'w') as f:
...  f.write('foo')
... 
3
>>> 

$ stat foo | sed -n 's/^\(Access.*\)Uid.*$/\1/p'
Access: (0777/-rwxrwxrwx)  

It would be nice to have an option to preserve the mode of the original file.

corruption of commas to semicolon in ORG field

If I have an ORG field like

ORG:Example\, Inc.;North West;Sales

And modify, it shows up in the editor as

Organisation : Example, Inc., North West, Sales

And gets saved as

ORG:Example;Inc.;North West;Sales

That is, "Example, Inc." became converted to "Example" with a sub org of "Inc."

Add `atomic_folder`

Sometimes I have a bunch of operations which create many files in a folder. The folder should only be there in the end of the operation. Something like this:

with atomic_folder("foo") as folder:
    unzip(something, folder)
    # "foo/" doesn't exist yet.
# now it does

Cannot use atomicwrites on filename with no directory component

When running the example in the README with atomicwrites 1.1.0:

from atomicwrites import atomic_write

with atomic_write('foo.txt', overwrite=True) as f:
    f.write('Hello world.')

I get:

Traceback (most recent call last):
  File "/Users/ryan/temp/atomictest.py", line 4, in <module>
    f.write('Hello world.')
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/site-packages/atomicwrites/__init__.py", line 152, in _open
    self.commit(f)
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/site-packages/atomicwrites/__init__.py", line 177, in commit
    replace_atomic(f.name, self._path)
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/site-packages/atomicwrites/__init__.py", line 89, in replace_atomic
    return _replace_atomic(src, dst)
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/site-packages/atomicwrites/__init__.py", line 46, in _replace_atomic
    _sync_directory(os.path.dirname(dst))
  File "/Users/ryan/.pyenv/versions/3.5.2/lib/python3.5/site-packages/atomicwrites/__init__.py", line 38, in _sync_directory
    fd = os.open(directory, 0)
FileNotFoundError: [Errno 2] No such file or directory: ''

The reason is that os.path.dirname('foo.txt') returns '' instead of '.'.

It looks like a bug was introduced here: 4c7de64#diff-530c1a5f2cffcd30669f3bb3ebbafc75R31

Atomic writes for 3rd-party libraries that only take a filename

I'm trying to understand whether I can use python-atomicwrites for ensuring that the writes I do via a 3rd-party library are atomic. Unfortunately, the 3rd-party library only takes a filename (not a file object or a file descriptor). It is my understandig that AtomicWriter is of little help in this situation: while I could pass f.name to the 3rd-party library, the library would re-open the file with a new file descriptor, so even if the two file descriptors wouldn't interfere (which I'm not sure about) the sync-mechanisms of AtomicWriter would apply to the wrong descriptor.

Without access to the file descriptor used by the 3rd-party library we cannot use os.fsync. If my understanding is correct, then os.sync would achieve the same goal, although with the potential overhead of flushing other files' caches, too (my understanding of low-level disk caching is limited, so feel free to correct me here). Therefore, something like the following would probably be the best one could hope for in this situation:

from contextlib import contextmanager
import os
from pathlib import Path
import tempfile

from atomicwrites import replace_atomic

@contextmanager
def atomic_filename(filename: Path) -> Path:
    with tempfile.NamedTemporaryFile(dir=filename.parent, delete=False) as f:
        f.close()
        temp_filename = Path(f.name)
        try:
            yield temp_filename
            os.sync()  # Force disk write
            replace_atomic(temp_filename, filename)
        except (Exception, SystemExit, KeyboardInterrupt):
            raise
        finally:
            try:
                temp_filename.unlink()
            except Exception:
                # There is nothing that we can do here
                pass

with atomic_filename(Path('foo.txt')) as fn:
    # fn is the temporary filename
    third_party_library.do_some_stuff(fn)
# temporary file has been moved atomically to 'foo.txt'

Is my understanding correct, or is there a better way to achieve this via python-atomicwrites?

(I'm targeting Linux and have little knowledge of how these things work on other platforms)

FEAT: file locking

Making file writing atomic is a great feature. Another feature that might be a good fit is locking files for e.g. appends.

Use Case: I have multiple processes that periodically write (append) to a log file "example_shared_log.txt" and need to prevent them from clobbering each other. One way to accomplish this is with fcntl:

def locked_write(fd, content):
    fcntl.flock(fd, fcntl.LOCK_EX)
    try:
        fd.write(content)
    finally:
        fcntl.flock(fd, fcntl.LOCK_UN)
        fd.close()

but that loses atomicity. A better solution would be to combine a lock on the original path with the rotation that atomicwrites uses.

If I put together a PR, would a feature like this be welcome?

Python error on rollback

I get this error when the folder is not writable:

atomicwrites/__init__.py", line 124, in _open
    self.rollback(f)
UnboundLocalError: local variable 'f' referenced before assignment

_open except block accidentally raises the inner exception

I'm talking specifically about this code:

        except:
            try:
                self.rollback(f)
            except Exception:
                pass
            raise

I managed to hit a corner case where rollback throws an exception (f was None at that point), so control flow reached the raise statement on the last line. For some reason, this raised the inner exception, not the outer one.

I can submit a patch to fix this: would you prefer a fix that uses the three-argument raise statement, which only works with Python 2, or should I add a dependency on six, and then call six.reraise, so it works with both 2 and 3?

Improve control over line endings

I noticed that the default mode is w, which in tempfile.NamedTempFile() always converts line endings based on the os.linesep. If I have a data with \n line endings, atom_write() by default converts it to \r\n on Windows. And vice-versa on Linux.

Python 3's open() introduced a new option, newlines to improve control over how line endings are handled. It's available in both Python 3 and Python 2 via io.open().

I can work around the problem by specifying the mode as w+b, but then on Python 3 I also have to make sure the string is encoded as bytes. It would be convenient if this library used io.open and propagated the newline parameter. Setting newline='' then does not perform any translation on line endings.

atomicwrites' old versions have been purged from pypi

Screenshot 2022-07-08 at 19 50 03

pypi just told me i had to enable 2fa to keep uploading this package. because I thought that was an annoying and entitled move in order to guarantee SOC2 compliance for a handful of companies (at the expense of my free time), i deleted the package and published a new version, just to see if the warning disappears. it did, so that's great.

what i didn't consider is that this would delete old versions. those are apparently now gone and yet it's apparently not possible for me to re-upload them. i don't think that's sensible behavior by pypi, but either way i'm sorry about that. the API has been the same since the first release anyway.

How do I integrate this with ZipFile?

This is how we use ZipFile.

src = "a.txt"
dst = "a.zip"

with ZipFile(dst, "w") as f:
    f.write(src)

How can I integrate atomic_write with this?
Could you provide me an example that use atomic_write to zip a single large text file?

Thank you

fsync after rename/move

On POSIX it's possible to fsync the directory, to ensure that the new file names are written to disk.
I think on Windows it's necessary to open the new (target) file and fsync it.

Use os.replace instead of os.rename for python 3.3+

I'm using this lib on Windows 10 with no issues so far, but I'm affraid os.rename, though not well documented for Windows, isn't atomic. As per the documentation:

If you want cross-platform overwriting of the destination, use replace().
https://docs.python.org/3/library/os.html#os.rename

So I suggest something along the lines of:

def _replace_atomic(src, dst):
        try:
                os.replace(src, dst)
        except:
                os.rename(src, dst)
        _sync_directory(os.path.normpath(os.path.dirname(dst)))

Ill submit a pull request and I would appreciate if you could update your pip entry. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.