Giter Site home page Giter Site logo

rtmigo / filememo_py Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 131 KB

Stores the results of expensive function calls and returns the cached result when the same inputs occur again

Home Page: https://pypi.org/project/filememo/

License: MIT License

Python 100.00%
python package memoize memoize-decorator memoization-library memoization cache function method file

filememo_py's Introduction

File-based memoization decorator. Caches the results of expensive function calls. Retains the cached results between program restarts.

CI tests are done in Python 3.8, 3.9 and 3.10 on macOS, Ubuntu and Windows.


The function can be expensive because it is slow, or uses a lot of system resources, or literally makes a request to a paid API.

The memoize decorator returns the cached result when the same function called with the same arguments. Thus, the function is expensive only once and inexpensive thereafter.

For example, the simplest cache for downloaded data can be set like this:

@memoize
def downloaded(url):
    return requests.get(url)
    
downloaded("http://example.net/aaa")  # downloads data
downloaded("http://example.net/bbb")  # downloads data
downloaded("http://example.net/aaa")  # gets data from cache   

Data is saved to the file system using pickledir. Even after the program restart, the cached results will be in place.

# gets data from cache after restart
downloaded("http://example.net/aaa")     

Install

$ pip3 install filememo

Use

from filememo import memoize

@memoize
def long_running_function(a, b, c):
    return compute()

# the following line actually computes the value only
# when the program runs for the first time. On subsequent 
# runs, the value is read from the file
x = long_running_function(1, 2, 3)

Function arguments

The results depend on both the function and its arguments. All results are cached separately.

@memoize
def that_function(a, b, c):
    return compute(a, b, c)

@memoize
def other_function(a, b):
    return compute(a, b)

# the following calls will cache three different values 
y1 = that_function(1, 2, 3)  
y2 = that_function(30, 20, 40)
y3 = other_function(1, 2)

# the way the arguments are set is also important, as is their order. 
# Therefore, the following calls are cached as three different ones
y4 = other_function(1, b=2)
y5 = other_function(a=1, b=2)
y6 = other_function(b=2, a=1)

Cache directory

If dir_path is not specified, the cached data is stored in the directory returned by the gettempdir . However, there is a high probability that the cache stored there will not survive a reboot. And even a certain probability that the system does not have a temporary directory, so the current directory will be considered temporary.

To better control the situation, you can set a specific directory for storing caches.

@memoize(dir_path='/var/tmp/myfuncs')
def function(a, b):
    return a+b
    
# it's ok if different functions share the same directory    
@memoize(dir_path='/var/tmp/myfuncs')
def other_func():
    return compute()

Expiration date

The max_age argument sets two conditions at once:

  • if the result is not yet in the cache (and we will add it now), then it will live in the cache no longer than max_age. After that it will be automatically deleted
  • if the result is already in the cache, then we only use it if its age is less than max_age. Otherwise, the function will be run again, and the result will be replaced with a new one
@memoize(max_age = datetime.timedelta(minutes=5))
def function(a, b):
    return compute()

Data version

When you specify version, all results with different versions are considered outdated.

Say you have the following function:

@memoize(version=1)
def function(a, b):
    return a + b

You changed your mind, and now the function should return the product of numbers instead of the sum. But the cache already contains the previous results with the sums. In this case, you can just change version. Previous results will not be returned.

@memoize(version=2)
def function(a, b):
    return a * b

Note that all other than the current version are deprecated, regardless of whether their value is greater or less. If you used version=10, and then started using version=9, then 9 is considered current, and 10 is obsolete.

Exceptions

If the decorated function throws an exception, the error is considered permanent. The exception is stored in the cache and will be raised every time.

from filememo import memoize, FunctionException

@memoize
def divide(a, b):
    return a / b

try:
    # tryng to run the function for the first time
    divide(1, 0)
except FunctionException as e:
    print(f"Error: {e.inner}")      

try:
    # not actually running again, getting error from cache
    divide(1, 0)
except FunctionException as e:
    print(f"Cached error: {e.inner}")      

The exceptions_max_age = None argument will prevent exceptions from being cached. Each error will be considered a one-time error.

@memoize(exceptions_max_age = None)
def download(url):
    return http_get(url)
    
while True:
    try:
        download('http://sample.net/path')
        break
    except FunctionException:
        time.sleep(1)
        # will retry        

You can also set the expiration time for cached exceptions. It may differ from the caching time of the data itself.

# keep downloaded data for a day, remember connection errors for 5 minutes

@memoize(max_age = datetime.timedelta(days: 1)
         exceptions_max_age = datetime.timedelta(minutes: 5))
def download(url):
    return http_get(url)

In-memory caching

Each call to a function decorated with @memoize results in I/O operations. If your absolute priority is performance, then even reading from the disk cache can be considered expensive. Although filememo does not attempt to cache the read data in memory, this functionality is easy to achieve by combining decorators.

from functools import lru_cache
from filememo import memoize

@lru_cache
@memoize
def too_expensive():
    return compute()

In this example, the filememo disk cache will be used to store the results between program runs, while the functools RAM cache will store the results between function calls.

If the data is already in disk cache, and the program is just started, then calling too_expensive() for the first time will read the result from disk. Further calls to too_expensive() will return the result from memory.

filememo_py's People

Contributors

github-actions[bot] avatar rtmigo avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.