Giter Site home page Giter Site logo

pyo3-polars's Introduction

1. Shared library plugins for Polars

This is new functionality and should be preferred over 2. as this will circumvent the GIL and will be the way we want to support extending polars.

Parallelism and optimizations are managed by the default polars runtime. That runtime will call into the plugin function. The plugin functions are compiled separately.

We can therefore keep polars more lean and maybe add support for a polars-distance, polars-geo, polars-ml, etc. Those can then have specialized expressions and don't have to worry as much for code bloat as they can be optionally installed.

The idea is that you define an expression in another Rust crate with a proc_macro polars_expr.

That macro can have the following attributes:

  • output_type -> to define the output type of that expression
  • output_type_func -> to define a function that computes the output type based on input types.

Here is an example of a String conversion expression that converts any string to pig latin:

fn pig_latin_str(value: &str, capitalize: bool, output: &mut String) {
    if let Some(first_char) = value.chars().next() {
        if capitalize {
            for c in value.chars().skip(1).map(|char| char.to_uppercase()) {
                write!(output, "{c}").unwrap()
            }
            write!(output, "AY").unwrap()
        } else {
            let offset = first_char.len_utf8();
            write!(output, "{}{}ay", &value[offset..], first_char).unwrap()
        }
    }
}

#[derive(Deserialize)]
struct PigLatinKwargs {
    capitalize: bool,
}

#[polars_expr(output_type=Utf8)]
fn pig_latinnify(inputs: &[Series], kwargs: PigLatinKwargs) -> PolarsResult<Series> {
    let ca = inputs[0].utf8()?;
    let out: Utf8Chunked =
        ca.apply_to_buffer(|value, output| pig_latin_str(value, kwargs.capitalize, output));
    Ok(out.into_series())
}

On the python side this expression can then be registered under a namespace:

import polars as pl
from polars.utils.udfs import _get_shared_lib_location

lib = _get_shared_lib_location(__file__)


@pl.api.register_expr_namespace("language")
class Language:
    def __init__(self, expr: pl.Expr):
        self._expr = expr

    def pig_latinnify(self, capatilize: bool = False) -> pl.Expr:
        return self._expr._register_plugin(
            lib=lib,
            symbol="pig_latinnify",
            is_elementwise=True,
            kwargs={"capitalize": capatilize}
        )

Compile/ship and then it is ready to use:

import polars as pl
import expression_lib

df = pl.DataFrame({
    "names": ["Richard", "Alice", "Bob"],
})


out = df.with_columns(
   pig_latin = pl.col("names").language.pig_latinnify()
)

See the full example in [example/derive_expression]: https://github.com/pola-rs/pyo3-polars/tree/main/example/derive_expression

2. Pyo3 extensions for Polars

See the example directory for a concrete example. Here we send a polars DataFrame to rust and then compute a jaccard similarity in parallel using rayon and rust hash sets.

Run example

$ cd example && make install $ venv/bin/python run.py

This will output:

shape: (2, 2)
┌───────────┬───────────────┐
│ list_a    ┆ list_b        │
│ ---       ┆ ---           │
│ list[i64] ┆ list[i64]     │
╞═══════════╪═══════════════╡
│ [1, 2, 3] ┆ [1, 2, ... 8] │
│ [5, 5]    ┆ [5, 1, 1]     │
└───────────┴───────────────┘
shape: (2, 1)
┌─────────┐
│ jaccard │
│ ---     │
│ f64     │
╞═════════╡
│ 0.75    │
│ 0.5     │
└─────────┘

Compile for release

$ make install-release

What to expect

This crate offers a PySeries and a PyDataFrame which are simple wrapper around Series and DataFrame. The advantage of these wrappers is that they can be converted to and from python as they implement FromPyObject and IntoPy.

pyo3-polars's People

Contributors

andysham avatar cmdlineluser avatar dalcde avatar marcogorelli avatar mariushelf avatar messense avatar ritchie46 avatar simong85 avatar stinodego avatar wukan1986 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.