Giter Site home page Giter Site logo

imstr's Introduction

Immutable Strings

crates.io docs.rs

This crate offers a cheaply cloneable and sliceable UTF-8 string type. It is inspired by the bytes crate, which offers zero-copy byte slices, and the im crate which offers immutable copy-on-write data structures. It offers a standard-library String-compatible API.

Internally, the crate uses a standard library string stored in a smart pointer, and a range into that String. This allows for cheap zero-copy cloning and slicing of the string. This is especially useful for parsing operations, where a large string needs to be sliced into a lot of substrings.

TL;DR: This crate offers an ImString type that acts as a String (in that it can be modified and used in the same way), an Arc<String> (in that it is cheap to clone) and an &str (in that it is cheap to slice) all in one, owned type.

Diagram of ImString Internals

This crate offers a safe API that ensures that every string and every string slice is UTF-8 encoded. It does not allow slicing of strings within UTF-8 multibyte sequences. It offers try_* functions for every operation that can fail to avoid panics. It also uses extensive unit testing with a full test coverage to ensure that there is no unsoundness.

Features

Efficient Cloning: The crate's architecture enables low-cost (zero-copy) clone and slice creation, making it ideal for parsing strings that are widely shared.

Efficient Slicing: The crate's architecture enables low-cost (zero-copy) slice creation, making it ideal for parsing operations where one large input string is slices into many smaller strings.

Copy on Write: Despite being cheap to clone and slice, it allows for mutation using copy-on-write. For strings that are not shared, it has an optimisation to be able to mutate it in-place safely to avoid unnecessary copying.

Compatibility: The API is designed to closely resemble Rust's standard library String, facilitating smooth integration and being almost a drop-in replacement. It also integrates with many popular Rust crates, such as serde, peg and nom.

Generic over Storage: The crate is flexible in terms of how the data is stored. It allows for using Arc<String> for multithreaded applications and Rc<String> for single-threaded use, providing adaptability to different storage requirements and avoiding the need to pay for atomic operations when they are not needed.

Safety: The crate enforces that all strings and string slices are UTF-8 encoded. Any methods that might violate this are marked as unsafe. All methods that can fail have a try_* variant that will not panic. Use of safe functions cannot result in unsound behaviour.

Example

use imstr::ImString;

// Create new ImString, allocates data.
let mut string = ImString::from("Hello, World");

// Edit: happens in-place (because this is the only reference).
string.push_str("!");

// Clone: this is zero-copy.
let clone = string.clone();

// Slice: this is zero-copy.
let hello = string.slice(0..5);
assert_eq!(hello, "Hello");

// Slice: this is zero-copy.
let world = string.slice(7..12);
assert_eq!(world, "World");

// Here we have to copy only the part that the slice refers to so it can be modified.
let hello = hello + "!";
assert_eq!(hello, "Hello!");

Optional Features

Optional features that can be turned on using feature-flags.

Feature Description
serde Serialize and deserialize ImString fields as strings with the serde crate.
peg Use ImString as the data structure that is parsed with the peg crate. See peg-list.rs for an example.
nom Allow ImString to be used to build parsers with nom. See nom-json.rs for an example.

Similar

There are several crates similar to this, which are listed in the Rust String Benchmarks. You may want to check the other crates out as well.

This is a comparison of this crate to other, similar crates. The comparison is made on these features:

  • Cheap Clone: is it a zero-copy operation to clone a string?
  • Cheap Slice ๐Ÿ•: is it possibly to cheaply slice a string?
  • Mutable: is it possible to modify strings?
  • Generic Storage: is it possible to swap out the storage mechanism?
  • String Compatible: is it compatible with String?

Here is the data, with links to the crates for further examination:

Crate Cheap Clone Cheap Slice Mutable Generic Storage String Compatible Notes
imstr โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ This crate.
tendril โœ”๏ธ โœ”๏ธ โœ”๏ธ โœ”๏ธ โŒ Complex implementation. API not quite compatible with String, but otherwise closest to what this crate does.
immut_string โœ”๏ธ โŒ ๐ŸŸก (no optimization) โŒ โŒ Simply a wrapper around Arc<String>.
immutable_string โœ”๏ธ โŒ โŒ โŒ โŒ Wrapper around Arc<str>.
arccstr โœ”๏ธ โŒ โŒ โŒ โŒ Not UTF-8 (Null-terminated C string). Hand-written Arc implementation.
implicit-clone โœ”๏ธ โŒ โŒ ๐ŸŸก โœ”๏ธ Immutable string library. Has sync and unsync variants.
semistr โŒ โŒ โŒ โŒ โŒ Stores short strings inline.
quetta โœ”๏ธ โœ”๏ธ โŒ โŒ โŒ Wrapper around Arc<String> that can be sliced.
bytesstr โœ”๏ธ ๐ŸŸก โŒ โŒ โŒ Wrapper around Bytes. Cannot be directly sliced.
fast-str โœ”๏ธ โŒ โŒ โŒ โŒ Looks like there could be some unsafety.
flexstr โœ”๏ธ โŒ โŒ โœ”๏ธ โŒ
bytestring โœ”๏ธ ๐ŸŸก โŒ โŒ โŒ Wrapper around Bytes. Used by actix. Can be indirectly sliced using slice_ref().
arcstr โœ”๏ธ โœ”๏ธ โŒ โŒ โŒ Can store string literal as &'static str.
cowstr โœ”๏ธ โŒ โœ”๏ธ โŒ โŒ Reimplements Arc, custom allocation strategy.
strck โŒ โŒ โŒ โœ”๏ธ โŒ Typechecked string library.

License

MIT, see LICENSE.md.

imstr's People

Contributors

aminya avatar benbinford avatar iamthecarl avatar ledjolleshaj avatar shnatsel avatar stevefan1999-personal avatar xfbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

imstr's Issues

Option<ImString> not deserializeable

imstr = { version = "0.2.0", features = ["serde"] }
#[derive(Debug, Serialize, Deserialize)]
pub struct ResponseGeneric {
  pub message: Option<ImString>
}
 pub message: Option<ImString>,
     |                ^^^^^^^^^^^^^^^^ the trait `i18n::_::_serde::Deserialize<'_>` is not implemented for `ImString<Arc<std::string::String>>`
     |
     = help: the following other types implement trait `i18n::_::_serde::Deserialize<'de>`:
               <bool as i18n::_::_serde::Deserialize<'de>>
               <char as i18n::_::_serde::Deserialize<'de>>
               <isize as i18n::_::_serde::Deserialize<'de>>
               <i8 as i18n::_::_serde::Deserialize<'de>>
               <i16 as i18n::_::_serde::Deserialize<'de>>
               <i32 as i18n::_::_serde::Deserialize<'de>>
               <i64 as i18n::_::_serde::Deserialize<'de>>
               <i128 as i18n::_::_serde::Deserialize<'de>>
             and 443 others
     = note: required for `std::option::Option<ImString<Arc<std::string::String>>>` to implement `i18n::_::_serde::Deserialize<'_>`
note: required by a bound in `next_element`

Run cargo fix

I see there are a lot of warnings. Have you considered running cargo fix?

Add `new()` optimisation

To prevent needing an allocation, add an optimisation that will simple use a globally shared Arc with an empty string.

Needs benchmarking to make sure this make sense.

`DetachedSlice` with different content?

I have a use case where I want to manually create a slice into a source string by setting the final string content and the original offset. When I want to refer to the content I would just ignore the original_offset.

Here is the implementation I have come up with:

pub struct DetachedImString<S: Data<String>> {
    content: S,
    original_offset: Range<usize>,
}

impl DetachedImString {
    pub fn as_str(&self) -> &str {
       return &self.content
    }
}

pub enum MaybeDetachedImString {
  Attached(ImString),
  Detached(DetachedImString)
}

I wonder if this use case is something useful for others so that I submit a PR, and if you have any notes on the approach.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.