Giter Site home page Giter Site logo

ruby-marshal's Introduction

ruby-marshal

A deserialization library for Ruby's marshalling format.

⚠️ WARNING ⚠️

This crate is really experimental. Not intended for production applications.

It does not support the entiretly of Ruby Marshal (yet), it contains a bunch of footguns, testing coverage is not really the best and, overall, this crate is Serde but worse. One could call it "bootleg homebrew Serde for Ruby Marshal".

This was ultimately intended to be a learning/experimentation project that ultimately got a bit overblown. Use at your own risk!

Getting started

If you want to get deserializing right away, make sure your target Rust types implement FromRubyMarshal and use the convenience method from_bytes to get started.

For simple cases can use the provided derive macro (#[derive(FromRubyMarshal)) to quickly implement FromRubyMarshal on your types:

use ruby_marshal::{self, FromRubyMarshal};

#[derive(Debug, FromRubyMarshal)]
struct Test {
    thing1: i32,
    thing2: Option<String>,
    thing3: Vec<i32>,
}

// ruby: Marshal.dump({:thing1 => 1, :thing2 => "hello", :thing3 => [1,2,3]})
let input: &[u8] = &[
    0x04, 0x08, 0x7b, 0x08, 0x3a, 0x0b, 0x74, 0x68, 0x69, 0x6e, 0x67, 0x31, 0x69, 0x06, 0x3a,
    0x0b, 0x74, 0x68, 0x69, 0x6e, 0x67, 0x32, 0x49, 0x22, 0x0a, 0x68, 0x65, 0x6c, 0x6c, 0x6f,
    0x06, 0x3a, 0x0d, 0x65, 0x6e, 0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x22, 0x0e, 0x53, 0x68,
    0x69, 0x66, 0x74, 0x5f, 0x4a, 0x49, 0x53, 0x3a, 0x0b, 0x74, 0x68, 0x69, 0x6e, 0x67, 0x33,
    0x5b, 0x08, 0x69, 0x06, 0x69, 0x07, 0x69, 0x08,
];
let out = ruby_marshal::from_bytes::<Test>(input).expect("parsing failed");
assert_eq!(out.thing1, 1);
assert_eq!(out.thing2, Some("hello".to_string()));
assert_eq!(out.thing3, vec![1, 2, 3]);

FromRubyMarshal can also be implemented manually.

Features

  • Deserialize binary Ruby Marshal objects into:
  • A derive macro to automatically implement FromRubyMarshal, Serde style.
    • Allows deserialization of named structs (e.g. Point { x: 1, y: 2 }) from Ruby hashes and IVAR objects (normal objects soon to follow).
    • Field renaming support.
    • Borrowed data support on types that use the 'de lifetime.

Roadmap

  • Full Ruby Marshal 4.8 support:
    • nil
    • Booleans (true, false).
    • Integers (0, 320).
    • Floating-point numbers (0.2, Math::PI).
    • Symbols (:foo), with support for resolving symlinks.
    • Arrays ([1, 2, 3]).
    • Hashes ({:a => 1, :b => 2}).
    • Byte arrays.
    • IVAR-wrapped objects.
      • Strings with encoding.
      • Regular expressions (not yet stable).
    • Class and module references.
    • Objects, with support for resolving object links.
    • Bignums (numbers outside of the [-230, 230 - 1] range).
    • Custom marshalled objects:
      • _dump, _load
      • marshal_dump, marshal_load
    • Additional low-level format tags:
      • TYPE_EXTENDED (e)
      • TYPE_UCLASS (C)
      • TYPE_DATA (d)
      • TYPE_USRMARSHAL (U)
      • TYPE_HASH_DEF (})
      • TYPE_MODULE_OLD (M)
  • A derive macro that doesn't suck.
    • Rust type system support.
      • Named structures (struct Point { x: 1, y: 2 }).
      • Enums (enum Variant { A, B }).
      • Borrowed data support (<'de>).
      • Generics (struct Container<T: FromRubyMarshal> { /* ... */ }).
    • Rust types support.
      • Any user type that implements FromRubyMarshal.
      • bool
      • Unsigned integers: u8, u16, u32, u64, u128, usize.
      • Signed integers: i8, i16, i32, i64, i128, isize.
      • Floating-point numbers: f32, f64.
      • Borrowed byte slices: &'de [u8].
      • String, Cow<'de, str>.
      • Box<T>.
      • Option<T>.
      • Vec<T> (arrays), Vec<(T, U)> (hashes).
      • HashMap<K, V>
        • Requires K: Eq + Hash.
        • Any BuildHasher that implements Default is supported.
      • BTreeMap<K, V>
        • Requires K: Ord.
      • ...some other types not yet considered...
    • Marshal format support.
      • Hashes ({:a => 1, :b => 2}).
      • IVAR objects.
      • Objects.
    • Annotation-based functionality.
      • Field renaming (#[marshal(rename = "foo")).
      • Selecting where to deserialize the boxed data of an IVAR object (#[marshal(ivar_data)]).
      • Pluggable logic to deserialize from custom marshalled objects.
      • ...etc...

ruby-marshal's People

Contributors

nnubes256 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

ruby-marshal's Issues

Improvement of FromRubyMarshal ergonomics

So I think I've had an idea to make possible having the deserializer be built-in into RubyType that sidesteps the double mutable borrow issues I had with it (see https://discord.com/channels/273534239310479360/1058530213585236019/1058758259428839425 on the Rust community Discord server).

Really the problem with the approach I took before is two-fold:

  • On one hand, there's value in providing an ergonomic API to deserialize elements taking advantage of lifetimes, so having the RubyType be bound to 'de: 'deser, 'deser is not without merit
  • On the other hand, the internal facilities (such as RubyArrayIter, RubyMapIter that use it (by having a &'deser mut Deserializer<'de> field) cannot really use any function that returnsRubyType<'de, 'deser> without binding the lifetime of & (mut) self to 'deser, which makes a lot of design space just impossible to explore; for example, Drop implementations.

However, this week I thought about it and I think there's a way to solve that particular problem. For this to work, I think you need both a RubyType<'de, 'deser> and a RubyType<'de> we will call the latter "RawRubyType<'de>".

  • RubyType<'de, 'deser> is what end-users use. There is no footguns here, and this would be the most ergonomic way to perform deserialization.
  • RawRubyType<'de> is what internal facilities use. There are some footguns to keep in mind when using this, but it allows the most fine control. A RawRubyType can be upgraded to a RubyType by providing a deserializer.

What this implies is that you have two ways of getting the next element to deserialize:

  • fn next_element<'deser>(&'deser mut self) -> RubyType<'de, 'deser>
  • fn next_raw_element(&mut self) -> RawRubyType<'de>

And implementations can mix and match depending on what they do. Consider RubyArrayIter for example:

  • The public API to advance to the next element would be a call to Deserializer#next_element, because here it's acceptable to bind the lifetime of the returned element to its parent RubyArrayIter.
  • Its Drop implementation would involve calling Deserializer#next_raw_element in a loop until the iterator is exhausted. I mean, you aren't really passing those elements up to the end-user; in fact the whole point is to drop them!

I think that with this I'd be able to solve all footguns the library has so far:

  • Having a Drop implementation for RubyArrayIter/RubyMapIter that safely skips the iterator on drop would be possible.
  • You could go so far as to "make the Frame built-in" into the returned RubyType (probably through a wrapper and some Deref/DerefMut magic), and therefore get rid of let mut deserializer = deserializer.prepare()? entirely, which right now I believe is a footgun because for it to work correctly you really want to call it every time you take the next element.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.