Giter Site home page Giter Site logo

rakaly / jomini Goto Github PK

View Code? Open in Web Editor NEW
52.0 6.0 5.0 1.14 MB

Low level, performance oriented parser for save and game files from EU4, CK3, HOI4, Vic3, Imperator, and other PDS titles.

Home Page: https://crates.io/crates/jomini

License: MIT License

Rust 99.89% R 0.11%
parser eu4 ck3 text binary imperator hoi4 paradox clausewitz

jomini's Issues

Support Parsing Colors in Text Parser

The binary parser already supports parsing RGB color info found in imperator saves -- the text parser should support RGB at the very least (and potentially HSV)

color = hsv { 0.5 0.2 0.8 }
color = rgb { 100 200 150 }

RGB info can be found in imperator saves.

Include common date object

I've copied an implementation of non leap year date objects in three different implementations:

This library should include commonly used abstractions or data structures so that I don't have to try and keep them all in sync.

EDIT: while HOI4 has hours, that can remain in that repo (if it's ever created) as HourlyDate

jomini::Value

It would be useful to have a structure that represents every possible value we can deserialize into. Example:

// here's deserializer into a generic Value
let value: HashMap<String, jomini::Value> = jomini::text::de::from_utf8_slice(r#"
    unquoted = 1234
    quoted = "567"
    operator > 0.5
    color = hsv { 0.3 0.4 0.5 }
    sequence = { 1 2 3 4 }
    map = { a = 1 b = 2 }
"#.as_bytes());

// and we can deserialize it into concrete type later
let unquoted = value.get("unquoted").unwrap();
assert_eq!(f32::deserialize(unquoted.clone()).unwrap(), 1234.0);

This is the similar idea to existing serde_value::Value, serde_json::Value, serde_yaml::Value and others.

  • it should losslessly keep any value from jomini deserializer (i.e. I can deserialize tape into it)
  • it should implement IntoDeserializer (i.e. I can deserialize it into any concrete value)
  • it should be a static type (owned)
  • it should implement Debug
  • it should give user some API to inspect the value (e.g. is it a token or a sequence)

There's ValueKind in jomini right now, but it's a borrowed type, it doesn't implement Debug, and it's not public API.

There's also generic serde_value::Value, but it is not lossless (loses quotes, operators, headers, etc.). And the code above fails with it in multiple ways (e.g. loses hsv header, loses > operator, and unquoted number is transformed into string and can't be parsed within assert statement later).

Why? I want to solve #138 in more generic way (rather than adding yet another _internal_jomini_property hack) + make low level syntax more accessible in general. Plus some visual inspection of the file contents via Debug.

Generically support token header

Found these across several PDS games:

color = hsv { 0.58 1.00 0.72 }
color = rgb { 169 242 27 }
color = hsv360{ 25 75 63 }
position = cylindrical{ 150 3 0 }
color5 = hex { aabbccdd }
mild_winter = LIST { 3700 3701
    # ....
}

A few options:

  1. Support them generically: Create a TextToken::Header(Scalar) followed by a TextToken::Array (though it seems equally plausible that it is followed by an object)
  2. Create a unique TextToken for each case (eg: TextToken::Hsv, TextToken::Rgb, TextToken::Hsv360, etc)
  3. Parse similar objects to the same internal structure (eg: hsv, rgb, hsv360, and hex should all be about the same)

Right now I'm leaning towards option 1. as that will allow this mechanism for unforeseen tokens that I'm sure will be introduced with each game. With option 1, we should still keep the BinaryToken::Rgb as that is the only color that appears in the binary format (thus far). The downstream breaking change will come with deserialization, in order for a client to distinguish values the deserializer will need to expose the token header somehow. The only thing coming to mind is to emulate deserializing a tuple with the first element being (rgb, hsv, etc) and the second element being the data.

Differentiate quoted vs unquoted at high level

This PR added distinction between those in tapes: #55

Can I distinguish between those in custom serde Deserializer? Basically, I'm asking if the following code is possible:

#[derive(Debug, Deserialize)]
struct GameData {
    a: QuotedString,
    b: UnquotedString,
}

let data = r#"
    a = "@test"
    b = @test
"#;

How to deserialize operators?

e.g. a modifier list:

modifier = {
	factor = 2
	has_policy_flag = economic_stance_market
}
modifier = {
	factor = 0
	num_communications < 2
}

I know TextTape can do this, but I don't know how to mix TextTape and struct together. I tried JominiDeserialize and serde's visitor, but neither Operator nor OperatorValue implements Deserialize.

Investigate array arguments instead of slices to binary float flavor

fn visit_f32(&self, data: [u8; 4]) -> f32;

is much more self-explanatory than

fn visit_f32(&self, data: &[u8]) -> f32;

This will need to tested to ensure there is not a performance regression (potential matchup: unaligned read vs memcpy + shifts + adds).

The implementation would be similar to arrayref:

        let val = data
            .get(..4)
            .map(|x| {
                let arr: &[u8; 4] = unsafe { &*(x.as_ptr() as *const [u8; 4]) };
                self.flavor.visit_f32(*arr)
            })
            .ok_or_else(Error::eof)?;

Introduce API for discovering hidden objects

levels = { 10 0=1 1=2 }

Is encoded in the tapes as

levels = { 10 { 0=1 1=2 } }

I call these hidden objects (though the 10 may be more of a object header -- time will only tell which is correct).

Right now it is impossible for the client to know if the object start or end token they are looking at denotes a hidden object.

Each tape should have a corresponding method for determining which tokens delimit a hidden object

This will help the downstream melters create an equivalent plain text document easier.

Add Enhanced Text Parser

The current text and binary parser are geared towards performance of parsing large save files. From what I've gathered, save files tend to use a subset of possible tokens. For instance, stellaris data files use other operators than = like:

has_level > 2

So it may be beneficial to add another type of parser: an enhanced text parser. It can contain features that are too expensive to have in the base parser:

  • Stream parser
  • Support operators other than equals (and other syntax that is in data files but not save files)
  • Losslessly record tokens position (line and column) -- like an AST

Ref nickbabcock/jomini#9

Support "list" text keyword (coat of arms)

Currently recorded as unsupported syntax, the "list" keyword references a previously defined property:

  simple_cross_flag = {
      pattern = list "christian_emblems_list"
      color1 = list "normal_colors"
  }

Where normal_colors is defined elsewhere like:

normal_colors = {
  30 = "red"
  12 = "blue"
  1 = "green"
  14 = "black"
  0 = "purple"
  # ....
}

With CK3 and Imperator both using this syntax for coat of arms, it is likely that this syntax will continue to be seen (if only in coat of arms).

I dislike the keyword approach as it introduces new syntax unused elsewhere, so it may be tricky to ensure that this doesn't regress on save games that happen to have a list value, all the while maintaining performance

Add Heterogenous lists to BinaryTape

CK3 ironman save contains the following:

6f 34 01 00 03 00 0c 00 0a 00 00 00
0c 00 00 00 00 00 01 00 14 00 02 00 00 00 0c 00
01 00 00 00 01 00 14 00 02 00 00 00 04

which translates into

levels={ 10 0=2 1=2 }

While TextTape can force itself through, the BinaryTape fails. So the fix is to interpret it like follow

levels={ 10 { 0=2 1=2 } }

so it becomes a heterogenous list with an integer first followed by an object.

Option::unwrap() on a None value

I've managed to trigger this unwrap:

let value = self.value.take().unwrap();

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', C:\Users\user\.cargo\registry\src\index.crates.io-6f17d22bba15001f\jomini-0.24.0\src\text\de.rs:409:43

This is probably my misuse of serde (running next_value() on a map twice), but it should be at least an .expect() with a better error message or an Error::custom().

Code sample:

use serde::Deserialize;

#[derive(Deserialize, Debug)]
struct Color((u8, u8, u8));

#[derive(Deserialize, Debug)]
enum ColorName {
    Red,
    Green,
    Blue,
}

struct Container;

impl<'de> Deserialize<'de> for Container {
    fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        struct TVisitor;
        impl<'de> serde::de::Visitor<'de> for TVisitor {
            type Value = Container;

            fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
                formatter.write_str("{r g b} or name")
            }

            fn visit_map<A: serde::de::MapAccess<'de>>(self, mut map: A) -> Result<Self::Value, A::Error> {
                while let Some(key) = map.next_key::<String>()? {
                    println!("key: {}", key);
                    if let Ok(color) = map.next_value::<Color>() {
                        println!("color: {:?}", color);
                    } else {
                        let name = map.next_value::<ColorName>()?;
                        println!("name: {:?}", name);
                    }
                }
                Ok(Container)
            }
        }
        deserializer.deserialize_map(TVisitor)
    }
}

fn main() {
    let data = r#"
        color1 = red
        color2 = { 255 0 0 }
    "#;
    let _: Container = jomini::text::de::from_utf8_slice(data.as_bytes()).unwrap();
}

Expose 64bit binary floating point values

There's a bit of an oddity with the binary tokens. There are two floating point values:

BinaryToken::F32_1(f32)
BinaryToken::F32_2(f32)

What's odd is that the second version, F32_2 actually consumes 8 bytes of data, so a better design for this would be:

BinaryToken::F64(f64)

This would not increase the size of BinaryToken as 8 bytes is less than the 16 bytes to store a slice reference.

All these years, I've assumed that the last 4 bytes are unused, so before making this change I'll need to run some tests to make sure this does not change how values are decoded.

Support Parsing Escaped Strings

While not possible (afaik) in EU4, Stellaris allows one to embed quotes in names, which require them to be escaped:

name = "Joe \"Captain\" Rogers"

Ref nickbabcock/jomini#11

Here is a failing test case:

    #[test]
    fn test_escaped_quotes() {
        let data = br#"name = "Joe \"Captain\" Rogers""#;

        assert_eq!(
            parse(&data[..]).unwrap().token_tape,
            vec![
                TextToken::Scalar(Scalar::new(b"name")),
                TextToken::Scalar(Scalar::new(br#"Joe "Captain" Rogers"#)),
            ]
        );
    }

Also consider only unescaping the quotes on to_utf8 so that Scalar can still be a pure slice reference instead of a something like a Cow.

custom_name="THE !@#$%^&*( '\"LEGION\"')"

How to ignore hidden object deserialization by using derived macros?

I got Error(InvalidSyntax { msg: "hidden object must start with a key", offset: 3560 }) while trying to deserialize a Stellaris mod technology definition, file contains

tech_xxx {
weight_modifier {
...
    any_owned_planet = {
					OR = {
						has_building = building_mote_harvesters
						building_mote_harvesting_traps_2 // <-- ERROR HERE
...

But I didn't define weight_modifier field in my technology struct

So my questions are:

  1. why should I deal with a hidden object while I simply want to ignore that field? If so, how can I deal with it?
  2. TextTape could be an alternative but it is too complicated to use, especially when dealing with a multi layer nested object. Any real examples using TextTape to deserialize a game file?
  3. Could the parser allow to be continued after a syntax error to be found, and return a None instead of an error? Might not be clear but it's like HTML parsers vs XHTML parsers.

How to debug and handle ScalarError?

When I use the hoi4save parser to parse my HOI4 save file, Jomini returns an "AllDigits Error". This error is likely caused by the fact that the value in the "manpower_pool" field can be a string of numbers with dots, which makes it difficult for the parser to recognize it as valid input.

manpower_pool={
	available=95915
	locked=3379.3.22.15
	total=3782.1.21.20
}

De-serializing enums

I'd like to de-serialize something like this

...
requirements = {
    country = ENG
    prestige = 10
}
...

My struct looks like this

#[derive(Clone, Debug, Deserialize, PartialEq)]
pub struct Event {
    requirements: Vec<Condition>,
}

#[derive(Clone, Debug, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum Condition {
    Country { country: String },
    Prestige { prestige: u32 }
}

But I can't get it to work, with or without named fields in the enums, with or without untagged. I'd like to be able to write something like this

#[derive(Clone, Debug, Deserialize, PartialEq)]
#[serde(untagged)]
pub enum Condition {
    #[serde(rename="country)]
    Country(String),
    #[serde(rename="prestige")]
    Prestige(u32)
}

And have it interpret the 1-field enum alternatives correctly. I wasn't able to use JominiDeserialize with enums either. Is something like this supported?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.