Giter Site home page Giter Site logo

Comments (10)

sebglazebrook avatar sebglazebrook commented on June 24, 2024 1

Things that I think you could try is this space are:

  • load data from files
  • use structs to house the data and have function implemented on those structs to wrap/use that data.
  • use constants for data and then it can live outside the function

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024 1

I think that your considerations raise good questions @Bassetts. Before going into the details, my current thinking is that without discarding it in the future, loading the wordlist from a file is not what I would start with. I'll do my best to lay out what I have in mind 🙂

It is probably useful to start by stating that by principle I try to do as little as is necessary at once, keeping in mind that in software we can always do more in a week if we think it's useful, and that an iterative approach, in my experience, often leads to find other useful things to do, with a different perspective, once the problem at hand is solved.

Also, at this moment, I am as interested in training myself to building the right thing as in learning Rust (i.e. what building the thing right means in Rust). 🙂 I don't mind doing in 3 small PRs what could be done in one if there was time pressure.

So, this is what I have currently in mind:

  1. Iterative approach: if someone is interested in using a different wordlist in the future, I would consider it then. Playing devil's advocate: there is not evidence of that at this point. (I am the only user I know of.)
  2. Nature of wordlists: Yes, like you, I do consider the wordlist as an external dependency; with the following characteristics:
    1. by its nature, it is very unlikely to be updated
    2. there aren't many reasons for someone to use multiple wordlists simultaneously
  3. Trust / security [1]:
    1. I agree that reviewablity is key, with the following caveat:
    2. I don't think reviewability of the wordlist taken in isolation brings much trust value, and I would consider the reviewabilty of the wordlist and its surrounding code at once as a metric
      1. as authors unworth of trust: both code and wordlist must be reviewed before using the tool, and I think simpler code surrouding a constant will be simpler, overall, than more complex code around a text file... YMMV?
      2. as trustworthy authors that can misspell a wordlist: reviewing a 7000+ lines of text is unreliable enough in itself to call for an automated test to catch misspellings : )
  4. Usability / Value proposition for this tool / Why is this tool for? From an usability perspective, the practical use cases I can think of [2] would favour:
    1. a simple command (no arguments) over a flexible one (wordlist as argument for example), mainly because of (1.ii)
    2. an atomic / standalone binary over a binary with attached files
    3. a binary as small as possible (I completely agree with you on that)
    4. a program that starts as fast as possible (the task in not important enough to wait IMHO). If embedding the wordlist meant faster user experience (I don't know if that is true), then I'd probably favour that because of (1) and (2)

I am no expert but I believe a text file has more potential for compression than a binary

I'm no expert either, that statement seems intuitively correct to me... I would also check (gut felling) what happens during the compilation steps. I feel that this question can be an interesting pretext to learn things about Rust internals if that's something we / someone wants to do.

With all that in mind, I'd like to reword / refine the objectives for this step from:

Github does a pretty good job, only skipping syntactic coloration for those lines, but otherwise editing the file isn't pleasant. Also, the data is unlikely to change while the logic certainly will.

Objective: move data to a different file, in the form of constants so that it doesn't get mixed with the code that is likely to be improved (and change) over time.

...to:

Github does a pretty good job, only skipping syntactic coloration for those lines, but otherwise editing the file isn't pleasant. Also, the data is unlikely to change while the logic certainly will.

Objective: move data to a different file, in the form of constants so that it doesn't get mixed with the code that is likely to be improved (and change) over time, so that:

  • the dice/wordlist/mod.rs is easier to work with (more colors, less scrolling)...
  • ...without becoming more complex to review (one step at a time)
  • it is clearer that the EFF long wordlist is an external dependency and shouldn't be modified

[1] The trust / security dicussion is an interesting one in itself I believe, and I'm tempted to open a separate issue on that specific topic not to sidetrack this one too much. (Done, see #6 )
[2] Not excluding that there could be other use cases than the ones that I can think of, see (1)

from dice.

oknono avatar oknono commented on June 24, 2024 1

I would like to pick this issue up please :)

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024

Hi @sebglazebrook!

Thank for the pointers!
I think the constants are what I was trying to achieve, and thought that my variable being immutable, then they would be constant... I'll search specifically for constants : )

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024

Yep, there are constants in Rust, I forgot that paragraph it seems, but that makes sense to me. (:bulb: thanks @sebglazebrook)

While loading the data from file would allow to swap easily the wordlists, I don't think there is any need for that currently. (I personally like the long worldlist from the EFF.)

Using a struct could be a next step. For now I think constants are going to fulfill the purpose of making the code more readable.

I'll reword this issue now that the way to go is clear. 🙂


The data of EFF's long wordlist should live in a different file

As it is currently, that long piece of data (7776 numbers -dice throws- and 7776 words) is bloating the dice/wordlist/mod.rs file.

Github does a pretty good job, only skipping syntactic coloration for those lines, but otherwise editing the file isn't pleasant. Also, the data is unlikely to change while the logic certainly will.

Objective: move data to a different file, in the form of constants so that it doesn't get mixed with the code that is likely to be improved (and change) over time.

Challenge: as explained in the link above, constants require a type annotation... what is their type?

Bonus points: I think that probably the only constant that needs to be public is the eff_long_wordlist itself, keeping the other ones private would be nice.

This is a refactoring, so there is no need to add new tests, the tests that are already written are sufficient. The goal is to make them green once the move is over ✔️

Of course, I'm happy to discuss alternatives or help at any stage 🙂 I am learning Rust as well!

from dice.

Bassetts avatar Bassetts commented on June 24, 2024

@gonzalo-bulnes Personally I think that loading the words from a file is the best approach here.

  1. As you said it will mean that different word lists could be used
  2. I would consider the list to be an external depedency. By loading from a file it would mean that if the list were to ever be updated then updating the list for the program is as simply as replacing the existing file with the new one
  3. It would mean the application could, in future, ship with multiple word lists. Users could select between them by passing an argument
  4. It means your users are not trusting you to have recreated the list properly. They can take the list that ships with the application and verify it is the same as the one provided by EFF themselves. Admittedly this one only benefits the truly paranoid, where is my tin foil hat?
  5. It will reduce the size of the produced binary and will likely mean that a release compresses better when creating an archive. I am no expert but I believe a text file has more potential for compression than a binary

Just my two cents 😄

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024

@Bassetts Althought I would be reluctant to use it here for the reasons above, you might be interested in rust-embed for the purpose of working with separate source files while still shipping a single binary. And it is a macro, which could also be interesting. : )

from dice.

Bassetts avatar Bassetts commented on June 24, 2024

@gonzalo-bulnes I was looking into the options for loading a file into the binary at compile time the other day and there is the include_str! macro.

I don't think it is particularly suitable for this application as you would then just have one huge static string embedded into the binary.

I also came across support for code generation by using a build script. That option is very interesting to me as you could have a build step that takes the word list file and generates code for you e.g. making each line into a variable, or putting all the lines into a HashMap.

Doing code generation would mean your word list text file can just be committed to source control, which would give you all the benefits that come with that such as;

  • changes being easier to see using git/github
  • being able to verify the file by doing a straight diff between the file in your repo and the one downloaded directly from the "upstream" (in this case EFF)

All that would then need to be done in terms of an audit is to verify the code generation code doesn't modify the text when generating the code, which I think should be fairly simple.

This also has the benefit that if someone wanted to provide a binary with an alternate word list then they could fork your repo, replace the word list with their own file (as long as it is in the expected format), and then just build. The code generation will take care of the rest.

For the reasons you laid out above this probably isn't the approach to take right now, but I thought it was worth highlighting as I found it very interesting, and we are both learning Rust 🦀 here. I may fire up a branch at some point to play with the code generation as it seems very interesting.

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024

Code generation using a build script is very interesting indeed! Thanks for the pointer @Bassetts!

from dice.

gonzalo-bulnes avatar gonzalo-bulnes commented on June 24, 2024

If you skipped the context above, here is the description of the issue 😉


The data of EFF's long wordlist should live in a different file

As it is currently, that long piece of data (7776 numbers -dice throws- and 7776 words) is bloating the dice/wordlist/mod.rs file.

Github does a pretty good job, only skipping syntactic coloration for those lines, but otherwise editing the file isn't pleasant. Also, the data is unlikely to change while the logic certainly will.

Objective: move data to a different file, in the form of constants so that it doesn't get mixed with the code that is likely to be improved (and change) over time, so that:

  • the dice/wordlist/mod.rs is easier to work with (more colors, less scrolling)...
  • ...without becoming more complex to review (one step at a time)
  • it is clearer that the EFF long wordlist is an external dependency and shouldn't be modified

Challenge: as explained in the link above, constants require a type annotation... what is their type?

Bonus points: I think that probably the only constant that needs to be public is the eff_long_wordlist itself, keeping the other ones private would be nice.

This is a refactoring, so there is no need to add new tests, the tests that are already written are sufficient. The goal is to make them green once the move is over ✔️

Of course, I'm happy to help at any stage 🙂 I am learning Rust as well!

from dice.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.