Giter Site home page Giter Site logo

software-systems-tudelft's Introduction

This is assignment 1, about embedded concurrency. You can find a description of this assignment on the website

Assignment 1 for Software Systems

Baseline requirements

  • Be able to search a file or directory for all occurrences of a regex pattern.
  • Print output while it is searching. (It shouldn't buffer the results and print them at the end)
  • Print the results only using the provided Display implementation. Results should be put into GrepResult structs and printed. This will make sure their formatting is consistent.
  • Make sure the search_ctr numbers during printing are in increasing order.
  • Make sure the output is well-formed.
  • Be concurrent, that is, should use multiple threads.
  • Be thread safe. Any pattern that may work for a limited input, or which depends on being lucky with how the scheduler schedules your threads is considered incorrect.
  • Use at least one Mutex or RwLock in a non-trivial way (i.e. you don't simply construct a Mutex and never use it).
  • Use at least one non-trivial channel.
  • Not use any other libraries besides regex and clap without explicit permissions from a TA.
  • Not use any unsafe code.
  • Compile on stable rust.
  • Use threads, and not rust's support for async and asynchronous programming.
  • Dynamically determine how many threads it should use based on the amount of cores the system has. You must not spawn many times more threads than your system has cores. On large folders, you should not spawn millions of threads.

Extra requirements

If your application is faster than grep: 1.5 points

  • This is easier to achieve than you may expect, since grep is single-threaded.
  • We will test this by running both your tool and grep, and asking them to find the word torvalds in the linux source code. The tests will be run on the same computer, with multiple cores available.

Good error handling: 1.0 points

  • Your application needs to provide a user-friendly output when an unexpected situation occurs.
  • For example, the user may provide an invalid regex, the program may encounter files that it does not have permission to read, or a file is deleted while the program is executing. This is not a full list.
  • The program should not contain any unwraps or expects that may fail. Provide user-friendly outputs.

Implement path filtering: 1.0 points

  • The program should only return results in files for which the file path matches the specified regex.
  • In order to do this, you will need to use the filter parameter in the Args struct.
  • For example, the command cargo run -- -f ".*/file[12].txt" banana ./examples/example2 should only return the result in file 1, not in file 3:

Analyse the performance of your implementation: 1.0 points

  • Submit a document of at most two pages, in which you present a performance analysis of your program.
  • Include a plot that shows the performance of the program over the number of threads that it was given.
  • Explain how you measured the performance, and how you prevented random fluctuations from affecting the result.
  • Include relevant information, such as the number of cores of the machine this was executed on and the dataset that was used.
  • Include the same command executed with grep as a baseline.

Support for binary files: 1.0 points

  • Not all files in the directory may contain utf-8 text. The program should work correctly in these scenarios.
  • Note that a String can only contain utf-8 text, so you will have to use a Vec instead.
  • You need to use the bytes module of the regex crate for this. See https://docs.rs/regex/latest/regex/bytes/index.html for more.
  • When printing, non-utf8 bytes should be replaced by the unicode replacement character. The String::from_utf8_lossy function does this. The Display implementation of GrepResult already uses this, so you don't need to change this.
  • In the example3 directory, you can find an example of a file with non-utf8 bytes. Note that your text editor may not enjoy these bytes. cargo run banana ./examples/example3/ should match the file. The expected result is: >>> (#0) ../template/examples/example3/test.bin �banana� ^^^^^^

Discussion

The analysis is done with a wide range of threads launched in the program. However, by default, the number of threads launched running the program differs among machines. It is equal to the maximum number of threads the machine could handle. In this case, 8 threads for MacBook and 16 threads for Dell.

Besides the baseline, most of the extra requirements are met in our program. However the Good error handling is not certain whether it is fully realized, due to lack of verification. For most functions implemented, error handling with user friendly messages are included.

Another issue regarding the performance of the program in respect to the number of threads launched in the program requires further analysis. The peak performance of the program reaches its peak performance on both machines with different operating systems when 128 and 64 threads are launched respectively. However, according to the CPU hardware parameters, the CPU in two machines should only be able to handle 16 threads and 8 threads at the same time. The answer to this issue remains unknown and requires further analysis.

The only two concurrency primitives used in the program are a counter of type Mutex which counts the numbers of files that match the regex, and a channel sending GrepResult between two threads. The counter of type Mutex is necessary because it guarantees the safety of the value in counter. Due to the fact that Mutex could only be accessed when it is locked, and block threads waiting for the lock to become available. Hence, only one thread is adding to the counter each time. The second concurrency primitive is a channel which sends GrepResult found in every file to another thread for printing out the result. This is due to the requirement of the assignment - Use at least one non-trivial channel.. In our case, two different type of threads are created. One only consists of a thread which loops and prints out the result in a provided format once it receives GrepResult via the channel. The other one consists of several threads which reads a file respectively and sends the GrepResult via the channel once the single file is done reading and matches are found.

Conclusion

Overall, the program is faster than the baseline throughout all our tests (when number of threads launched are the maximum that the machine could handle). The speedup is 136.03% on Dell (with 16 threads launched in the program) and 642.96% on MacBook (with 8 threads launched in the program). Most of the requirements are fully realized. Though answers to certain issues remain unknown, they are discovered and reasonable guesses for the cause are given.

software-systems-tudelft's People

Contributors

jclli avatar jdonszelmann avatar jonathanbrouwer avatar tonyyunyang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.