Giter Site home page Giter Site logo

Comments (37)

milancurcic avatar milancurcic commented on May 25, 2024 5

I like just File.

In the case of reading and writing formatted data, the trivial implementation doesn't operate on multiple variables. It reads or writes a string, for example:

character(len=:), allocatable :: a
type(File) :: f
f = File('log.txt')
f % read(a) ! reads whole file into string a
f % seek(0) ! rewind
f % readline(a) ! reads a single line into string a

To write:

f = File('log.txt', 'w')
f % write('pi = %.3f' .fmt. 3.141)

similar to Python. The last line would write 'pi = 3.141' to log.txt. This is an example of on-the-fly formatting that I mentioned earlier, which I think is needed if you want to elegantly read and write multiple variables. .fmt. here is our implementation of %.

C-style formatting is something I'd very much like in Fortran. I think it may help some newcomers to the language as well because this kind of formatting is more common in other languages. But that's for another proposal. :)

from stdlib.

cmacmackin avatar cmacmackin commented on May 25, 2024 2

C-style formatting is something I'd very much like in Fortran. I think it may help some newcomers to the language as well because this kind of formatting is more common in other languages. But that's for another proposal. :)

This actually wouldn't be too difficult to implement in a standard library. We'd just write a series of wrappers in C, taking different numbers of void*t arguments. We'd then use interoperability to call these from Fortran and wrap them in a generic block. We could have versions accepting between, say, 1 and 30 arguments (tedious, but could be automatically generated), which should be enough for anyone.

from stdlib.

everythingfunctional avatar everythingfunctional commented on May 25, 2024 2

I've always found it really strange that Fortran's read and write actually muddled the jobs of parsing and formatting together with the I/O. It leads to input files/specifications that are predicated on that functionality, which in my opinion is not a good thing. You're users shouldn't have to care about the peculiarities of the language you're using. I once was porting some Fortran code to Python and had to write something to simulate Fortran's IO in order to make sure old input files would still work properly.

I'm pretty fond of Python's file API. I think we should mimic that as a library and see if it works well for everybody. We need a string type to implement it properly though since readlines returns an array of strings. An on-the-fly formatting library would be really helpful too.

from stdlib.

certik avatar certik commented on May 25, 2024 2

It's about making Fortran easier to use. This:

s = open(filename)
s = open(filename, "w")

is simpler and more consistent than this:

open(newunit=s, file=filename, status="old", action="read")
open(newunit=s, file=filename, status="replace", action="write")

That's all there is to it. Just like Fortran has arrays, which do have similar information as the various C++ array libraries, but they are much easier to use. That's the whole point. So the goal of stdlib is to make Fortran easier to use and more productive, with a standard syntax, agreed upon by a wide community.

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024 2

After some discussions and implementations (see #71, #77, #86, #91), the function open has been added to stdlib_experimental_io:

integer function open(filename, mode, iostat) result(u)
    character(*), intent(in) :: filename
    character(*), intent(in), optional :: mode
    integer, intent(out), optional :: iostat
    ...
end function

This function returns a unit number for a file opened following the provided mode.
The mode can be include the following letters (following the Python open):

  • r (read), w (write and replace the file if it exists), x (write in a new file), a (append)
  • + (readwrite)
  • t (text), b (binary)
    The default mode is rt.

Currently, both text and binary files are opened with access = stream.

So, should this API support the other accesses (sequential and direct)? What are the pros and cons to support them (or not)?
So let 's discuss this API (@milancurcic @zbeekman @certik @cmacmackin @ivan-pi @marshallward @rweed @gronki @everythingfunctional and all others I maybe missed)

Supporting direct access would require to add recl to the API too. IMHO this would defeat the purpose of stdlib function open that aims to propose a more friendly way to deal with files.

A draft PR (#91) has been opened to propose an API to support at least sequential access (default is still stream):

integer function open(filename, mode, iostat, access) result(u)
    character(*), intent(in) :: filename
    character(*), intent(in), optional :: mode
    character(*), intent(in), optional :: access
    integer, intent(out), optional :: iostat
end function

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024 2

I am fine with calling it delete.

Me too. delete would be also in agreement with close(...='delete').
However, if we want to extend that to directories, rm or remove could be more appropriate (with e.g. rmdir to remove directories.

I assume this would go into the io module?

It would make sense to me, since open is already in the io module.

If we want to implement more stuff around file systems #100, we should probably have a new module dedicated to file system stuff (e.g., rm, rmdir, ls, mkdir,...)

from stdlib.

marshallward avatar marshallward commented on May 25, 2024 1

I would also like to see a more POSIX-like interface to the filesystem. I would also like to see this as independent from the Fortran interface, with a focus on data streams rather than the list-directed I/O for which Fortran I/O has been tailored.

I also think that there should be a generic File type, but it would be good to start with complete interfaces to the low level file I/O C functions, such as POSIX open, read, etc. More elegant interfaces can then be build from these.

I like the way Python does this, with a top level os module which provides a commont platform which conditionally wraps to a low level interface: posix module in Linux/OSX, nt for Windows, etc. Perhaps even the Fortran intrinsics could be supported as an additional backend of necessary, along with more exotic platforms (OpenVMS?).

from stdlib.

gronki avatar gronki commented on May 25, 2024 1

I think we do not need a derived type wrapper for Fortran i/o. The current i/o covers many of the general needs. Some missing capabilities can be (and gradualy are) improved from the language side and introducing a wrapper would not help with anything. For a particular file format (such as PPM), anyone can write their own wrapper using derived types. This is literally what derived types are for. They should not be used for introducing another way of doing the same thing (which is universally bad). So I am strongly against this proposal.

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024 1

Yes, perhaps it's better for this to start as a separate project, and after having something concrete to look at we can discuss whether people would like to have it in stdlib or not.

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024 1

Yes, ease of use is the key motivation for me here, and IMO is one of highest priorities for stdlib. I also learn standard Fortran I/O every time I use it. Teaching it is especially tedious.

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024 1

In this comment, @jacobwilliams mentioned:

delete a file or directory (I know for files we can open and then close(..status='delete') but that is just so non-modern).

Would it be an interest to implement a subroutine/function in stlib to delete a file? For example, this would check if the file exists, if it is not opened, and if both are true, would delete the file.
Such a subroutine/function could be implemented in stdlib_experimenatl_io.
Possible interface:

io = delete('file.txt')

or

call delete('file.txt',iostat=io)

with iostat being optional

Such a function/subroutine would avoid the 'traditional':

open(newunit=u, file = 'file.txt', status = 'old')
close(u, status = 'delete')

from stdlib.

certik avatar certik commented on May 25, 2024 1

I think the answer is yes, only we should make the naming consistent. Python has remove, and we should survey other languages.

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024

However, the problem is in defining methods on the file-object; these would need to know the number and type of arguments at compile-time. It would be impractical to produce methods with every conceivable permutation of object types. It would also require variadic functions, which are not available.

@cmacmackin Can you explain why the number and type of arguments aren't known at compile time? How many can there possibly be? I think you'd need just clever formatting of numeric types to characters, but I may be missing the point.

from stdlib.

zbeekman avatar zbeekman commented on May 25, 2024

I think he means that with standard write you can pass an arbitrary number of variables of different types. That makes it hard for a file%write() method to be generic because how do you create a sufficiently generic interface? I may be miss-interpreting his comment, but this is my current understanding.

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024

I see, that didn't even cross my mind. Wouldn't for most purposes one argument (character string for text, numeric or strings for binary) suffice?

The only time I put multiple variables on write or read statements is for list-directed I/O. When I do it, the language-provided write or read are simple enough.

If we had nice on-the-fly formatting like Python does, then this becomes trivial I think.

from stdlib.

certik avatar certik commented on May 25, 2024

@milancurcic can you elaborate a bit why the unit numbers are insufficient? The recommended approach with newunit from Fortran 2008 seems very simple: https://www.fortran90.org/src/best-practices.html#file-input-output. The unit number integer is your file like instance. That's how I always viewed it. The summarize the current approach:

To read from a file:

integer :: u
open(newunit=u, file="log.txt", status="old")
read(u, *) a, b
close(u)

Write to a file:

integer :: u
open(newunit=u, file="log.txt", status="replace")
write(u, *) a, b
close(u)

To append to an existing file:

integer :: u
open(newunit=u, file="log.txt", position="append", status="old")
write(u, *) N, V(N)
close(u)

The only annoying thing to me is that you have to put status="old" when you want to read from a file, you cannot just do open(newunit=u, file="log.txt") because the default status is undefined or something like that (I forgot the exact reason).

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024

Yes, unit numbers work. They were tedious before newunit, now they're OK. I think of them as file handles. Now, I can at best say they're not as elegant as having a file-like instance that carries the unit number abstracted away from the user. So, this is perhaps more of a style preference. I think managing unit numbers for different files (say, if you have multiple files open) may be more error-prone and harder to develop than if using file-like instances, but this is kinda hand-wavy.

With a high-level interface, your examples would be something like this. Read from a file:

type(file_type) :: f
f = file_type('log.txt', 'r')
a = f % read()
f % close()

Write to a file:

type(file_type) :: f
f = file_type('log.txt', 'w')
call f % write(a)
f % close()

Append to an existing file:

type(file_type) :: f
f = file_type('log.txt', 'r+')
call f % write(a)
f % close()

The pros as I see them:

  • A bit more elegant style;
  • Don't need to carry unit numbers, only file instances;
  • Pretty abstraction of status, position, and action attributes;
  • close could be baked into file_type's destructor so that the file is automatically closed when f goes out of scope;

Cons:

  • Would need elegant, on-the-fly formatting system between strings and numeric values;
  • Trivial implementation can write one variable per method call
  • Others?

from stdlib.

certik avatar certik commented on May 25, 2024

I talked with @zjibben about this and he pointed out another advantage of your approach: automatic closing of files when the instance goes out of scope. Also I would suggest that r is the default as in Python. So your first example can become just:

type(file_type) :: f
f = file_type('log.txt')
a = f % read()

Also perhaps the type could be called File, or File_t, we should discuss this in #3.

One issue is that you need to be able to read different types and I don't know if you can override read based on the result type. So it might need to be called as call f%read(a).

from stdlib.

certik avatar certik commented on May 25, 2024

Another issue: how do you distinguish between:

read(u,*) a, b

and

read(u,*) a
read(u,*) b

The first reads two numbers one a line, the second reads a number from two lines --- or are the two equivalent?

from stdlib.

cmacmackin avatar cmacmackin commented on May 25, 2024

I think he means that with standard write you can pass an arbitrary number of variables of different types. That makes it hard for a file%write() method to be generic because how do you create a sufficiently generic interface? I may be miss-interpreting his comment, but this is my current understanding.

Yes, that's what I was trying to communicate, albeit not very well.

from stdlib.

ivan-pi avatar ivan-pi commented on May 25, 2024

I like the idea. As mentioned by @cmacmackin this would be easier to implement if Fortran had support for variadic functions (in fact I mentioned exactly this use case in my comment at j3-fortran/fortran_proposals#76 (comment)).

The idea of wrapping a bunch of C subroutines in a generic block seems like a good idea. I am only worried about a combinatorial explosion. Say I want to read 3 integers, 4 reals, and a character array, etc., from a single line, wouldn't we then need to generate all possible combinations of all intrinsic types? Or would you somehow convert everything internally to a C pointer and use a format specifier? (Hopefully, my idea makes sense.)

type(File) :: f
integer :: a, b, c
real :: d, e, ff, g, h
character(len=1) :: char_arr(5)

call f%open("data.txt","r")
call f%readline(a, b, c, d, e, ff, g, h, char_arr, fmt=" ... some kind of format specifier ...")

Or we could just read lines as an allocatable character string and do the type conversion ourselves:

read(f%readline(),*) a, b, c, d, e, ff, g, h, char_arr ! <-- assuming this works?

I suppose this would work:

character(len=:), allocatable :: line
! ... open file ...
call f%readline(line)
read(line,*) a, b, c, d, e, ff, g, h, char_arr

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024

I would suggest to keep high-level I/O for ASCII files with formatted data (e.g., table of reals; something similar to File suggested by @milancurcic). Maybe some formats, like CSV, could be supported too.

If the user wants something more complex (mix of integers/reals/characters), or binary/stream files, the user can still use newunit (or the readline op proposed by @ivan-pi ; but probably less efficient).

from stdlib.

ivan-pi avatar ivan-pi commented on May 25, 2024

I would suggest to keep high-level I/O for ASCII files with formatted data (e.g., table of reals; something similar to File suggested by @milancurcic). Maybe some formats, like CSV, could be supported too.

So more like np.loadtxt(), but with the possibility to load the individual columns or rows directly into arrays (1d or 2d)?

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024

So more like np.loadtxt(), but with the possibility to load the individual columns or rows directly into arrays (1d or 2d)?

Yes, it was my idea.

I observed that I/O is usually quite a difficult step for most beginners when they come from R/Python/Matblab. Such high-level I/O could maybe help them.
Also, when performance is not an issue, high-level I/O could also simplify programs.

from stdlib.

certik avatar certik commented on May 25, 2024

@ivan-pi, @jvdp1 see #16 for loadtxt.

from stdlib.

milancurcic avatar milancurcic commented on May 25, 2024

Great ideas for C-style formatting implementation. A basic implementation for File can work on just strings at first, and parsing the strings into numeric variables and vice versa can be a separate issue (and module) that File would eventually depend on. I can prepare a prototype for File that works on strings.

@marshallward I'd like a maintained and tested POSIX interface as well. Do you mind opening a proposal for this? (Update: done in #22.)

@cmacmackin We can have a dedicated proposal to just C-style formatting. I like your idea for implementation. Do you mind opening a proposal for this? (Update: done in #19.)

from stdlib.

cmacmackin avatar cmacmackin commented on May 25, 2024

@ivan-pi My suggestion of calls to C was specifically for a printf function. This would avoid the combinatorial explosion because printf works on void* data types. These can be passed in from Fortran using a "deferred-type" argument, type(*). The interface would look something like this:

void printf_wrapper1(const char *format, void *arg1) {
    printf(format, arg1, arg2)
} 

void printf_wrapper2(const char *format, void *arg1, void *arg2) {
    printf(format, arg1, arg2)
} 
interface printf
subroutine printf_wrapper1(format_str, arg1) bind(c)
    character(len=1), dimension(*), intent(in) :: format_str
    type(*), intent(in) :: arg1
end subroutine printf_wrapper1

subroutine printf_wrapper2(format_str, arg1, arg2) bind(c)
    character(len=1), dimension(*), intent(in) :: format_str
    type(*), intent(in) :: arg1, arg2
end subroutine printf_wrapper2
end interface printf

The main complication with this is how to convert between Fortran and C strings. It wouldn't be hard to provide wrapper routines which do this for the Format string, but string arguments to printf could be more of a challenge.

from stdlib.

rweed avatar rweed commented on May 25, 2024

Three things I've implemented that I find useful for IO particularly input data where I create my own keyword based input. One is a File class that basically wraps current Fortran IO (open, read, write etc) but hides the unit number. I have read and write wrappers that allow you to read/write formatted, list-directed or binary of the intrinsic data types from the same subroutine. Another useful class is a deferredLenString class that is a defined type that contains a deferred length string with some routines to convert back and forth to regular strings and do the equivalent of LEN, LEN_TRIM etc. Finally, I define a fileImage class that uses an array of the deferredLenString class to hold all the records in a formatted input file in memory as a "file image". I have filters that can change case, strip delimiters etc from the original file. When processing keyword based input files, using a file image instead of the disk file is a lot faster. Some or all of these might prove useful for implementing IO classes specifically for processing text files etc.

from stdlib.

certik avatar certik commented on May 25, 2024

I generally agree with @gronki on this --- it's similar in #69. Rather than creating our own string_t, let's use the intrinsic, and work with the language and compilers to improve it if needed.

However, as a pragmatic approach, I think the best it to have both: a low level API that uses the language features. People like @gronki and I will use that primarily. And optionally a high level API, that lots of people can also use. I think we can have both. What we should not have is just the high level API without the low level one.

from stdlib.

certik avatar certik commented on May 25, 2024

@milancurcic let's separate the genuine new functionality, which would go into the low level API, from a convenience wrapper, which would go into the high level OO API.

Here are some ideas off top of my head that would go into the low level API:

  • Using the unit handle u (of type integer) directly
  • The standard Fortran functions read, write, close, etc.
  • Reading a string from a file and returning it as properly allocated (related to #69)
  • Reading an array of numbers (real or integer) and returning a properly allocated array back
  • Perhaps some easier function open (so that one can do u = open(filename) and u = open(filename, "w") as in Python etc. instead of the current open(newunit=u, file="log.txt", status="old", action="read") and open(newunit=u, file="log.txt", status="replace", action="write"), which is so complicated that I literally just had to look it up again --- the standard is badly designed that it left both status and action undefined --- we should work on fixing it) See #71.
  • ... what else?

The high level API would then wrap these in a convenience derived type wrapper file_t and methods.


I see a repeated pattern across many of the issues in stdlib ---- the high level "simple" OO interface is motivated by the fact that the low level built-in Fortran language features are insufficient. In order to fix the low level API, which a lot of people would otherwise use, let's use the OO interface as a vehicle to figure out what features we want and need (not being constrained by the limitations of the low level API: we can design our derived type and methods in any way we like). Then extract the genuine new functionality, and put it into the low level API, and the high level API is just a thin wrapper. Then let's create proposals to get some of the stdlib's improvements of the low level API into the Fortran standard itself -- backed by wide support here and our future users using it. And let people choose between the low level API and the high level API. It looks like we will have users for both.

from stdlib.

gronki avatar gronki commented on May 25, 2024

@certik I am not sure if I undestood correctly the last paragraph. Do you mean that the wrapper should be developed as the protype/testbench for the features that will be handled intrinsically in the future?

@everythingfunctional I disagree there is any reason to mimic Python's io library in fortran I/O. Is there anything that can be done in Python that Fortran does not enable to do with its current functionality? I would even argue that Fortran has superior functionality in some aspects and that the lesser brevity in other situations comes from lacks in string manipulation capabilities (for example, having to use write/read to parse strings to numbers).

Again, sorry for being a bitter potato. I am all for changes and criticizing Fortran where it sucks but I also want to make sure that we do not re-invent what's already good just because other languages handle it differently. Please do not take my opinions personally. :)

from stdlib.

certik avatar certik commented on May 25, 2024

@gronki yes, that's what I meant.

Is there anything that can be done in Python that Fortran does not enable to do with its current functionality?

Here is one example: #71. Look how much simpler the new open function is. Opening files in Fortran is so complicated that I literally have to look it up every time. This new function, on the other hand, I can remember, as it would be identical to Python.

from stdlib.

gronki avatar gronki commented on May 25, 2024

I understand your point. But my counter point is: then why not just use Python. :) What I want to say is that "just because it looks like Python" is not an argument because Fortran is not Python.

In this particular issue, the "Python-like" (or actually, C-like, because that's its origin) syntax takes the same amount of information (unit number, filename, access mode), just arranges it differently. It does not change the quality of life in any way. Nor would derived type wrapper of current I/O.

I am all for quality improvements that actually do make I/O more functional. But I disagree any of the ones discussed so far do that. That's only my opinion though and please don't be discouraged. As it was mentioned by someone before, since this is stdlib project (and not standard proposal project), it's probably best to let people make their packages and see if the solution produced ends up actually being better. :)

from stdlib.

everythingfunctional avatar everythingfunctional commented on May 25, 2024

Is there anything that can be done in Python that Fortran does not enable to do with its current functionality?

No, Python doesn't have nearly the number of options as Fortran, but that's the point. Fortran has too many options, and that makes it difficult to use. It would be nice if there was a simpler, more user friendly way to deal with files so you wouldn't have to look it up every time.

from stdlib.

certik avatar certik commented on May 25, 2024

To start: what is the advantage of unformatted sequential, compared to unformatted stream? There is a disadvantage that the format is compiler version specific. But is there any advantage? Could it be perhaps faster, or is that not the case.

from stdlib.

jvdp1 avatar jvdp1 commented on May 25, 2024

To delete a file:

  • Python: remove
  • Julia: rm
  • R: remove
  • Matlab: delete
  • C/C++: remove
    -D: remove
  • Fortran: delete in built-in close

and more examples here

remove (or rm) seems to be quite popular in other languages after a quick scan.

from stdlib.

certik avatar certik commented on May 25, 2024

I am fine with calling it delete. I assume this would go into the io module?

from stdlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.