Giter Site home page Giter Site logo

Comments (2)

radfordneal avatar radfordneal commented on July 23, 2024

I've wondered about what to do about 64-bit integers myself, without coming up with any really good solutions.

One approach would be to change current R integers to be 64 bit rather than 32 bit. This would probably be OK for R code - I'd guess that few R programmers are relying on 32-bit integer overflow producing NA, rather than the actual value. It might be OK for most C code too, if only there were a C compiler option to make "int" be 64 bits in size rather than the now-usual 32 bits. But there isn't, that I know of, in any common C compiler.

Then there is introducing a new data type, as you propose. This seems inelegant, in that the programmer ideally shouldn't need to deal with this issue. But more practically, though it would work OK for R code, it would require changing lots of C code. C code that uses the asInteger or as Real C functions to coerce a SEXP to a desired numerical value would at least work with 64-bit integers as long as they weren't actually needed, which is something, but of course all the C code that you want to actually use 64 bits would need to be modified. There's also an issue of unknown magnitude with introducing any new basic data type, in that some C code may be assuming that the current data types are the only ones possible. Certainly, lots of code would need to be extended to deal with the new data type in a graceful way.

A third approach, which I tentatively think I would prefer if I actually were to do this, would be to simply allow arithmetic on strings, provided they are syntactically valid numbers (or maybe only integers would be allowed). This would be slow, but would have the advantage that arbitrary-size integers would be supported (not just 64-bit ones). Since there would be no new data type, there would be no problem of R code crashing when it sees an object of a mysterious type. Depending on exactly how the feature is specified, there could be problems with data unexpectedly (to some program) ending up as strings rather than numbers - perhaps when arithmetic on integers overflows, with the result represented as a string.

Another advantage of this approach is that automatically interpreting strings of digits to numbers may be convenient even apart from allowing bigger integers. A disadvantage is that errors where the string really wasn't supposed to be treated as a number may be harder to find and/or fix.

I'm not clear on what the most common use-cases for wanting 64-bit integers are, though, so I don't know whether the performance disadvantage of numbers-as-strings is crucial.

from pqr.

dcegielka avatar dcegielka commented on July 23, 2024

One approach would be to change current R integers to be 64 bit rather than 32 bit.

I think it can give rise to many problems. What about SIMD/SSE/AVX etc.? This can not be easily changed. I think that this direction would be very problematic.

A third approach, which I tentatively think I would prefer if I actually were to do this, would be to simply allow arithmetic on strings

Unfortunately this will not work if you want to quickly work with indexes (time series). And this is the problem that I want to solve. As I wrote here that using double in POSIXct was a wrong decision. It's broken by design. This solution will never be trustworthy. Will always appear in random erroneous results.

The only correct solution is to use int64. This is how nanotime in R works. The problem, however, is how to distinguish REALSPX from nanotime with double as storage? We can check the class of the object:

int is_nanotime(SEXP p)
{
	if (TYPEOF(x) != REALSPX)
		return FALSE;
	unsigned int i;
	SEXP c; /* class ptr */
	PROTECT(c = GET_CLASS(p));
	unsigned int size = GET_LENGTH(c);
	for (i = 0; i < size; i++) {
		const char *s = CHAR(STRING_ELT(c, i));
		if (strcmp(s, "nanotime") == 0) {
			UNPROTECT(1);
			return TRUE;
		}
	}
	UNPROTECT(1);
	return FALSE;
}

But this is not the most effective solution. The second problem is that other programs may treat it as REALSPX and damage data. This is not a safe solution.

  1. (integer64 type)

Then there is introducing a new data type, as you propose. This seems inelegant, in that the programmer ideally shouldn't need to deal with this issue. But more practically, though it would work OK for R code, it would require changing lots of C code.

You're absolutely right. From the R programmer's side there should not be two same types of integer.

There is also a fourth possibility to solve POSIXct problems: timestamp - a new type assigned to time. This type of approach is popular in the world of databases. For example:

(...)
#define RAWSXP      24    /* raw bytes */
#define S4SXP       25    /* S4, non-vector */
#define TSSXP       26    /* timestamp -> int64 */

#ifdef USE_RINTERNALS
/* (...) */
#define TIMESTAMP(x)    ((int64_t *) DATAPTR_WITH_ALIGNMENT(x))
#else
extern R_NORETURN void Rf_TIMESTAMP_error(SEXP);
static inline int64_t *TIMESTAMP(SEXP x) 
{   if (TYPEOF(x) != TSSXP) Rf_TIMESTAMP_error(x);
    return (int64_t *) DATAPTR_WITH_ALIGNMENT(x);
}
#endif
/* etc */

This solution is completely safe and can be safely implemented - it will not break compatibility with other packages etc.

and from R:

x <- timestamp('2018-06-16T00:00:00.100000000+00:00')
# but what if
x <- 1529107200100000000

Again, we have integer64 and we can not use it that way. However, there is another solution. See this link and Kerf's manual, page 19.

They use notations similar to those of complex numbers. This in R could look like this:

x <- 2018.06.16           # '.' because 2018-06-16 -> 1994 :)
x <- T00:00:00.100000000  # 'T' because ':' is a seq operator
x <- 2018.06.16T00:00:00.100000000

It's nice to operate on such an object:

2018.06.16 + 3d     # +3 days
2018.06.16 + 1m5d   # +1 months and 5 days

This of course requires changing the code at the level of the R's parser, which is easier with pqR :)

btw. before I wrote this I read this: http://www.cs.utoronto.ca/~radford/RIOT2017-lang.pdf

from pqr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.