Comments (2)
I've wondered about what to do about 64-bit integers myself, without coming up with any really good solutions.
One approach would be to change current R integers to be 64 bit rather than 32 bit. This would probably be OK for R code - I'd guess that few R programmers are relying on 32-bit integer overflow producing NA, rather than the actual value. It might be OK for most C code too, if only there were a C compiler option to make "int" be 64 bits in size rather than the now-usual 32 bits. But there isn't, that I know of, in any common C compiler.
Then there is introducing a new data type, as you propose. This seems inelegant, in that the programmer ideally shouldn't need to deal with this issue. But more practically, though it would work OK for R code, it would require changing lots of C code. C code that uses the asInteger or as Real C functions to coerce a SEXP to a desired numerical value would at least work with 64-bit integers as long as they weren't actually needed, which is something, but of course all the C code that you want to actually use 64 bits would need to be modified. There's also an issue of unknown magnitude with introducing any new basic data type, in that some C code may be assuming that the current data types are the only ones possible. Certainly, lots of code would need to be extended to deal with the new data type in a graceful way.
A third approach, which I tentatively think I would prefer if I actually were to do this, would be to simply allow arithmetic on strings, provided they are syntactically valid numbers (or maybe only integers would be allowed). This would be slow, but would have the advantage that arbitrary-size integers would be supported (not just 64-bit ones). Since there would be no new data type, there would be no problem of R code crashing when it sees an object of a mysterious type. Depending on exactly how the feature is specified, there could be problems with data unexpectedly (to some program) ending up as strings rather than numbers - perhaps when arithmetic on integers overflows, with the result represented as a string.
Another advantage of this approach is that automatically interpreting strings of digits to numbers may be convenient even apart from allowing bigger integers. A disadvantage is that errors where the string really wasn't supposed to be treated as a number may be harder to find and/or fix.
I'm not clear on what the most common use-cases for wanting 64-bit integers are, though, so I don't know whether the performance disadvantage of numbers-as-strings is crucial.
from pqr.
One approach would be to change current R integers to be 64 bit rather than 32 bit.
I think it can give rise to many problems. What about SIMD/SSE/AVX etc.? This can not be easily changed. I think that this direction would be very problematic.
A third approach, which I tentatively think I would prefer if I actually were to do this, would be to simply allow arithmetic on strings
Unfortunately this will not work if you want to quickly work with indexes (time series). And this is the problem that I want to solve. As I wrote here that using double
in POSIXct
was a wrong decision. It's broken by design. This solution will never be trustworthy. Will always appear in random erroneous results.
The only correct solution is to use int64
. This is how nanotime in R works. The problem, however, is how to distinguish REALSPX
from nanotime
with double
as storage? We can check the class
of the object:
int is_nanotime(SEXP p)
{
if (TYPEOF(x) != REALSPX)
return FALSE;
unsigned int i;
SEXP c; /* class ptr */
PROTECT(c = GET_CLASS(p));
unsigned int size = GET_LENGTH(c);
for (i = 0; i < size; i++) {
const char *s = CHAR(STRING_ELT(c, i));
if (strcmp(s, "nanotime") == 0) {
UNPROTECT(1);
return TRUE;
}
}
UNPROTECT(1);
return FALSE;
}
But this is not the most effective solution. The second problem is that other programs may treat it as REALSPX
and damage data. This is not a safe solution.
- (
integer64
type)
Then there is introducing a new data type, as you propose. This seems inelegant, in that the programmer ideally shouldn't need to deal with this issue. But more practically, though it would work OK for R code, it would require changing lots of C code.
You're absolutely right. From the R programmer's side there should not be two same types of integer.
There is also a fourth possibility to solve POSIXct
problems: timestamp
- a new type assigned to time. This type of approach is popular in the world of databases. For example:
(...)
#define RAWSXP 24 /* raw bytes */
#define S4SXP 25 /* S4, non-vector */
#define TSSXP 26 /* timestamp -> int64 */
#ifdef USE_RINTERNALS
/* (...) */
#define TIMESTAMP(x) ((int64_t *) DATAPTR_WITH_ALIGNMENT(x))
#else
extern R_NORETURN void Rf_TIMESTAMP_error(SEXP);
static inline int64_t *TIMESTAMP(SEXP x)
{ if (TYPEOF(x) != TSSXP) Rf_TIMESTAMP_error(x);
return (int64_t *) DATAPTR_WITH_ALIGNMENT(x);
}
#endif
/* etc */
This solution is completely safe and can be safely implemented - it will not break compatibility with other packages etc.
and from R:
x <- timestamp('2018-06-16T00:00:00.100000000+00:00')
# but what if
x <- 1529107200100000000
Again, we have integer64
and we can not use it that way. However, there is another solution. See this link and Kerf's manual, page 19.
They use notations similar to those of complex numbers. This in R could look like this:
x <- 2018.06.16 # '.' because 2018-06-16 -> 1994 :)
x <- T00:00:00.100000000 # 'T' because ':' is a seq operator
x <- 2018.06.16T00:00:00.100000000
It's nice to operate on such an object:
2018.06.16 + 3d # +3 days
2018.06.16 + 1m5d # +1 months and 5 days
This of course requires changing the code at the level of the R's parser, which is easier with pqR :)
btw. before I wrote this I read this: http://www.cs.utoronto.ca/~radford/RIOT2017-lang.pdf
from pqr.
Related Issues (20)
- What is preventing the merge of pqR into R? HOT 1
- pqR side by side with GNU-R HOT 8
- T and F don't work with mat_mult_with_BLAS configuration argument
- Docker image for pqR HOT 3
- No window version available HOT 1
- Incompatible library version HOT 3
- Will pqR code be merged to R? HOT 9
- Missing <R_ext/sggc-app.h> HOT 11
- R/time.R - filename restricted? HOT 5
- Update formula does not work as expected HOT 6
- make error: multiple definitions HOT 6
- segfault when using attributes HOT 1
- buffer overflow when using attributes on a recursive function HOT 2
- floating point exception
- pqR aborted when using plot
- Problems with installation ubuntu 20.04 HOT 5
- Pqr and running Rstudio on mac
- Installation fails on Slackware 15.0 HOT 13
- Slow pqR loops vs R (CRAN)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pqr.