jfbu / texdimens Goto Github PK
View Code? Open in Web Editor NEWUtilities and documentation related to TeX dimensional units
Utilities and documentation related to TeX dimensional units
This is continuation of #3, after some delay because I managed in the meantime to bring down a server from an Emacs buffer, and decided to flee home after that.
*\message{\texdimenwithunit{0pt}{1pt}}
0.00002
*\message{\the\dimexpr0.00002\dimexpr1pt\relax\relax}
0.00002pt
But 0pt
is handled correctly by \texdimenwithunit
if second argument is >1pt
. The problem here is that the computer guy applied a math formula proven only for conversion ratio >1
, not =1
. In #3, there is a different problem applying when second argument is <1pt
. There the output is technically correct (although not quite pleasing).
What an embarrassment. I fixed #4 by applying the <1pt
branch to =1pt
as the former now handled correctly dim1=0pt
but this is of course definitely wrong inducing a shift of +1sp. Very stupid.
edit: it was also very stupid before that change when the =1pt
case was using the >1pt
branch, because this is same math as <1pt
albeit implemented differently, and the math is wrong for this special =1pt
case...
*\message{\texdimenwithunit{1234sp}{1pt}}
0.01884
*\message{\number\dimexpr0.01884pt}
1235
*\message{\texdimenwithunit{1235sp}{1pt}}
0.01886
*\message{\number\dimexpr0.01886pt}
1236
alas... and I already pushed it to CTAN.
This is very disconcerting, is your TeX installation perhaps corrupted by some malware? Please fix this if possible!
The motivation of this issue largely comes from CJK (Chinese/Japanese/Korean) typography traditions.
The “large” units (in
and cm
) are good for describing page elements, but they are too crude when it comes to describing text elements (such as font size). The Japanese Industrial Standard JIS X 4051 defines the unit “Q” to be used when specifying font sizes (it also defines PostScript point bp
). One “Q” is a quarter of a millimeter (0.25mm), and 13Q=3.25mm body text (about 9.247 TeX points) is quite commonly used in Japanese publications. Similarly, the China National Standard GB/T 18358-2009 (recommended/non-mandatory) defines various text/page elements based on either mm
or bp
unit.
Thus, it would be nice to provide \texdimenbothbpmm
, \texdimenbothmmbp
, etc., to give maximal dimension not exceeding the original input while being attainable in both mm
and bp
units.
As proven and documented, for in
and cm
, the attainable Usp
has the form U=floor(a*7227/100)=floor(b*7227/254)
, where a=50*k
and b=127*k
for some common integer k (nonnegative). Any other a=50*k+r
(r=1..49
) or b=127*k+r
(r=1..126
) will lead to a Usp
which is not attainable in the other unit.
The situation is different (and far more complicated) for bp
and mm
though. I started by looking at inputs in mm
unit, so X=floor(a*7227/2540)
for some a=2540*k+r
(r=0..2539
). Since the behavior of X
is periodic (mod 2540), we can focus on a=0..2539
only. Now the question is whether this X
is attainable from bp
or not.
We know X
is not attainable from bp
if and only if X=267
, 535
, or 802
(mod 803). We can then brute force through a=0..2539
to see which a
leads to one of these three unattainable values.
(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==267 *)
Total[
Table[
If[Mod[Floor[a 7227/2540], 803] == 267, 1, 0],
{a, 0, 2540 - 1}
]
]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=94, 1223, or 1505 *)
(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==535 *)
Total[
Table[
If[Mod[Floor[a 7227/2540], 803] == 535, 1, 0],
{a, 0, 2540 - 1}
]
]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=1035, 1317, or 2446 *)
(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==802 *)
Total[
Table[
If[Mod[Floor[a 7227/2540], 803] == 802, 1, 0],
{a, 0, 2540 - 1}
]
]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=282, 1411, or 1693 *)
Exactly 9 inputs of a
(out of every span of 2540 integers) lead to unattainable Xsp
in bp
unit; they are a=2540*k+r
, where r=94
, 282
, 1035
, 1223
, 1317
, 1411
, 1505
, 1693
, and 2446
.
So the algorithm for \texdimenbothmmbp
seems straightforward:
a
;a=94
, 282
, 1035
, 1223
, 1317
, 1411
, 1505
, 1693
, or 2446
(mod 2540).a
as is.a := a-1
.Units nc
and nd
are not available with the XeTeX and e(u)pTeX engines. But the package still provides associated unit conversion macros. Should it keep doing so?
nc/nd
units which would be possible only via extra mark-up and mimicking TeX exact behaviour, this is possible using xintexpr.The only problem here is that \dimexpr\texdimennc{1pt}nc\relax
will crash simply from engine not recognizing unit, but \texdimennc{1pt}
has no issue:
$ rlwrap euptex
This is e-upTeX, Version 3.141592653-p4.1.0-u1.29-230214-2.6 (utf8.uptex) (TeX Live 2024) (preloaded format=euptex)
restricted \write18 enabled.
**texdimens
entering extended mode
(/usr/local/texlive/2023/texmf-dist/tex/generic/texdimens/texdimens.tex)
*\message{\texdimennc{1pt}, \texdimenncup{1pt}, \texdimenncdown{1pt}}
0.07811, 0.07811, 0.0781
*\bye
No pages of output.
Transcript written on texdimens.log.
This looks to me as only a documentation issue.
Relates: latex3/latex3#1217
Recent advances having being leaked to the worldwide press in commit 13555b3, it appears feasible with not too extreme overhead to rewrite entirely the \texdimenUU{up,down}
macros for those units dd
, nc
, and in
for which currently the macros are not usable at or near \maxdimen
.
In the branch of \texdimenwithunit{<dim1>}{<dim2>}
handling the dim2=f sp
, f<65536
case, there is tantalizing code comment:
texdimens/texdimens/texdimens.tex
Lines 800 to 806 in 4cbca9b
Now my nephew who is good at maths says she sees various methods but as she was educated in LaTeX she can not help with the Plain e-TeX in the code!
Experience with the "up" and "down" macros has shown that it may be more efficient to do one or two more \numexpr
than opt for the (admittedly clever and admired) detour hijacking TeX's built-in dimension input process. So please spell out what you have in mind and improve your software if possible. Thanks.
The user interface macros are named \texdiminbp
, \texdiminbpdown
, \texdiminbpup
, etc...
But the package name is texdimens
and some internal macros use \texdimen
prefix, for example \texdimenstrippt
(which has a public name).
Also \texdiminin
is a mouthful. Wouldn't it be better for the user macros to be named \texdimenbp
, \texdimencm
, \texdimenin
?
I see no problem with 2 extra lines to cover negative <dimen2>
:
\def\texdimenwithunit_#1;#2{%
\ifnum#1=\p@\texdimendothis\texdimenwithunit_p@\fi
\ifnum#1>\p@\texdimendothis\texdimenwithunit_A\fi
\ifnum#1=-\p@\texdimendothis\texdimenwithunit_p@\fi % handles f = -65536
\ifnum#1<-\p@\texdimendothis\texdimenwithunit_A\fi % handles f < -65536
\texdimenorthat\texdimenwithunit_B#2#1;% -65536<f<65536; user input f=0 deserves an error
}%
The truncation and rounding directions should behave the way we want them to (unless I missed something?)
The PR #15 is about a \texdimendivide{dim1}{dim2}
doing an approximate computation of the mathematical dim1/dim2
, to the extent possible in (e)TeX base handling of input and output of dimensions. Indeed current (0.99
) \texdimenwithunit{dim1}{dim2}
gives an ouput R
is guaranteed to let R<dim2>
be parsed by TeX into something close to dim1
, but the way it does this lets this R
diverge somewhat from the mathematical dim1/dim2
, and very noticeably for small dimensions.
Indeed as TeX truncates when it multiplies, this R
, whose specification is that R<dim2>
should be near <dim1>
, will by force usually be larger than the mathematical dim1/dim2
ratio.
In the current implementation the divergence is very noticeable for small dimensions. For example:
*\message{\texdimenwithunit{2sp}{3sp}}
0.83333
*\message{\number\dimexpr0.83333\dimexpr3sp}
2
At first one is comforted by the fact that 0.66666
indeed would not work:
*\message{\number\dimexpr0.66666\dimexpr3sp}
1
but the naive expected value 0.66667
does work:
*\message{\number\dimexpr0.66667\dimexpr3sp}
2
This is indication that perhaps the handling of \texdimenwithdivide{dim1}{dim2}
is sub-optimal, particularly in the dim2<1pt
branch. The chosen formula does work but isn't it a bit too secure?
As explained in #2 (comment):
the condition on
N
is that it should be at leastceil(U * psi)
and at mostceil((U+1) * psi) -1
.No wonder then that
round(U * psi)
will not always work: if it rounds strictly down, we are doomed.What about the
M = round((U+0.5)*psi)̀
approach, will it work? (psi = 1/phi > 1
).
and we use currently the round((U+0.5)*psi)̀
formula in this dim2<1pt
case. But the closest to round(U * psi)
(which is best we can do to approximate the mathematical dim1/dim2
is ceil(U*psi)
.
I propose \texdimenwithunit{dim1}{dim2}
should implement, for dim2<1pt
, the ceil(U*psi)
formula so as to reduce the divergence from exact mathematical ratio dim1/dim2
. This will make #15 unneeded.
You mentioned here that
Regarding
ex
andem
this is more challenging and costly as
- it would need to hook into font selection
- anyhow there is a problem of principle if those units end up to be less than
1pt
.
However, I don’t think these are actual constraints. The observations above are based on explicit conversion ratios from the TeX/pdfTeX sources. But ex
and em
are simply handled the same way as internal dimensions, which means:
ex
and em
are font-dependent).\documentclass{article}
\begin{document}
Recall that the `\verb|1.3|' in `\verb|1.3\dimen0|' is internally represented as
$n+f/2^{16}$, where $n=1$ is the integer part and
$f=\hbox{round}(0.3\times2^{16})=\lfloor0.3\times2^{16}+1/2\rfloor=19661$.
So the input \verb|1.3| `equals' $(2^{16}+19661)/2^{16}=85197/2^{16}$.
\[
\dimen0=606021sp % 13Q or 3.25mm in Japanese typography
\verb|\dimen0=606021sp|
\Rightarrow
\verb|1.3\dimen0|
=\number\dimexpr1.3\dimen0\relax\,\hbox{sp}
=\Bigl\lfloor606021\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor,
\]
whereas $1.3\times606021=787827.3$ (rounds to 787827\,sp, which is 2\,sp short).
Now, let us try \verb|\font\1=cmr10 at 606021sp| and inspect
\verb|1em| and \verb|1.3em|:
\font\1=cmr10 at 606021sp
\[
\hbox{\1\verb|1em| is \number\dimexpr1em\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\1\verb|1.3em| is \number\dimexpr1.3em\relax\,sp}
=\Bigl\lfloor606022\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]
How about \verb|1ex| and \verb|1.3ex| for \verb|\font\2=cmr10 at 1212042sp|?
\font\2=cmr10 at 1212042sp
\[
\hbox{\2\verb|1ex| is \number\dimexpr1ex\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\2\verb|1.3ex| is \number\dimexpr1.3ex\relax\,sp}
=\Bigl\lfloor521851\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]
\end{document}
There is no much rationale in keeping the code comments (it seems there are 592 lines starting with a %
out of a total of 930 lines), together with texdimens.tex
, since they all can be replaced for example a by a link to this repo where we can put a non-stripped version. Or we could distribute separatedly the comments so people see them if they want to explore the code offline.
A (very tiny) gain in \input texdimens
or \usepackage{texdimens)
will be offered to the enhanced user experience if comments are stripped out. Indeed, so far texdimens
has not been incorporated by the TeX formats such as Plain or LaTeX that people use, despite the obvious advantages this would bring the TeX world in general. On the other hand let's recognize that LaTeX3 was for quite a few years loaded separatedly before making it into the formats, so let's be honest, it is normal people wait a bit until incorporating texdimens
too into all pre-built formats, despite its jaw-dropping effect or "whow"-effect it triggers in all people who encountered it along their TeX journey.
In the code for the bp, dd, and nd unit such as
texdimens/texdimens/texdimens.tex
Lines 217 to 222 in 266b4c2
the \relax\relax
could be a single \relax
. I don't remember why I used two, except that I like to not rely on e-TeX rules for terminations of expression apart from the \relax
-one. So perhaps (and now that I added benchmark file, this is only a matter of doing it), try out with a single \relax
and see it if it brings any gain.
By the way there are other instances of \relax\relax
in the code but those can't be reduced to a single \relax
.
For some reason the code multiplies by 2 the dimension in a numexpr. But for 1pt
branch I then operated on this in a dimexpr, adding division by 2. This will cause overflow:
*\let\m\message
*\m{\texdimenwithunit{1000pt}{1pt}}
1000.0
*\m{\texdimenwithunit{9000pt}{1pt}}
! Dimension too large.
<to be read again>
/
\texdimenwithunit_p@ ...ippt \the \dimexpr #1#3sp/
2\relax
<*> \m{\texdimenwithunit{9000pt}{1pt}
}
? X
Really having a bad day here...
For the second argument<1pt
branch (and also =1pt
as it got merged into it), a certain quantity C = ceil(r * 65536/f)
is obtained (indirectly) from which a decimal E
is then derived via E pt = C sp
. Here 0<=r<f<=65536
and the code comments seem to leave open the possibility that E=1.0
.
This seems wrong: E=1.0
means C=65536
, i.e. r*65536/f>65535
i.e. r>f - f/65536
. But f-1>=r
so we get f-1>f-f/65536
i.e. f/65536>1
. And this is false.
Am I correct? If yes, please amend your too hastily written code comments please.
The cause is that the mathematical explanations assumed first argument non-zero. The tech guy blindly followed the backoffice maths with no real understanding.
I thought I had fixed this already (in b93ed3c, for 1.0 release) but I must have been grepping for smaller than
and overlooked smaller (in absolute value) than
. ... too late for 1.1 already released. It seems one can check typos only after releases, what a curse.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.