jfbu / texdimens Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 354 KB

Utilities and documentation related to TeX dimensional units

Makefile 2.64% Shell 5.87% TeX 89.83% Python 1.67%

texdimens's People

Stargazers

Watchers

texdimens's Issues

\texdimenwithunit{0pt}{1pt} plain wrong

This is continuation of #3, after some delay because I managed in the meantime to bring down a server from an Emacs buffer, and decided to flee home after that.

*\message{\texdimenwithunit{0pt}{1pt}}
0.00002
*\message{\the\dimexpr0.00002\dimexpr1pt\relax\relax}
0.00002pt

But 0pt is handled correctly by \texdimenwithunit if second argument is >1pt. The problem here is that the computer guy applied a math formula proven only for conversion ratio >1, not =1. In #3, there is a different problem applying when second argument is <1pt. There the output is technically correct (although not quite pleasing).

0.99b still does not get \texdimenwithunit{dim1}{1pt} right...

What an embarrassment. I fixed #4 by applying the <1pt branch to =1pt as the former now handled correctly dim1=0pt but this is of course definitely wrong inducing a shift of +1sp. Very stupid.

edit: it was also very stupid before that change when the =1pt case was using the >1pt branch, because this is same math as <1pt albeit implemented differently, and the math is wrong for this special =1pt case...

*\message{\texdimenwithunit{1234sp}{1pt}}
0.01884
*\message{\number\dimexpr0.01884pt}
1235
*\message{\texdimenwithunit{1235sp}{1pt}}
0.01886
*\message{\number\dimexpr0.01886pt}
1236

alas... and I already pushed it to CTAN.

Why isn't "TeX" rendered with a stylish (sort-of) epsilon letter (nicely shifted downwards) in the pdf documentation?

This is very disconcerting, is your TeX installation perhaps corrupted by some malware? Please fix this if possible!

Add `\texdimenbothbpmm` and friends for conversions between `bp` and `mm`?

Motivation

The motivation of this issue largely comes from CJK (Chinese/Japanese/Korean) typography traditions.

The “large” units (in and cm) are good for describing page elements, but they are too crude when it comes to describing text elements (such as font size). The Japanese Industrial Standard JIS X 4051 defines the unit “Q” to be used when specifying font sizes (it also defines PostScript point bp). One “Q” is a quarter of a millimeter (0.25mm), and 13Q=3.25mm body text (about 9.247 TeX points) is quite commonly used in Japanese publications. Similarly, the China National Standard GB/T 18358-2009 (recommended/non-mandatory) defines various text/page elements based on either mm or bp unit.

Thus, it would be nice to provide \texdimenbothbpmm, \texdimenbothmmbp, etc., to give maximal dimension not exceeding the original input while being attainable in both mm and bp units.

My research so far

As proven and documented, for in and cm, the attainable Usp has the form U=floor(a*7227/100)=floor(b*7227/254), where a=50*k and b=127*k for some common integer k (nonnegative). Any other a=50*k+r (r=1..49) or b=127*k+r (r=1..126) will lead to a Usp which is not attainable in the other unit.

The situation is different (and far more complicated) for bp and mm though. I started by looking at inputs in mm unit, so X=floor(a*7227/2540) for some a=2540*k+r (r=0..2539). Since the behavior of X is periodic (mod 2540), we can focus on a=0..2539 only. Now the question is whether this X is attainable from bp or not.

We know X is not attainable from bp if and only if X=267, 535, or 802 (mod 803). We can then brute force through a=0..2539 to see which a leads to one of these three unattainable values.

(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==267 *)
Total[
 Table[
  If[Mod[Floor[a 7227/2540], 803] == 267, 1, 0],
  {a, 0, 2540 - 1}
  ]
 ]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=94, 1223, or 1505 *)

(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==535 *)
Total[
 Table[
  If[Mod[Floor[a 7227/2540], 803] == 535, 1, 0],
  {a, 0, 2540 - 1}
  ]
 ]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=1035, 1317, or 2446 *)

(* Xsp is input via mm, but unattainable via bp because Mod[X,803]==802 *)
Total[
 Table[
  If[Mod[Floor[a 7227/2540], 803] == 802, 1, 0],
  {a, 0, 2540 - 1}
  ]
 ]
(* gives 3 *)
(* X=Floor[(2540*k+r)*7227/2540], r=282, 1411, or 1693 *)

Exactly 9 inputs of a (out of every span of 2540 integers) lead to unattainable Xsp in bp unit; they are a=2540*k+r, where r=94, 282, 1035, 1223, 1317, 1411, 1505, 1693, and 2446.

So the algorithm for \texdimenbothmmbp seems straightforward:

get a;
test if a=94, 282, 1035, 1223, 1317, 1411, 1505, 1693, or 2446 (mod 2540).
if not, leave a as is.
if yes, a := a-1.

Should `\texdimennc` and `\texdimennd` and their "up and down" variants be left undefined for `xetex` and `e(u)ptex`?

Units nc and nd are not available with the XeTeX and e(u)pTeX engines. But the package still provides associated unit conversion macros. Should it keep doing so?

con: the units are not available in these engines,
pro: the macros will work correctly and not raise any specific error, so they could be useful in typesetting with these engines some article devoted to TeX dimensions, or needing to illustrate some conversions. Admittedly such article would probably also need to illustrate inputs with nc/nd units which would be possible only via extra mark-up and mimicking TeX exact behaviour, this is possible using xintexpr.

The only problem here is that \dimexpr\texdimennc{1pt}nc\relax will crash simply from engine not recognizing unit, but \texdimennc{1pt} has no issue:

$ rlwrap euptex
This is e-upTeX, Version 3.141592653-p4.1.0-u1.29-230214-2.6 (utf8.uptex) (TeX Live 2024) (preloaded format=euptex)
 restricted \write18 enabled.
**texdimens
entering extended mode
(/usr/local/texlive/2023/texmf-dist/tex/generic/texdimens/texdimens.tex)
*\message{\texdimennc{1pt}, \texdimenncup{1pt}, \texdimenncdown{1pt}}
0.07811, 0.07811, 0.0781
*\bye
No pages of output.
Transcript written on texdimens.log.

This looks to me as only a documentation issue.

Relates: latex3/latex3#1217

Let the `\texdimenUU{up,down}` be usable even with `\maxdimen` input also for the `dd`, `nc` and `in` units

Recent advances having being leaked to the worldwide press in commit 13555b3, it appears feasible with not too extreme overhead to rewrite entirely the \texdimenUU{up,down} macros for those units dd, nc, and in for which currently the macros are not usable at or near \maxdimen.

Please make explicit mysterious "I considered various ways" code comments to \texdimenwithunit

In the branch of \texdimenwithunit{<dim1>}{<dim2>} handling the dim2=f sp, f<65536 case, there is tantalizing code comment:

texdimens/texdimens/texdimens.tex

Lines 800 to 806 in 4cbca9b

    
           %          Computing C = ceil(r * 65536/f) in \numexpr is the delicate 
        
           %          part, as r can be as large as f-1 hence 65535 and r*65536 would 
        
           %          overflow. We could compute R=round(r*65536/f) ("scaling operation") 
        
           %          then C=R+1 if R*f-65536*r<0, else C=R. 
        
           % 
        
           %          The problem is then: how to get the sign of R*f-65536*r without 
        
           %          overflow? I considered various ways.

Now my nephew who is good at maths says she sees various methods but as she was educated in LaTeX she can not help with the Plain e-TeX in the code!

Experience with the "up" and "down" macros has shown that it may be more efficient to do one or two more \numexpr than opt for the (admittedly clever and admired) detour hijacking TeX's built-in dimension input process. So please spell out what you have in mind and improve your software if possible. Thanks.

Rename all macros to use \texdimen prefix?

The user interface macros are named \texdiminbp, \texdiminbpdown, \texdiminbpup, etc...

But the package name is texdimens and some internal macros use \texdimen prefix, for example \texdimenstrippt (which has a public name).

Also \texdiminin is a mouthful. Wouldn't it be better for the user macros to be named \texdimenbp, \texdimencm, \texdimenin?

Add a CHANGES.md file

Cover negative dimen2 in `\texdimenwithunit`

I see no problem with 2 extra lines to cover negative <dimen2>:

\def\texdimenwithunit_#1;#2{%
        \ifnum#1=\p@\texdimendothis\texdimenwithunit_p@\fi
        \ifnum#1>\p@\texdimendothis\texdimenwithunit_A\fi
        \ifnum#1=-\p@\texdimendothis\texdimenwithunit_p@\fi % handles f = -65536
        \ifnum#1<-\p@\texdimendothis\texdimenwithunit_A\fi  % handles f < -65536
        \texdimenorthat\texdimenwithunit_B#2#1;% -65536<f<65536; user input f=0 deserves an error
}%

The truncation and rounding directions should behave the way we want them to (unless I missed something?)

\texdimenwithunit{dim1}{dim2} could and even should output a ratio R closer to mathematical dim1/dim2

The PR #15 is about a \texdimendivide{dim1}{dim2} doing an approximate computation of the mathematical dim1/dim2, to the extent possible in (e)TeX base handling of input and output of dimensions. Indeed current (0.99) \texdimenwithunit{dim1}{dim2} gives an ouput R is guaranteed to let R<dim2> be parsed by TeX into something close to dim1, but the way it does this lets this R diverge somewhat from the mathematical dim1/dim2, and very noticeably for small dimensions.

Indeed as TeX truncates when it multiplies, this R, whose specification is that R<dim2> should be near <dim1>, will by force usually be larger than the mathematical dim1/dim2 ratio.

In the current implementation the divergence is very noticeable for small dimensions. For example:

*\message{\texdimenwithunit{2sp}{3sp}}
0.83333
*\message{\number\dimexpr0.83333\dimexpr3sp}
2

At first one is comforted by the fact that 0.66666 indeed would not work:

*\message{\number\dimexpr0.66666\dimexpr3sp}
1

but the naive expected value 0.66667 does work:

*\message{\number\dimexpr0.66667\dimexpr3sp}
2

This is indication that perhaps the handling of \texdimenwithdivide{dim1}{dim2} is sub-optimal, particularly in the dim2<1pt branch. The chosen formula does work but isn't it a bit too secure?

As explained in #2 (comment):

the condition on N is that it should be at least ceil(U * psi) and at most ceil((U+1) * psi) -1.

No wonder then that round(U * psi) will not always work: if it rounds strictly down, we are doomed.

What about the M = round((U+0.5)*psi)̀ approach, will it work? (psi = 1/phi > 1).

and we use currently the round((U+0.5)*psi)̀ formula in this dim2<1pt case. But the closest to round(U * psi) (which is best we can do to approximate the mathematical dim1/dim2 is ceil(U*psi).

I propose \texdimenwithunit{dim1}{dim2} should implement, for dim2<1pt, the ceil(U*psi) formula so as to reduce the divergence from exact mathematical ratio dim1/dim2. This will make #15 unneeded.

About `ex` and `em` units

You mentioned here that

Regarding ex and em this is more challenging and costly as

it would need to hook into font selection

anyhow there is a problem of principle if those units end up to be less than 1pt.

However, I don’t think these are actual constraints. The observations above are based on explicit conversion ratios from the TeX/pdfTeX sources. But ex and em are simply handled the same way as internal dimensions, which means:

There is no need to hard-code any inverse of conversion ratio (well, it would be impossible anyway, because ex and em are font-dependent).
The barrier of “at least 1pt” is lifted.

\documentclass{article}
\begin{document}
Recall that the `\verb|1.3|' in `\verb|1.3\dimen0|' is internally represented as
$n+f/2^{16}$, where $n=1$ is the integer part and
$f=\hbox{round}(0.3\times2^{16})=\lfloor0.3\times2^{16}+1/2\rfloor=19661$.
So the input \verb|1.3| `equals' $(2^{16}+19661)/2^{16}=85197/2^{16}$.
\[
\dimen0=606021sp % 13Q or 3.25mm in Japanese typography
\verb|\dimen0=606021sp|
\Rightarrow
\verb|1.3\dimen0|
=\number\dimexpr1.3\dimen0\relax\,\hbox{sp}
=\Bigl\lfloor606021\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor,
\]
whereas $1.3\times606021=787827.3$ (rounds to 787827\,sp, which is 2\,sp short).

Now, let us try \verb|\font\1=cmr10 at 606021sp| and inspect
\verb|1em| and \verb|1.3em|:
\font\1=cmr10 at 606021sp
\[
\hbox{\1\verb|1em| is \number\dimexpr1em\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\1\verb|1.3em| is \number\dimexpr1.3em\relax\,sp}
=\Bigl\lfloor606022\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]

How about \verb|1ex| and \verb|1.3ex| for \verb|\font\2=cmr10 at 1212042sp|?
\font\2=cmr10 at 1212042sp
\[
\hbox{\2\verb|1ex| is \number\dimexpr1ex\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\2\verb|1.3ex| is \number\dimexpr1.3ex\relax\,sp}
=\Bigl\lfloor521851\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]
\end{document}

Move code comments elsewhere?

There is no much rationale in keeping the code comments (it seems there are 592 lines starting with a % out of a total of 930 lines), together with texdimens.tex, since they all can be replaced for example a by a link to this repo where we can put a non-stripped version. Or we could distribute separatedly the comments so people see them if they want to explore the code offline.

A (very tiny) gain in \input texdimens or \usepackage{texdimens) will be offered to the enhanced user experience if comments are stripped out. Indeed, so far texdimens has not been incorporated by the TeX formats such as Plain or LaTeX that people use, despite the obvious advantages this would bring the TeX world in general. On the other hand let's recognize that LaTeX3 was for quite a few years loaded separatedly before making it into the formats, so let's be honest, it is normal people wait a bit until incorporating texdimens too into all pre-built formats, despite its jaw-dropping effect or "whow"-effect it triggers in all people who encountered it along their TeX journey.

Maybe check if removing one `\relax` which is not mandated by expression syntax brings some gain

In the code for the bp, dd, and nd unit such as

texdimens/texdimens/texdimens.tex

Lines 217 to 222 in 266b4c2

    
           % bp 7227/7200 = 803/800 
        
           % 
        
           \def\texdimenbp#1{\expandafter\texdimenstrippt\the\dimexpr\numexpr(% 
        
                             \expandafter\texdimen_bpnddd_signcheck 
        
                             \the\numexpr2*\dimexpr#1\relax\relax)*400/803sp\relax}% 
        
           \def\texdimen_bpnddd_signcheck#1{\texdimengobtilminus#1-1+#1}%

the \relax\relax could be a single \relax. I don't remember why I used two, except that I like to not rely on e-TeX rules for terminations of expression apart from the \relax-one. So perhaps (and now that I added benchmark file, this is only a matter of doing it), try out with a single \relax and see it if it brings any gain.

By the way there are other instances of \relax\relax in the code but those can't be reduced to a single \relax.

\texdimenwithunit{dim}{1pt} as "fixed" for #4 and #6 can cause overflow!

For some reason the code multiplies by 2 the dimension in a numexpr. But for 1pt branch I then operated on this in a dimexpr, adding division by 2. This will cause overflow:

*\let\m\message

*\m{\texdimenwithunit{1000pt}{1pt}}
1000.0
*\m{\texdimenwithunit{9000pt}{1pt}}
! Dimension too large.
<to be read again> 
                   /
\texdimenwithunit_p@ ...ippt \the \dimexpr #1#3sp/
                                                  2\relax 
<*> \m{\texdimenwithunit{9000pt}{1pt}
                                     }
? X

Really having a bad day here...

Code comment of `\texdimenwithunit` about some `E` possibly `1.0`, not always `0.ddd...` seems wrong

For the second argument<1pt branch (and also =1pt as it got merged into it), a certain quantity C = ceil(r * 65536/f) is obtained (indirectly) from which a decimal E is then derived via E pt = C sp. Here 0<=r<f<=65536 and the code comments seem to leave open the possibility that E=1.0.

This seems wrong: E=1.0 means C=65536, i.e. r*65536/f>65535 i.e. r>f - f/65536. But f-1>=r so we get f-1>f-f/65536 i.e. f/65536>1. And this is false.

Am I correct? If yes, please amend your too hastily written code comments please.

\texdimenwithunit{0pt}{<dim>} should always output 0.0 but it does not

The cause is that the mathematical explanations assumed first argument non-zero. The tech guy blindly followed the backoffice maths with no real understanding.

Documentation in 1.1 release of \texdimenbothbpmm and \texdimenbothmmbp use "smaller than" but intend "not exceeding"

I thought I had fixed this already (in b93ed3c, for 1.0 release) but I must have been grepping for smaller than and overlooked smaller (in absolute value) than. ... too late for 1.1 already released. It seems one can check typos only after releases, what a curse.

	% Computing C = ceil(r * 65536/f) in \numexpr is the delicate
	% part, as r can be as large as f-1 hence 65535 and r*65536 would
	% overflow. We could compute R=round(r*65536/f) ("scaling operation")
	% then C=R+1 if Rf-65536r<0, else C=R.
	%
	% The problem is then: how to get the sign of Rf-65536r without
	% overflow? I considered various ways.

	% bp 7227/7200 = 803/800
	%
	\def\texdimenbp#1{\expandafter\texdimenstrippt\the\dimexpr\numexpr(%
	\expandafter\texdimen_bpnddd_signcheck
	\the\numexpr2\dimexpr#1\relax\relax)400/803sp\relax}%
	\def\texdimen_bpnddd_signcheck#1{\texdimengobtilminus#1-1+#1}%