ckormanyos / wide-integer Goto Github PK

Wide-Integer implements a generic C++ template for uint128_t, uint256_t, uint512_t, uint1024_t, etc.

License: Boost Software License 1.0

C++ 99.48% CMake 0.14% C 0.01% Python 0.38%

arbitrary-precision arbitrary-precision-integer big-integer embedded-systems high-performance multiprecision numerical uint128 uint24 uint256 uint48 uint512

wide-integer's Introduction

Welcome

I'm a software developer, book author and independent researcher with focus on all aspects of high-performance, portable software. I have particular interest in embedded-systems software, mathematical software and quality.

My areas of expertise include software implementation, teaching and training, product development, large-scale build systems, continuous integration, quality, team leadership, international development, documentation and providing support roles.

Real-time embedded-systems software
High-performance mathematical software
Large-scale build systems and continuous integration systems
Software quality, performance and portability

How busy am I ?

wide-integer's People

Contributors

Stargazers

Watchers

wide-integer's Issues

Signed extension planned?

The unsigned integer types are present.
Are there any potential plans for a signed integer type?

GCC with -fsanitize=address indicates problem

When using GCC with -fsanitize=address, there is indication of a problem. It seems like it might stem from the multiplication routine. it arises when the multithreaded add test begins, but not in the tests of the examples.

This is a preliminary report. Not very much is known of this issue and it seems to not influence numerical correctness. Nonetheless, address sanitation does seem to pick up a clear problem indication.

Clean up trivial thread snizize issues in tests

Some trivial thread sanitize issues are found in the test routines. These should be cleaned up and eliminated. There do not seem to be any difficult issues, just some forgotten sync objects needed.

Create a GCC/GCOV build or shell script

Fix or explain why example007 breaks CI

Example007 breaks CI and the strict testing of the randomly generated result is not used in verification. This is a workaround.

The point of this issue is to try to understand why Example007 behaves differently on CI server. Is there an issue with byte ordering? Another possibility might be misunderstanding of an initialized creation of an instance of a standard random generator.

Simplify CI with test matrices

It should be possible to simplify CI with test matrices by grouping compiler(s) or language standard(s) together.

Reduce amount of Boost develop branch used in CI

The Boost multiprecision (and if needed math) dependencies have been reduced as of Boost 1.76. The amount of Boost develop branch needed for CI builds and runs can be significantly reduced.

Remove PCG random and use <random> instead

The use of a specialized version of PCG seems non-intuitive and cluttered within the context of modern C++ programming in this particular lasss library. It will probably be better for this design to simply use available engines and adapters from standard instead.

uint24_t missing (removed)

Support for uint24_t is missing because it has been removed. The reason is because digit checks for higher digit counts disallow 24-bit type because the half-width type plays such an important role in the implementation. And there is no valid half-width type with 12 bits.

At the moment, there is no plan to restore support for the 24-bit type.

Add MinGW builds and runs to CI

Set up CI

Set up continuous integration preliminarily use Ubuntu/GCC with libboost.

Optimized squaring routine

Make an optimized squaring routine to multiply n*n.

Relax most constraints on binary digit count

Relax most of the constraints on the binary digit count allowed in uintwide_t. Add support for bit counts that also contain fractional parts of a limb such as uint99_t, etc. To be decided is storage left-shifted to MSB or resting on the LSB.

Remove uintwide_t constructor(s) from array-like aggregates

Remove uintwide_t's constructor(s) from array-like aggregates. These include std::initializer_list, classic C-array and possibly even convenience construction from the so-called representation_type. This issue results from #63

Optimize unsigned integer sqrt

Try to optimize unsigned integer square root, for instance, with the algorithm SqrtRem known from the MPFR author's literature.

Construct-from float types constexpr

If is_iec559, then it should be possible to write a local constexpr set of floating-point dissection functions allowing for constexpr construct from built-in floating-poin types.

This is a refinement of #47. See also #77 which achieved constexprcast-to built-in floating-poin types.

Add Apple GCC to CI

Add Apple GCC and GCC MinGW builds and runs to CI.

Reduce amount of Git cloning in CI

Reduce amount of Git cloning in CI, which should be possible with the reduced dependencies of Math and (to a lesser extent) Multiprecision in develop branch.

Try/investigate specialized mul 8-by-8 limbs

Try to implement and investigate specialized multiplication (unrolled) of 8-by-8 limbs. Is it faster for common limb widths?

Tests to use manual verify instead of Boost.Test

Tests should use manual verify instead of Boost.Test. This will allow for working around Boost.Test bugs with certani C++20 compilers.

digits now means width

When a type is unsigned, you can get away with using these terms interchangeably. Now, I'd caution against continuing to use the term 'digits' to mean number of binary digits in the storage. Not only is it inconsistent with CNL, but more importantly it doesn't match numeric limits.

The giveaway that this is problematic is the definition of numeric_limits_uintwide_t_base::digits where clearly this word is being used in to different ways within the project.

I'd recommend either

changing the first tparam from Digits2 to Width2 or Bits, or
using the value to mean something different when type is unsigned.

If you choose (1), I'd consider this issue to be low priority; you're just changing a name. But if you choose the second option, you'll be breaking the API of your signed type going forward so you might want to do this sooner.

Strive for constexpr correctness in ctor and numeric_limits

Constructors and subsequently numeric limits could and should make more standards-adherent correct use of constexpr.

Use Boost develop branch in CI

The Boost distros on the runners are quite old. This issue is about using develop branch (in particular Boost.Math and Boost.Multiprecision) on CI runners.

Examples of doing this are in the wide-decimal repository.

Ensure specialized mul 4-by-4 on uint256_t with 64-bit limbs

Ensure that the specialization of multiplication is picked up 4-by-4 multiplication for both uint128_t with 32-bit limbs as well as uint256_t with 64-bit limbs

Unroll operations add/sub/shift-left/shift-right for 4 limbs

Unroll operations add/sub/shift-left/shift-right for 4 limbs.

Template enable_if known from the 4-by-4 multiplication case can be used to isolate these instances of the functions.

Negative shifts

The shift operators should recurse on the negative count, e.g.

-      if     (n <  0) { operator>>=(n); }
+      if     (n <  0) { operator>>=(-n); }

Numeric limits min() returns wrongly zero for signed

Numeric limits min() returns wrongly zero for signed integer types. Differentiate between signed and unsigned types and return the proper negative value for min on sighed types.

Add construct from/cast to built in float types

From standardization perspective, we will definitely need construction from and cast to built-in floating-point types float, double and long double.

Implement these. Along the way it might be necessary to resolve more clearly some of the existing (but not quite complete set) of casts to built in integral types.

Limb size & speedups

I appreciate the speed this library brings - it's very close to the speed of boost's multiprecision library for basic arithmetic.

I am trying to speed up uint256_t and uint512_t arithmetic in any possible way. I see that uint256_t and uint512_t are defined using 32-bit limbs. Is there anything prohibiting the use of 64-bit limbs instead?

I get a bunch of compile errors when I manually try to set the limb type to uint64_t. They seem to be template checks for the most part along with overflow warnings, but I don't know the code well enough to understand if we can work around them or not:

include/math/wide_integer/uintwide_t.h: In instantiation of ‘struct math::wide_integer::detail::uint_type_helper<128, void>’:
include/math/wide_integer/uintwide_t.h:637:125:   required from ‘class math::wide_integer::uintwide_t<256, long unsigned int>’
test_main.cpp:13:39:   required from here
include/math/wide_integer/uintwide_t.h:527:5: error: static assertion failed: Error: uint_type_helper is not intended to be used for this BitCount
     static_assert((   ((BitCount >= 8U) && (BitCount <= 64U))
     ^~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h: In instantiation of ‘class math::wide_integer::uintwide_t<256, long unsigned int>’:
test_main.cpp:13:39:   required from here
include/math/wide_integer/uintwide_t.h:645:5: error: static assertion failed: Error: Please check the characteristics of the template parameters ST and LT
     static_assert((    (std::numeric_limits<limb_type>::is_integer        == true)
     ^~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h: In instantiation of ‘class math::wide_integer::uintwide_t<512, long unsigned int, void>’:
test_main.cpp:15:56:   recursively required by substitution of ‘template<class UnknownUnsignedWideIntegralType, class> math::wide_integer::uintwide_t<256, long unsigned int>::operator math::wide_integer::uintwide_t<256, long unsigned int>::double_width_type<UnknownUnsignedWideIntegralType, <template-parameter-1-2> >() const [with UnknownUnsignedWideIntegralType = <missing>; <template-parameter-1-2> = <missing>]’
test_main.cpp:15:56:   required from here
include/math/wide_integer/uintwide_t.h:645:5: error: static assertion failed: Error: Please check the characteristics of the template parameters ST and LT
include/math/wide_integer/uintwide_t.h: In instantiation of ‘void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_divide_knuth(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&, math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>*) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’:
include/math/wide_integer/uintwide_t.h:984:26:   required from ‘math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>& math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::operator/=(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
test_main.cpp:16:58:   required from here
include/math/wide_integer/uintwide_t.h:2029:80: warning: left shift count >= width of type [-Wshift-count-overflow]
  limb_type(double_limb_type(  double_limb_type(double_limb_type(1U) << std::numeric_limits<limb_type>::digits)
                                                ~~~~~~~~~~~~~~~~~~~~~^~~~~~
include/math/wide_integer/uintwide_t.h:2075:76: warning: left shift count >= width of type [-Wshift-count-overflow]
      const double_limb_type      u_j_j1 = (double_limb_type(uu[uj]) << std::numeric_limits<limb_type>::digits) + uu[uj - 1U];
                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h:2089:61: warning: left shift count >= width of type [-Wshift-count-overflow]
                      <= double_limb_type(double_limb_type(t << std::numeric_limits<limb_type>::digits) + uu[uj - 2U])))
                                                           ~~^~~~~~
include/math/wide_integer/uintwide_t.h:2145:84: warning: left shift count >= width of type [-Wshift-count-overflow]
                     + double_limb_type(double_limb_type(previous_u) << std::numeric_limits<limb_type>::digits));
                                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
include/math/wide_integer/uintwide_t.h: In instantiation of ‘void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_divide_by_single_limb(math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::limb_type, uint_fast32_t, math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>*) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void; math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::limb_type = long unsigned int; uint_fast32_t = long unsigned int]’:
include/math/wide_integer/uintwide_t.h:2021:37:   required from ‘void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_divide_knuth(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&, math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>*) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
include/math/wide_integer/uintwide_t.h:984:26:   required from ‘math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>& math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::operator/=(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
test_main.cpp:16:58:   required from here
include/math/wide_integer/uintwide_t.h:1432:97: warning: left shift count >= width of type [-Wshift-count-overflow]
  - double_limb_type(double_limb_type(short_denominator) * hi_part)) << std::numeric_limits<limb_type>::digits);
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h:1442:141: warning: left shift count >= width of type [-Wshift-count-overflow]
  - double_limb_type(double_limb_type(short_denominator) * hi_part)) << std::numeric_limits<limb_type>::digits);
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h:1444:47: warning: right shift count >= width of type [-Wshift-count-overflow]
         *remainder = limb_type(long_numerator >> std::numeric_limits<limb_type>::digits);
                                ~~~~~~~~~~~~~~~^~~~~~
include/math/wide_integer/uintwide_t.h: In instantiation of ‘ST math::wide_integer::detail::make_hi(const LT&) [with ST = long unsigned int; LT = long unsigned int]’:
include/math/wide_integer/uintwide_t.h:2087:48:   required from ‘void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_divide_knuth(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&, math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>*) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
include/math/wide_integer/uintwide_t.h:984:26:   required from ‘math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>& math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::operator/=(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
test_main.cpp:16:58:   required from here
include/math/wide_integer/uintwide_t.h:591:5: error: static assertion failed: Error: Please check the characteristics of the template parameters ST and LT
     static_assert((    (std::numeric_limits<local_ushort_type>::is_integer == true)
     ^~~~~~~~~~~~~
include/math/wide_integer/uintwide_t.h:598:45: warning: right shift count >= width of type [-Wshift-count-overflow]
     return static_cast<local_ushort_type>(u >> std::numeric_limits<local_ushort_type>::digits);
                                           ~~^~~~~~
include/math/wide_integer/uintwide_t.h: In instantiation of ‘ST math::wide_integer::detail::make_lo(const LT&) [with ST = long unsigned int; LT = long unsigned int]’:
include/math/wide_integer/uintwide_t.h:1617:60:   required from ‘static void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_multiply_n_by_n_to_lo_part(math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::limb_type*, const limb_type*, const limb_type*, uint_fast32_t) [with long unsigned int RePhraseDigits2 = 256; const typename std::enable_if<((std::numeric_limits<LT>::digits * 4) == RePhraseDigits2)>::type* <anonymous> = 0; long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void; math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::limb_type = long unsigned int; uint_fast32_t = long unsigned int]’
include/math/wide_integer/uintwide_t.h:1495:38:   required from ‘static void math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::eval_mul_unary(math::wide_integer::uintwide_t<OtherDigits2, LimbType, AllocatorType>&, const math::wide_integer::uintwide_t<OtherDigits2, LimbType, AllocatorType>&, typename std::enable_if<((OtherDigits2 / std::numeric_limits<LT>::digits) < math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::number_of_limbs_karatsuba_threshold)>::type*) [with long unsigned int OtherDigits2 = 256; long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void; typename std::enable_if<((OtherDigits2 / std::numeric_limits<LT>::digits) < math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::number_of_limbs_karatsuba_threshold)>::type = void]’
include/math/wide_integer/uintwide_t.h:946:23:   required from ‘math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>& math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>::operator*=(const math::wide_integer::uintwide_t<Digits2, LimbType, AllocatorType>&) [with long unsigned int Digits2 = 256; LimbType = long unsigned int; AllocatorType = void]’
test_main.cpp:15:56:   required from here
include/math/wide_integer/uintwide_t.h:569:5: error: static assertion failed: Error: Please check the characteristics of the template parameters ST and LT
     static_assert((    (std::numeric_limits<local_ushort_type>::is_integer == true)
     ^~~~~~~~~~~~~

The reason I ask is that one of the goals of the library is to run on embedded systems, many of which I presume are 32-bit systems. I am running a 64-bit system and would like to know if speed gains are possible using full 64-bit limbs. It looks like Boost uses 32-bit limbs, too, so maybe there's a constraint I don't understand.

Other speedup tips would be very much appreciated, too.

Add apple clang on MacOS to CI

Add apple clang on MacOS to CI via simple matric. Can use wide-decimal as working model.

Specialize div for 128/256-bit uint with 32/64-bit limbs (generalize to 4 limbs)

The popular case of uint128 is not yet specialized or specifically optimized.

Consider to potentially specialize mul/div for 128-bit uint with 32-bit limbs

Consider optional allocator type template parameter

Some applications may want to use an allocator instead of local or stack storage for individual uintwide_t instances. This could be facilitated by supporting an optional allocator type template parameter which defaults to void or void* when no allocator is provided. In case of allocated storage, a simple fixed-width container can be used.

Add undefined behavior sanitizer runs to CI

CI could benefit from some builds and runs with undefined behavior sanitizer(s) UBsan

Fix -Wconversion warnings?

This isn't essential. It's just a suggestion. I probably shouldn't even be seeing warnings from that file. (I must have forgotten to use -isystem instead of -I.)

Here's some example output from g++ -Wconversion:

../external/wide-integer/math/wide_integer/uintwide_t.h:768:95: error: conversion from ‘uint_fast32_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
  768 |     static_assert(   (detail::verify_power_of_two_times_granularity_one_sixty_fourth<my_width2>::conditional_value == true)
      |                                                                                               ^
../external/wide-integer/math/wide_integer/uintwide_t.h:786:52: error: conversion from ‘long unsigned int’ to ‘unsigned int’ may change value [-Werror=conversion]
  786 |     using double_width_type = uintwide_t<my_width2 * 2U, limb_type>;
      |                                          ~~~~~~~~~~^~~~
../external/wide-integer/math/wide_integer/uintwide_t.h:1865:99: error: conversion from ‘long unsigned int’ to ‘unsigned int’ may change value [-Werror=conversion]
 1865 |              typename std::enable_if<(uintwide_t<RePhraseWidth2, LimbType, AllocatorType, IsSigned>::number_of_limbs == 4U)>::type const* = nullptr>
      |                                                                                                   ^
../external/wide-integer/math/wide_integer/uintwide_t.h:2249:93: error: conversion from ‘long unsigned int’ to ‘unsigned int’ may change value [-Werror=conversion]
 2249 |              typename std::enable_if<(   (uintwide_t<RePhraseWidth2, LimbType, AllocatorType>::number_of_limbs != 4U)
      |                                                                                             ^
../external/wide-integer/math/wide_integer/uintwide_t.h: In member function ‘bool math::wide_integer::uintwide_t<Width2, LimbType, AllocatorType, IsSigned>::wr_string(char*, uint_fast8_t, bool, bool, bool, uint_fast32_t, char) const’:
../external/wide-integer/math/wide_integer/uintwide_t.h:1421:66: error: conversion from ‘uint_fast32_t’ {aka ‘long unsigned int’} to ‘unsigned int’ may change value [-Werror=conversion]
 1421 |             uintwide_t<my_width2, limb_type, AllocatorType, false> tu(t);
      |                                                                  ^

It does seem like uint_fast32_t is 64 bits on my system which upsets the compiler when it gets converted to uint32_t. I don't think there's very much reason at all to be choosing the _fast variant of <cstdint> types in template parameters. I would highly recommend coming up with an alias, perhaps

namespace math::wide_integer {
  using size_t = std::int64_t;  // if you'd rather unsigned, I understand
}

which you then apply pretty-much everywhere you have a constexpr variable or a template parameter. Not only would this avoid a lot of the conversion errors (which are daft because the compiler knows where any overflow occurs here) but you can change your mind about exactly what the type is without huge churn later on.

Thoughts?

Breaking Change Announcement April 2021

Name changes of header file and namespace are planned in April 2021.
The reason for the changes is to provide better compatibility
with the wide-decimal project.

Change name of header file generic_template_uintwide_t to uintwide_t.h.
Change name of namespace wide_integer::generic_template to math::wide_integer.
No backward compatibility measures are planned at the moment.

Add more compilers/standards to CI

Add more compilers and C++ standards to CI in this repo.
Consider to model the CI after that in wide-decimal

Faster division?

Division uses Knuth long division routine.
Is it possible to implement a faster division scheme?

Is a question?

uint128_t n("12312452345");
std::cout << std::boolalpha;
std::cout << std::is_integral<uint128_t>::value << '\n';

output : false

Add more test cases and strive for higher test coverage

A few basic functions are tested in the test suite. But not yet all functions are tested and not yet all border cases are included in tests. In this issue, we plan to add more test cases and strive for higher test coverage. Boost.Test is expected to be used, optionally in combination with GCOV and/or LCOV.

Optimize Karatsuba Mul possible?

Karatsuba multiplication has been implemented.
There are separate steps for complementing negative subtraction results.
Can these be eliminated via use of alternate ordering in Kara Multiplication algorithm?

Long division broken on AVR 8-bit

Long division is at the moment broken on the AVR 8-bit target. The error is in an unknown location in the Knuth division algorithm and the benchmark test fails on the embedded controller. Other platforms such as PC and ARM do not seem to be influenced by this issue.

Continuous testing

I was wondering though: have you thought of gating PRs on passing tests? (The practice is mentioned here). To do this, I think you'd have to change the YML file like this and make a change to the master branch in the project settings.

The downside of this is that you cannot merge the code until all of the tests have passed, which slow down merges of error-free changes.
The upside of this is that you cannot merge the code until all of the tests have passed, which blocks merges of failing changes.

Even if you don't block merges, it would still be helpful for contributors and reviewers to be able to see their PRs pass tests publicly. If you agree, I'd be happy to submit a PR.

Try support 64-bit limb type

Some high-performance clients would like to try for speedup with 64-bit native limb_type, where this is available from compier/hardware, etc.

literal types

(This may be a dupe of #31 but types are not literals, i.e. not usable in constant expressions. It's also brought up in #10 but it's a separate concern from signedness so I thought it might be good to organise the threads into two.)

I took a quick look and things like std::fill and std::array may pose a problem in earlier language revisions, or even C++20. Would you be OK with abandoning C++11. Not necessary but very helpful because then you can keep variables and loops.

And would you consider abandoning C++14 and C++17? Even less necessary but still would reduce the amount of change necessary to achieve constexpr.

Digit separators in construction from string?

Conversion from non-limb types

The following doesn't compile:

auto input{math::wide_integer::uintwide_t<320, unsigned, void, true>{1729348762983LL}};

The reasons are to do with the c'tors in uintwide_t and detail::fixed_static_array. I couldn't get the the bottom of them but certainly, it's too easy for a c'tor in fixed_static_array to be chosen during overload resolution. I would recommend far fewer constructors, following the reasoning given here.

A uintwide_t isn't an array, or a string, or a fixed_static_array either, so it shouldn't be constructible from these things. You couldn't do that with int (unless you're making some terrible mistake caused by lax C conversion rules)!

Allowing a number to be initialised by any numeric type is a good idea to aim for. In ascending order of difficulty / descending order of priority:

std::is_integral
std::is_arithmetic
std::numeric_limits::is_specialized

Another bit of advice, try not to be too 'helpful' by providing convenience functions (especially c'tors) which aren't essential. A good class has the fewest member functions possible. Try and aim for the opposite of std::string, which has tonnes of member functions and is a poorly-designed class.

Location of public APIs

I've started writing a Conan recipe in order to cleanly integrate wide-integer into CNL. The plumbing and the conclusions I've come up with are hopefully applicable beyond CNL's concerns so I though I'd share the following observations/suggestions:

There are two public include directories (the thing that GCC receives as a header search path with options like -I or -isystem):
- ./math/wide-integer/, and
- ./util/utility/.
There are two public global namespaces:
- util, and
- math/wide_integer.
Two of the four headers exposed in public include directories are only necessary for testing, not integration with dependent packages:
- uintwide_t_examples.h, and
- uintwide_t_test.h.

In the CMake or Conan scripts, it is possible to shuffle some of the files around in order to omit the test headers from the install destination. However, because of this line

#include <util/utility/util_dynamic_array.h>

two separate install directories are needed and neither -- at their root -- do very much to indicate that they contain headers specific to the wide-integer library. The namespaces suffer from a similar problem: they are one or two levels deep and given very open-ended names at their root: 'util' and 'math'.

Like namespaces, directories (under usr/include at least) are not for creating taxonomies. (That's a great video to watch generally but that nugget of advice translates very well to C++.) By using 'util' and 'math', you're more likely to risk collisions and it's less likely for users to be able to find and remember where you library is on their system. So please consider some minor rearrangement to your source files and their location -- even if you don't go with the following suggestions...

My recommended changes (which are just one possible solution and which I'd be happy to submit in a PR) are to:

move test-only headers out of the public search path;
decide on a namespace and a directory for the library (e.g. ::wide_integer and /wide-integer/);
move public library headers together and away from the rest of the project files, e.g.:
- ./include/wide-integer/uintwide_t.h, and
- ./include/wide-integer/dynamic_array.h;
move the public definitions into this top-level namespace, e.g. ::wide_integer::uintwide_t;
put non-public (but header-exposed) definitions in a detail sub-namespace, e.g. ::wide_integer::detail;
move util::dynamic_array and its comparison operators to the detail sub-namespace.

I suggest this with virtually no understanding of:

the other libraries that might share dynamic_array (there are ways to keep it hidden in detail for wide-integer but inject it into another, public namespace elsewhere), and
the users of wide-integer who would be disrupted by this change. A less disruptive change is entirely possible but you may find that they appreciate the added clarity -- especially if it is accompanied by build and package management facilities which are simple and idiomatic.

Toom Cook multiplication

Implement (at least) Toom-Cook3 and Toom-Cook4 and possibly additional other order(s) for speed-up of multiplication when the bit count is many thousands of bits.

Specialize mul for 128-bit uint with 32-bit limbs

This has been split from #13 which originally handled 128-bit, four-component mul and div, with now separate mul here.

Handle negative arguments in functions like cbrt, GCD, prime

Handle negative arguments in functions like cbrt, GCD, prime, for instance cbrt of a negative number should work, whereas k'th root should not (except for cube root). Need to decide what to do with primality testing and GCD of signed integers, where Boost behavior could potentially be used for guidance here.