Finding a billion factorials in 60 ms with SIMD

166 points by todsacerdoti a day ago

few a day ago

It's interesting to see which codeforces blog posts get traction on HN.

For context, in competitive programming a lot of combinatorial problems (find some formula to count something) require you to output the answer modulo some prime. This is because otherwise the answer would overflow an int and make the problem too tedious to be fun and too hard for problem setters to write good problems or checkers for.

So to prove that you still know how to count the thing, you can do it a finite field. If you use integers mod prime, you still have all the usual arithmetic operations like addition subtraction multiplication. And even division is still easy since you can calculate multiplicative inverse with Fermat's Little Theorem (a^(p-2) = a^(-1) mod p). The final answer you output is not the real thing you're counting, but just evidence that you had the right formula and did all the right operations.

Anyway, just wanted to give context for why competitive programmers care about factorial mod a prime (usually as part of a binomial or multinomial expression). And I'm kind of surprised anyone outside of competitive programming cares about it.

See also:

https://usaco.guide/gold/modular?lang=cpp

https://usaco.guide/gold/combo?lang=cpp

dataflow a day ago
> a^(p-2) = a^(-1) mod p
Tangent, but curious: what made you write it like this? I've only ever seen it written as
```
  a^(p-1) = 1 mod p
```
or
```
  a^p = a mod p
```
Is that form somehow more useful in some cases?
- qsort a day ago
  It's easier to write code for efficiently computing the inverse in that form, roughly:
  int FastExp(int a, int e, int p) { if (e == 0) return 1; if (e == 1) return a; if (e % 2 == 0) return FastExp((a*a)%p, e/2, p); else return a * FastExp((a*a)%p, e/2, p); }
  In math competitions where you only have pen and paper, you'd instead turn what you wrote into a Diophantine equation you can solve with the usual method.
- addaon a day ago
  
  a^(-1) mod p is the multiplicative inverse in a finite field. The point of the original comment was to show how to transform the multiplicative inverse into an easier problem.
  - cperciva a day ago
    
    Just what do you think the running time of modular inverse is?
    
    atq2119 a day ago
    
    That's a pretty snarky and unhelpful approach to the conversation.
    That said, I'm also a bit surprised to see somebody discuss modular inverses without mentioning the extended euclidean algorithm, which is a more elementary solution.
    
    cperciva 13 hours ago
    
    Snarky I'll admit, but it was a serious question. Inverse is quasilinear time; the exponential is quasiquadratic. Maybe he didn't know about the EEA?
    
    tmyklebu 11 hours ago
    
    Well, modular multiplication is faster than modular inverse, both asymptotically for large moduli and practically for almost all moduli I can think of. (2, 3, and 4 being notable exceptions!)
    The article computes modular inverses of a_1, ..., a_n by:
    - Computing (a_1 * ... * a_i) = (a_1 * ... * a_{i-1}) * a_i recursively
    - Computing (a_1 * ... * a_n)^(-1) by square-and-multiply
    - Computing (a_1 * ... * a_i)^(-1) = (a_1 * ... * a_{i+1})^(-1) a_{i+1} recursively.
    - Computing a_i^(-1) = (a_1 * ... * a_i)^(-1) * (a_1 * ... * a_{i-1}) for each i.
    The second step is a scalar operation, so its running time is immaterial as long as you aren't doing something too silly.
    For my caveman brain, both Fermat's little theorem and square-and-multiply exponentiation are pretty easy to understand. Moreover, the code is going to be "defect-evident"---if I've gotten the logic wrong or forgotten integer promotions or modular reductions as in qsort's post, it'll quickly be clear by skimming the code.
    
    benreesman 12 hours ago
    
    I disagree on two counts:
    - having Colin stop by your thread is strictly an opportunity for useful information to flow from a singular source to many people
    - you would hear that aloud 100 times a day in any office where serious work was being done by professionals on a deadline and think nothing of it, bet your ass places in the world where serious hackers rise only on merit and have the best gear embargoed are saying stuff like that all the time. this nepotism capture bubble is an outlier in the history of serious engineering.
    Defining the rudeness threshold down to the point where cperciva clears it is one part comedy and two parts tragedy with the words Hacker News in bold at the top of the page.

vivzkestrel a day ago

Since we are talking factorials, i wanted to ask. What is the largest factorial that the biggest supercomputer known to man has computed? how long did it take

ethan_smith 13 hours ago

The current record is 10^10^9 (a billion billion digits) by Peter Luschny in 2021 using a parallel algorithm. For exact factorials, Clifford Stern calculated 170! in 2012 which has over 300 digits.
adgjlsfhk1 a day ago

probably not very big. factorials are pretty boring in the sense that it's relatively trivial to compute a factorial almost as big as your hard drive. 99% of the time will happen in a single multiply
- cperciva a day ago
  
  99% of the time will happen in a single multiply
  Far from it. Asymptotically it's a proportion 2/log(N) of the compute cost.
- LegionMammal978 14 hours ago
  
  Yeah, most algorithms that run in O(n polylog(n)) time, including most divide-and-conquer bigint arithmetic, will ultimately be dominated by how much memory you have available. For some experiments a while back, I would create a terabyte-long swapfile to extend the range of what GMP could do, and it still filled up within a couple CPU-weeks at most.

BaeJ 17 hours ago

[flagged]