That XOR Trick (2020)

florian.github.io

293 points by hundredwatt 3 days ago

danbruc 13 hours ago

For calculating the XOR of 1 to n there is a closed form solution, so no need to XOR them together in a loop.

  (n & ((n & 1) - 1)) + ((n ^ (n >> 1)) & 1)

Or a much more readable version

  [ n, 1, n + 1, 0 ][n % 4]

which makes it clear that this function cycles through a pattern of length four.

Why this works can be seen if we start with some n that is divisible by four, i.e. it has the two least significant bits clear, and then keep XORing it with its successors. We start with xxxxxx00 which is our n. Then we XOR it with n + 1 which is xxxxxx01 and that clears all the x's and leaves us with 00000001. Now we XOR it with n + 2 which is xxxxxx10 and that yields xxxxxx11 which is n + 3. The cycle finishes when we now XOR it it with n + 3 which yields 00000000. So we get n, 1, n + 3, 0 and then the cycle repeats as we are back at zero and at n + 4 which is again divisible by four.

sdenton4 5 hours ago

Nice!
My offhand solution not using xor is to subtract from the sum of 1 to n, which has a closed form solution. The closed form roughly halves the execution time, as we only have to iterate over the range once.
Good to know there's a similar speedup available on the xor path...
tomtomtom777 11 hours ago

Fascinating. It can see it work but I still can't really wrap my head around where the magic cycle length of 4 comes from.
- danbruc 8 hours ago
  Combining two consecutive integers starting with an even one yields one.
  xxxxxxx0 ^ xxxxxxx1 = 00000001
  If we start at a number divisible by four and do this twice, we get one twice.
  xxxxxx00 ^ xxxxxx01 = 00000001 xxxxxx10 ^ xxxxxx11 = 00000001
  And combining the two of course yields zero and we are right back at the start.
- NickPollard 10 hours ago
  
  There are essentially two bits of information in the 'state' of this iterated algorithm: a) Are all the non-lowest bits zero, or are they the value of the latest N b) the value of the lowest bit
  So the cycle of (N, 1, N+3, 0) corresponds to (A) and (B) being: (0,0), (0,1), (1,1), (1, 0) - i.e. the 4 possible combinations of these states.
- betasilly 10 hours ago
  Another interesting fact is that each time you make the xor of four consecutive numbers, beginning with an even number, the result is zero. Example in J.
  xor =: (16 + 2b0110) b. f =: 3 : 'xor/ y + i. 4' f"0 ] 2 * 1 + i. 100
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Summing a hundred millions: +/ f"0 ] 2 * i 100000000 gives zero (it takes a few seconds). So it seems the stated property holds for every even n.
  - danbruc 8 hours ago
    
    Yes, because XORing two consecutive integers only differing in the least significant bit yields one.
    xxxxxxx0 ^ xxxxxxx1 = 00000001
    Doing this twice with four consecutive numbers then also cancels the remaining one. That also means that you do not have to use consecutive numbers, you can use two arbitrary pairs
    2m ^ 2m + 1 ^ 2n ^ 2n + 1
    and for example
    16 ^ 17 ^ 42 ^ 43
    should be zero.
- HappyPanacea 10 hours ago
  
  If we generalize the problem to base k (they are k-1 duplicate of each number except the missing number, find missing one using base k-wise addition) then we can see the cycle is the smallest number such the base k-wise addition from 1 to the number is zero and it is power of k will form a cycle. I'm not sure if all such numbers are power of k if they exists or if there is an upper bound on them. For example in base 4 there appears to be no such cycle.
  - HappyPanacea 7 hours ago
    
    I made an arithmetical mistake in base 4, so I was wrong. I also wrote they are instead of there are.
    I think the following is true: For even k the cycle is k^2 long and for odd k is k long. Why? because units' place of generalized xor from 1 to k-1 is (k^2-k)/2 and therefore zero mod k if k is odd, if k is even then if we repeat it twice we get zero. For the second digit, k times the same digit will always give zero. Thus for odd k we have a zero when n is divisible by k and for even k we have a zero when n is divisible by 2k and the smallest power of k divisible by 2k is k^2 so it must be the cycle length.
Thorrez 12 hours ago

In your array-based equation, you say n+1, but in your explanation you say n+3. Is that a mistake?
- danbruc 10 hours ago
  
  No, that is correct, those two n represent slightly different things. n + 3 is the value after n XOR n + 1 XOR n + 2, so the n in the array index expression is n + 2 from the explaination and n + 3 results from (n + 2) + 1. I thought about how I could make this less confusing but it just became more confusing in my mind, so just used n in both cases.
- skullt 10 hours ago
  
  There's a bit of a trick in that solution: n is assumed to have the lower two bits clear so for an arbitrary n the array would really be:
  [(n & ~3), 1, (n & ~3) + 3, 0][n % 4]
  where the (n & ~3) makes sure those lower 2 bits are cleared. But note that we only ever can look at the first element when n % 4 == 0. In that case, (n & ~3) == n already. And further, we only ever can look at the third element when n % 4 == 2. In that case (n & ~3) == n - 2, so (n & ~3) + 3 == n + 1. Hence the array can be simplified to the one given in the other comment.
teiferer 6 hours ago

[dead]

antirez 12 hours ago

About one month ago I applied XOR in a similar (but a bit more complicated way) to Redis Vector Sets implementation, in the context of sanity check of loading a vset value from the RDB file. I believe the way it works is quite interesting and kinda extends the applicability of the trick in the post.

The problem is that in vector sets, the HNSW graph has the invariant that each node has bidirectional links to a set of N nodes. If A links to B, then B links to A. This is unlike most other HNSW implementations. In mine, it is required that links are reciprocal, otherwise you get a crash.

Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as

    element -> vector

And then reloaded and added back to the HNSW. This would be slow. Instead, what I do, is to serialize the graph itself. Each node with its unique ID and all the links. But when I load the graph back, I must be sure it is "sane" and will not crash my systems. And reciprocal links are one of the things to check. Checking that all the links are reciprocal could be done with an hash table (as in the post problem), but that would be slower and memory consuming, so how do we use XOR instead? Each time I see a link A -> B, I normalize it swapping A and B in case A>B. So if links are reciprocal I'll see A->B A->B two times, if I use a register to accumulate the two IDs and XOR them, at the end, if the register is NOT null I got issues: some link may not be reciprocal.

However, in this specific case, there is a problem: collisions. The register may be 0 even if there are non reciprocal links in case they are fancy, that is, the non-reciprocal links are a few and they happen to XOR to 0. So, to fix this part, I use a strong (and large) hash function that will make the collision extremely unlikely.

It is nice now to see this post, since I was not aware of this algorithm when I used it a few weeks ago. Sure, at this point I'm old enough that never pretend I invented something, so I was sure this was already used in the past, but well, in case it was not used for reciprocal links testing, this is a new interview questions you may want to use for advanced candidates.

hundredwatt 8 hours ago
A neat trick to make the accumulator both collision-resistant and self-diagnosing.
```
  For every normalized link id x:
      y = (x << k) | h(x)   # append a k-bit hash to the id
      acc ^= y
```
If acc is zero, all links are reciprocal (same guarantee as before).
If acc is non-zero, split it back into (x', h'):
* Re-compute h(x').
* If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems.
This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table.

jonathanlydall 14 hours ago

This was a go to interview question to be solved in C# at a place I worked at a while back which had developers allocated to projects working on pretty standard line of business systems.

The XOR solution was a valid answer, but not the only answer we would have happily accepted.

The interview question was chosen such that it's very easy to understand and quick to solve, meaning it would indicate the candidate knew at least the basics of programming in C#. Almost surprisingly, we actually had candidates applying for "senior" level positions who struggled with this.

It could be solved in a multitude of ways, e.g:

- XOR as above

- Use of a HashSet<int>

- Use for loop and List which contains a number and its count.

- Use LINQ to group the numbers or something and then find the one with the count.

As long as what they did worked, it was a "valid" answer, we could then often discuss the chosen solution with the candidate and see how they reacted when we let them know of other valid solutions.

It was really great for not being a "one clever trick" question and could act as a springboard to slightly deeper discussions into their technical thought processes and understanding.

dahcryn 11 hours ago

you are missing the most obvious one, no? Sum both lists and take the difference, that's the missing number, since the items are guaranteed unique
- vbezhenar 28 minutes ago
  It is interesting for me to remember my very first programming task. The very first day I was introduced to programming with Pascal (I think I was 14), I was taught variables, assignments, arithmetic and was given a task to switch two variables (swap). I quickly solved it using third variable, but then I was asked to do it without third variable. It was very hard task for me, I spent few hours at home tackling it, but finally I solved it with a trick conceptually similar to XOR:
  a := a + b; b := a - b; a := a - b;
  I'm still proud of little me and I always remember this solution when I encounter XOR tricks. I didn't knew about bitwise arithmetic at that time, but sometimes simple `+` can work just as well.
- zeroq 8 hours ago
  
  overflow
  - criddell 6 hours ago
    
    Would a BigInteger sum still overflow?
    
    Arnavion 6 hours ago
    
    It doesn't matter. Overflow is a non-issue as long as you have wrapping addition and subtraction operators, which C# does - regular `+` and `-` not inside `checked {}`. You don't need to reach for BigInteger.
TacticalCoder 12 hours ago

> "You are given an array A of n - 1 integers"
It's an array of integers so it fits in memory (otherwise it wouldn't be called an array). As it fits in memory, n cannot be that big. I'd still ask for more requirements, TopCoder problem style: I want to know how big n can be that the array fits in memory.
I didn't know that XOR trick. My solution would be a bit arrays with n bits and two for loops: one to light each bit corresponding to a number and one for loop to find the missing number.
And if my bit array doesn't fit in memory, then neither does the array from the problem (and certainly not the HashSet etc.).
- jonathanlydall 5 hours ago
  
  In our case we gave the list of numbers for the input which was around a dozen so memory was not a concern, again keeping the problem pretty simple.
- williamdclt 9 hours ago
  
  You could make the problem harder with "you are given a stream of n - 1 integers". N could then be any number, unbound by available memory.
  That makes the problem harder which makes it more interesting, a lot of the solutions wouldn't work anymore (this isn't necessarily a good interview question though)
  - Arnavion 6 hours ago
    
    Even with the original formulation, the array doesn't have to fit in available memory. mmap exists.
    
    cluckindan an hour ago
    
    You are given a magnetic tape containing a list of n - 1 integers… :-)

tromp 15 hours ago

It's funny how the author fails to apply the XOR trick in the two missing values problem:

> We can thus search for u by applying this idea to one of the partitions and finding the missing element, and then find v by applying it to the other partition.

Since you already have u^v, you need only search for u, which immediately gives you v.

ethan_smith an hour ago

Indeed - once you have u^v, finding u in one partition immediately gives you v = (u^v)^u, eliminating the need for the second search.
FabHK 3 hours ago

How can you find u? That's what the author explains next.
- Arnavion an hour ago
  
  The article says to use the "XOR of all elements" method to find u^v, then do the partitioning, then use the "XOR of all elements" method on the first partition to find u, then use the "XOR of all elements" method on the second partition to find v.
  tromp is saying the last step can be simplified. There is no need to use the "XOR of all elements" method on the second partition to find v, since the earlier steps have given us u^v and u, so simply XORing those two values together gives v.
  - FabHK 14 minutes ago
    
    Oh yes, you're right.

praptak 18 hours ago

Fun fact: the xor swap fails when the variables are aliases. This was the trick used in one of the underhanded code competitions.

Basically xor swapping a[i] with a[j] triggered the evil logic when i was equal to j.

CodesInChaos 14 hours ago

The submission by David Wagner, Philipe Biondi at https://bingweb.binghamton.edu/~scraver/underhanded/_page_id...
The state of RC4 consists of a random permutation of bytes. Whenever it outputs a value, it further permutes the state by swapping some bytes of the state. Th xor swap trick sets one of these values to zero, whenever RC4 attempts to swap the same item within the permutation. This gradually zeros out the state, until RC4 outputs the plaintext.
vaylian 16 hours ago

It would set a[i] to zero instead of swapping two values, right?
- praptak 11 hours ago
  
  Yes. Now we only need a legit use case for code that swaps values only if they are in different locations, otherwise zeroes the aliased location. Then we can finally do it using the xor swap!

analog31 a day ago

The first thing that occurred to me is that if a number is missing from a list, the sum of that list will fall short. But I like XOR's.

meindnoch 15 hours ago

Sum and xor are the same, but over different fields.
anitil a day ago

It really tickles my brain in a lovely way that it avoids all overflow risk as well
- repiret 20 hours ago
  
  There is no overflow risk. The trick works on any Abelian group. N-bit values form an Albanian group with xor where 0 is the identity and every element is its own inverse. But N-bit values also form an Abelian group under addition with overflow, where 0 is the identity and 2s-compliment is the inverse.
  If you’re working on an architecture where a single multiplication and a bit shift is cheaper than N xor’s, and where xor, add, and sub are all the same cost, then you can get a performance win by computing the sum as N(N+1)/2; and you don’t need a blog post to understand why it works.
  - ikurei 15 hours ago
    
    I think they meant that XOR avoids the overflow risk, whereas doing the sum of the array to figure out which number could cause an overflow.
    
    jonathrg 12 hours ago
    
    (wrapping) overflow doesn't affect the final result.
  - lblume 15 hours ago
    
    You can also calculate the XOR-accumulation of all values between 1 and n in O(1) using a lookup table like this:
    [n, 1, n+1, 0][n%4]
  - OjotCewIo 13 hours ago
    
    > The trick works on any Abelian group
    (https://en.wikipedia.org/wiki/Abelian_group -- I'll use ⋆ as the Abelian group's operation, and ~ for inversion, below.)
    I believe you are implying:
    (g(1) ⋆ ... ⋆ g(n)) ⋆ ~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = g(m)
    where "m" is the group element index that is not covered by "i".
    However, for this to work, it is requried that you can distribute the inversion ~ over the group operation ⋆, like this:
    ~(g(i(1)) ⋆ g(i(2)) ⋆ ... ⋆ g(i(n-1))) = ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))
    because it is only after this step (i.e., after the distribution) that you can exploit the associativity and commutativity of operation ⋆, and reorder the elements in
    g(1) ⋆ ... ⋆ g(n) ⋆ ~g(i(1)) ⋆ ~g(i(2)) ⋆ ... ⋆ ~g(i(n-1))
    such that they pairwise cancel out, and leave only the "unmatched" (missing) element -- g(m).
    However, where is it stated that inversion ~ can be distributed over group operation ⋆? The above wikipedia article does not spell that out as an axiom.
    Wikipedia does mention "antidistributivity":
    https://en.wikipedia.org/wiki/Distributive_property#Antidist...
    (which does imply the distributivity in question here, once we restore commutativity); however, WP says this property is indeed used as an axiom ("in the more general context of a semigroup with involution"). So why is it not spelled out as one for Abelian groups?
    ... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?
    
    FBT 11 hours ago
    
    > ... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?
    It does. For all x and y:
    (1) ~x ⋆ x = 0 (definition of the inverse) (2) ~y ⋆ y = 0 (definition of the inverse) (3) (~x ⋆ x) ⋆ (~y ⋆ y) = 0 ⋆ 0 = 0 (from (1) and (2)) (4) (~x ⋆ ~y) ⋆ (x ⋆ y) = 0 (via associativity and commutativity)
    In (4) we see that (~x ⋆ ~y) is the inverse of (x ⋆ y). That is to say, ~(x ⋆ y) = (~x ⋆ ~y). QED.
    
    stephencanon 8 hours ago
    
    Right. Another way to see this is that for a general (possibly non-Abelian) group, the inverse of xy is y⁻¹x⁻¹ (because xyy⁻¹x⁻¹ = x1x⁻¹ = xx⁻¹ = 1 [using "1" for the identity here, as is typical for general groups], or more colloquially, "the inverse operation of putting on your socks and shoes is taking off your shoes and socks"). For an Abelian group, y⁻¹x⁻¹ = x⁻¹y⁻¹, and we're done.
    
    OjotCewIo 11 hours ago
    
    Awesome, thanks! :)
- analog31 21 hours ago
  
  True, I hadn't thought of that. I'm spoiled by Python. ;-)

woadwarrior01 12 hours ago

Anyone interested in bit-level tricks like this, should have a copy of Hacker's Delight on their bookshelf.

akovaski 19 hours ago

The partitioning algorithm to find two missing/duplicate numbers is clever, I wouldn't have thought of that. It should also work if you have a list with 1 missing and 1 duplicate, yeah? You'd probably have to do an extra step to actually find out which number is missing and which is a duplicate after you find the two numbers.

> If more than two elements are missing (or duplicated), then analyzing the individual bits fails because there are several combinations possible for both 0 and 1 as results. The problem then seems to require more complex solutions, which are not based on XOR anymore.

If you consider XOR to be a little bit more general, I think you can still use something like the partitioning algorithm. That is to say, considering XOR on a bit level behaves like XOR_bit(a,b)=a+b%2, you might consider a generalized XOR_bit(a,b,k)=a+b%k. With this I think you can decide partitions with up to k missing numbers, but I'm too tired to verify/implement this right now.

hsfzxjy 21 hours ago

To derive "The XOR trick" I think both *associativity* and communitativity are needed.

That is, one should also prove a ^ (b ^ c) = (a ^ b) ^ c. Instinctive, but non-trivial.

kavouras 13 hours ago

Yeah that's what I was thinking, you need both
- OjotCewIo 13 hours ago
  
  yep, you need both; and in fact the definition includes both: https://en.wikipedia.org/wiki/Abelian_group

gciruelos an hour ago

a shameless plug of another blog post containing more xor trivia, for those interested: https://gciruelos.com/xor.html

less_less 5 hours ago

Adding to some other comments in the thread: finding missing or extra numbers is closely related to error-correcting codes, especially binary linear codes. In an error-correcting code, you have a string of bits or symbols, with symbol x_i appearing at position i. You choose the code so that valid sequences have a certain mathematical property, and then if one or a few symbols are corrupted, then you can use that property to correct the errors. The property is typically that a certain linear function called the "syndrome" is zero, meaning that sum(x_i * G_i) = 0 where each G_i is some strategically chosen vector, particular to the code. The math for how to correct is particular to the chosen G_i, and it's a really interesting field of study.

In a typical error-correcting code usage, you have an encoder which takes your message, and adds some extra symbols at the end which are calculated so that the syndrome is zero. Then when receiving your message, the receiver calculates the syndrome and if it's not zero, they know that at least one error has occurred. By using the code's decoding algorithm, they can figure out the fewest (and thus hopefully most likely) number of changes which would result in that error syndrome, and use this information to (hopefully) correct the transmission error.

For the missing numbers problem, you can set x_i to "how many times does the number i appear?". Then since the syndrome is sum(x_i * G_i), you can compute the syndrome on an unordered list of the i's. You are expecting the syndrome to be the same as the syndrome of full set 1...n, so when it is not, you can figure out which few x_i's are wrong that would lead to the syndrome you observed. You have an advantage because you know how many numbers are missing, but it's only a slight one.

The author's solution is called the Hamming code: you set F(i) = i, and you do the additions by xoring. Using error-correcting codes generalize to more missing numbers as well, including using xor, but the math becomes more complicated: you would want to use a fancier code such as a BCH or Goppa code. These also use xor, but in more complicated ways.

Straw 9 hours ago

One can generalize this to k missing numbers the same way as we typically do for the addition case by using finite fields:

XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:

x + y

x^2 + y^2

From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)

Similarly for k numbers we calculate sums of higher powers and get a higher order polynomial equation that gives our answer. Of course, the same solution works over the integers and I'd imagine modular arithmetic as well (I haven't checked though).

less_less 7 hours ago

This will depend on the field, and for F_2^m you want odd powers: sum(x), sum(x^3), sum(x^5) etc. Using sum(x^2) won't help because squaring over F_2^m is a field homomorphism, meaning that sum(x^2) = sum(x)^2.

This is also how BCH error-correction codes work (see https://en.wikipedia.org/wiki/BCH_code): a valid BCH codeword has sum(x^i where bit x is set in the codeword) = 0 for t odd powers i=1,3,5, ... Then if some bits get flipped, you will get a "syndrome" s_i := sum(x^i where bit x was flipped) for those odd powers. Solving from the syndrome to get the indices of the flipped bits is the same problem as here.

The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:

  • First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.

  • The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i).  This is only a conceptual step, not a computational one.

  • We will find a polynomial L(z) which is zero at all errors z=x and nowhere else.  This L is called a "locator" polynomial.  It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero.  The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).

  • Find the roots of L(z).  There are tricks for this if its degree is low.  If the degree is high then you usually just iterate over the field.  This takes O(#errors * size of domain) time.  It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.

You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).

Edit: bullets are hard.

Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.

Straw 5 hours ago

Good catch, thank you!

noman-land 8 hours ago

Can you explain a bit about how and why the higher powers work?
- less_less 6 hours ago
  If you imagine a polynomial L(z) that's zero at all the missing numbers, you can expand the coefficients out. For example, with 2 missing numbers (x,y), you have:
  L(z) = z^2 - (x+y)z + xy.
  You already have x+y, but what's xy? You can compute it as ((x+y)^2 - (x^2 + y^2))/2. This technique generalizes to higher powers, though I forget the exact details: basically you can generate the coefficients of L from the sums of powers with a recurrence.
  Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field.
  * But wait, this doesn't actually work! *
  Over fields of small characteristic, such as F_2^m, you need to modify the approach and use different powers. For example, in the equations above, I divided by 2. But over F_2^m in the example shown above, you cannot divide by 2, since 2=0. In fact, you cannot solve for (x,y) at all with only x+y and x^2 + y^2, because
  (x+y)^2 = x^2 + y^2 + 2xy = x^2 + y^2 + 0xy (since 2=0) = x^2 + y^2
  So having that second polynomial gives you no new information. So you need to use other powers such as cubes (a BCH code), or some other technique (e.g. a Goppa code). My sibling comment to yours describes the BCH case.

mrbluecoat 21 hours ago

PTSD for me on this topic due to a week wasted cleaning up PHP malware using XOR for obfuscation and encryption: https://www.godaddy.com/resources/news/php-malware-and-xor-e...

iotasilly 15 hours ago

Since J allow you to write short code, here are three example in J. The first use iota1000, the second a random permutation, and the third use matrix notation to create a little guessing game.

Example 1: Find the missing number

  xor =: (16 + 2b0110) b.
  iota1000 =: (i. 1000) 
  missingNumber =: (xor/ iota1000) xor (xor/ iota1000 -. 129) 
  echo 'The missing number is ' , ": missingNumber

This print 'The missing number is 129'

Example 2: Using a random permutation, find the missing number.

   permuted =: (1000 ? 1000)
   missingNumber = (xor/ permuted) xor (xor/ permuted -. ? 1000)

Example 3: find the missing number in this matrix.

  _ (< 2 2) } 5 5 $ (25 ? 25) 

   12  9  1 20 19
    6 18  3  4  8
   24  7  _ 15 23
   11 21 10  2  5
    0 16 17 22 14

Final test: repeat 10 times the example 3 (random matrices) and collect the time it takes you to solve it in a list of times, then compute the linear regression best fit by

  times %. (1 ,. i. 10)

Did you get better at solving it by playing more times?

I am not affiliated with J, but in case you want to try some J code there is a playground: https://jsoftware.github.io/j-playground/bin/html2/

Edited: It seems I am procrastinating a lot about something I have to do but don't want to.

burnt-resistor 21 hours ago

In ye olden days, bit manip operations were faster than algebraic operations.

And sometimes even faster than a load immediate, hence XOR AX, AX instead of MOV AX, 0.

GuB-42 21 hours ago

"xor ax, ax" is still in use today. The main advantage is that it is shorter, just 2 bytes instead of 3 for the immediate, the difference is bigger in 32 and 64 bit mode as you have to have all these zeroes in the instruction.
Shorter usually mean faster, even if the instruction itself isn't faster.
- sparkie 18 hours ago
  
  In long mode, compilers will typically emit `xor eax, eax`, as it only needs 2 bytes: The opcode and modrm byte. `xor ax, ax` takes 3 bytes due to the operand size override prefix (0x66), and `xor rax, rax` takes 3 bytes due to the REX.W prefix. `xor eax, eax` will still clear the full 64-bit register.
  Shorter basically means you can fit more in instruction cache, which should in theory improve performance marginally.
  - Someone 16 hours ago
    
    Size isn’t everything. You should start by reading the manual for your CPU to see what it advises. The micro-architecture may treat only one of the sequences specially. For modern x64, I think that indeed is the shorter xor sequence, where, internally, the CPU just renames the register to a register that always contains zero, making the instruction independent of any earlier instructions using eax.
    IIRC, Intel said a mov was the way to go for some now ancient x86 CPUs, though.
- tyfighter 20 hours ago
  
  Modern x86 implementations don't even do the XOR. It just renames the register to "zero".
- burnt-resistor 21 hours ago
  
  Barely. x86 is fading. Arm doesn't do this in GCC or Clang.
  > Shorter usually means faster
  It depends, so spouting generalities doesn't mean anything. Instruction cache line filling vs. cycle reduction vs. reservation station ordering is typically a compiler constraints optimization problem(s).
  - userbinator 20 hours ago
    
    Arm doesn't do this in GCC or Clang.
    Because Arm64 has a zero register, and Arm32 has small immediates, and all instructions are uniformly long.
heisenbit 12 hours ago

And in these modern days it matters that an algorithm can use divide and conquer and can be parallelized. Xor plays nice here. Also the lack of carry bits and less branching help in the crypto space.

ameliaquining 21 hours ago

Ah, my least favorite technical interview question. (I've been asked it, but only after I first read about it online.)

anthomtb 7 hours ago

Horses for courses.
It's silly to as ask a web dev these questions and expect these XOR approaches.
Low-level developers ("bare metal" as the kids say), on the other hand? They should have a deep enough understanding of binary representation and bitwise operations to approach these problems with logic gates.
phendrenad2 21 hours ago

Indeed, it kind of feels like asking if someone knows what the number 5318008 means.
motorest 16 hours ago

> Ah, my least favorite technical interview question.
The epitome of turning technical interviews into a trivia contest to make them feel smart. Because isn't that the point of a tech interview?
- empiko 16 hours ago
  
  Is there any other field where they give you random brain teasers for an interview? My friends outside of IT were laughing their heads off when they hears about the usual interview process.
  - sfn42 16 hours ago
    
    I've always had reasonable interview questions. Get some data from an API and display it in a table. Make a class that can store car data and get them by plate number. Make a class that calculates tax based on a bracket system.
    I haven't even read the article so I don't know what this is about really but if an interviewer seriously asked me about some obscure xor trick I'd laugh at them.
- ur-whale 15 hours ago
  
  In what way is that question trivia?
  I believe you under-estimate what a good interviewer is trying to do with questions such as these:
  Either you've seen the trick before and you get an opportunity to show the interviewer that you're an honest person by telling him you have. Huge plus and the interview can move on to other topics.
  Either you haven't and you can demonstrate to the interviewer your analytical skills by dissecting the problem step by step and understanding what the code actually does and how.
  Bonus if you can see the potential aliasing problem when used to swap two variables.
  Not a trivia question at all.
  - commandlinefan 4 hours ago
    
    I knew a guy who would ask the binary search question in interviews (i.e. "you have an array of sorted values, what's the fastest way to find if an element is in the array?"). I always felt like this was an unfair question to ask somebody as well - it doesn't seem like something you'd be able to come up with on your own if you hadn't seen it _in an interview situation_. OTOH it's a quick way to screen people who actually did a CS degree.
  - shmerl 3 hours ago
    
    Imagine asking to prove a relatively difficult theorem. That's a similar type of question and it's a waste of time during an interview. Once you know the proof (know the algorithm), the idea might seem trivial, but coming up with such idea (inventing the algorithm) took people possibly a very long time in the first place.
    You shouldn't expect it to be possible during the course of the interview for those who don't know it already, it makes no sense to expect that.
    At best, the question will check if someone memorized such stuff. But I don't see a lot of value in that.
  - snozolli 7 hours ago
    
    It has no connection to modern software engineering. It's a clever and irrelevant trick for 99.999% of programming jobs out there.
    Stop asking these asinine questions and ask questions relevant to real-world software engineering. Software engineers are their own worst enemies.

gblargg 10 hours ago

Gray code is something semi-related. For hardware encoders of a position you want only one transition between states, that is, the XOR of the two to have only one bit set. Normal binary has multiple transitions between some values (e.g. three bit changes between 011 and 100). Gray code could be 000, 001, 011, 010, 110, 111, 101, 100.

anthk an hour ago

https://en.wikipedia.org/wiki/Hamming_distance

mzs 18 hours ago

I like the 'store prev ^ next' trick for lists that can be walked from the front or from the back.

Findecanor 11 hours ago

I figured out the solution of using addition directly. A caveat with addition is that addition can grow the number of significant bits needed, and thus overflow (for large-enough values of n).

One aspect of XOR is that it is the same as binary addition without carry, and therefore it does not overflow.

gblargg 10 hours ago

Use unsigned (modulo) and overflow doesn't affect the result.

cyberax 15 hours ago

I think it misses the XOR trick for bi-directional lists: https://en.wikipedia.org/wiki/XOR_linked_list

They're really evil on modern CPUs.

XeO3 19 hours ago

Apart from these applications of XOR, a favourite one is using Bitwise AND to find Even/Odd numbers.

nullc a day ago

Generalizing an 'xor accumulator' support set difference of more than one element is interesting: https://github.com/bitcoin-core/minisketch

daitangio 11 hours ago

Very well written article! I used xor just as fast clear register :)

lsllc 3 hours ago
Yes! in the old MS-DOS days (circa 286?), it was quicker in terms of cycles to do:
```
  xor ax, ax
```
Than:
```
  mov ax, 0h
```

st0le a day ago

Another fun trick I've discovered.

`XOR[0...n] = 0 ^ 1 .... ^ n = [n, 1, n + 1, 0][n % 4]`

nullc 21 hours ago

Tables yuck :P, maybe
XOR[0...x] = (x&1^(x&2)>>1)+x*(~x&1)
- bsdz 16 hours ago
  
  ~Is there a simple proof for this type of identity?~
  Actually I found something through Gemini based on the table mod 4 idea in previous post. Thanks.
tialaramex a day ago

Right, or in summary, no you don't need to all that extra work up front.

ZoomZoomZoom 15 hours ago

> XOR is commutative, meaning we can change the order in which we apply XOR. To prove this, we can check the truth table for both x ^ y and y ^ x

This is nonsensical, where does the second truth table come from? Instead you just observe that, by definition, 1^0 == 0^1.

mytailorisrich 9 hours ago

> XOR on the same argument: x ^ x = 0

For those who do/did assembly, this is the common way to set a register to zero in x86 assembly (probably not only) because the instruction does not need an operand, so is shorter, and executes in one cycle only.

moron4hire a day ago

Why do people hate traditional for loops so much? In a conversation about petty micro optimizations, we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"?

ToValueFunfetti 20 hours ago

xor(1..n) = switch(n % 4) { case 0: return n; case 1: return 1; case 2: return n + 1; default: return 0; }
So you don't actually need the first loop (at least for the set of integers 1..n example), but bringing that up is probably out of scope for this article.
devjab 16 hours ago

I think you raise a good question, but Python doesn't have a traditional for loop. To do it in one loop, you'd either have to simulate a traditional for loop with something like range, or you'd have to build a c/zig/rust lib and use it with cffi (or whatever rust uses that I forgot what was named). Or you're going to do it the "pythonic" way and write two loops, probably with a generator. As far as micro optimisation I'd argue that it depends on what you want. Speed or stable memory consumption? The single loop will be faster (for the most part) but the flip side is that there is a limit on how big of a data set it can handle.
It's all theoretical though. On real world data sets that aren't small I don't see why you wouldn't hand these tasks off to C/Zig/Rust unless you're only running them once or twice.
delifue 21 hours ago

Its main benefit is to avoid having extra data structure (like hash map) to find the missing or duplicate, using O(n) time and O(1) space.
- moron4hire 21 hours ago
  
  No, again, that's not my point. The code from the article is O(2n) when it could be O(n). I know we're not supposed to care about constant factors, but I've lived in a world where not micro optimizing the ever loving shit out of my software could potentially make people throw up, so this sort of stuff kind of stands out to me.
  - repiret 20 hours ago
    
    The code in the article is written in Python, whose only for loop is for-each. It is 2N XOR operations, regardless of whether you use one or two loops.
    I probably would have written it with a single loop, using the `enumerate` iterator adapter. But in Python, two loops is almost certainly more efficient.
    
    Dylan16807 19 hours ago
    
    You can loop over the range and then do result ^= i ^ A[i]. If adapters are slow you don't need them here.
    Having only one loop gives you a better memory access pattern, because it's 2 XOR operations in between each memory access. Two loops is the same number of instructions, but it spends one loop ignoring memory and then another loop doing rapid-fire memory accesses. In python there's enough overhead that it's unlikely to matter. But in a faster language running on a busy machine that could make a real difference.
    
    sfn42 12 hours ago
    
    for i in range(n - 1):
    Is pretty much a standard for loop. Between that and
    for n in numbers:
    You can do pretty much the same things as a more conventional language.
    You could also solve it pretty simply like this:
    expected_sum = (n * (n + 1)) / 2
    missing_num = expected_sum - sum(numbers)
    This only iterates the list once. This would probably be my solution if I was given this task in an interview.
  - perfmode 21 hours ago
    
    real world performance will depend on how much of that N fits in cache. and in what cache it fits (L1, 2, 3). once loaded, you may not pay much cost to access each value a second time.
    
    MindSpunk 20 hours ago
    
    Doing 2 loops over the data means you have to do a full pass over the data twice. If your N doesn’t fit in L3 then you’re going to load N twice instead of once. Loading twice, even out of L1 is still slower than never loading twice at all.
    
    moron4hire 20 hours ago
    
    Exactly. And there's also the fact that sometimes the data stream we're processing is unbounded and ephemeral. For example, reading values from a physical sensor. It may not match up to this specific example, but the point remains that a single "loop" over a data set might be all you get, so pack as much into that loop as you can.
  - NoahZuniga 19 hours ago
    
    O(2n) doesn't exist. The whole point of big O is that you ignore such "trivial" things as what factor comes before the n
    
    moron4hire 19 hours ago
    
    Did I not say that?
    
    haskellshill 17 hours ago
    
    >we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"
    You seem to believe that "O(2n)"
    for value in range(1, n + 1): result ^= value for value in A: result ^= value
    is slower than "O(n2)"
    for value in range(1, n + 1): result ^= value result ^= A[value-1]
    simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?
    
    a_t48 17 hours ago
    
    Unless both loops get unrolled it's ever so slightly slower due to having to check for the end value twice. Plus potentially a cache hit at the start of the second loop.
    
    codebje 15 hours ago
    
    None of this is as straightforward as it seems.
    A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.
    "FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.
    A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).
    Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.
    If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).
    So maybe we be a bit more Pythonic, and do:
    for i, value in enumerate(A) result ^= i result ^= value
    One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.
    Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.
    
    moron4hire 15 hours ago
    
    Even assuming python's foreach loop in these cases get optimized down to a very bare for loop, the operations being performed are dominated by the looping logic itself, because the loop body is so simple.
    Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.
    It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.
    The smaller the loop body, the higher the gains from optimizing the looping construct itself.
    
    iainmerrick 9 hours ago
    
    You said the code from the article is O(2n) when it could be O(n), but those are the same thing.
    
    codebje 18 hours ago
    
    You did, but it might not be an effective strategy to mention asymptotic complexity to help forward your argument that one linear implementation is faster than another.
    Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.
    In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.
anitil a day ago

I think it's just an interesting approach to solving particular limited problems. If I needed to solve this I'd end up either using set arithmetic or sorting the list, both of which use more memory and time. Maybe down low in some compiler loop or JVM loop this could be the difference between a sluggish application and a snappy one
- moron4hire 21 hours ago
  
  That's not my point. My point is that the exact same code from the original article could be done in a single, traditional for-loop, instead of two for-each loops.

cubefox 9 hours ago

Pet peeve: It is common to describe XOR as a special logical operator ("either or"), but it is arguably easier to just describe it as ≠ (!=, not equal) for Boolean inputs.

However, then it is clearly still easier to just phrase everything in terms of = (equality) instead!

Equality is for binary inputs is also called XNOR, biconditional, iff, ↔, etc, which is the negation of XOR. But thinking of it immediately as "=" is much more straightforward.

Another advantage of = over ≠/xor is that equality is not just commutative and associative, it's intuitively obvious that it is associative. The associativity of ≠/xor is less obvious. Moreover, equality is also transitive, unlike inequality/xor.

Overall, equality seems a much more natural concept to reason with, yet I don't know of any languages which have a bitwise equality/XNOR/↔ operator, i.e. one that operates on integers rather than Booleans.

ur-whale 15 hours ago

One interesting problem related to the trick (which as pointed out elsewhere in the thread, fails spectacularly when the two variables alias to the same memory location) is to find other dyadic functions of integers that have the same property.

makeset 18 hours ago

Fun fact: you can show that there is another binary operator that performs the same triple assignment swap.

johnea a day ago

Wow! This is a flashback! Hope you're doing well Andy W!

bewd55 9 hours ago

[dead]

TZubiri 9 hours ago

Tldr. I'm sorting that array all day baby