# Benchmarking an Old Chestnut: Dead Horse Edition

One more pass on the oddly-recurring value puzzle. There’s always a more optimal method.

I know, I know: the “beating-a-dead-horse” and “old-chestnut” metaphors mean there’s a horse chestnut joke to be made somewhere, but I haven’t come up with anything worth your time. Instead, here’s an article about

gesztenyepüré, the Hungarian chestnut puree of my childhood (just add canned sour cherries for flawless Danny-nostalgia). Enjoy!

## Raymond’s Posit

A few hours after posting my previous article, Raymond Kwan, a classmate from my Lighthouse Labs cohort, reached out with a *potentially even geniuser* solution to the oddly-recurring value puzzle:

… At the scale that you are testing at got me thinking if removing the second iteration would make any noticeable difference. I believe this can be achieved by instead of tallying, the hashmap just keeps a boolean. e.g. if key (number) does not exist in map then insert; else delete key. This will likely leave you with a hashmap with a size of 1 at the end of your iteration if I understood the constraints correctly. This will increase the number of constant time operations in your initial loop. But wondering if $O(1n + a)$ will be better than $O(2n + b)$; where $a > b$.

As a quick refresher, my solution was to run through the provided array once, keeping a tally of each value encountered, and then iterate through the tallies to find the one with an odd value. This amounts to two operations of linear time-complexity ($O(n)$) with the size of the test array and the size of the tallies hash (ie., the number of distinct values in the test array) adding up to be the determining factor in the length of the operation.

If I understand Raymond correctly, he figures that since I’m exploring at the scale of multi-million-element test arrays and resultant (potentially) multi-thousand-key tally hashes, there may be meaningful room to optimize the process by eliminating that second iteration over the tallies hash. His solution is beautiful to me: **don’t keep tallies for the values; just flip a bit for each value when you encounter it.**

So, for example, in the array `[ :smurf, :hobbit, :smurf ]`

:

- Start the iteration:
- Encounter a
`:smurf`

, flip the`:smurf`

bit to`true`

- Encounter a
`:hobbit`

, flip the`:hobbit`

bit to`true`

- Encounter a
`:smurf`

, flip the`:smurf`

bit to`false`

- Encounter a
- Iteration’s done; the only remaining
`true`

bit belongs to`:hobbit`

, so that must be the oddly-frequent value in the array. Distribute celebratory pipe-weed, console Gargamel, etc.

Because evenly-frequent values will ultimately turn themselves off while odd ones won’t, the answer is the last one standing. Raymond suggested using a hash to track our flipping bits by creating and deleting keys for the associated values when encountered: in the end, the hash would be left with just one remaining key. His question is, would the single iteration across the test array ($O(n)$) plus multiple constant-time ($O(1)$) operations of…

- checking addresses to see if the hash is populated, and then
- creating or deleting the keys as appropriate
- plus the final step of finding the hash’s only key and asking for its name…

… be faster than my approach? And, under what circumstances?

## Setting Up the Board

Here’s how I’ve implemented Raymond’s approach:

```
def flip_the_bit(test_array)
value_bits = { }
test_array.each do |value|
if value_bits[value]
value_bits.delete(value)
else
value_bits[value] = true
end
end
value_bits.first[0]
end
```

And, for reference, here’s *my* approach from the previous article:

```
def tally_and_check(test_array)
tallies = Hash.new(0)
test_array.each do |value|
tallies[value] += 1
end
tallies.each do |original_value_from_the_test_array, frequency|
if frequency.odd?
return original_value_from_the_test_array
end
end
end
```

I’m hypothesizing that the nature of the test array is going to be especially important: it seems to me that the larger the `tallies`

hash I need to check, the more significant Raymond’s savings. So, I need to create some new test-arrays to better exercise this. Here’s a guide:

Range of Possible Values | Location of Oddly-Recurring Value | Name of Array |
---|---|---|

`0..1` | Earliest possible index | `best_case_value_poor` |

`0..1` | Latest possible index | `worst_case_value_poor` |

`1..2000` | Earliest possible index | `best_case_value_mid` |

`1..2000` | Random index | `average_case_value_mid` |

`1..2000` | Latest possible index | `worst_case_value_mid` |

`1..10_000_000` | Earliest possible index | `best_case_value_rich` |

`1..10_000_000` | Random index | `average_case_value_rich` |

`1..10_000_000` | Latest possible index | `worst_case_value_rich` |

And, here’s the implementation, again resulting in arrays that are twenty million and one elements long:

```
# Value-Poor Arrays
best_case_value_poor = (0..1).map do |number|
Array.new(10_000_000, number)
end.flatten.prepend(0)
worst_case_value_poor = (0..1).map do |number|
Array.new(10_000_000, number)
end.flatten.push(1)
# Value-Middling Arrays
average_case_value_mid = (1..2000).map do |number|
Array.new(10_000, number)
end.flatten.push(rand(1..2000)).shuffle
best_case_value_mid = (1..2000).map do |number|
Array.new(10_000, number)
end.flatten.prepend(1)
worst_case_value_mid = (1..2000).map do |number|
Array.new(10_000, number)
end.flatten.push(2000)
# Value-Rich Arrays
average_case_value_rich = (1..10_000_000).map do |number|
Array.new(2, number)
end.flatten.push(rand(1..10_000_000)).shuffle
best_case_value_rich = (1..10_000_000).map do |number|
Array.new(2, number)
end.flatten.prepend(1)
worst_case_value_rich = (1..10_000_000).map do |number|
Array.new(2, number)
end.flatten.push(10_000_000)
```

It takes my system about 30 seconds and 10% of its RAM to instantiate these examples before the benchmark even starts up.

Penultimately, here’s my benchmark setup:

```
require 'benchmark'
Benchmark.bmbm do |scenario|
### Tally and Check Scenarios ###
scenario.report('TallyCheck: Poor, Best') { tally_and_check(best_case_value_poor) }
scenario.report('TallyCheck: Poor, Worst') { tally_and_check(worst_case_value_poor) }
scenario.report('TallyCheck: Mid, Average') { tally_and_check(average_case_value_mid) }
scenario.report('TallyCheck: Mid, Best') { tally_and_check(best_case_value_mid) }
scenario.report('TallyCheck: Mid, Worst') { tally_and_check(worst_case_value_mid) }
scenario.report('TallyCheck: Rich, Average') { tally_and_check(average_case_value_rich) }
scenario.report('TallyCheck: Rich, Best') { tally_and_check(best_case_value_rich) }
scenario.report('TallyCheck: Rich, Worst') { tally_and_check(worst_case_value_rich) }
### Flip the Bit Scenarios ###
scenario.report('FlipBit: Poor, Best') { flip_the_bit(best_case_value_poor) }
scenario.report('FlipBit: Poor, Worst') { flip_the_bit(worst_case_value_poor) }
scenario.report('FlipBit: Mid, Average') { flip_the_bit(average_case_value_mid) }
scenario.report('FlipBit: Mid, Best') { flip_the_bit(best_case_value_mid) }
scenario.report('FlipBit: Mid, Worst') { flip_the_bit(worst_case_value_mid) }
scenario.report('FlipBit: Rich, Average') { flip_the_bit(average_case_value_rich) }
scenario.report('FlipBit: Rich, Best') { flip_the_bit(best_case_value_rich) }
scenario.report('FlipBit: Rich, Worst') { flip_the_bit(worst_case_value_rich) }
end
```

## Results and Discussion

Building the test-arrays was by far the most intense thing my computer had to do during the benchmarking process. After about two minutes, here were the results:

```
user system total real
TallyCheck: Poor, Best 1.718976 0.001031 1.720007 ( 1.729708)
TallyCheck: Poor, Worst 1.726422 0.000014 1.726436 ( 1.737059)
TallyCheck: Mid, Average 2.217027 0.000038 2.217065 ( 2.227950)
TallyCheck: Mid, Best 1.996516 0.000033 1.996549 ( 2.006564)
TallyCheck: Mid, Worst 2.003764 0.000009 2.003773 ( 2.016251)
TallyCheck: Rich, Average 6.929435 0.355002 7.284437 ( 7.314630)
TallyCheck: Rich, Best 4.645296 0.342828 4.988124 ( 5.013641)
TallyCheck: Rich, Worst 5.132136 0.340092 5.472228 ( 5.496994)
FlipBit: Poor, Best 1.660526 0.000000 1.660526 ( 1.670883)
FlipBit: Poor, Worst 1.578161 0.000000 1.578161 ( 1.588549)
FlipBit: Mid, Average 2.354981 0.000027 2.355008 ( 2.366234)
FlipBit: Mid, Best 1.710901 0.000004 1.710905 ( 1.721515)
FlipBit: Mid, Worst 1.631275 0.000000 1.631275 ( 1.643420)
FlipBit: Rich, Average 6.524084 0.307339 6.831423 ( 6.859530)
FlipBit: Rich, Best 1.754409 0.000009 1.754418 ( 1.764125)
FlipBit: Rich, Worst 1.628670 0.000040 1.628710 ( 1.638146)
```

Or, in handy table format:

Scenario | `#tally_and_check` | `#flip_the_bit` |
---|---|---|

Value-Poor, Best-Case | 1.73 seconds | 1.67 seconds |

Value-Poor, Worst-Case | 1.74 seconds | 1.59 seconds |

Value-Middling, Average-Case | 2.23 seconds | 2.37 seconds |

Value-Middling, Best-Case | 2.01 seconds | 1.72 seconds |

Value-Middling, Worst-Case | 2.02 seconds | 1.64 seconds |

Value-Rich, Average-Case | 7.31 seconds | 6.86 seconds |

Value-Rich, Best-Case | 5.01 seconds | 1.76 seconds |

Value-Rich, Worst-Case | 5.50 seconds | 1.64 seconds |

### How Right Raymond Was

In nearly^{1} all cases, Raymond’s approach achieved faster times. Particularly, for all of the best- and worst- case scenarios, the time for `#flip_the_bit`

was constant at around 1.7 seconds: as expected, his method doesn’t care how many distinct values the array has; driving to a specific hash key to make changes, and getting the oddly-frequent value at the end of the process, are both constant-time, $O(1)$ operations.

Also as expected, `#tally_and_check`

*did* care about the range of possible values in the array, since it resulted in a larger hash to check after the initial run-through. Moreover, even in the value-poor scenarios where Raymond’s bit-flipping is up against a `tallies`

-iteration with a hash only two keys large (so, the *best-case* best-case scenario for my method), he still consistently beats me.

I think this provides the clearest answer to his question of whether knocking out an unnecessary linear-time operation at the cost of some additional constant-time operations is still faster than two $O(n)$ operations: for my implementations of this benchmark, that’s a **yup**. Bravo!

### But… Huh.

There was one unexpected pattern that emerged from my experiment: at all levels of value-richness, the *average-case* arrays took *longer* to process than the best- and worst-case equivalents. What I’d anticipated was that the value would fall somewhere between the two extremes; given the counter-intuitive results, I’m worried I’ve set an unfair test.

The most obvious, consistent difference in the scenarios is that the average-case arrays are *shuffled* and the best- and worst-case arrays are *sorted*. Initially, then, I wondered if Ruby attempts to sort the arrays behind the scene prior to the `#find`

iteration in `#tally_and_check`

, adding a chunk of time for those scenarios… but `#flip_the_bit`

doesn’t use `#find`

, yet it exhibits the same slowdown. Could Ruby be sorting the array before even a basic `#each`

sweep, just as a matter of course? That seems extravagant.

My other thought was that maybe there’s some extra efficiency to accessing the same physical space in memory over-and-over for *either* tallying *or* key building/destroying, which might be what happens with a sorted array, but not with a shuffled array: in sorted-array scenarios, `tallies`

increments a value stored *in the same key over and over again*, and maybe something similar happens to `value_bits`

? I’d love to know for sure.

… But for now, it makes me wonder, could *this* be the edge `#tally_and_check`

needs to beat out `#flip_the_bit`

? What if our value-poor arrays were pre-shuffled?!

`#tally_and_check`

’s Hail Mary Pass

```
shuffled_value_poor = (0..1).map do |number|
Array.new(10_000_000, number)
end.flatten.push(rand(0..1)).shuffle
Benchmark.bmbm do |scenario|
scenario.report('TallyCheck: Poor, Shuffled') { tally_and_check(shuffled_value_poor) }
scenario.report('FlipBit: Poor, Shuffled') { flip_the_bit(shuffled_value_poor) }
end
```

```
user system total real
TallyCheck: Poor, Shuffled 1.893808 0.000000 1.893808 ( 1.904579)
FlipBit: Poor, Shuffled 1.621908 0.000000 1.621908 ( 1.626906)
```

Nope: `#flip_the_bit`

is just consistently better, and it leaves me confused about what could be causing the slowdown when processing the average-case scenarios. If anyone knows and would like to enlighten me, I’d be chuffed.

Meanwhile, thanks for the good clean fun, Raymond!

The Value-Middling, Average-Case (with shuffled arrays) always came out a little in favour of

`#tally_and_check`

—I have*no*idea why. I tried it about ten times. ↩︎