Benchmarking an Old Chestnut: Dead Horse Edition

Benchmarking an Old Chestnut: Dead Horse Edition

One more pass on the oddly-recurring value puzzle. There’s always a more optimal method.

I know, I know: the “beating-a-dead-horse” and “old-chestnut” metaphors mean there’s a horse chestnut joke to be made somewhere, but I haven’t come up with anything worth your time. Instead, here’s an article about gesztenyepüré, the Hungarian chestnut puree of my childhood (just add canned sour cherries for flawless Danny-nostalgia). Enjoy!

Raymond’s Posit

A few hours after posting my previous article, Raymond Kwan, a classmate from my Lighthouse Labs cohort, reached out with a potentially even geniuser solution to the oddly-recurring value puzzle:

… At the scale that you are testing at got me thinking if removing the second iteration would make any noticeable difference. I believe this can be achieved by instead of tallying, the hashmap just keeps a boolean. e.g. if key (number) does not exist in map then insert; else delete key. This will likely leave you with a hashmap with a size of 1 at the end of your iteration if I understood the constraints correctly. This will increase the number of constant time operations in your initial loop. But wondering if O(1n+a)O(1n + a) will be better than O(2n+b)O(2n + b); where a>ba > b.

As a quick refresher, my solution was to run through the provided array once, keeping a tally of each value encountered, and then iterate through the tallies to find the one with an odd value. This amounts to two operations of linear time-complexity (O(n)O(n)) with the size of the test array and the size of the tallies hash (ie., the number of distinct values in the test array) adding up to be the determining factor in the length of the operation.

If I understand Raymond correctly, he figures that since I’m exploring at the scale of multi-million-element test arrays and resultant (potentially) multi-thousand-key tally hashes, there may be meaningful room to optimize the process by eliminating that second iteration over the tallies hash. His solution is beautiful to me: don’t keep tallies for the values; just flip a bit for each value when you encounter it.

So, for example, in the array [ :smurf, :hobbit, :smurf ]:

  • Start the iteration:
    • Encounter a :smurf , flip the :smurf bit to true
    • Encounter a :hobbit, flip the :hobbit bit to true
    • Encounter a :smurf, flip the :smurf bit to false
  • Iteration’s done; the only remaining true bit belongs to :hobbit, so that must be the oddly-frequent value in the array. Distribute celebratory pipe-weed, console Gargamel, etc.

Because evenly-frequent values will ultimately turn themselves off while odd ones won’t, the answer is the last one standing. Raymond suggested using a hash to track our flipping bits by creating and deleting keys for the associated values when encountered: in the end, the hash would be left with just one remaining key. His question is, would the single iteration across the test array (O(n)O(n)) plus multiple constant-time (O(1)O(1)) operations of…

  • checking addresses to see if the hash is populated, and then
  • creating or deleting the keys as appropriate
  • plus the final step of finding the hash’s only key and asking for its name…

… be faster than my approach? And, under what circumstances?

Setting Up the Board

Here’s how I’ve implemented Raymond’s approach:

def flip_the_bit(test_array)
  value_bits = { }

  test_array.each do |value|
    if value_bits[value]
      value_bits.delete(value)
    else
      value_bits[value] = true
    end
  end

  value_bits.first[0]
end

And, for reference, here’s my approach from the previous article:

def tally_and_check(test_array)
  tallies = Hash.new(0)

  test_array.each do |value|
    tallies[value] += 1
  end

  tallies.each do |original_value_from_the_test_array, frequency|
    if frequency.odd?
      return original_value_from_the_test_array
    end
  end
end

I’m hypothesizing that the nature of the test array is going to be especially important: it seems to me that the larger the tallies hash I need to check, the more significant Raymond’s savings. So, I need to create some new test-arrays to better exercise this. Here’s a guide:

Range of Possible ValuesLocation of Oddly-Recurring ValueName of Array
0..1Earliest possible indexbest_case_value_poor
0..1Latest possible indexworst_case_value_poor
1..2000Earliest possible indexbest_case_value_mid
1..2000Random indexaverage_case_value_mid
1..2000Latest possible indexworst_case_value_mid
1..10_000_000Earliest possible indexbest_case_value_rich
1..10_000_000Random indexaverage_case_value_rich
1..10_000_000Latest possible indexworst_case_value_rich

And, here’s the implementation, again resulting in arrays that are twenty million and one elements long:

# Value-Poor Arrays
best_case_value_poor = (0..1).map do |number|
  Array.new(10_000_000, number)
end.flatten.prepend(0)

worst_case_value_poor = (0..1).map do |number|
  Array.new(10_000_000, number)
end.flatten.push(1)


# Value-Middling Arrays
average_case_value_mid = (1..2000).map do |number|
  Array.new(10_000, number)
end.flatten.push(rand(1..2000)).shuffle

best_case_value_mid = (1..2000).map do |number|
  Array.new(10_000, number)
end.flatten.prepend(1)

worst_case_value_mid = (1..2000).map do |number|
  Array.new(10_000, number)
end.flatten.push(2000)


# Value-Rich Arrays
average_case_value_rich = (1..10_000_000).map do |number|
  Array.new(2, number)
end.flatten.push(rand(1..10_000_000)).shuffle

best_case_value_rich = (1..10_000_000).map do |number|
  Array.new(2, number)
end.flatten.prepend(1)

worst_case_value_rich = (1..10_000_000).map do |number|
  Array.new(2, number)
end.flatten.push(10_000_000)

It takes my system about 30 seconds and 10% of its RAM to instantiate these examples before the benchmark even starts up.

Penultimately, here’s my benchmark setup:

require 'benchmark'

Benchmark.bmbm do |scenario|
  ###  Tally and Check Scenarios  ###
  scenario.report('TallyCheck: Poor, Best') { tally_and_check(best_case_value_poor) }
  scenario.report('TallyCheck: Poor, Worst') { tally_and_check(worst_case_value_poor) }

  scenario.report('TallyCheck: Mid, Average') { tally_and_check(average_case_value_mid) }
  scenario.report('TallyCheck: Mid, Best') { tally_and_check(best_case_value_mid) }
  scenario.report('TallyCheck: Mid, Worst') { tally_and_check(worst_case_value_mid) }

  scenario.report('TallyCheck: Rich, Average') { tally_and_check(average_case_value_rich) }
  scenario.report('TallyCheck: Rich, Best') { tally_and_check(best_case_value_rich) }
  scenario.report('TallyCheck: Rich, Worst') { tally_and_check(worst_case_value_rich) }

  ###  Flip the Bit Scenarios  ###
  scenario.report('FlipBit: Poor, Best') { flip_the_bit(best_case_value_poor) }
  scenario.report('FlipBit: Poor, Worst') { flip_the_bit(worst_case_value_poor) }

  scenario.report('FlipBit: Mid, Average') { flip_the_bit(average_case_value_mid) }
  scenario.report('FlipBit: Mid, Best') { flip_the_bit(best_case_value_mid) }
  scenario.report('FlipBit: Mid, Worst') { flip_the_bit(worst_case_value_mid) }

  scenario.report('FlipBit: Rich, Average') { flip_the_bit(average_case_value_rich) }
  scenario.report('FlipBit: Rich, Best') { flip_the_bit(best_case_value_rich) }
  scenario.report('FlipBit: Rich, Worst') { flip_the_bit(worst_case_value_rich) }
end

Results and Discussion

Building the test-arrays was by far the most intense thing my computer had to do during the benchmarking process. After about two minutes, here were the results:

                                user     system      total        real
TallyCheck: Poor, Best      1.718976   0.001031   1.720007 (  1.729708)
TallyCheck: Poor, Worst     1.726422   0.000014   1.726436 (  1.737059)

TallyCheck: Mid, Average    2.217027   0.000038   2.217065 (  2.227950)
TallyCheck: Mid, Best       1.996516   0.000033   1.996549 (  2.006564)
TallyCheck: Mid, Worst      2.003764   0.000009   2.003773 (  2.016251)

TallyCheck: Rich, Average   6.929435   0.355002   7.284437 (  7.314630)
TallyCheck: Rich, Best      4.645296   0.342828   4.988124 (  5.013641)
TallyCheck: Rich, Worst     5.132136   0.340092   5.472228 (  5.496994)


FlipBit: Poor, Best         1.660526   0.000000   1.660526 (  1.670883)
FlipBit: Poor, Worst        1.578161   0.000000   1.578161 (  1.588549)

FlipBit: Mid, Average       2.354981   0.000027   2.355008 (  2.366234)
FlipBit: Mid, Best          1.710901   0.000004   1.710905 (  1.721515)
FlipBit: Mid, Worst         1.631275   0.000000   1.631275 (  1.643420)

FlipBit: Rich, Average      6.524084   0.307339   6.831423 (  6.859530)
FlipBit: Rich, Best         1.754409   0.000009   1.754418 (  1.764125)
FlipBit: Rich, Worst        1.628670   0.000040   1.628710 (  1.638146)

Or, in handy table format:

Scenario#tally_and_check#flip_the_bit
Value-Poor, Best-Case1.73 seconds1.67 seconds
Value-Poor, Worst-Case1.74 seconds1.59 seconds
Value-Middling, Average-Case2.23 seconds2.37 seconds
Value-Middling, Best-Case2.01 seconds1.72 seconds
Value-Middling, Worst-Case2.02 seconds1.64 seconds
Value-Rich, Average-Case7.31 seconds6.86 seconds
Value-Rich, Best-Case5.01 seconds1.76 seconds
Value-Rich, Worst-Case5.50 seconds1.64 seconds

How Right Raymond Was

In nearly1 all cases, Raymond’s approach achieved faster times. Particularly, for all of the best- and worst- case scenarios, the time for #flip_the_bit was constant at around 1.7 seconds: as expected, his method doesn’t care how many distinct values the array has; driving to a specific hash key to make changes, and getting the oddly-frequent value at the end of the process, are both constant-time, O(1)O(1) operations.

Also as expected, #tally_and_check did care about the range of possible values in the array, since it resulted in a larger hash to check after the initial run-through. Moreover, even in the value-poor scenarios where Raymond’s bit-flipping is up against a tallies-iteration with a hash only two keys large (so, the best-case best-case scenario for my method), he still consistently beats me.

I think this provides the clearest answer to his question of whether knocking out an unnecessary linear-time operation at the cost of some additional constant-time operations is still faster than two O(n)O(n) operations: for my implementations of this benchmark, that’s a yup. Bravo!

But… Huh.

There was one unexpected pattern that emerged from my experiment: at all levels of value-richness, the average-case arrays took longer to process than the best- and worst-case equivalents. What I’d anticipated was that the value would fall somewhere between the two extremes; given the counter-intuitive results, I’m worried I’ve set an unfair test.

The most obvious, consistent difference in the scenarios is that the average-case arrays are shuffled and the best- and worst-case arrays are sorted. Initially, then, I wondered if Ruby attempts to sort the arrays behind the scene prior to the #find iteration in #tally_and_check, adding a chunk of time for those scenarios… but #flip_the_bit doesn’t use #find, yet it exhibits the same slowdown. Could Ruby be sorting the array before even a basic #each sweep, just as a matter of course? That seems extravagant.

My other thought was that maybe there’s some extra efficiency to accessing the same physical space in memory over-and-over for either tallying or key building/destroying, which might be what happens with a sorted array, but not with a shuffled array: in sorted-array scenarios, tallies increments a value stored in the same key over and over again, and maybe something similar happens to value_bits? I’d love to know for sure.

… But for now, it makes me wonder, could this be the edge #tally_and_check needs to beat out #flip_the_bit? What if our value-poor arrays were pre-shuffled?!

#tally_and_check’s Hail Mary Pass

shuffled_value_poor = (0..1).map do |number|
  Array.new(10_000_000, number)
end.flatten.push(rand(0..1)).shuffle

Benchmark.bmbm do |scenario|
  scenario.report('TallyCheck: Poor, Shuffled') { tally_and_check(shuffled_value_poor) }

  scenario.report('FlipBit: Poor, Shuffled') { flip_the_bit(shuffled_value_poor) }
end
                                 user     system      total        real
TallyCheck: Poor, Shuffled   1.893808   0.000000   1.893808 (  1.904579)
FlipBit: Poor, Shuffled      1.621908   0.000000   1.621908 (  1.626906)

Nope: #flip_the_bit is just consistently better, and it leaves me confused about what could be causing the slowdown when processing the average-case scenarios. If anyone knows and would like to enlighten me, I’d be chuffed.

Meanwhile, thanks for the good clean fun, Raymond!

  1. The Value-Middling, Average-Case (with shuffled arrays) always came out a little in favour of #tally_and_check—I have no idea why. I tried it about ten times. ↩︎