Split-Testing and Late-Night Coding


Split Tests are a very powerful way to determine if a thing you are doing is in your best interest or not. If you aren’t familiar with A/B tests, the very high level is that you show half of your users one thing, half another thing, and you compare the results of each set of users.


One of the more mundane, human problems you can run into as an engineer is tiredness. Tunnel vision sets in, and you focus on specific tasks. If you’re solving a complex math problem, you make an arithmetic error. If you’re a coder not using TDD, often you’re running into syntax errors or the like.

These things happen.


At IMVU, we deploy code fifty times a day. The code you just wrote goes out to the production cluster without waiting for QE people to sign off on it, or for your manager to approve it.

Engineering under TDD acts like a blanket to shield you from a lot of problems. You know from the get-go if you broke something in a more interesting way (assuming said interesting way was protected by a test). The tests exist to protect you before you get near the production systems. A failed test on your buildbot is a good thing, because that’s a broken thing your customers never had to deal with. As someone who had jobs where he was tacitly encouraged to code untested fixes on the fly on the production machines in other jobs, this is a godlike boon.

However, things go wrong, and sandboxes and production environments still can disagree! This is why the final step of pushing code live to a production server is always, always, always verify your change manually out in the live website.

And if it’s not verified (or if your change breaks things in very unexpected ways, causing die spikes or high database load or the such), to revert to a previous known good state and investigate further.


Now that all of the background for tonight is set up, I’ll skip the actual stupidity, and jump straight to the conclusion:

When testing a feature wrapped in a spiffy-keen augmentation of an experiment system live, please remember to take into account that you have a 50% chance to land in the “nothing happens” side of the experiment. This will probably save you a lot of trouble.

It was a pleasant surprise when I looked up to see a wonderful forest around me. I’d just been staring at this one tree over here.

The other kind of Restful

The fun part is my team’s Quality Engineer, who was familiar with this feature and had tested it in production live before, was also up working. He didn’t catch the obvious either, because it was so obvious.

Say a human has a 10% chance of making a mistake/forgetting something obvious. Say he’s got a pairing partner with the same ratio. Assuming an ideal world, you’ve reduced your net error ratio to 1% by having another dude on the case with you. Yay pairing!

If both dudes are tired, their error rates go up, and even pairing, you still have a higher chance of fucking something up. Boo tired!

…Perhaps there’s a better moral than the one listed above hidden in here?

Also read...


Leave a Reply

Your email address will not be published. Required fields are marked *