JavaScript, Node, Web Architecture

Authentication – from “Programming JavaScript Applications”

I have read a lot of books about how to build applications, but I’ve never seen one that adequately covers the topic of authentication, so I decided to dedicate a section to it in my book, Programming JavaScript Applications. Enjoy this free excerpt.

Authentication

Authentication is the mechanism which confirms the identity of users trying to access a system. In order for a user to be granted access to a resource, they must first prove that they are who they claim to be. Generally this is handled by passing a key with each request (often called an access token). The server verifies that the access token is genuine, that the user does indeed have the required privileges to access the requested resource, and only then is the requset granted.

There are many ways to grant a user an access token. The most common is a password challenge.

Passwords

Passwords should be stored with a one way encryption hash, so that even if a malicious intruder obtains access to the user database, they still won’t have access to user passwords.

The hash should be sufficiently long not just to prevent an attack from a single machine, but to prevent an attack from a large cluster of machines.

Worms targeting vulnerable versions of popular website platforms such as WordPress and Drupal have become common. Once such a worm takes control of a website and installs its payload, it can recruit all of the site’s traffic into a JavaScript botnet, and, among other things, use visitor CPU power to crack stolen password databases which fail to implement the security precautions outlined here.

There are botnets that exist today with over 90,000 nodes. Such botnets could crack MD5 password hashes at a rate of nine billion per second.

Passwords are vulnerable to the following common attacks:

  • Rainbow tables
  • Brute force
  • Variable time equality
  • Passwords stolen from third parties

Rainbow tables

Rainbow tables are precomputed tables used to look up passwords using stolen hashes. Once bad guys get their hands on user passwords, they’ll attempt to attack popular services such as email and bank accounts — which spells very bad PR for your service.

There are rainbow tables that exist today which can discover almost every possible password up to 14 characters. To prevent password theft by rainbow table, users should choose passwords of at least 14 characters. Sadly, such passwords are definitely not convenient, particularly on mobile devices. In other words, you should not rely on users to select appropriate passwords.

Rainbow tables can significantly reduce the time it takes to find a password, at the cost of memory, but with terabyte hard drives and gigabytes of RAM, it’s a trade off that is easily made. That said, it is possible to protect your service against rainbow table attacks.

Password Salts

One defence you can employ against rainbow tables is password salting. A salt is a sequence of random characters that gets paired with a password during the hashing process. Salts should be cryptographically secure random values of a length equal to the hash size. Salts are not secrets, and can be safely stored in plaintext alongside the user’s other credentials.

Salting can protect passwords in a couple of ways:

First, a uniquely generated salt can protect your password databases against existing rainbow tables. Using a random salt makes your site immune from these attacks. However, if you use the same salt for every password, a new rainbow table can be generated to attack the password database.

Second, if two different users use the same password, the compromised password will grant access to both user accounts. To prevent that, you must use a unique salt for each password. Doing so makes a rainbow table attack impractical.

Node.js supplies a suitable random generator called crypto.randomBytes(). It returns a buffer, so you’ll need to wrap it to get a suitable salt string:

/**
 * createSalt(keylength, callback) callback(err, salt)
 *
 * Generates a cryptographically secure random string for
 * use as a password salt using Node's built-in
 * crypto.randomBytes().
 *
 * @param  {Number} keyLength
 * @param  {Function} callback 
 * @return {undefined}
 */
var createSalt = function createSalt(keyLength, callback) {
  crypto.randomBytes(keyLength, function (err, buff) {
    if (err) {
      return callback(err);
    }
    callback(null, buff.toString('base64'));
  });
};

The operation is asynchronous because the cryptographically secure random number generator takes time to collect enough entropy to complete the operation.

Brute force

Rainbow tables get all the blogger attention, but Moore’s Law is alive and well, and brute force has become a very real threat. Attackers are employing GPUs, super computing clusters that cost less than $2,000, and JavaScript botnets comprised of tens of thousands of browsers visiting infected websites.

A brute force attack will attempt to crack a password by attempting a match using every possible character combination. A simple single-iteration hash can be tested at the rate of millions of hashes per second on modern systems.

One way to thwart brute force attacks is to programatically lock a user’s account after a handful of failed login attempts. However, that strategy won’t protect passwords if an attacker gains access to the password database.

Key stretching can make brute force attacks impractical by increasing the time it takes to hash the password. This can be done by applying the hash function in a loop. The delay will be relatively unnoticed by a user trying to sign in, but will significantly hamper an attacker attempting to discover a password through brute force.

You should not simply pick a random hash function and apply it in a loop. It’s too easy to unwittingly open up attack vectors. Instead, use an established standard for iterative hashing, such as bcrypt or PBKDF2.

I discovered 100 hashes in less than 1ms using a simple MD5 algorithm, and then tried the same thing with Node’s built-in crypto.pbkdf2() function (HMAC-SHA1) set to 80,000 iterations. PBKDF2 took 15.48 seconds. To a user performing a single login attempt per response, the slow down is barely noticed, but it slows brute force to a crawl.

Usage is deceptively simple:

crypto.pbkdf2(password, salt,
  iterations, keyLength, function (err, hash) {
    if (err) {
      return callback(err);
    }
    callback(null, new Buffer(hash).toString('base64'));
  });  

However, there are important considerations that shouldn’t be overlooked, such as generating the appropriate unique, crytographically secure random salt of the right length, and calculating the number of iterations in order to balance user experience and security.

Variable time equality

If it takes your service longer to say no to a slightly wrong password than a mostly wrong password, attackers can use that data to guess the password, similar to how you guess a word playing hangman. You might think that random time delays and network timing jitter would sufficiently mask those timing differences, but it turns out an attacker just needs to take more timing samples to filter out the noise and obtain statistically relevant timing data:

From Crosby et al. “Opportunities And Limits Of Remote Timing Attacks”:

We have shown that, even though the Internet induces significant timing jitter, we can reliably distinguish remote timing differences as low as 20┬Ás. A LAN environment has lower timing jitter, allowing us to reliably distinguish remote timing differences as small as 100ns (possibly even smaller). These precise timing differences can be distinguished with only hundreds or possibly thousands of measurements.

The best way to beat these attacks is to use a constant time hash equality check, rather than an optimized check. That is easily achieved by iterating through the full hash before returning the answer, regardless of how soon the answer is known.

For more information, see Coda Hale’s “A Lesson in Timing Attacks”.

Here is an example of a constant time string equality algorithm in JavaScript:

/**
 * constantEquals(x, y)
 *
 * Compare two strings, x and y with a constant time
 * algorithm to prevent attacks based on timing statistics.
 */
constantEquals = function constantEquals(x, y) {
  var result = true,
    length = (x.length > y.length) ? x.length : y.length,
    i;

  for (i=0; i<length; i++) {
    if (x.charCodeAt(i) !== y.charCodeAt(i)) {
      result = false;
    }
  }
  return result;
};

Stolen passwords

By far the biggest threat to password security is the fact that these tactics have already worked against other websites, and users have a tendency to reuse passwords across different sites. Since you don’t have access to the user’s other accounts for verification, there’s little you can do to enforce unique passwords on your website.

As you have seen, passwords alone are an ineffective authentication system, but they can still be useful in combination with other authentication factors.

Credential

I searched for a suitable open source password authentication module in NPM, but I couldn’t find one that met all of the criteria you should consider when you’re implementing password authentication in your applications. This is a critical component of your system security, so it’s important to get it right. I created a library to make it easy.

Install credential:

$ npm install --save credential

.hash():

var pw = require('credential'),
  newPassword = 'I have a really great password.';

pw.hash(newPassword, function (err, hash) {
  if (err) { throw err; }
  console.log('Store the password hash.', hash);
});

.verify():

var pw = require('credential'),
  storedHash = '{"hash":...', // truncated to fit on page
  userInput = 'I have a really great password.';

pw.verify(storedHash, userInput, function (err, isValid) {
  var msg;
  if (err) { throw err; }
  msg = isValid ? 'Passwords match!' : 'Wrong password.';
  console.log(msg);
});

Multi-factor authentication

Because of the threat of stolen passwords, any policy which relies solely on password protection is unsafe. In order to protect your system from intruders, another line of defense is necessary.

Multi-factor authentication is an authentication strategy which requires the user to present authentication proof from two or more authentication factors: The knowledge factor (something the user knows: password, etc…), the possession factor (something the user has: mobile phone, etc…), and the inherence factor (something the user is, fingerprint, etc…).

Knowledge factor

A common secondary security mechanism that was widely implemented in the financial industry just a few years ago are “security questions”. Pairing a password with security questions does not qualify as multi-factor authentication, though, because you need the user to pass challenges from two or more authentication factors. Using multiple knowledge factor challenges does not prevent a determined snoop from breaking in.

Multi-factor authentication means that an attacker would have to be both a snoop and a thief, for instance.

Posession factor

For corporate and government intranets, it’s common to require some type of physical token or key to grant access to systems. Mechanisms include USB dongles, flash card keys, etc…

OTPs (One Time Passwords) are short-lived passwords that work only for a single use. They satisfy the posession factor because they’re usually generated by a dedicated piece of hardware, or by an app on the user’s mobile phone. The device is paired with the service that is being authenticated against in a way that cannot be easily spoofed by impostors.

Google released a product called Google Authenticator that generates one time passwords for mobile devices. There is a node module called speakeasy that lets you take advantage of Google authenticator to authenticate users using the posession factor.

Install Speakeasy:

$ npm install --save speakeasy

Then take it for a spin:

var speakeasy = require('speakeasy');

// Returns a key object with ascii, hex, base32 and
// QR code representations (the QR code value is a
// Google image URL):
var key = speakeasy.generate_key({
  length: 20,
  google_auth_qr: true
});

// This should match the number on your phone:
speakeasy.time({key: key.base32, encoding: 'base32'});
Standard
JavaScript, Web Architecture

Polymorphic functions and method dispatch in JavaScript

Don’t be afraid of the big words. This stuff is simple. Polymorphism just means that something behaves differently based on context, like certain words can have different meanings depending on how they’re used. e.g.,

  • “Watch out for that sharp turn in the road!”
  • “Wow, that knife is sharp!”
  • “I know you’re sharp enough to wrap your head around these polymorphic functions.”

We’re going to make our functions behave differently based on the parameters you pass into them. In JavaScript, those parameters are stored in the array-like arguments object, but it’s missing useful array methods.

We can instantiate a real array and borrow its slice method. This technique is called method delegation, not to be confused with event delegation. Notably, using [].slice.call is both shorter and faster than the commonly used Array.prototype.slice.call approach.

A Function that Sorts Parameters

Let’s break it down. Slice is an easy way to shallow-copy an array (or in this case, an array-like object).

Now that args is a real array, we can use the sort method to sort the contents. Yay!

This is really handy to grab the first argument off the stack:

Function Polymorphism

One common use of this pattern is to change the behavior of a function depending on what gets passed into it. This is called a polymorphic function:

Method Dispatch for Chainable Modules

This gets really interesting in the context of a module.

The goal is to expose a public api for a chainable module. If the module is part of a larger framework, you might not want to clutter the framework namespace with all your module’s public methods. Using method dispatch, we can call methods by passing the method name as the first parameter.

This is a great pattern for jQuery plugins:

Standard
Web Architecture

Basic Website Deployment Strategy

Deployment is a big part of the web application development process that is frequently overlooked, or taken for granted — but when it’s not handled properly, it can cause very expensive problems.

When you’re a single developer creating a small application, it’s not unusual to simply edit your source files, and upload them directly to the server. That’s it. No complex deployment necessary.

However, when you’re working on a website with millions of monthly visitors and hundreds of dollars per minute at stake, that strategy is out of the question.

In order to avoid a lot of lost revenue, a more robust strategy is called for.

Backups

I once worked for a company that hosted hundreds of websites on a proprietary content management system. When I got there, their idea of a deployment strategy was to keep daily backups of the whole system — operating system, database, and all. If anything went wrong, they would restore content from the previous nightly.

The problem came one day when something did go wrong. I didn’t want to just restore from backup. What would happen to all the changes users made that day? They would vanish into thin air. Hours of work, wasted. How would you feel if that happened to your data?

I wanted to fix the bugs without resorting to the backups, but my opinion didn’t matter — that’s the way they had always done things. I wasn’t about to stand in the way of a well established system. They restored from backup, I was flooded with user complaints, and the blame for the system failure was on my head. I gave my notice the next day. I had a feeling that a company with such disregard for customer happiness wasn’t going very far.

I just did a search to see if they’re still around. You can’t make this stuff up: Their website is still active, but currently down. Their traffic rankings are in the toilet. Looks like I made the right call.

Don’t get me wrong — I’m all for data redundancy and nightly backups. However, relying on them to fix your blunders is nothing short of corporate suicide.

Version Control

Version control, AKA revision control, revision management, or version management, is the process of storing a history of every change to the source. You edit your source code like you normally would, but when you’re done with a new feature or bug fix, you commit the changes to your version control system.

Now you have an easy way to fix something if your changes introduce bugs. Just rewind to the previous version and try again. Nothing else gets impacted, and you don’t have to worry about screwing up the operating system, server configuration, or user data changes that occurred in the mean time.

This is all well and good — but what if you have a complex code base with a lot of contributors? Sorting out exactly when and how things got broken is not always as simple as rolling back to the previous version. There is a chance that two different people editing two completely different (but interdependent) files made changes that conflict with each other over the course of several commits.

That could take some time to track down, and if you’re editing against the production code, that could mean some very costly downtime.

Staging Environment

The next step is to deploy a staging environment. Staging is a mirror of the production environment that you can use to do test deployments. Now you develop your new features against the staging environment, and keep a stable build in production.

The problem with this strategy comes when you need to make hotfixes on the live production code. You should make those bug fixes in a copy of the live environment so that you can test your fixes before you commit them back. If you only have one other environment, where do you make your hotfixes?

Branching Strategy

This is where branching comes in. Create a master branch, which is always a stable mirror of the production build. When you need to make a hotfix on a bug in production, you check out the master branch, make your fix, commit it, and push it into production. If the bug fix is screwed up, rewind the change and try again.

New features are added to a development branch. Development is frequently an unstable version of your application, which might break and get fixed several times in a busy development day. While developers are strongly encouraged to test their code before committing to development, sometimes severely broken code gets checked in accidentally. After all – developers are often under pressure and time constraints.

For this reason, the development branch should never get pushed directly into production without first being passed through QA. Whether you’re a one-man developer team, or a large organization, you should not forget to test.

Version Tags

When the next version is feature complete — meaning that all of the features described in the specification document (you have one of those, right?) are implemented, the code is under feature freeze, and all further changes are made for the purpose of fixing bugs and polishing the branch for production release.

Meanwhile, new development can still take place concurrently in a future version branch.

Serious bugs that get discovered and fixed in the course of new development can be selectively back-ported to the development and production branches. Likewise, bugs that get fixed in the master branch that also affect the development branches can also get merged into the other branches. Distributed version control systems like git make cherry picking changes a fairly straightforward process, once you wrap your head around it.

Just don’t forget to test!

Testing

Most projects should have a thorough test suite that encompasses both automated unit tests, and high-level tests that get passed via visual inspection.

Unit tests are sanity checks that make assertions like, myMethodReturnValue == 1. If it’s truthy, the assertion passes and you get a green light. If it isn’t, it fails, and you know something’s broken in the code.

High level tests might make assertions like “Cart item subtotals line up with item descriptions on the printed receipt.” Essentially, they’re a checklist of tests that can’t be automated that the QA team needs to run through to make sure that the release is ready to move into production.

Many teams will have a dedicated testing environment for this purpose, but don’t let that stop you if you’re a one-man show. Being thorough about testing can save even a lone coder a lot of lost revenues if they don’t have to wait for users to find (and maybe report the bugs. In the testing environment, all the debug switches will be turned on, and the application will do lots of logging. The unit test suite will be used to monitor committed changes, and frequently there will be a dashboard full of green lights (or red lights, if something goes awry). Use your checklist and an automated testing suite to guide you through the application wizard-style for the final “all systems go for launch!”

More Resources

I didn’t invent any of this stuff. The concept of release and hotfix branching has been around for years, and I have used them on a number of projects. However, there are some new tools, and a formalized branching strategy standard developed specifically for the distributed version control system, Git.

We very recently decided to adopt a new branching model by Vincent Driessen, aided by a tool called Git Flow. I like it so far. Check it out!

A short introduction to Git Flow from Mark Derricutt on Vimeo.

This post owes thanks to Vlad and Douglas at Zumba for introducing me to Git Flow. You can read Vlad’s thoughts on CakePHP at Nuts and Bolts of CakePHP.

Standard