Mostly out of my head; feel free to correct me if you spot an error:
- The most common use for random data in computer security is as a secret: a password, a key, or a session ID for instance.
- Random data is the result of a random process.
- An ideal toss of a coin generates 1 bit of random data. [Update 2009-11-27: Or maybe not.] For n tosses there are 2n different strings of n bits that could be the result.
- Producing random data is hard on a mostly deterministic machine, particularly if large quantities are needed.
- Data that looks random doesn’t have to be. For example, base64-encoded data looks random to some people although it isn’t.
- The output of a random process has certain statistical properties that can be tested.
- A deterministic process can produce output that exposes the same statistical properties as a truly random process for any finite number of bits.
- It is therefore impossible, for any finite string of bits, to prove that it is the result of a random process.
- It is, however, possible to detect and reject as non-random the results of some deterministic processes on the basis of statistical tests.
- A deterministic process designed to produce output that passes some of the statistical randomness tests is called a pseudorandom number generator (PRNG).
- There are different qualities of pseudorandom number generators. The simpler ones are unsuitable for security applications.
- Data from a PRNG that looks random according to statistical test can still be entirely predictable if the PRNG is known and its internal state can be worked out.
- The whole point of using random data in security is to make guessing hard. The idea is that to guess a random secret an attacker will have to search the entire space of possible values in the worst case and half of it in the average case.
- Attackers cannot be prevented from guessing secrets. If there is an easier way than guessing, the attacker is assumed to prefer the easier way.
- The inner state of some PRNGs can be determined from a small number of subsequent output values.
- Up to a certain length and under a number of side conditions, the output of cryptographic primitives such as hash functions or ciphers are expected to expose the statistical properties of random data.
- Pseudorandom number generators for security purposes often combine a source of random data with cryptographic methods to produce an almost arbitrary amount of random data. True random data is used to seed the generator, which then expands the data.
- Potential sources of random data in a deterministic computer system are external events and digitized noise from analog circuits.
- Using digitized noise isn’t easy either. The process might be biased, limiting the randomness obtained.
- Humans are poor random number generators unless they execute a random process, e.g. repeatedly tossing a coin.
- Stream ciphers combine a (pseudo)random key stream with the data stream using bitwise XOR.
- A string of truly random data as a message has maximum entropy. Such random data thus cannot be compressed.
- Random data can also be used as test input into software components. This type of testing is called fuzzing.
- For an ideal block cipher, flipping a single bit in either the clear text or the key flips each bit of the cipher text with probability 0.5.
Note that I’m sloppy in my use of the term random. Randomness can mean a lot of different things and I’m really making a few assumptions.