In practice, however, many PRNGs exhibit design artifacts that can cause them to fail statistical tests. Such artifacts include consecutive values that are correlated and values with different lengths. Moreover, because the outputs are determined solely by the input values, an attacker who knows the algorithm and the seed can calculate or even reproduce the entire sequence of output values [2].
True to Form
These weaknesses lead many to believe that only numbers produced by true random number generators (TRNGs) should be used by mission-critical systems and data processing applications, including those that use distributed ledger technology (DLT). TRNGs, also called non-deterministic random bit generators (NRBGs), use naturally-occurring sources of entropy to generate sequences of binary digits with the following properties [3]:
Randomness [2]
- Uniform distribution(no bias): All numbers are equally likely to appear in a sequence. In terms of binary bits, the probability of a zero or one is exactly 1/2.
- Consistency: The RNG’s behavior is consistent across seeds.
- Scalability: Tests that are applied to a sequence can also be applied to randomly extracted subsequences.
Unpredictability [2]
- Independence: Each generated value is statistically independent and does not convey any information about other values.
- Forward unpredictability: Predicting future output is infeasible with only knowledge of past output.
- Backward unpredictability: Determining the seed is infeasible with only knowledge of past output.
- Irreproducibility: Two of the same generators, given the same starting conditions, will produce different outputs.
Tried and True
The term “true random number generator” often refers to physical RNGs that use dedicated hardware to capture or measure unpredictable physical phenomena [4]. These hardware-dependent solutions usually come in the form of on-chip components (for example, Intel Digital Random Number Generator) [5], special add-on cards (for example, IDQ Quantis) [6], or physical sensors. All form factors tend to be slower and more expensive compared to PRNGs.
TRNGs are based on the idea that true randomness can be obtained only from natural sources such as electromagnetic and quantum phenomena, essentially sources outside of man-made systems. The following are examples of natural sources:
- Thermal noise from semiconductor diodes or resistors: This electrical or radio frequency noise is generated when charge carriers, typically electrons within an electrical conductor, are thermally agitated. The charge carriers vibrate as a result of the temperature–the higher the temperature, the higher the agitation and hence the thermal noise level [7].
Example: Intel Digital Random Number Generator [5] - Atmospheric noise: This radio noise is caused by natural atmospheric processes, primarily lightning discharges in thunderstorms. Around 40 lightning discharges occur around the world per second, which translates to roughly 3.5 million lightning discharges per day [8].
Example: Random.org [9] - Radioactive decay: This is the spontaneous breakdown of an unstable atomic nucleus that results in the release of energy (electromagnetic waves) and/or matter (subatomic particles). Random numbers are usually produced by measuring the elapsed time between particle emissions [10].
Example: HotBits [11]
Other TRNG implementations, called non-physical non-deterministic random bit generators, obtain entropy from sources that do not use dedicated hardware [4]. The behavior of such processes can vary considerably depending on the computer platform and other factors.
- System resources or physical environment of a computer
Examples: Instantaneous values of the system clock; electrical activity or air turbulence within sealed hard drives - Human-machine interactions
Examples: Time between keystrokes; mouse movement
Too Good to be True
The randomness that TRNGs provide come with certain issues.
- Susceptibility to tampering: Many TRNGs are not completely secure against attackers who want to observe or manipulate their internal state in order to predict or influence future output.Physical TRNGs are often built externally to the system using the generated numbers and are therefore susceptible to observation or manipulation by attackers. The most common way to compromise a physical TRNG is to tamper with the hardware. The following are examples of tampering [12].
- Increasing or reducing the drive voltage beyond the allowed levels
- Running the device in temperatures that are outside the specification
- Implanting a hardware trojan or backdoor into the device
Attackers can also observe or manipulate non-physical TRNGs, especially if the data source contains little or no entropy. For instance, if the attacker knows that the system clock is the entropy source and has a rough idea of when a value was generated, he/she can guess the inputs and resulting outputs. While low entropy is not a guarantee of compromise, it does weaken the entire RNG mechanism.
- Throughput and speed: Since true random numbers are based on physical phenomena, throughput is often limited. Large numbers of requests are difficult to service within short periods. Moreover, the speed at which values can be produced is dependent on the underlying physical phenomena being measured [12].
- Output quality: Some entropy sources can result in output that is biased or correlated in some way. Values do not always distribute evenly across the space of all possible values. Some values are more likely to occur than others, and certain values can almost never occur in practice because the input data range may be limited [2].
Because of these defects, some implementations of TRNGs include post-processing of the raw data. One such method is “conditioning,” which involves modifying the bit stream (scrambling the bits) to reduce bias and/or increase the entropy rate (further randomizing the bits) of the resulting output. Conditioning often takes the form of passing samples collected by the entropy source to a cryptographic hashing algorithm [13], which transforms data of various lengths to values of fixed length (often called a hash value). Hashing algorithms are one-way mathematical functions. This means that computing the hash value of a piece of data is easy, but using the same hash value to determine the original piece of data is infeasible [14].
The Hybrid (Enough) Truth
The arguments in favor of TRNG usage are usually based on the assumption that random numbers cannot be computed because computers operate in deterministic ways — they follow a set of instructions provided by humans, executing each instruction precisely. RNGs based solely on deterministic computation cannot be regarded as “true” RNGs in the strictest sense of the word, since their output is inherently predictable if the seed values are known.
To achieve the randomness and throughput required by many real-world applications we must combine the desirable attributes of TRNGs and PRNGs. In one such hybrid approach, multiple TRNGs supply raw random numbers to an entropy pool which, in turn, provides random seed values to a cryptographically secure PRNG (CSPRNG). This CSPRNG then generates authentic random numbers that exhibit the correct statistical properties as well as significant resistance to computational attacks [12]. This is also how Unitychain’s RNGProtocol produces large quantities of unique and verifiable authentic random numbers (aRNs), with a “Proof of Randomness” report provided (more on this soon).
Intel calls RNGs that use this approach “cascade construction random number generators.” This type of RNG presents the following advantages [15]:
- The construction allows the CSPRNG (cryptographically secure PRNG) operator to tap into as many entropy sources as are available. Entropy sources can fail or be compromised by attackers so the more entropy that is supplied to the RNG, the more secure the output is presumed to be.
- Using seed values from high-entropy sources mitigates the effects of the CSPRNG’s deterministic behavior. In other words, random seed values play a big part in making authentic random numbers possible.
- CSPRNGs are capable of generating many random numbers within a short period because they are processor-based. A fast CSPRNG that is seeded by slow entropy sources still outperforms traditional hardware-dependent TRNGs. This is what Unitychain utilizes.
In our next article, we will discuss a variety of statistical tests that measure the quality of RNGs. These tests examine sample output sequences to see if they possess attributes that true random sequences are likely to exhibit. While the results are probabilistic and not definitive, the tests help detect defects in the output that attackers can use to gain information about the RNG and its future output.
References
[1] International Organization for Standardization. Test and analysis methods for random bit generators within ISO/IEC 19790 and ISO/IEC 15408
[2] True Random Number Generators
[3] Menezes, A., Van Oorschot, P., and Vanstone, S. (1996). Handbook of Applied Cryptography – Chapter 5
[4] Barker, E., Kelsey, J. (2015). Recommendation for Random Number Generation Using Deterministic Random Bit Generators
[5] Taylor, G., Cox, G. (2011). Behind Intel’s New Random-Number Generator
[6] IDQ. Quantis Random Number Generator
[7] Poole, I. RF Thermal Noise Tutorial
[9] Haahr, M. (1998). What’s this fuss about true randomness?
[10] NDT Resource Center. What is radioactive decay?
[11] Walker, J. (1996). HotBits: Genuine random numbers, generated by radioactive decay
[12] Guneysu, T., Wild, A., Schneider,T. (2015). Cryptanalytic Tools
[13] Maginnis, L. (2015). Now available from GNU Press, the NeuG True Random Number Generator
[14] Persits Software, Inc. One-way Hash Function
[15] Intel. (2014). Digital Random Number Generator (DRNG) Software Implementation Guide