Journal Articles

Overload Journal #116 - August 2013 + Programming Topics
Browse in : All > Journals > Overload > o116 (6)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Hard Upper Limit on Memory Latency

Author: Martin Moene

Date: 05 August 2013 19:53:37 +01:00 or Mon, 05 August 2013 19:53:37 +01:00

Summary: Achieving very low latency is important. Sergey Ignatchenko asks how low can we go.

Body: 

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Bunny, and do not necessarily coincide with the opinions of the translator or the Overload editor. Please also keep in mind that translation difficulties from Lapine (like those described in [Loganberry04]) might have prevented providing an exact translation. In addition, both the translators and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

In [Bunny12], we discussed upper limits on the feasible possible memory size. It was found that even if every single atom of silicon implements one bit of memory, then implementing 2128 bytes will take 54 cubic kilometers of silicon, and to implement 2256 bytes there won’t be enough atoms in the observable universe. Now, we will proceed to analyse the upper limit of speed for huge amounts of memory (which are lower than the absolute limits mentioned above, but are still much higher than anything in use now). In other words, we’ll try to provide an answer to questions like, “Is it realistic to expect 2100-byte RAM to have access times which are typical for modern DDR3 SDRAM?”

Assumptions

We need to agree on the assumptions on which we will rely during our analysis. First, let’s assume that RAM is still made out of silicon, and that each bit of RAM requires at least one atom of silicon to implement it. This is an extremely generous assumption (current implementations use several orders of magnitude more atoms than that). Second, let’s rely on the assumption that nothing (including information) can possibly travel faster than speed of light in vacuum. This is a well-known consequence from invariance of speed of light and causality (many will name it a scientific fact rather than assumption or hypothesis, but we won’t argue about the terms here).

Analysis

Let’s consider memory which has B bytes. Let’s assume that each bit is implemented by one single atom of silicon. Then, this memory will take minimum a possible volume of

where NA is Avogadro’s number (6.02×1023 mol-1), and VmSi is molar volume of silicon (12×10-6 m3/mol). Therefore, for our 2100-byte (which is approximately equal to 1.27×1030 bytes) RAM it will take at least 200 cubic meters of silicon. Now let’s assume that whatever device which needs access to our RAM has dimensions which are negligible compared to the size of RAM silicon, so we can consider access to our RAM coming from a point (let’s name this point an ‘access point’). Now let’s arrange our RAM around the access point in a sphere (a sphere being the most optimal shape for our purposes). Such a sphere will have radius of

Therefore, for our 2100-byte RAM, a silicon sphere implementing it will have radius of at least 3.7 meters. Now, let’s find out how long it will take an electromagnetic wave to go through Rmin (back and forth, to account for the time it takes a request to go to the location where the data is stored, and data to come back):

where c is a speed of light (strictly speaking, we should take speed of electromagnetic waves in silicon, but as we're speaking about lower bounds, and the difference is relatively small for our purposes, we can safely use the speed of light in vacuum, or 3×108 m/s). Substituting, we find that for our example 2100-byte RAM, equals approximately 25 nanoseconds.

It means that (given our assumptions above) there is a hard limit of 25 nanoseconds on the minimum possible guaranteed latency of 2100-byte RAM. While the number may look low, we need to realize that modern typical RAM latencies are more than an order of magnitude lower than that: for example, typical latency for DDR3 SDRAM is 1–1.5 ns (approximately 20 times less than our theoretical limit for 2100-byte RAM).

Now we can ask another question – “What is the maximum memory size for which we can realistically expect latencies typical to modern DDR3 SDRAM?” Using formula (*), we can calculate it as approximately 290 bytes. That is, even if each bit is implemented by single atom of silicon, 290-byte RAM is the largest RAM which can possibly have latencies comparable to modern DDR3 SDRAM.

Generalization

If (as is currently the case) each bit is implemented with N atoms of silicon, our formula (*) will become

allowing the calculation of latency limits depending on the technology in use. For example, if for our 2 100-byte RAM every bit is represented with 1000 atoms of silicon (which is comparable – by the order of magnitude – to technologies used in modern RAM), the best possible latency will become 250 ns. As for the largest memory which can have latencies comparable to modern DDR3 SDRAM (given 1000 atoms per bit implementation), it is approximately 280 bits.

Further considerations

It should be mentioned that latency is not the only parameter which determines memory performance; another very important parameter is memory bandwidth (and memory bandwidth is not affected by our analysis). Also it should be mentioned that Tmin is in fact the minimum latency we can guarantee for all the bits stored (bits stored closer to the access point will have lower latencies than Tmin). Another practical consideration is caching – our analysis did not take caching into account, and for most common access patterns caching will improve average latencies greatly.

Conclusions

One interesting consequence which comes out of our analysis is that currently silicon technology has already got very close to the hard physical limits (as opposed to technological limits which dominated electronics for decades), and that even relativistic effects may come into play when trying to improve things further along the lines of Moore’s law. While this is a known thing for those dealing with bleeding-edge electronics, it is usually ignored by people in the software industry, where it is quite common to extrapolate Moore’s law to last for centuries. On the other hand, approaching such hard physical limits may signal a close of the usual every-year expansion of number of cores/RAM/HDD size/..., and such an end may have very significant effects on the future of the software industry. While it is unclear if it will be a Good Thing or Bad Thing for people in the industry, what is clear is that such an end would be quite a drastic change for the software development industry as a whole.

References

[Bunny12] “No Bugs” Bunny, ‘640K 2256 Bytes of Memory is More than Anyone Would Ever Need Get’, Overload #112, December 2012

[Loganberry04] David ‘Loganberry’, ‘Frithaes! – an Introduction to Colloquial Lapine!’, http://bitsnbobstones.watershipdown.org/lapine/overview.html

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Notes: 

More fields may be available via dynamicdata ..