Memlat is a tiny benchmark program to measure cache and memory access latencies. I wrote it based on the idea from the article "What Every Programmer Should Know About Memory" by Ulrich Drepper. If you find any bug, or have comment, you can email me at minlee (at)



[root@piquet memlat]# ./memlat
Usage: ./memlat size(KB) duration(second) random(0|1)

stride size is 64, random(0|1) is do_shuffle switch, so give 0 for sequential, 1 for random. each run is 1sec, and the final report shows min,max,average for both cycles,performance(unit:Million elements traversed).
For accuracy, make sure stride size (current 64) == your cache line size. Also set affinity to one cpu without running any other process. Note that 2 or 3 cycles are typically measurable minimum due to size of core loop, so for L1 cache you'll see them even if it has actually less latency.
[root@piquet memlat]#
[root@piquet memlat]#
[root@piquet memlat]#
[root@piquet memlat]# ./memlat 512 10 1
64(STRIDE size) * 8192(# of stride) =        512 KB
cycle: 2994414795, count:189324540, so, 15.816306 cycles/memref
cycle: 2992305429, count:189193313, so, 15.816127 cycles/memref
cycle: 2992371714, count:189202611, so, 15.815700 cycles/memref
cycle: 2992375665, count:189144438, so, 15.820585 cycles/memref
cycle: 2992366656, count:189190240, so, 15.816707 cycles/memref
cycle: 2992360986, count:189201527, so, 15.815734 cycles/memref
cycle: 2992376781, count:189206673, so, 15.815387 cycles/memref
cycle: 2992372398, count:189206898, so, 15.815345 cycles/memref
cycle: 2992380669, count:189208316, so, 15.815270 cycles/memref
cycle: 2992382568, count:189207564, so, 15.815343 cycles/memref

summary: cycle 15.815 15.821 15.816 perf 189.144 189.324 189.208
[root@piquet memlat]#


Give various working set size, get cycle numbers, and plot graph. I could get ones similar to above graph.


memlat-0.1.tgz for x86_64.

last updated : Jan 2012