コンテンツにスキップ

英文维基 | 中文维基 | 日文维基 | 草榴社区

利用者:ぺお/sandbox

実行時メモリオーダリング

[編集]

SMPシステムの場合

[編集]

SMPにはいくつかのメモリ一貫性モデルが存在する:

  • 逐次一貫性(すべての読み込みと書き込みは順番通りに実行される)
  • 緩い一貫性(いくつかのリオーダリングが許される)
    • 読み込みが他の読み込みの後に並び替えられる(キャッシュコヒーレンシやスケーラビリティのため)
    • 読み込みが書き込みの後に並び替えられる
    • 書き込みが他の書き込みの後に並び替えられる
    • 書き込みが読み込みの後に並び替えられる
  • 弱い一貫性(明示的なメモリバリアによる制限を除けば、読み込みと書き込みの任意の並び替えが可能)


いくつかのCPUでは、

  • 読み書き命令があると不可分操作がリオーダするかもしれない。
  • 一貫性のない命令キャッシュパイプラインがありえる。 その場合、命令キャッシュのフラッシュ/再読み込みといった特殊な命令なしには自己書き換えコードが動かない。
  • 依存関係のあるデータ読み込みもリオーダするかもしれない(Alpha固有)。プロセッサがあるデータへのポインタを読み込んだときでも、そのポインタが指す正しいデータではなく、すでにキャッシュされてまだ無効になっていない古いデータをフェッチするかもしれない。この緩いリオーダリングを許すことで、ハードウェアはシンプルで高速になるが、読み込み側と書き込み側の両方でメモリバリアが必要になる。[1]
Memory ordering in some architectures[2][3]
Type Alpha ARMv7 PA-RISC POWER SPARC RMO SPARC PSO SPARC TSO x86 x86 oostore AMD64 IA-64 zSeries
Loads reordered after loads Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Loads reordered after stores Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Stores reordered after stores Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Stores reordered after loads Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Atomic reordered with loads Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Atomic reordered with stores Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad
Dependent loads reordered Template:Bad
Incoherent instruction cache pipeline Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad Template:Bad

Some older x86 and AMD systems have weaker memory ordering[4]

SPARC memory ordering modes:

  • SPARC TSO = total store order (default)
  • SPARC RMO = relaxed-memory order (not supported on recent CPUs)
  • SPARC PSO = partial store order (not supported on recent CPUs)

Memory barrier types

[編集]

Compiler memory barrier

[編集]

These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.

  • The GNU inline assembler statement
asm volatile("" ::: "memory");

or even

__asm__ __volatile__ ("" ::: "memory");

forbids GCC compiler to reorder read and write commands around it.[5]

__memory_barrier()

intrinsics.[6][7]

_ReadWriteBarrier()

Hardware memory barrier

[編集]

Many architectures with SMP support have special hardware instruction for flushing reads and writes.

lfence (asm), void_mm_lfence(void)
sfence (asm), void_mm_sfence(void)[9]
mfence (asm), void_mm_mfence(void)[10]
sync (asm)
sync (asm)
mf (asm)
dcs (asm)
dmb (asm)
dsb (asm)
isb (asm)

Compiler support for hardware memory barriers

[編集]

Some compilers support builtins that emit hardware memory barrier instructions:

See also

[編集]

References

[編集]
  1. ^ Reordering on an Alpha processor by Kourosh Gharachorloo
  2. ^ Memory Ordering in Modern Microprocessors by Paul McKenney
  3. ^ Memory Barriers: a Hardware View for Software Hackers, Figure 5 on Page 16
  4. ^ Table 1. Summary of Memory Ordering, from "Memory Ordering in Modern Microprocessors, Part I"
  5. ^ GCC compiler-gcc.h
  6. ^ ECC compiler-intel.h
  7. ^ Intel(R) C++ Compiler Intrinsics Reference

    Creates a barrier across which the compiler will not schedule any data access instruction. The compiler may allocate local data in registers across a memory barrier, but not global data.

  8. ^ Visual C++ Language Reference _ReadWriteBarrier
  9. ^ SFENCE — Store Fence
  10. ^ MFENCE — Memory Fence
  11. ^ Data Memory Barrier, Data Synchronization Barrier, and Instruction Synchronization Barrier.
  12. ^ Atomic Builtins
  13. ^ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
  14. ^ MemoryBarrier macro
  15. ^ Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fence [1]

Further reading

[編集]