ā” @ruvector/attention
/docs/ecosystem--integrations/ruvectorattention
Native Rust implementation of Flash Attention for transformer computations:
typescriptimport { FlashAttention } from '@ruvector/attention'; const attention = new FlashAttention({ blockSize: 32, // L1 cache optimized dimensions: 384, temperature: 1.0, useCPUOptimizations: true }); // Compute attention with O(N) memory instead of O(N²) const result = attention.attention(queries, keys, values); console.log(`Computed in ${result.computeTimeMs}ms`); // Benchmark against naive implementation const bench = attention.benchmark(512, 384, 5); console.log(`Speedup: ${bench.speedup}x`); console.log(`Memory reduction: ${bench.memoryReduction}x`);
Key Optimizations:
- Block-wise computation (fits L1 cache)
- 8x loop unrolling for dot products
- Top-K sparse attention (12% of keys)
- Two-stage screening for large key sets
- Online softmax for numerical stability