ruflo

⚔ @ruvector/attention

/docs/ecosystem--integrations/ruvectorattention

Native Rust implementation of Flash Attention for transformer computations:

typescript
import { FlashAttention } from '@ruvector/attention';

const attention = new FlashAttention({
  blockSize: 32,      // L1 cache optimized
  dimensions: 384,
  temperature: 1.0,
  useCPUOptimizations: true
});

// Compute attention with O(N) memory instead of O(N²)
const result = attention.attention(queries, keys, values);
console.log(`Computed in ${result.computeTimeMs}ms`);

// Benchmark against naive implementation
const bench = attention.benchmark(512, 384, 5);
console.log(`Speedup: ${bench.speedup}x`);
console.log(`Memory reduction: ${bench.memoryReduction}x`);

Key Optimizations:

  • Block-wise computation (fits L1 cache)
  • 8x loop unrolling for dot products
  • Top-K sparse attention (12% of keys)
  • Two-stage screening for large key sets
  • Online softmax for numerical stability