BittWare GroqCard™ Accelerator

BittWare GroqCard™ Accelerator is a double-width PCIe form factor ML accelerator developed to integrate easily. The GroqWare™ suite implements a software-defined hardware approach, allowing easy deployment paths for PyTorch, TensorFlow, and ONNX-trained deep learning models. The BittWare GroqCard Accelerator features scalability with nine RealScale™ chip-to-chip connections that guarantee the deployment of multiple cards as efficiently as one. Furthermore, an internal software-defined network delivers predictable, repeatable performance with no run-to-run variations. The GroqCard has been qualified for use with the SMC AS-4124GS-TNR and Dell R750xa. The HPE DL385 Gen 10 Plus has been tested, but the full server interop exercise was not completed. In addition, liquid has also qualified the GroqCard in the chassis with up to 16 GroqCards. Using the GroqCard in other server models is at the user’s risk.

GroqChip™ Processor

The fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip reduces data movement for predictable low-latency performance, bottleneck-free. This standalone chip provides flexible integration into compute-intensive applications. The architecture is much simpler than a GPU and is designed with a software-first focus, making it easier to program and providing predictable performance with lower latency.

GroqWare™ Suite

GroqWare Suite is a comprehensive and versatile software stack designed to accelerate a variety of HPC and ML workloads. Composed of Groq™ Compiler, Groq API, and Utilities, the suite eases deployment implementations with an open-source driver/runtime and support for industry-standard AI/ML frameworks. GroqFlow™ Tool Chain (included in the GroqWare Suite) enables a single line of Pytorch or TensorFlow code to import and transform existing models through a fully automated toolchain to run on Groq hardware.

Features

Fully deterministic processor
Predictable and repeatable performance with no run-to-run variation
End-to-end on-chip protection
Improves uptime and reliability with error-correction code (ECC) protection throughout the entire GroqChip™ data path
230MB of on-die memory
Large globally sharable SRAM for high-bandwidth, low-latency access to model parameters without the need for external memory

9x RealScale chip-to-chip connectors
Near-linear multi-server and multi-rack scalability without the need for external switches
Up to 80TBs on-die memory bandwidth
Massive concurrency and data parallelism for bandwidth-sensitive applications
PCIe Gen4 x16 interface
Up to 31.5GB/s of bi-directional bandwidth in an industry-standard interface for fast device and network connections

Applications

Financial
Science and government
Generative AI

Industrial
Oil and gas

Specifications

Dual width, full height, 3/4 length PCI Express Gen4 x16 adapter form factor
Performance of up to 750 TOPs, 188 TFLOPs (INT8, FP16 at 900MHz)
Memory
- 230MB SRAM per chip
- Up to 80TB/s on-die memory bandwidth

Chip scaling up to 9x RealScale chip-to-chip connectors
Numerics
- INT8, INT16, INT32 and TruePoint™ technology
- MXM: FP32
- VXM: FP16, FP32
Power – Max: 375W; TDP: 275W; Typical: 240W

BittWare GroqCard™ Accelerator

GroqChip™ Processor

GroqWare™ Suite

Features

Applications

Specifications

Videos

GroqChip Overview

husseinkey ELECTRONICS®

COMPANY