Taalas unveils Direct-to-Silicon ASIC for Llama 8B

2049.news · 20.02.2026, 09:23:57

Taalas unveils Direct-to-Silicon ASIC for Llama 8B


Taalas, a team formed by former Tenstorrent engineers, announced a chip that embeds a model directly into silicon without external memory.

Design and performance

The company integrated model weights and architecture as the chip itself, avoiding HBM and complex packaging to simplify inference hardware design.

Performance figures reported by Taalas include 17,000 tokens per second on Llama 3.1 8B, which they say outpaces current SOTA GPUs by an order of magnitude.

  • Production cost: the chip is claimed to be 20 times cheaper to produce than comparable GPU hardware.
  • Power consumption: the device reportedly uses 10 times less energy than those GPUs for the same workload.

Trade-offs and flexibility

Taalas acknowledges technical compromises: baked weights are quantized to 3–6 bit precision and the demo context is limited to 1,000 tokens input and the same for output.

Although the ASIC targets a specific model family, the design retains support for LoRA adapters and a variable context window, preserving some fine-tuning flexibility.

Roadmap

The available silicon implements Llama 8B (HC1). Taalas plans to release a mid-size chip with enhanced reasoning capabilities in spring and to demonstrate a frontier model on second-generation silicon by winter.

Practical notes

Taalas reports the hardware already exists and has been demonstrated; the team frames the product as more than investor slides, while warning about the inherent architectural constraints.

The combination of high throughput, reduced cost and lower power consumption could reshape edge and on-premise inference deployment for compatible model families.


Related posts

Kling releases Kling V 2.6 video model with audio
Qwen Image Edit 2511 Adds 3D Camera Control
Scroll down to load next post