AMD releases Llama-135m, the first small-language AI model

Thursday, September 19, 2024

Recently, AMD officially launched its first “small language model”, AMD-Llama-135m, on the Huggingface platform, which has attracted widespread attention in the industry for its unique speculative decoding capabilities.

The AMD-Llama-135m model has 670 billion tokens and is licensed under the Apache 2.0 open source license, providing users with more flexibility and freedom. According to AMD, the model's main “speculative decoding” capability, the basic principle of this feature is to use a small draft model to generate a set of candidate tokens, and then a larger target model to verify these candidate tokens. This approach not only ensures the accuracy and reliability of the generated tokens, but also allows each forward pass to generate multiple tokens, thus significantly improving efficiency. The AMD-Llama-135m is also significantly optimized in terms of RAM footprint compared to traditional AI models. Thanks to the speculative decoding technique, the model is able to reduce the RAM footprint without compromising performance, enabling more efficient utilization of computational and storage resources.

The release of AMD-Llama-135m marks an important step for AMD in the field of AI. In the future, AMD will continue to dedicate itself to the research, development and innovation of AI technology to provide smarter, more efficient and reliable solutions for users around the world.

Leave your comment