(RTTNews) – As the artificial intelligence game intensifies, Meta Platforms (META) is working on a state-of-the-art multi-modal large language model named Chameleon.
According to the company’s research paper, the proposed LLM can single-handedly perform tasks previously performed by different models and could integrate information better than previous ones.
The paper noted that Chameleon uses an ‘early-fusion token-based mixed-modal’ architecture, under which the model learns from a combination of images, code, text, and other inputs. Additionally, it uses a mix of images, text and code tokens to create sequences.
“Chameleon’s unified token space allows it to seamlessly reason over and generate interleaved image and text sequences, without the need for modality-specific components,” the research paper stated.
The latest model is trained in two stages using a dataset of 4.4 trillion tokens of text, image-text combinations, and sequences of interwoven texts and images. The researchers trained two versions of Chameleon using 7 billion parameters and one with 34 billion parameters for more than 5 million hours on Nvidia A100 80GB GPUs.
Meanwhile, Meta’s competitors – OpenAI has launched GPT-4o and Microsoft (MSFT) has introduced MAI-1 model a few weeks ago.
The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.