Meta, Lama 4 and game AI comparisons are caught

admin2 weeks ago

0 5 3 minutes read

Meta, Lama 4 and game AI comparisons are caught

META on the weekend fell two newly Lama 4 Models: A smaller model named SCout, and the company, a medium-sized model, which the company claims to be eaten GPT-4O and Gemini 2.0 in “a wide variety of reported criteria”.

Maverick quickly maintained the number two in the AI Benchmark site Lmana, where people compared outputs from different systems and voted for the best. In Meta Press releaseThe company emphasized Maverick’s 1417 ELO score, which took it on Openai’s 4th and Geminin 2.5 Pro. (A higher ELO score means that the model wins more often in the arena on the way to head with competitors.)

Success seemed to be positioned as a serious challenge for closed models from Openai, anthropic, and the latest technology from Google. Later, the AI researchers earn the documents of Meta’s documents discovered something unusual.

In fine printing, Meta acknowledges that the version of Maverick in LMarena is not the same as the public. According to Meta’s own materials, “Experimental Chat Version” Maverick’s “optimized for speech” Techcrunch First reported.

“Meta’s interpretation of our policy, did not match what we expect from model providers” published Two days after the model was released, X. “META, ‘Lama-4-Maverick-03-26-SEYSEL’ must have made it clear that it is a customized model to optimize for human choice. As a result, we are updating our leadership table policies to strengthen our adherence to fair, recurrent assessments to prevent this confusion.”

Ashley Gabriel, a meta spokesman Ashley Gabriel, said in a statement sent by e -mail, “We are trying to try all kinds of special variants,” he said.

Gabriel, “” Llama-4-Maverick-03-26-Experimental ‘, LMarena’da well-performing conversation is an optimized version,” he said. “Now we have published our open source version and we will see how developers privatized Lama 4 for their use. We are excited to see what to build and look forward to the ongoing feedback.”

Although Meta did with Maverick is not clearly contrary to LMarena’s rules, the site shared concerns About Playing System And he took steps to “prevent excessive hanging and prevent comparison leakage”. When companies can send specially set versions of their models to test different versions of the public, comparison rankings such as LMarena become less meaningful as indicators of real world performance.

“This is the most respected general comparison because others suck,” says independent AI researcher Simon Willon Donkey. “When Lama 4 appeared, Gemini was second in the arena immediately after the Pro – This really impressed me and I kicked myself because I didn’t read the small pressure.”

Shortly after releasing Meta Maverick and Scout, the AI community started Talk about a rumor Meta, Llama 4 models, hiding real limitations, while performing better performance in criteria. Productive AI Vice President in Meta, Ahmad Al-Dahle, touched on the accusations In a shipment in x: “We have heard the allegations that we have been educated on the test sets – this is not just right and we will never be able to do so. Our best understanding is that the variable quality that people see needs to balance the applications.”

“A very confusing version in general.”

Some I also noticed This lama was released in a strange time. Saturday tends to be when Big AI news falls. After asking why a Lama 4 in Threads was released at the weekend, Mark Zuckerberg, Meta CEO answered: “That’s when he was ready.”

“It is a very confusing version in general,” says Willon Follow AI models closely and documents. “The model score we have there is completely worthless. I can’t even use the model in which they get a high score.”

Meta’s way to release Lama 4 was not exactly smooth. In accordance with To a final report from InformationThe company pushed the launch back due to the fact that the company returned many times due to its failure to meet the internal expectations. These expectations released an open -weighted model that produces a ton of buzz after Deepseek, an open -source AI attempt from China.

Ultimately, using an optimized model in LMarena makes developers in a difficult position. When choosing models like Lama 4 for applications, they naturally look at the criteria for guidance. However, as for Maverick, these criteria can reflect the abilities that are not really available in the models that the public can access.

As artificial intelligence development accelerates, this section shows how the criteria become war areas. In addition, even if the system means playing games, it also indicates how Meta is willing to be seen as AI leader.

Update, April 7: The story was updated to add meta’s statement.

admin2 weeks ago

0 5 3 minutes read