New study accuses LM Arena of gaming its popular AI benchmark

The popular AI vibe test may not be as fair as it seems.

The rapid proliferation of AI chatbots has made it difficult to know which models are actually improving and which are falling behind. Traditional academic benchmarks only tell you so much, which has led many to lean on vibes-based analysis from LM Arena. However, a new study claims this popular AI ranking platform is rife with unfair practices, favoring large companies that just so happen to rank near the top of the index. The site's operators, however, say the study draws the wrong conclusions.

LM Arena was created in 2023 as a research project at UC Berkeley. The pitch is simple—users feed a prompt into two unidentified AI models in the "Chatbot Arena" and evaluate the outputs to vote on the one they like more. This data is aggregated in the LM Arena leaderboard that shows which models people like the most, which can help track improvements in AI models.

Companies are paying more attention to this ranking as the AI market heats up. Google noted when it released Gemini 2.5 Pro that the model debuted at the top of the LM Arena leaderboard, where it remains to this day. Meanwhile, DeepSeek's strong performance in the Chatbot Arena earlier this year helped to catapult it to the upper echelons of the LLM race.

Read full article

Comments

Don’t watermark your legal PDFs with purple dragons in suits

There’s a time and there’s a place. Federal court is neither.

Being a model citizen and a person of taste, you probably don't need this reminder, but some others do: Federal judges do not like it when lawyers electronically watermark every page of their legal PDFs with a gigantic image—purchased for $20 online—of a purple dragon wearing a suit and tie. Not even if your firm's name is "Dragon Lawyers."

Federal Magistrate Judge Ray Kent of the Western District of Michigan was unamused by a recent complaint (PDF) that prominently featured the aubergine wyrm.

"Each page of plaintiff’s complaint appears on an e-filing which is dominated by a large multi-colored cartoon dragon dressed in a suit," he wrote on April 28 (PDF). "Use of this dragon cartoon logo is not only distracting, it is juvenile and impertinent. The Court is not a cartoon."

Read full article

Comments

Epic Games Store moves to attract app developers with 0% developer share on the first $1 million in revenue

Last year Apple responded to a ruling in an antitrust case that required the company to allow app developers to collect payments from users without going through App Store by allowing developers to provide links to alternate payment methods in their ap…

Last year Apple responded to a ruling in an antitrust case that required the company to allow app developers to collect payments from users without going through App Store by allowing developers to provide links to alternate payment methods in their apps. But the system that Apple rolled out was pretty onerous. Users who followed those links […]

The post Epic Games Store moves to attract app developers with 0% developer share on the first $1 million in revenue appeared first on Liliputing.

New material may help us build Predator-style thermal vision specs

Films of IR-sensitive material only tens of nanometers thick are tough to make.

Military-grade infrared vision goggles use detectors made of mercury cadmium telluride, a semiconducting material that’s particularly sensitive to infrared radiation. Unfortunately, you need to keep detectors that use this material extremely cool—roughly at liquid nitrogen temperatures—for them to work. “Their cooling systems are very bulky and very heavy,” says Xinyuan Zhang, an MIT researcher and the lead author of a new study that looked for alternative IR-sensitive materials.

Added weight was a sacrifice the manufacturers of high-end night-vision systems were mostly willing to make because cooling-free alternatives offered much worse performance. To fix this, the MIT researchers developed a new ultra-thin material that can sense infrared radiation without any cooling and outperforms cooled detectors at the same time. And they want to use it to turn thermal vision goggles into thermal vision spectacles.

Staying cool

Cooling-free infrared detectors have been around since before World War II and mostly relied on pyroelectric materials like tourmaline that change their temperature upon absorbing infrared radiation. This temperature change, in turn, generates an electric current that can be measured to get a readout from the detector. Although these materials worked, they had their issues. Operating at room temperature caused a lot of random atomic motion in the pyroelectric material, which introduced electrical noise that made it difficult to detect faint infrared signals.

Read full article

Comments

Sen. Susan Collins blasts Trump for cuts to scientific research

New study shows budget cuts to research would significantly hurt the economy in the long run.

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.

Sen. Susan Collins (R-Maine) kicked off a Wednesday hearing criticizing ​​the Trump administration for cutting science funding, firing federal scientists, and triggering policy uncertainties that she said threaten to undermine the foundation for America’s global leadership.

Collins, chair of the Senate Appropriations Committee, said the administration’s abrupt cancellation of grants and laying off scientists has little or no justification. “These actions put our leadership in biomedical innovation at real risk and must be reversed,” she said.

Read full article

Comments

The 2025 Aston Martin Vantage: Achingly beautiful and thrilling to drive

It took time to get confident with the Vantage, and it did not like rain.

I'm not sure I can remember another car that took as long to get comfortable with as the 2025 Aston Martin Vantage. It's an achingly beautiful machine, from the outside at least. And by the week's end, I had my first glimpses into how it can deliver driver engagement with the best of them. By then I'd also gotten over my disappointment with the interior and, sadly yet again, had the "British cars with crap electronics" stereotype confirmed once more.

Painted the same striking shade of Podium Green as one of Formula 1's safety cars, the Vantage is one of the most eye-catching cars we've tested in a while. In person, that giant front grille dominates things, but all around the car you see the influence of the aerodynamicists and engineers who want to bend the airflow to their needs; cutting drag here, adding downforce there, feeding a cooling duct or venting waste heat. The way the wheel arches stretch out from the doors reminds me of the One-77 supercar from a few years ago, but it's all a thoroughly modern shape here.

That sculpted and vented hood contains the Vantage's 4.0 L twin-turbo V8. With 656 hp (490 kW) and 590 lb-ft (800 Nm), it's the most powerful Vantage to date, eclipsing the time the company bolted some Eaton superchargers to a 2-ton Chesterfield sofa on wheels. ZF's excellent 8HP automatic transmission sends that power and torque to the rear wheels, which arrived wearing Vantage-specific versions of Michelin's latest Pilot Sport 5 tires.

Read full article

Comments

FEVM FA-EX9 mini PC with AMD Strix Halo coming soon

Chinese mini PC maker FEVM has a habit of cramming high-performance hardware into compact designs. Last year the company introduced a Mac Studio clone with up to an Intel Core i9-14900KF processor and NVIDIA RTX 4090M graphics. And now FEVM is preparin…

Chinese mini PC maker FEVM has a habit of cramming high-performance hardware into compact designs. Last year the company introduced a Mac Studio clone with up to an Intel Core i9-14900KF processor and NVIDIA RTX 4090M graphics. And now FEVM is preparing to launch a mini PC with an AMD Ryzen AI Max+ 395 Strix […]

The post FEVM FA-EX9 mini PC with AMD Strix Halo coming soon appeared first on Liliputing.

Neanderthals invented their own bone weapon technology by 80,000 years ago

Neanderthals used sleek bone projectiles to hunt big game.

Archaeologists recently unearthed a bone projectile point someone dropped on a cave floor between 80,000 and 70,000 years ago—which, based on its location, means that said someone must have been a Neanderthal.

The point (or in paleoarchaeologist Liubov V. Golovanova and colleagues’ super-technical archaeological terms, “a unique pointy bone artifact”) is the oldest bone tip from a hunting weapon ever found in Europe. It’s also evidence that Neanderthals figured out how to shape bone into smooth, aerodynamic projectiles on their own, without needing to copy those upstart Homo sapiens. Along with the bone tools, jewelry, and even rope that archaeologists have found at other Neanderthal sites, the projectile is one more clue pointing to the fact that Neanderthals were actually pretty sharp.

Getting to the point

Archaeologists found the bone point in Mezmaiskaya Cave, high in the Caucasus Mountains (Mezmaiskaya is also home to the remains of three Neanderthals who lived around 90,000 years ago; anthropologists sequenced samples of their DNA in earlier studies). Herbivore teeth from the same layer of sediment dated to around 70,000 years old, and the bone point’s position near the bottom of that layer probably makes it closer to 80,000 or 70,000 years old. That makes it the oldest bone projectile point ever found in Europe (so far).

Read full article

Comments

Neanderthals invented their own bone weapon technology by 80,000 years ago

Neanderthals used sleek bone projectiles to hunt big game.

Archaeologists recently unearthed a bone projectile point someone dropped on a cave floor between 80,000 and 70,000 years ago—which, based on its location, means that said someone must have been a Neanderthal.

The point (or in paleoarchaeologist Liubov V. Golovanova and colleagues’ super-technical archaeological terms, “a unique pointy bone artifact”) is the oldest bone tip from a hunting weapon ever found in Europe. It’s also evidence that Neanderthals figured out how to shape bone into smooth, aerodynamic projectiles on their own, without needing to copy those upstart Homo sapiens. Along with the bone tools, jewelry, and even rope that archaeologists have found at other Neanderthal sites, the projectile is one more clue pointing to the fact that Neanderthals were actually pretty sharp.

Getting to the point

Archaeologists found the bone point in Mezmaiskaya Cave, high in the Caucasus Mountains (Mezmaiskaya is also home to the remains of three Neanderthals who lived around 90,000 years ago; anthropologists sequenced samples of their DNA in earlier studies). Herbivore teeth from the same layer of sediment dated to around 70,000 years old, and the bone point’s position near the bottom of that layer probably makes it closer to 80,000 or 70,000 years old. That makes it the oldest bone projectile point ever found in Europe (so far).

Read full article

Comments

Google is quietly testing ads in AI chatbots

Unsurprisingly, an advertising company is finding more places to run ads.

Google has built an enormously successful business around the idea of putting ads in search results. Its most recent quarterly results showed the company made more than $50 billion from search ads, but what happens if AI becomes the dominant form of finding information? Google is preparing for that possibility by testing chatbot ads, but you won't see them in Google's Gemini AI—at least not yet.

A report from Bloomberg describes how Google began working on a plan in 2024 to adapt AdSense ads to a chatbot experience. Usually, AdSense ads appear in search results and are scattered around websites. Google ran a small test of chatbot ads late last year, partnering with select AI startups, including AI search apps iAsk and Liner.

The testing must have gone well because Google is now allowing more chatbot makers to sign up for AdSense. "AdSense for Search is available for websites that want to show relevant ads in their conversational AI experiences," said a Google spokesperson.

Read full article

Comments