Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

Irrelevant red herrings lead to “catastrophic” failure of logical inference.

For a while now, companies like OpenAI and Google have been touting advanced "reasoning" capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.

The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. "Current LLMs are not capable of genuine logical reasoning," the researchers hypothesize based on these results. "Instead, they attempt to replicate the reasoning steps observed in their training data."

Mix it up

In "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"—currently available as a pre-print paper—the six Apple researchers start with GSM8K's standardized set of over 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs' complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values—so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation.

Read full article

Comments

Expert witness used Copilot to make up fake damages, irking judge

Judge calls for a swift end to experts secretly using AI to sway cases.

A New York judge recently called out an expert witness for using Microsoft's Copilot chatbot to inaccurately estimate damages in a real estate dispute that partly depended on an accurate assessment of damages to win.

In an order Thursday, judge Jonathan Schopf warned that "due to the nature of the rapid evolution of artificial intelligence and its inherent reliability issues" that any use of AI should be disclosed before testimony or evidence is admitted in court. Admitting that the court "has no objective understanding as to how Copilot works," Schopf suggested that the legal system could be disrupted if experts started overly relying on chatbots en masse.

His warning came after an expert witness, Charles Ranson, dubiously used Copilot to cross-check calculations in a dispute over a $485,000 rental property in the Bahamas that had been included in a trust for a deceased man's son. The court was being asked to assess if the executrix and trustee—the deceased man's sister—breached her fiduciary duties by delaying the sale of the property while admittedly using it for personal vacations.

Read full article

Comments

Ward Christensen, BBS inventor and architect of our online age, dies at age 78

Christensen kick-started online culture by inspiring thousands of hobbyist communities.

On Friday, Ward Christensen, co-inventor of the computer bulletin board system (BBS), died at age 78 in Rolling Meadows, Illinois. Christensen, along with Randy Suess, created the first BBS in Chicago in 1978, leading to an important cultural era of digital community-building that presaged much of our online world today.

Friends and associates remember Christensen as humble and unassuming, a quiet innovator who never sought the spotlight for his groundbreaking work. Despite creating one of the foundational technologies of the digital age, Christensen maintained a low profile throughout his life, content with his long-standing career at IBM and showing no bitterness or sense of missed opportunity as the Internet age dawned.

"Ward was the quietest, pleasantest, gentlest dude," said BBS: The Documentary creator Jason Scott in a conversation with Ars Technica. Scott documented Christensen's work extensively in a 2002 interview for that project. "He was exactly like he looks in his pictures," he said, "like a groundskeeper who quietly tends the yard."

Read full article

Comments

Rightsholders Seek U.S. Help to Collect $1.4 Million Piracy Judgment Against Cloudflare

Two book authors are asking a U.S. federal court to enforce a $1.4 million piracy judgment against Cloudflare. A Moldovan court previously ruled that Cloudflare is liable as it failed to block access to a pirated book offered though one of its customers. The company has yet to pay these damages. The case can have broad implications, but it’s uncertain if the U.S. court will indeed validate the Moldovan order.

From: TF, for the latest news on copyright battles, piracy and more.

cloudflare logoPopular Internet infrastructure service Cloudflare has come under pressure from copyright holders in recent years.

The company offers its services to millions of customers including multinationals, governments, but also some of the world’s leading pirate sites.

These pirate sites have proven to be quite a headache for Cloudflare. For example, rightsholders continue to complain that the company helps pirates conceal their hosting locations and identities, as was made clear again in recent submissions to the European Commission.

In some countries, rightsholders are using the legal system to address their gripes. This resulted in site blocking orders in Japan, Italy, and Germany. At the same time, Cloudflare has also been sued directly for its association with pirate sites.

Cloudflare’s Moldovan Piracy Lawsuit

In Moldova, for example, book authors Eugeniu and Radu Turuta, sued Cloudflare and several of its customers, including the anonymous operators of file hosting platform doku.pub. The authors accused these services of sharing pirated copies of their book “5000 Integrated Circuits Power Audio Amplifiers”.

5000

When the authors sent Cloudflare a takedown notice, the company responded that it doesn’t host any content for doku.pub, clarifying that it operates as a ‘pass-though’ CDN provider. Instead of taking any direct action, Cloudflare said that it would inform its customers about the allegations.

This response is typical for Cloudflare. The company generally forwards DMCA takedown notices and only takes direct action if it permanently hosts the allegedly infringing material.

However, in the Moldovan case, the authors argued that Cloudflare had the technical capacity to block access to the infringing content on doku.pub, but failed to do so despite being notified. This inaction, they claimed, made Cloudflare complicit and directly contributed to their financial losses.

In 2022, the Chisinau Court dismissed the authors’ claim. The court held that Cloudflare, as a CDN provider, merely acted as an intermediary and was not directly involved in hosting or distributing the pirated content.

This ruling was appealed, with the higher court taking a different stance, emphasizing the responsibility of CDNs to actively combat copyright infringement within their networks.

$1.4 Million Piracy Judgment

The Court of Appeal’s ruling puts Cloudflare on equal footing with other Internet providers. Essentially, it concluded that it doesn’t matter whether Cloudflare merely passes on traffic or if it hosts content as well.

“The company Cloudflare Inc does not provide data transmission services over the internet to the https://doku.pub website, but it does provide data transmission services between the https://doku.pub website and the end users, and this fact is confirmed by Cloudflare Inc., which claims to be a pass-through network.

“[T]he court finds that, by reproducing the content of the works (books) in dispute, without the consent of the authors, there has been a violation of the patrimonial copyright of the plaintiffs,” the Court of Appeals added.

Based on these conclusions, the Court held that Cloudflare is liable for copyright infringement, ordering the company to pay €1.27 million (approximately $1.4 million) in damages to the authors.

Authors Ask U.S. Court to Enforce Judgment

The judgment was undoubtedly a major setback for Cloudflare. In particular, it conflicts with the different types of safe harbors for Internet providers in the United States, where pass-through services are treated differently from hosting platforms.

Despite the court of appeal’s order, the issue isn’t fully resolved yet. According to the authors, Cloudflare has yet to pay any damages. To ensure that this will happen, they took the matter to the U.S. legal system.

At a U.S. federal court in California, the rightsholders point out that the judgment from the Chisinau court of appeal is final, adding that Cloudflare has yet to pay. They therefore ask the court to recognize this foreign order as a valid judgment, so the damages can be collected.

validate

The case was initially filed at the Superior Court of the State of California but was transferred to the federal court this summer, where it’s still pending. Cloudflare has yet to file a detailed response, but it will likely point to the safe harbor protection U.S. copyright law provides.

In theory, the outcome of this case could have far-reaching implications for copyright holders and CDNs worldwide. If the US court recognizes and enforces the Moldovan judgment, it could inspire other copyright holders to pursue similar legal actions against CDNs.

However, enforcing foreign judgments in the U.S. is complex and certainly not guaranteed. The U.S. court will consider various factors, including whether the Moldovan court had jurisdiction and whether the judgment violates U.S. public policy.

In addition to the liability and jurisdiction questions, Cloudflare will likely protest the scale of the damages award as well. In the United States, the maximum statutory damages for a single copyrighted work is $150,000, which is a fraction of the Moldovan award.

A copy of the legal paperwork, which is currently pending at the U.S. District Court for the Northern District of California, is available here (pdf)

From: TF, for the latest news on copyright battles, piracy and more.

The Internet Archive and its 916 billion saved web pages are back online

Wayback Machine back in read-only mode after DDoS, may need further maintenance.

The Internet Archive has brought its Wayback Machine back online "in a provisional, read-only manner" as it continues to recover from attacks that took the site down last week, founder Brewster Kahle said in a post last night. The archive.org home page points users to the now-functional Wayback Machine but notes that other Internet Archive services are temporarily offline.

Kahle said it was "safe to resume" the Wayback Machine's operations, but that it "might need further maintenance, in which case it will be suspended again." The Wayback Machine's "Save Page Now" feature that lets users capture a webpage manually is currently unavailable. The related openlibrary.org book-preservation website was still offline today.

Founded in 1996, the nonprofit Internet Archive crawls the web to preserve pages that are publicly available and has captured 916 billion web pages so far. It has a staff of 150 people and also provides free access to many videos, audio files, and books (though it was recently forced to delete 500,000 books after losing a copyright case).

Read full article

Comments

Routine dental X-rays are not backed by evidence—experts want it to stop

The actual recommendations might surprise you—along with the state of modern dentistry.

Has your dentist ever told you that it's recommended to get routine dental X-rays every year? My (former) dentist's office did this year—in writing, even. And they claimed that the recommendation came from the American Dental Association.

It's a common refrain from dentists, but it's false. The American Dental Association does not recommend annual routine X-rays. And this is not new; it's been that way for well over a decade.

The association's guidelines from 2012 recommended that adults who don't have an increased risk of dental caries (myself included) need only bitewing X-rays of the back teeth every two to three years. Even people with a higher risk of caries can go as long as 18 months between bitewings. The guidelines also note that X-rays should not be preemptively used to look for problems: "Radiographic screening for the purpose of detecting disease before clinical examination should not be performed," the guidelines read. In other words, dentists are supposed to examine your teeth before they take any X-rays.

Read full article

Comments

AI chatbots can read and write invisible text, creating an ideal covert channel

A quirk in the Unicode standard harbors an ideal steganographic code channel.

What if there was a way to sneak malicious instructions into Claude, Copilot, or other top-name AI chatbots and get confidential data out of them by using characters large language models can recognize and their human users can’t? As it turns out, there was—and in some cases still is.

The invisible characters, the result of a quirk in the Unicode text encoding standard, create an ideal covert channel that can make it easier for attackers to conceal malicious payloads fed into an LLM. The hidden text can similarly obfuscate the exfiltration of passwords, financial information, or other secrets out of the same AI-powered bots. Because the hidden text can be combined with normal text, users can unwittingly paste it into prompts. The secret content can also be appended to visible text in chatbot output.

The result is a steganographic framework built into the most widely used text encoding channel.

Read full article

Comments

People think they already know everything they need to make decisions

When given partial info, most people felt confident they knew all they needed to.

The world is full of people who have excessive confidence in their own abilities. This is famously described as the Dunning-Kruger effect, which describes how people who lack expertise in something will necessarily lack the knowledge needed to recognize their own limits. Now, a different set of researchers has come out with what might be viewed as a corollary to Dunning-Kruger: People have a strong tendency to believe that they always have enough data to make an informed decision—regardless of what information they actually have.

The work, done by Hunter Gehlbach, Carly Robinson, and Angus Fletcher, is based on an experiment in which they intentionally gave people only partial, biased information, finding that people never seemed to consider they might only have a partial picture. "Because people assume they have adequate information, they enter judgment and decision-making processes with less humility and more confidence than they might if they were worrying whether they knew the whole story or not," they write. The good news? When given the full picture, most people are willing to change their opinions.

Ignorant but confident

The basic setup of the experiment is very straightforward. The researchers developed a scenario where an ongoing water shortage was forcing a school district to consider closing one of its schools and merging its students into another existing school. They then wrote an article that described the situation and contained seven different pieces of information: three that favored merging, three that disfavored it, and one that was neutral. Just over half of the control group that read the full article favored merging the two schools.

Read full article

Comments

Smart gardening firm’s shutdown a reminder of Internet of Things’ fickle nature

Company closing “due to a number of challenges with this business.”

AeroGarden, which sells Wi-Fi-connected indoor gardening systems, is going out of business on January 1. While Scotts Miracle-Gro has continued selling AeroGarden products after announcing the impending shutdown, the future of the devices' companion app is uncertain.

AeroGarden systems use hydroponics and LED lights to grow indoor gardens without requiring sunlight or soil. The smart gardening system arrived in 2006, and Scotts Miracle-Gro took over complete ownership in 2020. Some AeroGardens work with the iOS and Android apps that connect to the gardens via Wi-Fi and tell users when their plants need water or nutrients. AeroGarden also marketed the app as a way for users to easily monitor multiple AeroGardens and control the amount of light, water, and nutrients they should receive. The app offers gardening tips and can access AeroGarden customer service representatives and AeroGarden communities on Facebook and other social media outlets.

Regarding the reasoning for the company's closure, AeroGarden's FAQ page only states:

Read full article

Comments

Rebellion brews underground in Silo S2 trailer

“What if everything you know to be true was just one big lie?”

Rebecca Ferguson returns as Juliette in the second season of Apple TV's Silo.

Apple TV's dystopian sc-fi drama Silo, based on the trilogy by novelist Hugh Howey, was one of the more refreshing surprises on streaming television in 2023: a twist-filled combination of political thriller and police procedural set in a post-apocalyptic world. We included it in our year-end TV roundup, calling the series "one of the more intriguing shows of the year." The official trailer recently dropped for S2, and it looks like we can expect another suspenseful season full of surprising revelations.

(Spoilers for S1 below.)

As we wrote in last year's roundup, Silo is set in a self-sustaining underground city inhabited by a community whose recorded history only goes back 140 years, generations after the silo was built by the founders. Outside is a toxic hellscape that is only visible on big screens in the silo's topmost level. Inside, 10,000 people live together under a pact: Anyone who says they want to "go out" is immediately granted that wish—cast outside in an environment suit on a one-way trip to clean the cameras. But those who make that choice inevitably die soon after because of the toxic environment.

Read full article

Comments