OpenAI’s Sora 2 lets users insert themselves into AI videos with sound

Sora social app launches with deepfake-style “cameos” and feed controls.

On Tuesday, OpenAI announced Sora 2, its second-generation video-synthesis AI model that can now generate videos in various styles with synchronized dialogue and sound effects, which is a first for the company. OpenAI also launched a new iOS social app that allows users to insert themselves into AI-generated videos through what OpenAI calls "cameos."

OpenAI showcased the new model in an AI-generated video that features a photorealistic version of OpenAI CEO Sam Altman talking to the camera in a slightly unnatural-sounding voice amid fantastical backdrops, like a competitive ride-on duck race and a glowing mushroom garden.

Regarding that voice, the new model can create what OpenAI calls "sophisticated background soundscapes, speech, and sound effects with a high degree of realism." In May, Google's Veo 3 became the first video-synthesis model from a major AI lab to generate synchronized audio as well as video. Just a few days ago, Alibaba released Wan 2.5, an open-weights video model that can generate audio as well. Now OpenAI has joined the audio party with Sora 2.

Read full article

Comments

FCC chairman leads “cruel” vote to take Wi-Fi access away from school kids

FCC Republicans kill funding for Wi-Fi hotspot lending and Wi-Fi on school buses.

The Federal Communications Commission yesterday voted to end funding for two programs designed to help schoolchildren and library patrons access the Internet.

FCC Chairman Brendan Carr claims that Biden-era orders to establish the programs exceeded the FCC's authority. The FCC voted 2-1 to kill the programs, with Republican Olivia Trusty voting with Carr and Democrat Anna Gomez dissenting.

In the previous administration, the FCC expanded the Universal Service Fund's E-Rate program in 2024 to let schools and libraries lend out Wi-Fi hotspots and services that could be used off-premises. The FCC separately decided in 2023 to let the E-Rate program pay for Wi-Fi service on school buses.

Read full article

Comments

Qualcomm’s PC chips are safe from ARM litigation (for now) following court ruling

Last month Qualcomm introduced its 2nd-gen chips for Windows laptops and other PCs, promising big gains in CPU, graphics, and AI performance. But for the past few years there’s been a bit of uncertainty over whether the company would be able to c…

Last month Qualcomm introduced its 2nd-gen chips for Windows laptops and other PCs, promising big gains in CPU, graphics, and AI performance. But for the past few years there’s been a bit of uncertainty over whether the company would be able to continue making these chips. That’s because chip designer Arm had sued Qualcomm over […]

The post Qualcomm’s PC chips are safe from ARM litigation (for now) following court ruling appeared first on Liliputing.

Judson Althoff: Microsoft stellt Nadella Co-Chef an die Seite

Strategischer Umbau bei Microsoft: Judson Althoff übernimmt das operative Geschäft, während sich Satya Nadella auf KI und technische Infrastruktur konzentriert. (Microsoft, Wirtschaft)

Strategischer Umbau bei Microsoft: Judson Althoff übernimmt das operative Geschäft, während sich Satya Nadella auf KI und technische Infrastruktur konzentriert. (Microsoft, Wirtschaft)

Can today’s AI video models accurately model how the real world works?

New research shows highly inconsistent performance on a variety of physical reasoning tasks.

Over the last few months, many AI boosters have been increasingly interested in generative video models and their seeming ability to show at least limited emergent knowledge of the physical properties of the real world. That kind of learning could underpin a robust version of a so-called "world model" that would represent a major breakthrough in generative AI's actual operant real-world capabilities.

Recently, Google's DeepMind Research tried to add some scientific rigor to how well video models can actually learn about the real world from their training data. In the bluntly titled paper "Video Models are Zero-shot Learners and Reasoners," the researchers used Google's Veo 3 model to generate thousands of videos designed to test its abilities across dozens of tasks related to perceiving, modeling, manipulating, and reasoning about the real world.

In the paper, the researchers boldly claim that Veo 3 "can solve a broad variety of tasks it wasn’t explicitly trained for" (that's the "zero-shot" part of the title) and that video models "are on a path to becoming unified, generalist vision foundation models." But digging into the actual results of those experiments, the researchers seem to be grading today's video models on a bit of a curve and assuming future progress will smooth out many of today's highly inconsistent results.

Read full article

Comments