At MIT, however, they've done just that: researchers can make videos of people apparently saying things that never passed their lips -- in one case, even making someone sing a Japanese song when they don't know a word of the language. That's nothing new: what makes the difference here is that you can't see the join. Placed side by side with videos of the subjects actually saying the words in questions, it's impossible to spot the fakes.
As a technologist, I admire the work. All done by computer, of course: between two and four minutes of the subject speaking is analysed, and the positions of the mobile bits of the face for all the components of sounds are extracted. When the computer's given a new sound, it gets the bits of the face in positions corresponding to that sort of sound and superimposes them on the rest of the subject's physog. String enough of those together, and there's your video.
It has limitations -- you need the soundtrack to start with, and it only works when you have enough raw material of the subject talking to camera, newscaster style. But these aren't significant: the essential magic, that the computer can automatically extract enough of the subject to synthesise and animate an undetectable fake, will be usable in the end to many more situations. Clever stuff.
As a paranoid futurologist, I'm terrified. At first, it doesn't seem that big a deal. We're used to the idea that not only does the camera lie, it can concoct fantasies of Archerian proportions. We're computer-animation savvy, and can admire Hollywood's every digital lurch towards the days when actors really will be mindless, emotionless automata instead of having to pretend. But when we see a video of someone we know saying things we thought they'd never say, our reactions will be more complex.











