The heist is just a preview of how unprepared we are for AI-powered cybercrime.
In what may be the world’s first AI-powered heist, synthetic audio was used to imitate a chief executive’s voice and trick his subordinate into transferring over $240,000 into a secret account, The Wall Street Journal reported last week.
The company’s insurer, Euler Hermes, provided new details to the Washington Post on Wednesday but refused to name the company involved. The company’s managing director was called late one afternoon and his superior’s voice demanded the subordinate wire money to a Hungarian account to save on “late-payment fines”, sending the financial details over email while on the phone. A spokeswoman from Euler Hermes said, “The software was able to imitate the voice, and not only the voice: the tonality, the punctuation, the German accent.”
The thieves behind the voice would call back to demand a second payment, which raised the managing director’s suspicions and led to him calling his boss directly. In an email to Euler Hermes, the director said that the synthetic “‘Johannes’ was demanding to speak to me whilst I was still on the phone to the real Johannes!”
Over the past few years, deepfakes have been growing increasingly sophisticated. Online platforms fail to detect it, and companies struggle with how to handle the resulting fallout. The constant evolution of deepfakes means that simply detecting them will never be enough due to the nature of the modern internet, which guarantees it an audience by monetizing attention and fostering the production of viral content. This past June, convincing deepfakes of Mark Zuckerberg were published to Instagram and kept up shortly after Facebook refused to delete a manipulated video of Nancy Pelosi. There is still no clear consensus on how Facebook should’ve handled that situation or future ones.
All of this is exaggerated by the data monetization models of companies like Facebook and Google. Techno-sociologist Zeynep Tufecki warns that companies like Facebook rely on creating a “persuasion architecture” that “make us more pliable for ads [while] also organizing our political, personal and social information flows.” That core dynamic, combined with the constant evolution of deepfake technology, means this problem will likely get worse across all online platforms unless the companies behind them can be convinced to change their business models.
A Site Faking Jordan Peterson’s Voice Shuts Down After Peterson Decries Deepfakes
The maker of NotJordanPeterson.com, a Jordan Peterson Voice simulator that used AI to match his voice to any text inputs, took the website down, after the real Peterson freaked out.
The owner of NotJordanPeterson.com, a website for generating convincing clips of Jordan Peterson saying whatever you want using AI, shut down their creation this week after the real Peterson announced his displeasure and raised the possibility of legal action.
While the site was up, a 21-second recording greeted visitors to the site, saying in Peterson’s voice, “This is not Jordan Peterson. In fact, I’m a neural network designed to sound like Dr. Peterson.”
The clip implored the visitor to type some text into a box, that would be fed into a neural network trained on hours of Peterson’s actual voice, and generated into audio that sounded a lot like the real thing.
“The Deep Fake artists need to be stopped, using whatever legal means are necessary, as soon as possible.”
Several media outlets tested the program and published the results, making him pantomime feminist texts and vulgarities. Aside from the outrageous content, the results sounded a lot like the real thing.
It turns out that Peterson—a controversial Canadian professor known for his lectures defending the patriarchy and denying the existence of white privilege while decrying “postmodern neo-Marxists,”—did not find NotJordanPeterson.com flattering.
“Something very strange and disturbing happened to me this week,” Peterson wrote on his website. “If it was just relevant to me, it wouldn’t be that important (except perhaps to me), and I wouldn’t be writing this column about it. But it’s something that is likely more important and more ominous than we can even imagine.”
He then goes on to spend over 1,300 words decrying deepfakes—algorithmically-generated face-swapped videos, not fake audio but sometimes combined with fake voices—as a threat to politics, personal privacy, and veracity of evidence, and ends with a vague allusion toward making fake audio and video illegal. Or, possibly, suing creators.
“Wake up. The sanctity of your voice, and your image, is at serious risk,” he wrote. “It’s hard to imagine a more serious challenge to the sense of shared, reliable reality that keeps us linked together in relative peace. The Deep Fake artists need to be stopped, using whatever legal means are necessary, as soon as possible.”
After Peterson published this blog post, the NotJordanPeterson website shut down operations. “In light of Dr. Peterson’s response to the technology demonstrated by this site … and out of respect for Dr. Peterson, the functionality of the site will be disabled for the time being,” the site owner wrote.
The site owner told Motherboard that despite Peterson’s hinting at legal action in his blog, Peterson isn’t suing him, and he took NotJordanPeterson down after he saw his negative reaction. At the time of publication, Peterson has not responded to Motherboard’s request for comment.
It’s interesting to see a public figure like Peterson address deepfakes so directly. Plenty of other celebrities have been subject to the algorithmic face-swap and fake-audio treatment, including podcast host Joe Rogan, Nicholas Cage, and Elon Musk.
The AI models that generate fake video or audio rely on a huge amount of existing data to analyze and “learn” from. As it happens, refusing to shut the fuck up—as so many powerful men are wont to—is great training material for an AI algorithm to train a realistic model of someone on.
Before Peterson, the closest any powerful men have come to commenting on deepfakes as a phenomenon is Mark Zuckerberg, after an artist created a deepfake of him saying some insidious things. The media coverage of that satirical art project forced his platform to enact policies around handling fake video content.
But what Peterson is implying in this screed—that deepfakes, even as art, should be stopped, banned, and otherwise made illegal—is something legislators and AI ethicists have grappled with since the dawn of deepfakes two years ago. Many experts say that regulating deepfakes is a bad idea, because trying to do so could chill First Amendment rights and free speech online.
Peterson mentions Rep. Yvette Clark’s proposed DEEPFAKES Accountability Act as a potential solution to his embarrassment, and what he sees as the dangers of deepfakes as a whole. The Electronic Frontier Foundation notes that in that bill, “while there is an exception for parodies, satires, and entertainment—so long as a reasonable person would not mistake the ‘falsified material activity’ as authentic—the bill fails to specify who has the burden of proof, which could lead to a chilling effect for creators.”
As a big fan of free speech, Peterson of all people should be wary of suggesting we sue the pants off anyone who makes an unflattering mimicry of us online. If he really wants to do something to combat the real dangers of deepfakes, he could start with advocating for improving the legislation that does exist to get help for victims of revenge porn and non-consensual nudes. Those are the people who are really impacted by harassment and intimidation online.
There Is No Tech Solution to Deepfakes
Funding technological solutions to algorithmically-generated fake videos only puts a bandage on the deeper issues of consent and media literacy.
Every day, Google Alerts sends me an email rounding up all the recent articles that mention the keyword “deepfake.” The stories oscillate between suggesting deepfakes could trigger war and covering Hollywood’s latest quirky use of face-swapping technology. It’s a media whiplash that fits right in with the rest of 2018, but this coverage frequently misses what we should actually fear most: A culture where people are fooled en masse into believing something that isn’t real, reinforced by a video of something that never happened.
In the nine months since Motherboard found a guy going by the username “deepfakes” posting face-swapped, algorithmically-generated porn on Reddit, the rest of the world rushed straight for the literal nuclear option: if nerds on the internet can create fake videos of Gal Gadot having sex, then they can also create fake videos of Barack Obama, Donald Trump, and Kim Jong Un that will somehow start an international incident that leads to nuclear war. The political implications of fake videos are so potentially dangerous that the US government is funding research to automatically detect them.
In April, the US Defense Advanced Research Projects Agency (DARPA)’s Media Forensics department awarded nonprofit research group SRI International three contracts to find ways to automatically detect digital video manipulations. Researchers at the University at Albany received funding from DARPA to study deepfakes, and found that analyzing the blinks in videos could be one way to detect a deepfake from an unaltered video.
The worry that deepfakes could one day cause a nuclear war is a tantalizing worstcase scenario, but it skips right past current and pressing issues of consent, media literacy, bodily autonomy, and ownership of one’s own digital self. Those issues are not far-fetched or theoretical. They are exacerbated by deepfakes today. Will someone make a fake video of President Donald Trump declaring war against North Korea and get us all killed? Maybe. But the end of humanity is the most extreme end result, and it’s getting more attention than issues around respecting women’s bodies or assessing why the people creating deepfakes felt entitled to using their images without permission to begin with.
Until we grapple with these deeply entrenched societal issues, DARPA’s technical solutions are bandages at best, and there’s no guarantee that they will work anyway.
To make a believable deepfake, you need a dataset comprised of hundreds or thousands of photos of the person’s face you’re trying to overlay onto the video. The solution proposed by researchers at the University at Albany assumes that these photos, or “training datasets,” probably don’t include enough images of the person blinking. The end result is a fake video that might look convincing, but where people don’t blink naturally.
But even those researchers concede that this isn’t a totally reliable way to detect deepfakes. Siwei Lyu, a professor at the State University of New York at Albany, told MIT Technology Review that a quality deepfake could get around the eye-blinking detection tool by collecting images in the training dataset that show the person blinking.
Lyu told MIT Tech Review that his team has an even better technique for detecting deepfakes than blinks, but declined to say what it is. “I’d rather hold off at least for a little bit,” Lyu says. “We have a little advantage over the forgers right now, and we want to keep that advantage.”
This exemplifies the broader problem with trying to find a technical solution to the deepfakes problems: as soon as someone figures out a way to automatically detect a deepfake, someone will find a way around it. Platforms are finding out that it’s not as easy as block a keyword or ban a forum to combat fake porn showing up on their sites. Image host Gfycat, for example, thought it could use automated tools to detect algorithmically-generated videos on its platform and kick them off, but months after it announced this effort, we still found plenty of deepfakes hosted there.
The algorithms themselves will, by design, stay locked in a cat-and-mouse game of outdoing each other. When one solution for detection pops up—like the blinks—the other will learn from it, and match it. We’ve seen this happen with bots that are continually getting better at solving CAPTCHAs, forcing bot-detection systems to make the CAPTCHAs more difficult to solve, which the bots learn to beat, and so, on infinitely.
This doesn’t mean that we should throw our hands up and stop trying to find tech solutions to complex problems like deepfakes. It means that we need to recognize the limitations of these solutions, and to continue to educate people about technology and media, when to trust what they see, and when to be skeptical.
Florida senator Marco Rubio got it right when he talked about deepfakes at a Heritage Foundation forum last month: “I think the likely outcome would be that [news outlets] run the video, with a quotation at the end saying, by the way, we contacted senator so-and-so and they denied that that was them,” he said, talking about a hypothetical scenario where a deepfake video could spread as a news tip to journalists. “But the vast majority of the people watching that image on television are going to believe it.”
Fake news isn’t new, and malicious AI isn’t new, but the combination of the two, plus a widespread destabilized trust in media is only going to erode our sense of reality even more.
This isn’t paranoia. We saw a small glimpse of this with the spoof video that Conservative Review network CRTV made of Alexandria Ocasio-Cortez about a month after she won the Democratic congressional nomination in New York. CRTV cut together a video of Ocasio-Cortez giving an interview to make it seem like she bombed it. This wasn’t a deepfake by any means—it was rudimentary video editing. Still, more than one million people viewed it and some people fell for it. If you already thought poorly of Ocasio-Cortez, the video could reinforce your beliefs.
If people are gullible enough to believe in conspiracy theories—so much so that they show up at Trump rallies with signs and shirts supporting QAnon—we don’t need AI to fool anyone into believing anything.
The first headline we published for a deepfakes story, back in December, said: “AI-Assisted Fake Porn Is Here and We’re All Fucked.” We stand by that. We are still deeply fucked. Not because a deepfake is going to lead to nuclear war, but because we have so many problems we need to solve before we worry about advanced detection of AI-generated video.
We need to figure out how platforms will moderate users spreading malicious uses of AI, and revenge porn in general. We have to solve the problems around consent, and the connection between our bodily selves and our online selves. We need to face the fact that debunking a video as fake, even if it’s proven by DARPA, won’t change someone’s mind if they’re seeing what they already believe. If you want to see a video of Obama saying racist things into a camera, that’s what you’ll see—regardless of whether he blinks.
The Department of Defense can’t save us. Technology won’t save us. Being more critically-thinking humans might save us, but that’s a system that’s lot harder to debug than an AI algorithm.
This Program Makes It Even Easier to Make Deepfakes
Unlike previous deepfake methods, FSGAN can generate face swaps in real time, with zero training.
A new method for making deepfakes creates realistic face-swapped videos in real-time, no lengthy training needed.
Unlike previous approaches to making deepfakes—algorithmically-generated videos that make it seem like someone is doing or saying something they didn’t in real life—this method works on any two people without any specific training on their faces.
Most of the deepfakes that are shared online are created by feeding an algorithm hundreds or thousands of images of a specific face. The algorithm “trains” on that specific face so it can swap it into the target video. This can take hours or days even with access to expensive hardware, and even longer with consumer-grade PC components. A program that doesn’t need to be trained on each new target is another leap forward in making realistic deepfakes quicker and easier to create.
“Our method can work on any pair of people without specific training,” the researchers said in a video presenting their method. “Therefore, we can produce real-time results on unseen subjects.”
Researchers from Bar-Ilan University in Israel and the Open University of Israel posted their paper, “FSGAN: Subject Agnostic Face Swapping and Reenactment,” to the arXiv preprint server on Friday. On their project page, the researchers write that the open-source code is impending; in the paper, they say that they’re publishing the details of this program because to suppress it “would not stop their development,” but rather leave the public and policymakers in the dark about the potential misuse of these algorithms.
In a video demonstrating the FSGAN program, the researchers show how it can overcome hair and skin tone to swap faces seamlessly:
Similar to how the single-shot method developed by Samsung AI used landmarks on the source and target’s faces to map the Mona Lisa’s face to make her “speak,” FSGAN pinpoints facial landmarks, then aligns the source face to the target’s face.
The FSGAN program itself wasn’t cheap or easy to make: The researchers say in their paper that it required eight Nvidia Tesla v100 GPU processors—which can cost around $10,000 each for consumers—to train the generative adversarial network that the program then uses to create deepfakes in real-time.
On their project website, the researchers say that the project code will eventually be available on GitHub, a platform for open-source code development. Assuming the researchers make a pre-trained AI model available, it’s likely that using it at home won’t be as resource-intensive as it was to train it from scratch in a lab.
“Our method eliminates laborious, subject specific data collection and model training, making face swapping and reenactment accessible to non-experts,” the researchers wrote. “We feel strongly that it is of paramount importance to publish such technologies, in order to drive the development of technical counter-measures for detecting such forgeries, as well as compel lawmakers to set clear policies for addressing their implications.”