DeepMind’s New Dreams Up Videos on Many Topics

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. In the last few years, the pace of progress
in machine learning research has been staggering. Neural network-based learning algorithms are
now able to look at an image and describe what’s seen in this image, or even better,
the other way around, generating images from a written description. You see here a set of results from BigGAN,
a state of the art image generation technique and marvel at the fact that all of these images
are indeed synthetic. The GAN part of this technique abbreviates
the term Generative Adversarial Network – this means a pair of neural networks that battle
each over time to master a task, for instance, to generate realistic looking images when
given a theme. These detailed images are great, but, what
about generating video? With the Dual Video Discriminator GAN, DVD-GAN
in short, DeepMind’s naming game is still as strong as ever, it is now possible to create
longer and higher-resolution videos than was previously possible, the exact numbers are
256×256 in terms of resolution and 48 frames, which is about 2 seconds. It also learned the concept of changes in
the camera view, zooming in on an object, and understands that if someone draws something
with a pen, the ink has to remain on the paper unchanged. The Dual Discriminator part of the name reveals
one of the key ideas of the paper. In a classical GAN, we have a discriminator
network that looks at the images of the generator network and critiques them. As a result, this discriminator learns to
tell fake and real images apart better, but at the same time, provides ample feedback
for the generator neural network so it can come up with better images. In this work, we have not one, but two discriminators,
one is called the spatial discriminator that looks at just one image and assesses how good
it is structurally, while the second, temporal discriminator critiques the quality of the
movement in these videos. This additional information provides better
teaching for the generator, which will in return, be able to generate better videos
for us. The paper contains all the details that you
could possibly want to learn about this algorithm, in fact, let me give you two that I found
to be particularly interesting: one, it does not get any additional information about where
the foreground and the background is, and is able to leverage the learning capacity
of these neural networks to learn these concepts by itself. And two, it does not generate the video frame
by frame sequentially, but it creates the entire video in one go. That’s wild. Now, 256×256 is not a particularly high video
resolution, but if you have been watching this series for a while, you are probably
already saying that two more papers down the line, and we may be watching HD videos that
are also longer than we have the patience to watch. And all this through the power of machine
learning research. For now, let’s applaud DeepMind for this
amazing paper, and I can’t wait to have a look at more results and see some followup
works on it. What a time to be alive! Thanks for watching and for your generous
support, and I’ll see you next time!

  1. As a video editor, I'm excited to imagine the impact this could have on video editing!
    As a Youtuber, I really hope people won't prefer machine-made videos 😅

  2. This scares me a bit tbh. Thinking about the possibility of making it easier to make artificial videos makes me wonder what could happen if this fell into the wrong hands. It’s gives people the ability to easily craft and spam videos with fake news using this artificial AI. Image what could happen if China artificially made videos of HK protesters acting violently as propaganda to spread to the main land. It’s not very exciting to me due to all of the possible problems this could create. I don’t want to sound like I’m crazy and that I’m fearing the worst, but it just makes me wonder. If AI can fake videos, faces, scripts, and now video, we could see fake video coverage appear on the internet in the near future.

  3. So cool. Really interesting failure modes too. Like in the top left at 2:06, it gets the falling down and bouncing up of the basketballs, but doesn't understand that each ball has to fall down before it can bounce up again.

  4. I fear we have already invented something, which can be compared to animal brain in complexity, yet with entirely different architecture and zero personality or abstract thinking.

  5. Imagine, like, a transformer or, as here, a GAN (or some fancy combination of those), which you can hand a movie script and it'll put out a plausible movie with all the fancy shots and fully synthesized audio and everything you might want

  6. In 15 years we could be watching synthetic media, and the distinction between real and fake would be almost totally dissolved. What a time to be alive!

  7. What a time to be alive! But what if the universe we believe we are in was not created sequentially either..? We are just experiencing it as such but in reality it was created all at once by a verrrrry big computer.

  8. Some of the scenes in these GAN video clips are kind of what I imagined the chaos realm in Event Horizon may appear like.

  9. It all looks like mangled versions of real clips. What's the point? And are we sure this isn't just some variation of complex overfitting? 🤔

  10. To us, those videos conjured out of thin air look like magic.
    in 5 years, it'll look like prehistoric technology.
    To think of all the things ai is doing, despite it being in its infancy.
    I doubt there has been a time where the world changes as fast as right now. And it's a great time to be alive and watch out all unfold.

    I'm eager for insights in science, genomic and neurology. But it'll have to wait… Probably just a few years.

  11. I fail to see how this is that impressive.. All of this stuff is the old "morphing" between 2 or more images I used to do in the early 90s, but its just a computer doing it. Nothing is "from scratch" its all data from existing images and videos and just being mashed up in a morph. And the results of these videos are a mess. Its incoherent garbage.

  12. What if this guy here does not even exist and all this channel is just an experiment to challenge a deep neural network to create better and more content?

  13. I came upon a realization. This tech is showing us the hidden states of life existing in between two life forms e.g. duck and dog. This hybrid state is often horrifying and as some people call it "nightmare fuel". If you ever have investigated the occult/paranormal you will hear of stories of entities or spirits. I almost feel that these inbetweener creations that DeepMind and other GANs are generating might be exposing these non-physical demonic abominations. Maybe I'm going too right-brain with this but you know. I'm waiting for this tech to be used in some futuristic horror movie. Somehow it's able to create life forms that are more horrifying than people can create.

  14. do you think it's possible that everything neural networks (and humans) do can be described either as pattern recognition or pattern generation? and that possibly we just need a sufficiently generalized solution for each (and improved hardware) to have human level or beyond general AI?

  15. soon youtube won't even need any content creators, and the whole censorship problem is sidestepped entirely.

  16. Two years down the line and you will have online generator of short clips based on uploaded images and short written description. I used one such site some years ago when coloring BW photographs. Just uploaded them, let the "magic happen", downloaded the low resolution images. Opened both in PS and imported the colors from the low resolution image onto the high resolution BW image to get a high resolution color image. The results were rather pleasing.

  17. It's cool and all, but to me it's going to cheapen the human experience and gloss it over with a haze of skepticism and distrust.

  18. Amazing. Now we get to understand why when we dream weird s happen, people has weird faces or blurry faces and many time stuff doesn't make sense. The brain is probably is generating stuff like this. We also probably have a gpt2 lying around in our brains as many times I hear myself in my mind narrating very concisely what's going to happen next in the dream and then I see it happening. Also: Baby Elon XD

  19. I am waiting for someone to use a GAN to generate software. I wonder if that's the first step towards a true strong AI; giving it the tools to improve its own code base?

  20. Makes you wonder … when anyone can generate tens of thousands of hours of high-quality entertainment, for free, at the click of a button, will the people who "need to be entertained" be biologically weeded out? Tons of people will be totally consumed by never-ending AI-generated TV series etc, and the people who are less interested in "entertainment" will thrive. My guess is that humans will behaviorally change when this tech is perfected

  21. Whoa, hang on… am I understanding correctly that this AI creates vids completely from scratch? As in: no base image or other reference? The people and situations in these short vids aren't real?

  22. MAN, this is so creepy! The generates imagery is a combination of being extremely familiar, things where you could actually describe what it’s meant to be, yet unrecognizable if you start to take more than just and glance at it. Amazing technology, but the stuff it creates creeps me out!

  23. The fact that it generates it all at once explains why the resolution is lower than other state of the art methods – it’s not producing 256×256 images, but 256x256x48. So the curse of dimensionality says a small increase in each dimension has huge computational costs.

  24. Imagine AI understood story writing and cinematography. There could always be another episode on netflix, because it would be generated as soon as you click next.

    If you were to feed the AI personal data aswell, it would specifically manufacture content for your preferences. Maybe this will happen in the next few decades?

  25. I hope nobody uses it for generating fake videos (along with a transcript provided by GPT-2). And definitely not fake porn.

  26. these videos are so creepy. This scares the hell out of me. Humans moving in freaking weird ways, babies that make demonic face deformations and satan expresions… What a time to be alive… I'll have nightmares tonight ;-;

  27. I remember a few years ago when AI Winter proponents, armchair critics where commenting on the current advances. “It’s just a linear function fitter”, “Nothing fundamentaly new that has not been worked on for decades”, “yes well, of course it can do x. But it will takes decades until it masters y, if ever …”
    you don’t see that anymore. And no, it is not just a passing fluke. What is happening now is absolutely crazy, has become crazier every couple of months, and will get a lot of crazier soon. Exciting, scary, revolutionary… what a time to be alive.

