From Hayao Miyazaki’s Spirited Away to Satoshi Kon’s Paprika, Japanese anime has made it okay for adults everywhere to enjoy cartoons again. Now, a team of Tsinghua University and Cardiff University researchers have introduced CartoonGAN — an AI-powered technology that simulates the styles of Japanese anime maestri from snapshots of real world scenery.
Anime has distinct aesthetics, and traditional manual transformation techniques for real world scenes require considerable expertise and expense, as artists must painstakingly draw lines and shade colours by hand to create high-quality scene reproductions.
reviews and analysis! Click here :
Meanwhile, existing transformation methods based on non-photorealistic rendering (NPR) or convolutional neural networks (CNN) are also either time-consuming or impractical as they require paired images for model training. Moreover, these methods do not produce satisfactory cartoonization results, as (1) different cartoon styles have unique characteristics involving high-level simplification and abstraction, and (2) cartoon images tend to have clear edges, smooth color shading and relatively simple textures, which present challenges for the texture-descriptor-based loss functions used in existing methods.
CartoonGAN is a GAN framework composed of two CNNs which enables style translation between two unpaired datasets: a Generator for mapping input images to the cartoon manifold; and a Discriminator for judging whether the image is from the target manifold or synthetic. Residual blocks are introduced to simplify the training process.
To avoid slow convergence and obtain high-quality stylization, dedicated semantic content loss and edge-promoting adversarial loss functions and an initialization phase are integrated into this cartoonization architecture. The content loss is defined using the ℓ1 sparse regularization (instead of the ℓ2 norm) of VGG (Visual Geometry Group) feature maps between the input photo and the generated cartoon image.
An example of a Makoto Shinkai stylization shows the importance of each component in CartoonGAN: The initialization phase performs a fast convergence to reconstruct the target manifold; sparse regularization copes with style differences between cartoon images and real-world photos while retaining original contents, and the adversarial loss function creates the clear edges.
Changing components in the CartoonGAN loss function: (a) input photo, (b) without initialization process, © using ℓ2 regularization for content loss, (d) removing edge loss, (e) CartoonGAN result.
Both real-world photos and cartoon images are used for model training, while the test data contains only real-world pictures. All training images are resized to 256×256 pixels. Researchers downloaded 6,153 real-world pictures from Flickr, 5,402 of which were for training and the rest for testing. A total of 14,704 cartoon images from popular anime artists Makoto Shinkai, Mamoru Hosoda, Hayao Miyazaki, and Satoshi Kon were used for model training.
Compared to recently proposed CNN-based image transformation frameworks CycleGAN or Gatys et al’s Neural Style Transfer (NST) method, CartoonGAN more successfully reproduces clear edges and smooth shading while accurately retaining the input photo’s original content.
Because NST only uses a single stylization reference image for model training, it cannot deeply learn a particular anime style, especially when there are significant content differences between the stylization reference image and the input images. Improvements can be seen when more training data is introduced. However, even if a large collection of training data is used, stylization inconsistencies may appear between regions within the image.
Although the upgraded CycleGAN+Lidentity model’s identity loss function performs better on input photo content preservation, it is still unable to reproduce Makoto Shinkai or Hayao Miyazaki’s artistic styles as accurately as CartoonGAN does. Moreover, CartoonGAN’s processing time of 1617.69 s is 33 percent faster than CycleGAN and and 50 percent faster than CycleGAN plus Lidentity.
Comparison of CartoonGAN with other image transformation frameworks for Makoto Shinkai (top) and Hayao Miyazaki (bottom) styles.
The paper’s authors say they will focus on improving cartoon portrait stylization for human faces in their future research, while exploring applications for other image synthesis tasks with designed loss functions. The team also plans to extend the CartoonGan method to video stylization by adding sequential constraints to the training process.
The paper CartoonGAN: Generative Adversarial Networks for Photo Cartoonization was accepted by last month’s CVPR 2018 (Conference on Computer Vision and Pattern Recognition) in Salt Lake City, USA.
This article is about a side project by
Mary Kate MacPherson
. We like to do side projects like AI for music videos, and the party button. The idea was to try out an adversarial neural network that generates new anime faces after training on a set of human-created anime faces. We started from an online example. The original code for that work and the anime faces dataset can be found here.
Running training on the original code from here.
As you can see in the image above, the Generative Adversarial Network (GAN) iteratively improves over time at generating realistic images. A GAN contains a generator that makes new images, and a discriminator that gives the generator constructive feedback. Both the generator and discriminator are Convolutional Neural Networks (CNNs).
Here is an example of the training data:
The following image is a nice example of the final output.
Anime faces generated entirely using artificial intelligence.
In the image above you can see that the bottom right character has a bow on her head, and the eyes and mouth are usually in the correct location on the face. Some generated outputs are kind of mushed as we see in the first row second from the left. The images look good in general, although they are not very big (64 by 64 pixels for 3 color channels).
This was a cool result, but there are some known limitations to CNNs. For example, Capsnets are better at understanding the placement of things in a picture than CNNs
CNNs do not mind “seeing” faces when everything is in the picture, but in weird places (source here and further reading here). This is a problem capsule networks aim to address.
Now, being an overachiever,
Mary Kate MacPherson
decided to try this task using a GAN with a capsule network as the discriminator instead of a CNN. The dataset was moved over from the original code base to the capsule network code base. The capsule network code came from here, and the generator from the capsule network code was replaced by the generator from the anime project, so as to make anime girls rather than handwritten numbers.