MHXX GAN: Image Generation with Neural Networks

22 minute read

Intro

Working in preproduction and character concept art I noticed that a lot of the assets would not be used. The creative process takes a long time and meeting clients expectations can get complicated if there is more than one person making the decision.
The goal of this project was to see if generative adversarial networks could cut production time and costs as well as generate good base images for future character concepts or armor models.
I started with anime faces to learn about different types of GANs and then moved to my own dataset. The popular game Monster Hunter.

Can we create new art or armor design from a dataset we already have?

Computer Requirements vs Google Colab

To do a basic GAN or DCGAN with low dimensions doesn’t actually require a lot of computing power.
However, if you want to use Nvida’s Style GAN, Style GAN 2 or googles Big GAN you’ll need a dedicated GPU
Below is what’s needed for computing requirements
- Nvidia gpu 11gb or greater
- Cuda 10.0
- cuDNN 7.6.5
- tensorRT 6
You could possibly use google colab, however training a style GAN requires converting the data to tfrecords. Which I will talk a little bit more later.
I tried doing this multiple times however at the 12 hour mark my tfrecords would time out. I would only recommend using google colab if you are just using a vanilla GAN or deep convolutional GAN.

Vanilla GAN

What exactly is a generative adversarial network you may ask. It is two neural networks that compete against each other.
One a generator who generates fake images think of this as the master forger who’s trying to generate fake art to sell onto the market.
Another is our valiant discriminator who is an inspector and looks through fake and real art and is able to hopefully choose the real thing.
Over time each of them trains and their weights get adjusted as one pauses and the other one learns. It gets to the point where the generator is generating pretty realistic copies of the real thing.
All this pausing and learning is done through backpropgation using gradient descent.

GAN, capcom, mhxx

Backpropagation and Gradient Descent

What is backpropagation and what is gradient descent you might ask.
The combination of these two tries to minimize the cost function.
Learning is accomplished when adjusting the model’s parameters so that the estimated prediction gradually converges to the target and thus the cost would decrease.
At first the generator is frozen so the discriminator can learn. After a few steps it’s then the discriminators turn to be frozen while the generator learns.
To goal is for both loss in this case cross entropy loss to decrease, below we show an image of how gradient descent works.

GAN, capcom, mhxx

Data Collection

I collected data from two different sources
One was the anime face dataset to learn how to use three methods of how GAN worked
This dataset contained 85000 images

Anime Dataset

The second was the Monster Hunter armor dataset
This dataset contained 1083 images which is not nearly enough to train with so data augmentation was used.

MHXX Dataset

Image Augmentation

By using data augmentation I increased my dataset from 1083 to 19437
Step 1 was to separate top half, single and double images that were stuck together
I then split the double images in half

#Splitting Image in Half
s = []
for i in range(1081):
# Read the image
    img = imageio.imread(filenames[i])
    height, width, color = img.shape

# Cut the image in half
    width_cutoff = width // 2
    s1 = img[:, :width_cutoff]
    s2 = img[:, width_cutoff:]

    s.append(s1)
    s.append(s2)

The single images were then combined with cut in half double and the top half images were not used
To double or image dataset I mirrored the images

Mirror

#Flipping the images to get more data
flip_img = []
for i in range(2172):
    flip = np.fliplr(myarray[i])
    flip_img.append(flip)

To further increase it again I increased brighness and dimness of the images and took a random sample
I collected up to 19437 new data images

Brightness and Dimness

#Testing brightness and dimness to increase dataset

data = thre_array[1000]
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(brightness_range=[0.7,1.2])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize = (15,10))

for i in range(6):
	# define subplot
	plt.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint8')
	# plot raw pixel data
	plt.imshow(image)
# show the figure
pyplot.show()

I then used photoshop to increase the resolution so we wouldn’t get any blurry images.
There are other methods of increasing the images dataset but I wanted to keep the same pose throughout
Other methods include
- Random cropping of the image
- Taking a piece out of the image
- Changing the RGB color of the image
- Zooming in and out for the Images
- Flipping the image upside down
- Image slight rotation
I then changed the images so they would have a 512 x 512 dimension which is required in Style GAN

#The files will be saved in a newly created folder called output
import Augmentor
p = Augmentor.Pipeline('mhxx_800px')
p.resize(probability = 1.0, width = 512, height = 512)
p.random_color(probability = 1.0, min_factor = 0.5, max_factor = 0.9)

#Now transform all our images to 256x256
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
p.process()

I then use fastai to check the images to see if tfrecords will read without error.
Use dataset_tool.py to change images to tfrecords
Please note when you do change your images to tfrecords you need to make sure you have enough space. Something like 6 gb of data can turn into 40-50 gb in tfrecords. This is because it takes the 4x4, 8x8, 16x16 all the way up to 1024 x 1024.

Image Augment Github

Convolutional Neural Networks

Convolutional layers can be thought as feature extractors) instead of connected layers for both discriminator and generator.
The main advantage of this is it would take much less training time.
For example, if you had an image with a resolution of 500 x 500 x 3 using a connected layer you would need up to 750,000 units for the whole image.
Then you would need to feed through each network layer. Imagine if you had 1000 neurons connected form the input to another layer. That would require 750,000 x 1000 which is 750 million connections. And you would keep going layer by layer and which would slow down training time as well as run the risk of overfitting.
Also, images nowadays have high resolution and to run it through a connected neural network would take ages.
Below is the structure of a deep convolutional GAN (DCGAN) which we will be using later to explore how well our image dataset can be used to create new images

GAN, capcom, mhxx

VGG16 Keras Model Feature Map

Another thing about convolutional neural networks is the feature maps capture the result of applying filters to an input image.
To try to gain an understanding of what features might be detected I used Keras’s VGG-16 model which is easier than trying to fit a model from scratch.
It has 16 learned layers and will show us how the neural network captures features. Filters are simply the weights in the neural network and these weight values have spatial relationship to each other.
Feature maps closer to the input capture more of the finer detail of the hunter armor avatar, as we progress through the network it becomes less and less clear.

GAN, capcom, mhxx

Feature Map Github

Deep Convolutional GAN

The main reason why I chose a DCGAN was it didn’t require too much computing time and finding the results for the first 2000 steps of training took only about 1-2 hours
I explored 4 sets of data.
- Monster Hunter World armor top Half
- Monster Hutner Double Cross (MHXX)
- Anime faces
- Monster Hunter Palico companions

GAN, capcom, mhxx

For hyperparameters I used generally the same as the original paper.
Original DCGAN Paper
The Steps to train a DCGAN are the following:
- Sample a batch of normalized images from the dataset of real images
- Generate noise for input to the generator
- Generate images using random noise using the generator, I also saved the generated fake images every 200 steps
- We then train the discriminator by pausing the generator. It is fed in fake images and real images.
- Then we freeze the discriminator and have it learn using the noise and the real images.
- Repeat step using the for loop to end up with a good discriminator and generator. However at some point DCGAN can only improve the image so much.
- That’s where Style GAN will come in next.

GAN, capcom, mhxx

DCGAN Github

Image Metrics

To see how well your GAN is you usually compare your fakes to your reals. You can also look at your generator loss, discriminator fake loss and discriminator real loss.
However, there are other quantitative measures that people have developed. Below are 3 of the most popular ones.
Inception Score takes the v3 model which was trained on 1000 classifications of images. This model is from the keras library. The higher the score the better. Which means better image quality and diversification.
Another method that borrows from inception score is frechet inception distance. It uses the same model but then compares the normal distribution between the real and fake images. The closer the distribution distances the better, which means the lower the score better image quality and diversification.
I got a score about 7.314 with the anime face dataset feeding it in the style GAN fakes comparing it to the real ones.
Another method that was developed by Nvidia is perceptual path length. They found that even with low FID scores, the generator would still produce low quality images.
The idea here is to choose the “perceptually” shortest distance.
For example, using a latent variable z1 produces a brownish light colored palico cat and a latent variable z2 produces a black palico cat.
Since only color is being changed the ideal path is the blue line where the middle ground is a gray palico cat which is “perceptually” the shortest distance.
The longest distance “perceptually” is the dinosaur as it needs to change shape and color
The lower the PPL the better.

GAN, capcom, mhxx

Freceht Inception Distance Github

Style GAN

GAN, capcom, mhxx

Use (A Traditional) Progressive Growing to gradually increase the resolution.
Generate images from fixed value tensor, not generating images from stochastically generated latent variables as in conventional GANs.
StyleGAN generates image from fixed 4x4x512 tensor
And stochastic latent variables are used as style vectors in StyleGAN.
The stochastically generated latent variables are used as style vectors through Adaptive instance normalization at each resolution after nonlinearly transformed by an 8-layer neural network.
Below is the process of training a Style GAN using the mhxx dataset

GAN, capcom, mhxx

For the anime dataset I trained it for 7 days. Using the original learning rate and minibatch. Below are the results using interpolation.

GAN, capcom, mhxx

I tried to do the same for the MHXX dataset however when we got to the fourth day of training we had a mode collapse.

GAN, capcom, mhxx

What is mode collapse
- Mode collapse when generator only capable of generating one or a small subset of different outcomes, or modes.
- Like our exploding blurred armor on the right at 4 days of training
Possible reasons for mode collapse
- Not enough data vs our face dataset. Only 19437 we might be overfitting the data with augmentation
- Generator has collapsed to a single point believing what it is drawing is real, then discriminator learns from that single point and gradient descent unable to separate identical outputs
I then used transfer learning from the anime face model pickle
Hyperparameter changes
- decreased LR to 0.001
- set minibatch repeat to 1 instead of 5.
- Perhaps having too many minibatches to give to discriminator caused the model to be unstable
Transfer Learning take a model trained on a large dataset and transfer its knowledge to a smaller dataset.
- Set resume_kimg to the pickled face model at 7440
- This allows us to use the advantage that the anime face model had a bigger data set
Below are the results. We can see some blurred images and this called artifact or blurs. This is caused when the generator is trying its best to “feel” out how best to generate real images.

GAN, capcom, mhxx

Style GAN Github

TensorBoard Generator and Discriminator Loss

r1_penalty-Generator loss is going down so producing better images
Fake_score- Discriminator Fake Loss – Getting better at classifying fake images
Real_score – Discriminator Real Loss – Getting higher and is getting worse at classifying real images Might be due to data size or artifact blurring

GAN, capcom, mhxx

Generated Images

This is the results of our generated images from the MHXX Style GAN

GAN, capcom, mhxx

Generate Image Github

Future Work

This was pretty successful. The generate images could be used for new armor ideas for new monsters or new armor designs.
To better improve this I would probably collect more images from mixed armor fashion communities and more armor data from monster hunter world.
Some ideas to add to this are:
- add a gpt2 model to it to generate hunter and armor names
- separate with helmet and no helmet and male and female so model can do a better job.
Future work could be to generate new iron man armor, a new Disney princess
Going to try to work on a face swap GAN web app using existing IP for social media.

Thank you for reading my Blog! Slide Deck, Full Write up and Github link below

Share on

Twitter Facebook Google+ LinkedIn

Justin Huang

MHXX GAN: Image Generation with Neural Networks

Intro

Computer Requirements vs Google Colab

Vanilla GAN

Backpropagation and Gradient Descent

Data Collection

Image Augmentation

Mirror

Brightness and Dimness

Convolutional Neural Networks

VGG16 Keras Model Feature Map

Deep Convolutional GAN

Image Metrics

Style GAN

TensorBoard Generator and Discriminator Loss

Generated Images

Future Work

Share on

You May Also Enjoy

Disney NLP: Analysis of Tokyo and Anaheim Disney Resorts