I know this is all coming out about a week after I started seeing some of these parameter and layer tests, but I wanted to make my own, both for easy reference and for more precise targeting of layer articulations in the Google Imagenet library. I modified the original Deepdream script to systematically move through parameter choices and layers. I didn’t test all the layers, as some of the lower-level ones just weren’t that interesting. Finally, I don’t want to host all these pictures here on this post because it would be enormous, so I’m using the gallery-style viewer on Imgur instead. These ran on my Linux Mint 17 laptop on CPU. Memory requirements and avoiding errors relating to floats kept some of my parameter choices confined to a somewhat small range, but hey, I figure most people casually trying Deepdream out might not have GPUs either. I hope this is valuable to someone. Here’s the original image for everything that follows. I chose it because it’s relatively small at 512p, has some fine detail, a face (and eyes), some blurred areas, some blank space/light gradients, and a small-ish color palette:
In each case, I am varying the value of one parameter and leaving the others that their default values. That is: n_iter = 10; n_octaves = 4; octave_scale = 1.4. Also, each frame beyond the first is zoomed 0.25% towards the center of the image.
Parameters:
Here I step through a series of value increments in each parameter for five different layers, some with simple articulations, some with higher-level ones, to examine the effects of varying those parameters. I feed each output back into the image to be processed, saving each time (and zooming in 0.25%), so that I obtain a series of four increasingly modified images for each layer and each parameter set. I felt this was enough to obtain a general idea of the character of each parameter adjustment.
Number of iterations:
inception_3a_pool_1 inception_3a_pool_3 inception_3a_pool_5 inception_3a_pool_7 inception_3a_pool_9 inception_3a_pool_11 inception_3a_pool_13 inception_3a_pool_15 inception_3a_pool_17 inception_3a_pool_19 inception_3a_pool_21 inception_3a_pool_23
inception_3b_3x3_reduce_1 inception_3b_3x3_reduce_3 inception_3b_3x3_reduce_5 inception_3b_3x3_reduce_7 inception_3b_3x3_reduce_9 inception_3b_3x3_reduce_11 inception_3b_3x3_reduce_13 inception_3b_3x3_reduce_15 inception_3b_3x3_reduce_17 inception_3b_3x3_reduce_19 inception_3b_3x3_reduce_21 inception_3b_3x3_reduce_23
inception_4c_output_1 inception_4c_output_3 inception_4c_output_5 inception_4c_output_7 inception_4c_output_9 inception_4c_output_11 inception_4c_output_13 inception_4c_output_15 inception_4c_output_17 inception_4c_output_19 inception_4c_output_21 inception_4c_output_23
inception_4d_5x5_1 inception_4d_5x5_3 inception_4d_5x5_5 inception_4d_5x5_7 inception_4d_5x5_9 inception_4d_5x5_11 inception_4d_5x5_13 inception_4d_5x5_15 inception_4d_5x5_17 inception_4d_5x5_19 inception_4d_5x5_21 inception_4d_5x5_23
inception_5b_3x3_1 inception_5b_3x3_3 inception_5b_3x3_5 inception_5b_3x3_7 inception_5b_3x3_9 inception_5b_3x3_11 inception_5b_3x3_13 inception_5b_3x3_15 inception_5b_3x3_17 inception_5b_3x3_19 inception_5b_3x3_21 inception_5b_3x3_23
Number of octaves:
inception_3a_pool_1 inception_3a_pool_2 inception_3a_pool_3 inception_3a_pool_4 inception_3a_pool_5 inception_3a_pool_6 inception_3a_pool_7 inception_3a_pool_8
inception_3b_3x3_reduce_1 inception_3b_3x3_reduce_2 inception_3b_3x3_reduce_3 inception_3b_3x3_reduce_4 inception_3b_3x3_reduce_5 inception_3b_3x3_reduce_6 inception_3b_3x3_reduce_7 inception_3b_3x3_reduce_8
inception_4c_output_1 inception_4c_output_2 inception_4c_output_3 inception_4c_output_4 inception_4c_output_5 inception_4c_output_6 inception_4c_output_7 inception_4c_output_8
inception_4d_5x5_1 inception_4d_5x5_2 inception_4d_5x5_3 inception_4d_5x5_4 inception_4d_5x5_5 inception_4d_5x5_6 inception_4d_5x5_7 inception_4d_5x5_8
inception_5b_3x3_1 inception_5b_3x3_2 inception_5b_3x3_3 inception_5b_3x3_4 inception_5b_3x3_5 inception_5b_3x3_6 inception_5b_3x3_7 inception_5b_3x3_8
Octave scale:
inception_3a_pool_1.00 inception_3a_pool_1.13 inception_3a_pool_1.27 inception_3a_pool_1.44 inception_3a_pool_1.62 inception_3a_pool_1.83 inception_3a_pool_2.07 inception_3a_pool_2.33 inception_3a_pool_2.64 inception_3a_pool_2.98
inception_3b_3x3_reduce_1.00 inception_3b_3x3_reduce_1.13 inception_3b_3x3_reduce_1.27 inception_3b_3x3_reduce_1.44 inception_3b_3x3_reduce_1.62 inception_3b_3x3_reduce_1.83 inception_3b_3x3_reduce_2.07 inception_3b_3x3_reduce_2.33 inception_3b_3x3_reduce_2.64 inception_3b_3x3_reduce_2.98
inception_4c_output_1.00 inception_4c_output_1.13 inception_4c_output_1.27 inception_4c_output_1.44 inception_4c_output_1.62 inception_4c_output_1.83 inception_4c_output_2.07 inception_4c_output_2.33 inception_4c_output_2.64 inception_4c_output_2.98
inception_4d_5x5_1.00 inception_4d_5x5_1.13 inception_4d_5x5_1.27 inception_4d_5x5_1.44 inception_4d_5x5_1.62 inception_4d_5x5_1.83 inception_4d_5x5_2.07 inception_4d_5x5_2.33 inception_4d_5x5_2.64 inception_4d_5x5_2.98
inception_5b_3x3_1.00 inception_5b_3x3_1.13 inception_5b_3x3_1.27 inception_5b_3x3_1.44 inception_5b_3x3_1.62 inception_5b_3x3_1.83 inception_5b_3x3_2.07 inception_5b_3x3_2.33 (I got floating point exceptions if I went higher for this one)
Layers:
Here, all three parameters are at their default values; n_iter = 10; n_octaves = 4; octave_scale = 1.4. This time, there are a total of ten images in each feedback loop, since I wanted to get a deeper picture of the dominant articulations that appear in each layer. Of course, I probably miss a lot because the sample image I chose doesn’t contain every feature that can be recognized and articulated.
inception_3a_1x1 inception_3a_3x3 inception_3a_3x3_reduce inception_3a_5x5 inception_3a_5x5_reduce inception_3a_output inception_3a_pool inception_3a_pool_proj
inception_3b_1x1 inception_3b_3x3 inception_3b_3x3_reduce inception_3b_5x5 inception_3b_5x5_reduce inception_3b_output inception_3b_pool inception_3b_pool_proj
inception_4a_1x1 inception_4a_3x3 inception_4a_3x3_reduce inception_4a_5x5 inception_4a_5x5_reduce inception_4a_output inception_4a_pool inception_4a_pool_proj
inception_4b_1x1 inception_4b_3x3 inception_4b_3x3_reduce inception_4b_5x5 inception_4b_5x5_reduce inception_4b_output inception_4b_pool inception_4b_pool_proj
inception_4c_1x1 inception_4c_3x3 inception_4c_3x3_reduce inception_4c_5x5 inception_4c_5x5_reduce inception_4c_output inception_4c_pool inception_4c_pool_proj
inception_4d_1x1 inception_4d_3x3 inception_4d_3x3_reduce inception_4d_5x5 inception_4d_5x5_reduce inception_4d_output inception_4d_pool inception_4d_pool_proj
inception_4e_1x1 inception_4e_3x3 inception_4e_3x3_reduce inception_4e_5x5 inception_4e_5x5_reduce inception_4e_output inception_4e_pool inception_4e_pool_proj
inception_5a_1x1 inception_5a_3x3 inception_5a_3x3_reduce inception_5a_5x5 inception_5a_5x5_reduce inception_5a_output inception_5a_pool inception_5a_pool_proj
inception_5b_1x1 inception_5b_3x3 inception_5b_3x3_reduce inception_5b_5x5 inception_5b_5x5_reduce inception_5b_output inception_5b_pool inception_5b_pool_proj
Conclusions:
So, what have we learned? Well, first of all, creating ~220 Imgur galleries for one blog post brings heavy sadness. But more importantly, we can discern some general trends across the board for parameters and layer characteristics. One way to check this out is to open multiple tabs and scroll across them as the value of a parameter is increased. The number of iterations is the primary controller for the “intensity” of the articulations, but different values for this parameter in the same neighborhood will actually select slightly different features to articulate. For the purposes of creating smooth animations, the value of the number of iterations should probably be kept low, say, to 8 or less. The number of octaves also controls the intensity of articulations, but in a different way, effecting the relative size, depth, and divergence in the colors of the input image. Different values of the number of octaves in the same neighborhood also definitely pick out different features to articulate, and setting a high number for this parameter will strongly increase the color saturation of the resultant images. Finally, the octave scale is another control upon the relative size effect of the features that Deep Dream brings out. Again, a large number for the octave scale seems to have an effect on the color saturation of the resultant images. Lowering the number of iterations and the number of octaves obviously reduces the computational cost.
With regards to layers, it seems that the name of each layer doesn’t really help much to describe what’s going on, but generally lower-level layers (3a and 3b) tend to articulate geometry-based deformations similar to what one would find in a photoshop filter, medium-level layers tend to articulate different takes on the dogslugs, lizards, pagodas, eyes, and so on, and the high-level layers (5a and 5b) are less-recognizable and harder to describe, but I don’t see a lot of the fundamental characteristics of layer 4 presented within.
Thanks for reading and I hope this helps your Deep Dreaming in some way!
Pingback: Dreaming big |