Understand shape inference in deep learning technologies

Run the following code in your Python shell with Keras Python installed,

from keras.layers import Input, Embedding
from keras import backend as K
a = Input((12, ))
x = Embedding(1000, 10, trainable=False, input_length=10, name="word_embedding")(a)
print(K.int_shape(x))
print(x.get_shape().as_list())

And you’ll get suprised. You’ll get two different returns:

(None, 10, 10)
[None, 12, 10]

If you are not sure why, this article is for you!

Static Shapes

In Tensorflow, the static shape is given by .get_shape() method of the Tensor object, which is equivalent to .shape.

The static shape is an object of type tensorflow.python.framework.tensor_shape.TensorShape.

With None instead of an integer, it leaves the possibility for partially defined shapes :

a = Input((None, 10))
print(a.shape)

returns (?, ?, 10) where unknown dimensions None are printed with an question mark ?. Since we are using the Keras Input layer, the first dimension is systematically None for the batch size and cannot be set.

The TensorShape has 2 public attributes, dims and ndims:

print(a.shape.dims) # [Dimension(None), Dimension(None), Dimension(10)]
print(a.shape.ndims) # 3
print(K.ndim(a)) # 3

Dynamic Shapes

Of course, during the run of the graph with input values, all shapes become known. Shape at run time are named dynamic shapes.

To access to their values at run time, you can use either the tensorflow operator tf.shape() or the Keras wrapper K.shape().

As graph operators, they both return a tensorflow.python.framework.ops.Tensor Tensor.

a = Input((None, ))
import numpy as np
f = K.function([a], [K.shape(a)])
print(f( [np.random.random((3,11)) ]))

returns [array([ 3, 11], dtype=int32)]. 3 is the batch size, and 11 the value for the unknown data dimension at graph definition.

The equivalent of a.get_shape().as_list() for static shapes is tf.unstack(tf.shape(a)).

The number of dimensions is returned by tf.rank() which returns a tensor of rank zero (scalar) and for that reason is very different from K.ndim or ndims methods with integer return.

Shape setting for Operators

Most operators have an output shape function that enables to infer the static shape, without running the graph, given the shapes of the operators’ input tensors.

For example, in Tensorflow you can check how to define the Shape functions in C++ for any operator.

Nevertheless, shape functions do not cover all cases correctly, and in some cases, it is impossible to infer the shape without knowing more of your intent

Let’s take an example, where automatic shape inference is not possible.

Let’s define a reshaping operation given a reshaping tensor for which the values are not known at graph definition, but only at run time.

Such a reshaping tensor can be for example depending on input tensor dimensions, or any other dynamic shapes:

a = Input((None,))
reshape_tensor = Input((), dtype="int32")
print(a.shape)

# reshape the first input tensor given the second input tensor
x = K.reshape(a, reshape_tensor)
print(x.shape)

# build the model
f = K.function([a, reshape_tensor], [x])

# eval the model on input data
import numpy as np
f([np.random.random((3,10)), [3, 2,5] ])

prints

(?, ?)
<unknown>

The shape of variable a is of rank 2, but both dimensions are not known: it is named partially known shape. Its value is TensorShape([None, None]).

The shape of variable x is unknown, neither the rank nor the dimensions are known. Its value is TensorShape(None).

That is where, if possible, setting the shape manually can help get a more precise shape than TensorShape([None, None]) or TensorShape(None).

To set the unknown shape dimensions:

a = Input((None, 10))
a.set_shape((10,10,10))
print(a.shape)

Note that shape setting requires to preserve the shape rank and known dimensions. IF I write:

a.set_shape((10,10,10))

it leads to a value error

ValueError: Shapes (?, ?, 10) and (10, 10, 11) are not compatible

Setting the shape enables further operators to compute the shape.

Since Tensorflow does not have a concept of Layers (it is much more based on Operators and Scopes), the set_shape() function is the method for shape inference without running the graph.

Shape setting for Layers

Let’s come back to the initial example, where a layer, the Embedding layer, is a concept involved in the middle of the Keras graph definition.

The concept of layers gives a struture to the neural networks, enabling to run through the layers later on:

from keras.models import Model
a = Input((11,))
x = Embedding(1000, 10, trainable=False, input_length=10, name="word_embedding")(a)
m = Model(a,x)
for l in m.layers:
    print(l.name)

returns the name of each layers: input_1, word_embedding.

Still, when a shape cannot be inferred, it is possible to set it also, so that further layers benefit from their output shape information.

Let’s see in practice, with a simple custom concatenate Layer.

For the purpose, let me introduce an error in the compute_output_shape() function, adding 2 to the last shape dimension, as I did in the Embedding layer at the begining of this article, setting input_length to 10 instead of 12:

from keras.engine.topology import Layer
class MyConcatenateLayer(Layer):

    def call(self, inputs):
        return K.concatenate(inputs)

    def compute_output_shape(self, input_shape):
        print(input_shape) # [(None, 10), (None, 12)]
        return (None, input_shape[0][1] + input_shape[1][1] + 2)

from keras.layers import Input, Embedding
from keras import backend as K
a = Input((10,))
b = Input((12,))
c = MyConcatenateLayer()([a,b])
print(c.shape) # (?, 22)
print(c._keras_shape) # (?, 24)

The code will run without error, as well as the graph evaluation on values for inputs.

As you can see, Keras adds attributes to Tensor such as _keras_shape, to be able to retrieve layers information. This can be useful for layer weight saving for example.

The Keras K.int_shape() method relies on _keras_shape attribute to return the result, leading to propagation of the error.

Since shapes can vary in rank and their values can be None, it is difficult, on such a simple concatenation example, to be sure to cover all cases in the shape inference function and this leads to errors.

A note on CNTK

CNTK distinguishes unknown dimensions into 2 categories:

the inferred dimensions whose value is to be inferred by the system and is printed with -1 instead of the question mark ?. For example, in the matrix multiplication A x B between tensors A and B, the last dimension of A can be inferred by the system given the first dimension of B. See here
the free dimensions whose value is known only when data is bound to the variable and is printed with -3

Well done!

Now, you are aware of the Why of the shape setting, its advantages and its risks.