Reference

Contents

Reference

AutoGrad

AutoGrad.grad — Function.

Usage:

x = Param([1,2,3])          # user declares parameters
x => P([1,2,3])             # they are wrapped in a struct
value(x) => [1,2,3]         # we can get the original value
sum(abs2,x) => 14           # they act like regular values outside of differentiation
y = @diff sum(abs2,x)       # if you want the gradients
y => T(14)                  # you get another struct
value(y) => 14              # which represents the same value
grad(y,x) => [2,4,6]        # but also contains gradients for all Params

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradient information.

grad(y, x) returns the gradient of y (output by @diff) with respect to any parameter x::Param, or nothing if the gradient is 0.

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an array of Params found by a recursive search of object x.

Alternative usage:

x = [1 2 3]
f(x) = sum(abs2, x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.gradloss — Function.

Usage:

x = Param([1,2,3])          # user declares parameters
x => P([1,2,3])             # they are wrapped in a struct
value(x) => [1,2,3]         # we can get the original value
sum(abs2,x) => 14           # they act like regular values outside of differentiation
y = @diff sum(abs2,x)       # if you want the gradients
y => T(14)                  # you get another struct
value(y) => 14              # which represents the same value
grad(y,x) => [2,4,6]        # but also contains gradients for all Params

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradient information.

grad(y, x) returns the gradient of y (output by @diff) with respect to any parameter x::Param, or nothing if the gradient is 0.

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an array of Params found by a recursive search of object x.

Alternative usage:

x = [1 2 3]
f(x) = sum(abs2, x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)

AutoGrad.gradcheck — Function.

gradcheck(f, x...; kwargs...)

Numerically check the gradient of f(x...) and return a boolean result.

Each argument can be a Number, Array, Tuple or Dict which in turn can contain other Arrays etc. Only 10 random entries in each large numeric array are checked by default. If the output of f is not a number, we check the gradient of sum(f(x...)).

Keywords

args=:: the argument indices to check gradients with respect to. Could be an array or range of indices or a single index. By default all arguments that have a length method are checked.
kw=(): keyword arguments to be passed to f.
nsample=10: number of random entries from each numeric array in gradient dw=(grad(f))(w,x...;o...) compared to their numerical estimates.
atol=rtol=0.01: tolerance parameters. See isapprox for their meaning.
delta=0.0001: step size for numerical gradient calculation.
verbose=1: 0 prints nothing, 1 shows failing tests, 2 shows all tests.

KnetArray

Knet.KnetArray — Type.

KnetArray{T}(undef,dims)
KnetArray(a::AbstractArray)
Array(k::KnetArray)

Container for GPU arrays that supports most of the AbstractArray interface. The constructor allocates a KnetArray in the currently active device, as specified by gpu(). KnetArrays and Arrays can be converted to each other as shown above, which involves copying to and from the GPU memory. Only Float32/64 KnetArrays are fully supported.

Important differences from the alternative CudaArray are: (1) a custom memory manager that minimizes the number of calls to the slow cudaMalloc by reusing already allocated but garbage collected GPU pointers. (2) a custom getindex that handles ranges such as a[5:10] as views with shared memory instead of copies. (3) custom CUDA kernels that implement elementwise, broadcasting, and reduction operations.

Supported functions:

Indexing: getindex, setindex! with the following index types:
- 1-D: Real, Colon, OrdinalRange, AbstractArray{Real}, AbstractArray{Bool}, CartesianIndex, AbstractArray{CartesianIndex}, EmptyArray, KnetArray{Int32} (low level), KnetArray{0/1} (using float for BitArray) (1-D includes linear indexing of multidimensional arrays)
- 2-D: (Colon,Union{Real,Colon,OrdinalRange,AbstractVector{Real},AbstractVector{Bool},KnetVector{Int32}}), (Union{Real,AbstractUnitRange,Colon}...) (in any order)
- N-D: (Real...)
Array operations: ==, !=, cat, convert, copy, copyto!, deepcopy, display, eachindex, eltype, endof, fill!, first, hcat, isapprox, isempty, length, ndims, one, ones, pointer, rand!, randn!, reshape, similar, size, stride, strides, summary, vcat, vec, zero. (cat(x,y,dims=i) supported for i=1,2.)
Math operators: (-), abs, abs2, acos, acosh, asin, asinh, atan, atanh, cbrt, ceil, cos, cosh, cospi, erf, erfc, erfcinv, erfcx, erfinv, exp, exp10, exp2, expm1, floor, log, log10, log1p, log2, round, sign, sin, sinh, sinpi, sqrt, tan, tanh, trunc
Broadcasting operators: (.*), (.+), (.-), (./), (.<), (.<=), (.!=), (.==), (.>), (.>=), (.^), max, min. (Boolean operators generate outputs with same type as inputs; no support for KnetArray{Bool}.)
Reduction operators: countnz, maximum, mean, minimum, prod, sum, sumabs, sumabs2, norm.
Linear algebra: (*), axpy!, permutedims (up to 5D), transpose
Knet extras: relu, sigm, invx, logp, logsumexp, conv4, pool, deconv4, unpool, mat, update! (Only 4D/5D, Float32/64 KnetArrays support conv4, pool, deconv4, unpool)

Memory management

Knet models do not overwrite arrays which need to be preserved for gradient calculation. This leads to a lot of allocation and regular GPU memory allocation is prohibitively slow. Fortunately most models use identically sized arrays over and over again, so we can minimize the number of actual allocations by reusing preallocated but garbage collected pointers.

When Julia gc reclaims a KnetArray, a special finalizer keeps its pointer in a table instead of releasing the memory. If an array with the same size in bytes is later requested, the same pointer is reused. The exact algorithm for allocation is:

Try to find a previously allocated and garbage collected pointer in the current device. (0.5 μs)
If not available, try to allocate a new array using cudaMalloc. (10 μs)
If not successful, try running gc() and see if we get a pointer of the right size. (75 ms, but this should be amortized over all reusable pointers that become available due to the gc)
Finally if all else fails, clean up all saved pointers in the current device using cudaFree and try allocation one last time. (25-70 ms, however this causes the elimination of all reusable pointers)