User-Defined Derivatives
Note
User-defined derivatives are rarely necessary. While they can be as accurate as automatic differentiation and may offer modest speed improvements, calculating them correctly requires significant effort. Most users can safely skip this section.
For most problems, the best choice for calculating derivatives is either automatic differentiation or central difference numerical differentiation. However, there may be cases where the user wants or needs to supply their own derivatives. For example, one could have a problem for which almost all the derivatives are known analytically, but a few must be calculated numerically. This section explains how to provide user-defined derivatives.
The following example illustrates how to set up a problem with user-defined derivatives. We use the classic brachistochrone problem, where the objective is to minimize the time for a particle to slide down a curve between two points. This example shows how to specify derivatives for objectives and constraints manually. (See the Jupyter Lab notebook for additional problem details.) As usual, we start by defining the problem:
from yapss import Problem
from yapss import numpy as np
from yapss.numpy import cos, pi, sin
ocp = Problem(name="Brachistochrone", nx=[3], nu=[1], nq=[0])
ocp.derivatives.method = "user"
ocp.derivatives.order = "second"
g0 = 32.174
def objective(arg):
arg.objective = arg.phase[0].final_time
def continuous(arg):
x, y, v = arg.phase[0].state
(u,) = arg.phase[0].control
arg.phase[0].dynamics[:] = v * cos(u), v * sin(u), g0 * sin(u)
Because we’ve specified that we’re using user-defined, second-order derivatives, we need to define the gradient and Hessian functions for the objective, and the Jacobian and Hessian functions for the continuous functions.
The Continuous Jacobian and Hessian
We’ll start with the continuous function, which generically computes the dynamics function, the integrand array, and the path constraint functions. For our example, the continuous Jacobian function is defined as follows:
def continuous_jacobian(arg):
x, y, v = arg.phase[0].state
(u,) = arg.phase[0].control
jacobian = arg.phase[0].jacobian
jacobian[("f", 0), ("x", 2)] = cos(u)
jacobian[("f", 0), ("u", 0)] = -v * sin(u)
jacobian[("f", 1), ("x", 2)] = sin(u)
jacobian[("f", 1), ("u", 0)] = v * cos(u)
jacobian[("f", 2), ("u", 0)] = g0 * cos(u)
ocp.functions.continuous_jacobian = continuous_jacobian
For each phase p, the jacobian dictionary (arg.phase[p].jacobian) contains entries
representing partial derivatives. The keys follow the pattern:
("f", i): Elementiof the dynamics function.("g", i): Elementiof the integrand.("h", i): Elementiof the path constraint.
Each entry is a partial derivative with respect to a decision variable. The decision variables are:
("x", i): State vector elementi.("u", i): Control vector elementi.("s", i): Parameter vector elementi.("t", 0): Time variable.
The Hessian of the continuous functions is defined similarly:
def continuous_hessian(arg):
x, y, v = arg.phase[0].state
(u,) = arg.phase[0].control
hessian = arg.phase[0].hessian
hessian[("f", 0), ("x", 2), ("u", 0)] = -sin(u)
hessian[("f", 0), ("u", 0), ("u", 0)] = -v * cos(u)
hessian[("f", 1), ("x", 2), ("u", 0)] = cos(u)
hessian[("f", 1), ("u", 0), ("u", 0)] = -v * sin(u)
hessian[("f", 2), ("u", 0), ("u", 0)] = -g0 * sin(u)
ocp.functions.continuous_hessian = continuous_hessian
The main difference with the Hessian is that each entry represents a second partial derivative,
so two keys corresponding to decision variables are required for each partial derivative. Note that
for mixed partial derivatives, the order of the keys is unimportant. For example, if you set
hessian[("f", 0), ("x", 2), ("u", 0)], hessian[("f", 0), ("u", 0), ("x", 2)] should not be set.
The Objective Gradient and Hessian
We’ll start with the derivative functions for the objective. The gradient function is defined as follows:
def objective_gradient(arg):
arg.gradient[0, "tf", 0] = 1
ocp.functions.objective_gradient = objective_gradient
The gradient values are stored in the dictionary arg.gradient. In this example, the key is
(0, "tf", 0), which means the gradient is with respect to the final time of phase 0.
The gradient keys are of the form
(p, "x0", i): Elementiof the initial state vector in phasep.(p, "xf", i): Elementiof the final state vector in phasep.(p, "t0", 0): The initial time of phasep.(p, "tf", 0): The final time of phasep.(0, "s", i): Elementiof the parameter vector.
Because the gradient is constant with respect to the decision variables, the Hessian function is zero, so the Hessian function is simply
def objective_hessian(arg):
return
ocp.functions.objective_hessian = objective_hessian
Had the Hessian been nonzero, each key would have been a tuple of two decision variable keys, so for
example a key of the dictionary arg.hessian in a multiphase problem might be something like
((3, "xf", 1), (0, "s", 2)). More generally, the keys are of the form
(p, var1, i1), (p, var2, i2)
where p is the index of the phase (or 0 for an element of the parameter vector), var1 and
var2 are one of the variables x0, xf, t0, tf, or s, and i1 and i2
are the index of the variable (or 0 for t0 for tf).
The Discrete Jacobian and Hessian
For problems with discrete functions, the discrete Jacobian and discrete Hessian are defined similarly to the objective functions. The keys for the discrete Jacobian and Hessian dictionaries have an additional index for the discrete function:
(k, (p, var, i))
where k is the discrete function index, and p, var, and i are as in
the objective derivative functions. Finally, the arg.hessian keys have the form
(k, (p1, var1, i1), (p2, var2, i2))