p → q Norm
The $p \rightarrow q$ norm of a matrix comes up in approximation algorithms, small set expansion, and analysis. It measures the most that the $q$-norm can be larger than the $p$-norm, effectively prioritizing concentrated vectors. The norm of a matrix can be thought of as “how much it blows up an input vector”.
$$ \|A\|_{p \rightarrow q} = \sup_{x \neq 0} \frac{\|Ax\|_q}{\|x\|_p} $$
When solving for this, note that there are two ways in which to take the norm:
1) sum norm, and
2) expectation norm.
Sum Norm
$$ \|x\|_p = \left(\sum_i |x_i|^p \right)^{\frac{1}{p}} $$
Expectation Norm
$$ \|x\|_p = \mathbb{E}_i \left(|x_i|^p\right)^{\frac{1}{p}} $$
Note that as $p$ is increased, it weights more strongly the heavy elements of the vector. This is why $\|\cdot\|_{\infty}$ is simply the largest vector element.
Example 1
Solve for:
$$ \|\mathbb{I}\|_{2 \rightarrow 4} $$
$$ \|\mathbb{I}\|_{2 \rightarrow 4} = \sup_{x \neq 0} \frac{\|\mathbb{I}x\|_4}{\|x\|_2} $$
Let’s contrast the behavior of the $2 \rightarrow 4$ norm using two input vectors, $x_1$ and $x_2$.
$$ x_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix} $$
$$ \|\mathbb{I}\|_{2 \rightarrow 4} = \frac{\left[\frac{1}{n}(1^4 + 0 + 0 + 0)\right]^{\frac{1}{4}}}{\left[\frac{1}{n}(1^2 + 0 + 0 + 0)\right]^{\frac{1}{2}}} = \frac{n^{-\frac{1}{4}}}{n^{-\frac{1}{2}}} = n^{\frac{1}{4}} $$
$$ x_2 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} $$
$$ \|\mathbb{I}\|_{2 \rightarrow 4} = \frac{\left[\frac{1}{n}(1^4 + 1^4 + 1^4 + 1^4)\right]^{\frac{1}{4}}}{\left[\frac{1}{n}(1^2 + 1^2 + 1^2 + 1^2)\right]^{\frac{1}{2}}} = 1 $$
It is apparent that the 4-norm is highest when vectors are concentrated. Note that the above results are opposite when using the sum norm.
Important Result: If the matrix in question preserves the 2-norm, such as a unitary matrix, the most the $2 \rightarrow 4$ norm can be is larger by a factor of $n^{\frac{1}{4}}$.
The $2 \rightarrow 2$ norm of a graph’s normalized adjacency matrix (“normalized” means it’s a random walk matrix) is always 1. But, the second largest eigenvalue indicates expansion of the graph, as it provides information about the rate at which a random walk mixes; if something is orthogonal to the stationary distribution, how quickly will that vector decay under mixing on the graph.
If the $2 \rightarrow 4$ norm is applied to the adjacency matrix of a graph, it is similar to the $2 \rightarrow 2$ norm, except that the 4 favors small sets, telling more about small set expansion.
Theorem 1
If $A$ is the normalized adjacency matrix of a graph and $P_{\ge \lambda}$ is the projector onto the $\ge \lambda$ eigenspace, then the graph has very good small set expansion. All sets of size $\delta$ expand greatly iff the projector has a small $2 \rightarrow 4$ norm:
$$ \Phi_\delta \ge 1 – \lambda^{O(1)} \iff \|P_\lambda\|_{2 \rightarrow 4} \le \frac{1}{\delta^{O(1)}} $$
Solving $2 \rightarrow 4$ norm
Recognize that
$$ \|Ax\|_4 = \left(\mathbb{E}_i |\langle a_i, x \rangle|^4\right)^{\frac{1}{4}} = \left(\frac{1}{m} \sum_{i=1}^m \langle a_i, x \rangle^4 \right)^{\frac{1}{4}}. $$
Now, normalize $x$: $x’ = \frac{x}{\|x\|_2}$.
$$ \sup_{x \neq 0} \frac{\|Ax\|_4}{\|x\|_2} = \left[\max_{\|x\|_2 = 1} \mathbb{E}_i |\langle a_i, x \rangle|^4\right]^{\frac{1}{4}} $$
$$ \therefore \|A\|_{2 \rightarrow 4}^4 = \max_{\|x\|_2=1} \mathbb{E}_i \langle a_i, x \rangle^4 $$
The RHS can be thought of as a homogeneous degree-4 polynomial in the entries of $x$ constrained by the $l_2$ unit sphere. Optimizing a problem over the unit sphere is a non-convex problem and generally hard. This is where something like the Lasserre hierarchy comes in.
Multilinear algebra provides the following equation:
$$ \langle a_i, x \rangle^4 = (a_i^T x)^4 = \langle a_i^{\otimes 4}, x^{\otimes 4} \rangle $$
Also, bilinearity allows bringing the expected value inside the inner product. Thus:
$$ \max_{\|x\|_2=1} \mathbb{E}_i \langle a_i, x \rangle^4 = \max_{\|x\|_2=1} \langle \mathbb{E}_i a_i^{\otimes 4}, x^{\otimes 4} \rangle $$
This allows us to separate into something that depends on the problem and something that we are to maximize over. Now in taking the convex hull, we can separate it into something that’s instance-dependent and something that’s universal.
Aside — Observables
An observable is a Hermitian matrix, $M$, with expectation value $x^\dagger M x$. Because the trace of a scalar is the scalar:
$$ x^\dagger M x = \mathrm{Tr}(x^\dagger M x) $$
Because trace holds the cyclic property:
$$ \mathrm{Tr}(x^\dagger M x) = \mathrm{Tr}(M x x^\dagger) $$
From the Frobenius inner product:
$$ \mathrm{Tr}(M x x^\dagger) = \langle M^\dagger, x x^\dagger \rangle $$
Because $M$ is Hermitian, $M = M^\dagger$:
$$ x^\dagger M x = \langle M, x x^\dagger \rangle $$
By writing it like this, we have separated the quantity we are measuring versus the state we are measuring. Thus, the quantities that we measure are linear in the state, like in probability theory:
$$ \mathrm{Tr}(M \rho) = \lambda \mathrm{Tr}(M \rho_1) + (1 – \lambda) \mathrm{Tr}(M \rho_2) \quad \text{if} \quad \rho = \lambda \rho_1 + (1 – \lambda) \rho_2. $$
Aside — Density Matrices
Density matrices can represent mixed states, and the set of density matrices can be thought of as a convex hull:
$$ D(n) = \mathrm{conv}\{x x^\dagger \mid \|x\|_2 = 1\} $$
Why is this true?
A density matrix \(\rho \in \mathbb{C}^{2n \times 2n}\) satisfies:
- \(\rho \succeq 0\)
- \(\mathrm{Tr}(\rho) = 1\)
A convex combination of pure states looks like:
$$ \rho = \sum_i \lambda_i x_i x_i^\dagger $$
where \(\lambda_i \ge 0\) and \(\sum_i \lambda_i = 1\).
This is a statistical mixture of pure states \(x_i\) each with probability \(\lambda_i\). Mathematically, this is what it means to be in the convex hull of pure state projectors. The Krein–Milman theorem in convex analysis tells us that any compact convex set is the convex hull of its extreme points. Thus, every mixed state is a convex combination of pure states.
Because density matrices combine quantum mechanics and probability, it allows us to distinguish quantum entanglement from classical correlation. For a pure state, if you’re correlated then you’re entangled. In determining correlation versus entanglement for, say, two particles, then you define the separable states to mean unentangled, and those are all convex combinations of product states; all the classically correlated states.
$$ \mathrm{sep}(n,n) = \mathrm{conv}\{x x^\dagger \otimes y y^\dagger \mid \|x\|_2=1, \|y\|_2=1\} $$
Note that a density matrix of rank-1 corresponds to a pure state.
Because the above are a convex set, we can define the support function as the maximum inner product of anything in the set given an input \(M\). For example, someone tells you they’re going to make a measurement, \(M\). Prepare for me a quantum system \(\rho\) that maximizes this value. The hardness result for this problem is:
$$ 3\mathrm{SAT}[\log^2(n)] \le_p h_{\mathrm{sep}}(M) = 1 \quad \text{or} \quad h_{\mathrm{sep}}(M) \le \frac{1}{2} $$
for some \(M\) satisfying \(0 \le M \le \mathbb{I}\) and \(M\) of size \(n^2 \times n^2\). Assuming the exponential time hypothesis, this implies a lower bound of \(\Omega(n^{\log n})\) [3].
$$ h_{\mathrm{sep}}(M) = \max\{ \langle M, \rho \rangle \mid \rho \in \mathrm{sep} \} $$
We can also define the weak membership problem (WMEM): Given \(\rho \in \mathrm{sep}\) or \(\mathrm{dist}(\rho, \mathrm{sep}) > \epsilon\), determine which? Essentially, after you’ve determined every entry in the matrix \(\rho\), have you made something separable or something that is far from separable? Even though a price that’s exponential in the number of qubits to solve for every entry in the density matrix has been incurred, this problem may still be unsolvable.
Because this problem is a convex set, the two problems above are roughly equivalent in difficulty.
Monogamy of Entanglement
$$ \rho^{ABC} = \mathbb{C}^{n^3 \times n^3} $$
This density matrix composed of three subsystems can be analogous to a probability distribution that has marginal probabilities between subsections of the system, like \(AB\) or \(AC\). For example:
$$ \rho^{AB} = (\mathrm{id}^A \otimes \mathrm{id}^B \otimes \mathrm{Tr}^C)(\rho^{ABC}) $$
Facts
- When \(p=\infty\) and \(q=1\), it is the Grothendieck problem, of which MAXCUT is a special case [1].
References
- Estimating the Matrix \(p \rightarrow q\) Norm. https://math.mit.edu/~urschel/preprints/norm.pdf
- Unique games, the Lasserre hierarchy and monogamy of entanglement. Harrow, Aram. https://www.youtube.com/watch?v=cM7GuAzkFRU
- Testing product states, quantum Merlin-Arthur games and tensor optimisation. Aram W. Harrow, Ashley Montanaro. https://arxiv.org/abs/1001.0017
- Hypercontractivity, Sum-of-Squares Proofs, and their Applications. https://arxiv.org/pdf/1205.4484
- Quantum de Finetti Theorems under Local Measurements with Applications. https://arxiv.org/pdf/1210.6367