Tuesday, March 13, 2018

To What Degree Can We Use Degree?

Guest post by PhD student Ben Rapone.


The $\textbf{degree}$ of a function $f$ defined on a set $\Omega$ and its image $f\left(\Omega\right)$ is an extension of the winding number, which counts the number of times a closed curve travels counterclockwise around a given point. Intuitively, the degree counts the number of times $f$, in some sense, ''wraps'' $\Omega$ around a point in $f\left(\Omega\right)$. So what additional information does the degree give, and why would you care about it?

The degree was developed as a way to measure, or keep careful count of, the number of solutions to a system of nonlinear equations. By ''careful'' we mean consistent with respect to some types of perturbation of the system. Degree theory, as we shall see shortly, provides some very nice tools for verifying the existence of solutions to nonlinear systems of equations.

We will keep this conversation light, general and mostly geared towards imparting an intuition concerning what information the degree imparts, and in what ways an application oriented person not necessarily familiar with higher level mathematics could use the degree to their general advantage. For the curious mathematician wishing for a more in depth and general derivation and application there are many summaries, blogs, lecture notes, and theses (check this one out, for instance) awaiting you on the web (Google is always your friend). Here we will not provide any proofs, but will instead refer the reader to those already given in different venues (why recreate the wheel I ask?). Let's proceed, then.

To keep the discussion simple, in this post we will concern ourselves with degrees of continuous functions over bounded, "nice" manifolds. In particular we will narrow our focus at the moment to continuous functions over bounded regions in $\mathbb{R}^n$. With that in mind, let us define the following setting over which we will define the degree:
\begin{equation} \label{eq:Vars} \Omega\subset\mathbb{R}^n \text{ open and bounded } \end{equation} \begin{equation} \label{eq:cont} f:\bar{\Omega}\rightarrow \mathbb{R}^n \text{ continuous } \end{equation} \begin{equation} \label{eq:Jdef} y \in f \left(\bar{\Omega}\right) \setminus f\left(\partial\Omega\right) \text{ s.t. the Jacobian, $J_f(x)$, is defined $\forall x\in\Omega$ with $f(x)=y$} \end{equation}
In other words we will assume $y$ is in the range of $f$ over the closure of $\Omega$ but not in the image of $f$ over the boundary of $\Omega$, and if $x\in\Omega$ such that $f(x)=y$ then $J_f(x)$, the Jacobian of $f$ at $x$, is defined.

Let's look at two examples to illustrate the setting described in Equations \eqref{eq:Vars}, \eqref{eq:cont}, and \eqref{eq:Jdef}.

Figure 1 represents a continuous mapping from $\mathbb{R}$ to $\mathbb{R}$. In particular, its domain and range are specified as $\Omega \approx (-0.2,5.2)$ and $F(\Omega)\approx(-1.3,1.45)$ (these intervals are shaded in the corresponding colors on the $x$- and $y$-axis, respectively). The values where the degree is not defined are indicated by red x's and correspond to the image of the boundary of $\Omega$, $\{-1.3,1.45\}$, and points where the derivative doesn't exist, i.e., where $y = \pm 1$. All other points of $F(\Omega)$ are fair game including the value $y=0.5$, which we will refer back to for an illustration of how we calculate the degree.

Figure 2 represents a continuous mapping from $\mathbb{R}^2$ to $\mathbb{R}^2$, with the domain $\Omega$ the filled circle, and range $F(\Omega)$ the filled triangle. Here $F$ maps the boundary of the circle to the boundary of the triangle and creates a folding of $\Omega$ along the red and green curves in such a way as to make the Jacobian undefined at each point along them. This folding is highlighted using the embedded arrows to show the movement of space and illustrated by the transformation of the pink curve. It follows then that the degree is not defined along the boundary of the triangle and at any point along the red or green curve. Every other point is a viable option if we assume $F$ is smooth everywhere else.

So how do we calculate the degree? The next definition for the degree under our restrictions will answer this question.


\begin{equation}\label{ddef} \operatorname{deg}\left(f,\Omega,y\right)=\sum\limits_{x\in f^{-1}(y)}\operatorname{sign}\left(J_f(x)\right) \end{equation}
where $\operatorname{sign}\left(J_f(x)\right)$ denotes the sign of the Jacobian of $f$ at $x$, i.e., $$ \operatorname{sign}\left(J_f(x)\right)= \left\{ \begin{array}{ll} -1 & \mbox{if } J_f(x)< 0, \\ \ \ \ 0 & \mbox{if } J_f(x)= 0,~\mbox{ and } \\ \ \ \ 1 & \mbox{if } J_f(x)> 0. \\ \end{array} \right. $$

By restricting ourselves to the settings specified in Equations \eqref{eq:Vars}, \eqref{eq:cont}, and \eqref{eq:Jdef}, we ensure that $\operatorname{sign}\left(J_f(x)\right)$ exists for each $x\in f^{-1}(y)$. Hence, if we can guarantee that the sum $\sum_{x\in f^{-1}(y)}\operatorname{sign}\left(J_f(x)\right)$ converges, then we can guarantee that $\operatorname{deg}\left(f,\Omega,y\right)$ is defined. One way to provide this guarantee is to require the sum to be finite, i.e., require that only finitely many $x\in\Omega$ exist such that $f(x)=y$. In some sense this restriction limits the number of "foldings" that $F$ can do. Okay now, say we have this additional restriction and the degree is defined for the values we wish to check. How can this setting benefit us? Let's illustrate this with our simple example in Figure 1.

As we observed earlier, the degree is defined at $y=0.5$ and so we can compute $\operatorname{deg}\left(f,\Omega,y\right)$ using definition \eqref{ddef} to be $1+(-1)+1+(-1)+1=1$. So how does this conform with our intuition? We can think of $f$ as taking the set $\Omega$, stretching it out and laying it down on $\Omega$ the way you would lay a long sheet down on a short surface, folding back and forth so it fits on the surface. Any value lying between the image of the boundary points has the beginning of the sheet below it and the end above it, so $f$ lays over it at least one time, with potentially a bunch of additional folds that must come in pairs (something very reminiscent of the intermediate value theorem). This might seem trivial, and it is here, but when $\Omega$ and $f$ are not so nice the degree becomes an invaluable tool with the help of some well meaning theorems.

Theory for Applications

Alright, so what's the use of all this stuff about degree any way? For some of you, it might all seem like a waste of time because, by definition, in order to calculate the degree we must have knowledge of solutions to the very equation we are trying to verify solutions to. Despair not, however, for the following theorem will put your concerns to rest. This theorem is quoted directly from this book on Theorems of Leray-Schauder Type And Applications, where the details of a proof are also available.


Let $\Omega\subset\mathbb{R}^b$ be an open bounded subset and $f:\bar{\Omega}\rightarrow\mathbb{R}^b$ be a continuous mapping. If $p\not\in f\left(\partial\Omega\right)$, then there exists an integer $\operatorname{deg}\left(f, \Omega,p\right)$ satisfying the following properties: $ \quad \text{[i] (Normality) $\operatorname{deg}\left(I, \Omega,p\right)=1$ if and only if $p\in\Omega$, where $I$ denotes the identity mapping. } \\ \quad \text{[ii] (Solvability) If $\operatorname{deg}\left(f, \Omega,p\right)\not= 0$, then $f(x)=p$ has a solution in $\Omega$. } \\ \quad \text{[iii] (Homotopy) If $f_t(x):[0,1]\times\bar{\Omega}\rightarrow\mathbb{R}^n$ is continuous and $p\not\in \bigcup\limits_{t\in[0,1]}f_t\left(\partial\Omega\right)$, then} \\ \quad \quad \quad \quad \text{$\operatorname{deg}\left(f, \Omega,p\right)$ does not depend on $t\in[0,1]$. } \\ \quad \text{[iv] (Additivity) Suppose that $\Omega_1, \Omega_2$ are two disjoint open subsets of $\Omega$ and} \\ \quad \quad \quad \quad \text{$p\not\in f\left(\bar{\Omega}-\Omega_1\cup\Omega_2\right)$}. \\ \quad \quad \quad \quad \text{Then $\operatorname{deg}\left(f, \Omega,p\right)=\operatorname{deg}\left(f, \Omega_1,p\right)+\operatorname{deg}\left(f, \Omega_2,p\right)$. }\\ \quad \text{[v] $\operatorname{deg}\left(f, \Omega,p\right)$ is a constant on any connected component of $\mathbb{R}^n\setminus f(\partial\Omega)$. } $

One nice advantage this theorem has given us is seen in the homotopy invariance of the degree, property [iii]. As a consequence, we could equate the verification of solutions to one system with that of solutions to another, potentially much simpler, system. For instance, Frommer, Hoxha, and Lang (also see here) were able to develop a test involving interval arithmetic to prove existence of zeros of functions using interval arithmetic, which depends on the homotopy invariance property of the degree. Of course there is a wide range of theory concerning the computation of the degree outside of utilizing just the properties found in the theorem, which I invite you to explore. Here are some quick suggestions to take a look at:

  1. On the complexity of isolating real roots and computing with certainty the topological degree by Mourrain, Vrahatis, and Yakoubsohn;
  2. The calculation of the topological degree by quadrature by O'Neil and Thomas (also see here);
  3. and, of course, the book from which we took the theorem.
Additionally though, keep an eye out for my own collaborative research with Dr. Dvijotham and Dr. Krishnamoorthy (the author of this blog) where we take full advantage of the homotopy invariance property to construct optimization based techniques for calculating robustness of solutions to systems of quadratic equations. Check my own website to see more about myself, and new blog posts about my research.

Wednesday, November 15, 2017

Category Theory and Sheaves

Category Theory and Sheaves Guest post by PhD Student Matthew Broussard.


In this post, we plan to explore the basic language of category theory with an eye towards defining sheaves, mathematical constructs which formalize the transition between local and global data on a space. In future posts we will explore the theory and application of sheaves in more detail, but first we need to lay the groundwork for our later discussion.

There is quite a bit of background and vocabulary necessary to make sense of sheaves. We could argue that sheaves are mathematical entities with rich structure in themselves, and thus are of interest to abstract mathematicians. Still, generally if one puts in the effort to learn any mathematical concept technical enough to be called "theory," one wishes to get something from the work. It is disappointing to go to all the work to learn such a discipline only to find that it doesn't actually give you a way to approach new problems!

Sheaves give us powerful tools which make many generalizations possible. For example, homology and cohomology can both be expanded from their purely topological roots into the world of sheaves, and the more general framework allows the topological results to fall out as special cases (as one hopes from a generalization). Viewed over graphs, sheaves show the topological underpinnings of certain graph and network theoretic problems.

But what new avenues do these open? Sheaves give us better insight into the structure of spaces. For instance, Joel Friedman [2] showed that morphisms from graphs $G_i$ to a graph $G$ can be viewed as sheaves $S(G_i)$ over $G$, and that morphisms between $G_i$ and $G_j$ induce morphisms between their respective sheaves. However, there are sheaf morphisms between these induced sheaves which are not the result of graph morphisms. These extra morphisms capture aspects of the structure of $G$ which have not been well captured in graph theory alone. This refinement of structural detection allows sheaf theory to address questions about graphs which have been intractable to normal graph theoretic approaches.

Category Theory

When one studies various structures in mathematics, one often encounters similar patterns in different constructs. For instance, cycles in topological spaces behave in some ways like abelian groups. As another example, there are some properties of a structure drawn from the base set on which the structure is imposed. Category theory is a language which makes such correspondences more rigorous, as well as the tools to turn these correspondences into a mathematical structure of their own.

We will explore an introduction to category theory with a focus in topology, both due to the personal interest of the author and because topology is the historical origin of categories. We will focus particularly on the categories of presheaves and sheaves. These categories are of particular interest to topological data processing and analysis, as we will explore more deeply in a future post.


First, though, we must understand the language of category theory. What exactly is a category? (We mostly follow the construction found in Elements of Algebraic Topology by Munkres, though with fewer, but more detailed examples).

We define a category to consist of three things:

  1. A class of objects
  2. For each ordered pair of objects $(X,Y)$, a set $\mathrm{hom}(X,Y)$ of morphisms
  3. A composition function on the morphisms such that $(f,g)=g \circ f: \mathrm{hom}(X,Y) \times \mathrm{hom}(Y,Z) \rightarrow \mathrm{hom}(X,Z)$ for all objects $X, Y, Z$ where $g \circ f$ is associative and has identities — that is
    • If $f \in \mathrm{hom}(W,X), h \in \mathrm{hom}(X,Y)$, and $g\in \mathrm{hom}(Y,Z)$, then $h \circ (g \circ f)=(h\circ g)\circ f$
    • There exists $1_{X} \in \mathrm{hom}(X,X)$ such that $1_{X}\circ f=f$ and $g\circ 1_{X}=g$

There are several categories which we use in the study of algebraic topology. Most ubiquitous are the category of topological spaces with continuous maps and standard composition and the category of abelian groups under homomorphism (with standard composition, which we will take as implied henceforth unless a particular composition rule is stated). The category of chain complexes and chain maps is also a fairly common sight, though there are others — later, for instance, we will discuss the category of topological spaces with restriction maps as morphisms, as well as the category of finite semimodules with quotient maps.

Usually the objects in a category are the things we study in a particular branch of mathematics (topological spaces, groups, rings, manifolds, etc.) and the morphisms are maps between members of our chosen object which are sufficient to preserve some aspect or aspects of our object (for instance, if we only wish to discuss topological invariants, we could consider the category of topological spaces under homeomorphism. If we cared about properties preserved by continuous maps, we would instead equip our category with morphisms of continuous maps).

Thus far, category theory hasn't given us anything new. It has only provided a slightly different way to talk about maps between structural elements. The theory's utility arises from functors, a type of map between categories.


A functor is a function $G:C \rightarrow D$ where $C$ and $D$ are categories such that

  1. for each object $X$ of $C, G(C)$ is an object of $D$;
  2. for each morphism $F:X\rightarrow Y$ of $C, G(f):G(X)\rightarrow G(Y)$ is a morphism of $D$;
  3. $G(1_{X})=1_{G(X)}$ for all $X$; and
  4. $G(g\circ f)=G(g)\circ G(f)$ for all $g,f$.
(It is of interest to note that the categories with functor morphisms is an admissible category, though we will not use this fact.)

There are several basic functors, with the identity functor and the forgetful functor perhaps foremost among them. The identity functor maps from a category to itself and, as with most identity maps, takes objects and morphisms back to themselves. The forgetful functor, a map that takes a structured space to its underlying set and its morphisms to their underlying set maps.

However, there are also some functors which we use regularly in algebraic topology without realizing their functorial nature. Perhaps the most common one we use is homology. $H_n:Top \rightarrow Ab$ is a functor from the category of topological spaces to the category of abelian groups by assigning to each topological space (that is, the objects of $Top$) its $nth$ homology group (an object of the category of abelian groups). Then given a continuous map $f$, the following diagram commutes:

$\begin{CD} H_n(X) @>{\rm Pushforward}>> H_n(Y)\\ @AAA @AAA\\ X@>>{f}> Y \end{CD}$

The morphism $ f_*$ between $H_n(X)$ and $H_n(Y)$ is known as the pushforward. Its construction is discussed in detail in [4], but the relevant results for our purposes are that (i) the identity map induces the identity homomorphism, and (ii) for $f:K\rightarrow L$ and $g:L\rightarrow M$, we have $(g\circ f)_*=g_*\circ f_*$.

Clearly the first requirement holds: every space has an $nth$ homology group. Likewise, we noted that each map induces a homomorphism, so if we say $H_n$ takes $f$ to $f_*$ the second requirement is filled. The third and fourth requirements follow from the results we noted about induced homomorphisms.

Whether explicitly or implicitly stated, the functoriality of the homology construction is an integral part of the proof that homology is a topological invariant.

Likewise, there is a functor from the category of simplicial complexes and simplicial maps to that of chain complexes and chain maps which assigns $K\rightarrow \mathscr{C} (K)$ and $f\rightarrow f_\#$. Again, every simplicial complex has a chain complex associated with it, and every simplicial map $f$ induces a chain map $f_\#$, so the first two conditions are upheld. The latter two arise from the verification that the identity simplicial map $\iota$ induces the identity chain map and that $(g\circ f)_\#=g_\#\circ f_\#$, both of which follow from the definition of $f_\#$.

Other functors appear in topological studies. Homotopy relies on the fundamental group, for instance, which also exhibits functorial properties. While the fact that these are functors is an important and useful point, these results are generally used independently of category theory. Our interest, however, will lie with a form of functor that is rarely discussed outside of the context of category theory: sheaves.

What is a Sheaf?

N.B.: The information we present here on sheaf theory is largely drawn from [1], with some clarifications from [4].

A sheaf is a means to keep track of data over a space. Data sources are required to agree on comparable data when they overlap in space, and local data is required to be sufficient to recover global data.

Formally, a presheaf is a functor $F:\rm{Open}^{op}(X) \rightarrow \rm{Alg}$ from the category of open sets on the topological space $X$ with restriction morphisms to an algebraic category. A sheaf is a presheaf with additional structure:

  1. Given an open cover $\{U_i\}$ of an open set $U$, if $s,t \in F(U)$ with $s|_{U_i}=t|_{U_i}$ for each $i$, then $s=t$.
  2. Given an open cover $\{U_i\}$ of an open set $U$, with an $s_i\in F(U_i)$ for each $i$ such that $s_i |_{U_i\cap U_j}=s_j|_{U_i\cap U_j}$ for all $i,j$, then there is an element $s\in F(U)$ such that $s|_{U_i}=s_i$ for all $i$.

The first condition (called the locality condition) requires that two pieces of global data that look the same locally are in fact the same globally. The second (called the gluing condition) requires that data which agrees can be glued together into a global structure. We think of $s$ and $t$ (called sections) as particular choices of data with the assigned algebraic structures as all possible data.
(Note that locality demands that $F(\varnothing)=0$, since $\cup_{i\in \varnothing} U_i$ is an open cover of $\varnothing$, and any two distinct sections agree on all elements of the cover yet are not equal. This requirement will generally be implied.)

Example 1

Let's look at a specific example of a sheaf built on a space $X$ composed of two disjoint open sets $A$ and $B$.

We want to build a sheaf that works as much like a constant function as we can. Let's define a functor $F:Top(X)\rightarrow Grp$ from the category of open sets of $X$ with restriction maps to the category of groups with group homomorphisms by $F(U)=G$ for each non-empty open set $U$ for a fixed group $G$ and $F(g)=\iota$ for all restriction maps except those to $\varnothing$, which all map to $0$. Is this a sheaf?

That this is a presheaf (that is, that $F$ is a functor) is easy to show. For locality, suppose there is an open set $U$ with open cover $\{U_i\}$. Given two sections $s,t \in G$ with $s\neq t$, since restriction maps induce the identity homomorphism $s=s|_{U_i}$ and $t=t|_{U_i}$. Thus $s|_{U_i}\neq t|_{U_i}$ for some $i$ (indeed for all $i$), so by the contrapositive if $s,t \in F(U)$ with $s|_{U_i}=t|_{U_i}$ for each $i$, then $s=t$, so locality holds.

Finally, the gluing condition. Take $G=\{0,1\}$ and the open cover $A, B$. Choose the local sections $s_1=F(A)=0$ and $s_2=F(B)=1$. Since they don't intersect, $s_1|_{A\cap B}=0=s_2|_{A\cap B}$, yet there is no value in $s\in F(X)=G$ with $s_{A}=s_1$ and $s_{B}=s_2$, so there is no global section which works.

The closest we can get to what we would think of intuitively as a constant sheaf on this space is to assign $F(U)=G\times G$ if $U$ intersects both $A$ and $B$, $F(U)=G\times \varnothing$ if $U$ only intersects $A$, and $F(U)=\varnothing \times G$ if $U$ only intersects $B$, rendering the following diagram:

$\begin{CD} G @<{\pi_1}<< G\times G @>{\pi_2}>> G\\ @AA{F}A @AA{F}A @AA{F}A\\ V_2@<<{\rm Containment}< V_1 @>>{\rm Containment}>V_3 \end{CD}$

Here, $V_1$ is the set of open subsets of $X$ which intersect $A$ and $B$, $V_2$ those that intersect only $A$, and $V_3$ those that intersect only $B$. Containments within one of these classes induces the identity map, and all containments of the null set induce the zero map, though we don't include these details in the diagram.

Note that again this is a presheaf and locality holds, but now the gluing condition is fulfilled, and hence it is a sheaf. We call this the constant sheaf on this space.
(See [1] section 3.1 for more information on when constant sheaves are possible and when locally constant sheaves are required)

Example 2

In the previous example, we explored how locally consistent data can fail to produce a global section, where the data that we allow for the entire space is too restrictive to capture the variety of local data. Now we will explore the opposite problem: what happens when the global data isn't restrictive enough?

Consider the discrete two point space $\{a, b\}$ where we assign $\mathbb{R}^3$ to $\{a,b\}$ and $\mathbb{R}$ to both $\{a\}$ and $\{b\}$, with the restriction map from $\{a,b\}$ to $\{a\}$ given by the projection $\pi_1$ onto its first coordinate and the restriction from $\{a,b\}$ to $\{b\}$ given by the projection $\pi_2$ onto its second coordinate, as shown in the diagram below.

$\begin{CD} \mathbb{R} @<{\pi_1}<< \mathbb{R}^3 @>{\pi_2}>> \mathbb{R}\\ @AA{F}A @AA{F}A @AA{F}A\\ \{a\}@<<{\rm Containment}< \{a,b\} @>>{\rm Containment}>\{b\} \end{CD}$

Clearly any compatible local sections one chooses can be glued together as a global section. Indeed, given $s_1$ the local section over $\{a\}$ and $s_2$ over $\{b\}$, choosing any $(s_1, s_2, s_3)$ will be a global section consistent with both local sections. However, that very ease of choice makes this construction fail to be a sheaf. Two different global sections (any two distinct $s_3$ and $s_3'$ will do) agree on all local sections of the space, a violation on the condition of locality. In order to create a sheaf from this structure, we must either reduce the vector space attached to $\{a,b\}$ from $\mathbb{R}^3$ to $\mathbb{R}^2$ or we must introduce a third point in the space onto which we project the third coordinate from the full space, as illustrated below.

(See [1] section 2.1 for the development of where the original scenario might arise)


Thus far we have discussed the basic terms required to understand sheaf theory. However, the intuition behind the construction still isn't clear. We claimed at the beginning that sheaves were a method of, among other things, keeping track of data. One might ask how the construction we've created corresponds with data in the normal sense.

In the next post, we will show how we might understand the way sheaves track data in the context of pictures. Though sheaves are not usually applied in the case of image reconstruction (since it is easy enough to keep track of the involved information without going to the trouble of constructing a sheaf!), it will still give us an intuitive idea of what, exactly, a sheaf does, and what restriction morphisms, the gluing condition, and locality actually mean in a more concrete sense.

Works Cited

[1] Curry, Justin M., "Sheaves, Cosheaves and Applications," University of Pennsylvania, 2014. arXiv
[2] Friedman, Joel, "Sheaves on Graphs, Their Homological Invariants, and a Proof of the Hanna Neumann Conjecture" University of British Columbia, 2011. arXiv
[3] Munkres, James R. "Elements of Algebraic Topology," Perseus Publishing, 1984
[4] R. Ghrist, Elementary Applied Topology, ed. 1.0, Createspace, 2014

Tuesday, October 31, 2017

De Rham Cohomology

Guest post by Phd student  Enrique Alvarado

In the following, we will take a look at the motivation for considering \(closed\) and \(exact\) forms on manifolds. This will lead us to look for the closed forms which are \(\it{not}\) exact -- which to put crudely, is what de Rham cohomology studies.

Let's first take an intuitive look at what differential forms are.


\(\color{purple}{\mathbf{Definition.}}\) A differential \(k\)-form on \(\mathbb{R}^3\) is a differentiable mapping, \(\varphi : \mathbb{R}^3 \to \Lambda^k\), that takes a point in 3-space to a \(k\)-covector.

So, what are \(k\)-covectors?

\(\color{purple}{\mathbf{Definition.}}\) A \(k\)-\(\it{covector}\) is a funciton, \(\lambda : \Lambda_k \to \mathbb{R}\), that takes objects called \(k\)-\(\it{vectors}\) to real numbers.

In other words, \(\Lambda^k\) is the dual space of \(\Lambda_k\).

Now, to understand the vector space of \(k\)-vectors, denoted \(\Lambda_k\), let's take a little trip into Intuitionland by considering the cases for \(k = 0, 1, 2\),  and \(3\).

A \(0\)-vector in \(\mathbb{R}^3\) can be thought to be a real number, a \(1\)-vector in \(\mathbb{R}^3\) can be thought to be a vector in \(\mathbb{R}^3\), and a \(2\)-vector in \(\mathbb{R}^3\) can be pictured as the wedge of two linearly independent vectors, as shown below.

Similarly, a \(3\)-vector in \(\mathbb{R}^3\) can be pictured as a wedge of three linearly independent vectors as shown below.

Now, although there is no geometric difference between \(k\)-vectors and \(k\)-covectors, there is an algebraic one. This reason can be intuitively explained by considering the difference between a \(1\)-vector and a \(1\)-covector. Notice that we are just saying that we are considering the difference between a vector, and a covector in 3-space.

If we think of 1-vectors as column vectors, \(\left(\begin{array}{}x_1\\ y_1\\ z_1\\ \end{array}{} \right),\) we can then think about 1-covectors as \(\it{row}\) vectors \(\left(x_2, y_2, z_2\right)\) since we can then operate on the column vectors to get a real number as follows.

\(\begin{array}{}\left(x_2, y_2, z_2\right) \end{array}{}
\left(\begin{array}{} x_1\\ y_1\\ z_1\\ \end{array}{}\right)  = x_1x_2 + y_1y_3 + z_1z_3\).

So what a differential \(k\)-form \(\varphi\) does is that to every point \(p\) in \(\mathbb{R}^3\), we have an associated \(k\)-covector. The figure below is a mapping \(p \mapsto \varphi \in \Lambda^3\).

Another way of defining a differential \(k\)-form on \(\mathbb{R}^3\), is by saying that it is a \(k\)-covector field on \(\mathbb{R}^3\). We will denote the space of all \(k\)-forms on a manifold \(M\) as \(\mathbf{C}^k(M)\).

As we have seen, \(0\)-forms can be identified to be scalar functions. In \(\mathbb{R}^3\), 1-forms can be identified with vector fields, 2-forms can also be identified with vector fields via the right-hand rule, and 3-forms can be identified with scalar functions via a similar rule. There is a generalization of the gradient operator that is applied to forms.

d: \mathbf{C}^k(M) \to \mathbf{C}^{k+1}(M)

Keeping in mind the ways we can identify 0-forms, 1-forms, and 2-forms, \(\omega \mapsto d\omega\) is then identifiable to:

(1) The gradient operator \(\omega \mapsto \nabla \omega\) when \(\omega\) is a 0-form.

(2) The curl operation \(\omega \mapsto \nabla \times \omega\) when \(\omega\) is a 1-form.

(3) The divergence operation \(\omega \mapsto \nabla \cdot \omega\) when \(\omega\) is a 2-form.


Now, differential forms may be used to give us global information about manifolds, rather than local. For example, let's consider the manifold \(M := \mathbb{R}^2 - B\), where \(B\) is some open ball centered about the origin. If we take any point in \(M\), we can find a sufficiently small open ball that looks identical to some open ball in \(\mathbb{R}^2\). Therefore, all local properties of \(M\) are the same as those in \(\mathbb{R}^2\). But the fact that the origin is missing is a global property.

Certain differential forms are interesting for the purpose of detecting these types of global properties. The interesting ones have their exterior derivative zero. Such differential forms are called closed. That is, a differential form \(\varphi\) is \(\it{closed}\) if \(d\varphi = 0\).

So why are closed forms interesting when trying to investigate global properties? 

Let \(\omega\) be a closed \(k\)-form, and let's integrate it over a closed smooth \(k\)-chain \(C\) (a chain \(C\) is closed if it has no boundary) in a manifold \(M\) that is at least \(k\)-dimensional. If \(S\) is the boundary of an orientable, compact, smooth submanifold \(S\) (i.e \(\partial S = C\)) of \(M\), then Stokes' Theorem states

\int_C \omega &= \int_S d\omega \\
&= \int_S 0 \\
&= 0.

Therefore, if we have a closed \(k\)-form \(\omega\) on a submanifold \(C\) of \(M\) for which

\int_C\omega &\neq 0,

we then know that \(C\) must \(\it{not}\) be \(\it{the}\) boundary of any oriented, compact, smooth submanifold of \(M\)! The fact that there exists such a submanifold \(C\) tells us about the global information of the manifold \(M\).

If we want to be able to detect these global properties, we have to find reasonable forms to integrate over \(S\). There might be forms which always integrate to 0, no matter what \(S\) is!

Such forms are called \(\it{exact}\).

\(\color{purple}{\mathbf{Definition}}\) A \(k\)-form, \(\omega\) is \(\it{exact}\) if there exists a \((k-1)\)-form such that \(\omega = d\varphi\).

Note that \(d:\mathbf{C}^{k}\to \mathbf{C}^{k+1}\) is an operator that takes k-forms and gives us (k+1)-forms, so the above definition makes sense.

Integrating exact forms over closed chains will always evaluate to 0. Let's prove this result for exact 1-forms and 1-chains.

Let's consider a 1-form \(\omega\) and a 0-form \(\varphi\) for which \(\omega = d\varphi\), and let \(C\) be any closed 1-chain. If we pick any two points \(p, q \in C\), we may then say that \(C = A + B\) where \(A\) is the curve that goes from \(p\) to \(q\), and \(B\) is the curve that goes from \(q\) to \(p\).

 We can now compute the following,

\int_C\omega &= \int_{A + B}\omega\\
&= \int_{B}\omega + \int_{A}\omega\\
&= \int_{B}d\varphi + \int_{A}d\varphi\\
&= \int_{p - q}d\varphi + \int_{q - p} d\varphi\\
&= 0.

So yes, integrating an exact 1-from over a closed 1-chain always gives us zero, and this result holds in general as well. You may however say that the only reason that we were able to find global properties of a manifold was by applying Stokes' Theorem to closed forms. So this would only be bad if all exact forms are closed. This is in fact true!

\(\color{red}{\mathbf{Theorem.}}\) If \(\omega \in \mathbf{C}^k\) is exact, then it is also closed. That is, for any differential form \(\varphi\), \(d\circ d\varphi = 0\).

What this means is that we cannot just integrate any closed form. We must choose closed forms which are not exact. Do there exists such closed forms? What does exactness depend on?

To investigate these questions a little further, let's take a look at a 1-form on the punctured plane, and then a 1-form on the half plane. 

\(\color{blue}{\mathbf{Example.}}\) Let \(M := \mathbb{R}^2 - \{0\}\), and consider the 1-form on \(M\)

\omega &= \frac{xdy - ydx}{x^2 + y^2}.

Let \(\gamma : [0, 2\pi] \to M\) be the curve defined by \(\gamma (t) = (\cos{t}, \sin{t})\), whose trace is the unit circle. By substituting \(x = \cos{t}\) and \(y = \sin{t}\) everywhere in the formula for \(\omega\), we get that

\int_\gamma\omega &= \int_{[0,2\pi]}\frac{\cos{t}(\cos{t}\ dt) - \sin{t}(-\sin{t}\ dt)}{\sin^2{t} + \cos^2{t}} \\
&= \int_0^{2\pi}dt \\
&= 2\pi.

This implies that \(\omega\) is not exact; because if it were, then integrating it over any closed curve would give us \(0\).

However, \(\omega\) \(\it{is}\) exact on some smaller domains such as the right half-plane \(H := \{(x, y) \in \mathbb{R}^2 : x > 0\}\). In the right half-plane, we get that \(\omega = d(\tan^{-1}({y/x}))\). In polar coordinates, we would get that \(\omega = d\theta\).

This in fact is true in general, as the following theorem describes.

\(\color{red}{\mathbf{Theorem.}}\) Let \(M\) be a smooth manifold with or without boundary. Each point of \(M\) has a neighborhood on which every closed form is exact.

What this tells us is that a form being exact is not a local property. So if our objective is to investigate global properties of manifolds, we must then find out which closed forms are \(\it{not}\) exact. This is precisely what de Rham cohomology studies!

The way we do this is by constructing the following equivalence relation among closed \(k\)-forms.

Two closed \(k\)-forms \(\varphi\) and \(\omega\) are \(\it{equivalent}\) if their difference, \(\omega - \varphi\) is an exact form. That is, if

\omega - \varphi &= d\phi

for some \(\phi \in C^{k-1}(M)\).

This will partition our space of closed \(k\)-forms into equivalence classes. So instead of talking about a specific form \(\omega\), we will consider the equivalence class

\([\omega] := \{\varphi \in \mathbf{C}^k : \omega - \varphi = d\phi\) for some \(\phi \in \mathbf{C}^{k-1}\}\).

We can then say that \([\omega] = [\varphi]\) for such forms \(\varphi\).

Notice that constructing this equivalence relation is exactly what we get when we define the following quotient group.

Let's first define a couple of subspaces of \(\mathbf{C}^k(M)\) when \(M\) is a smooth manifold with or without boundary.

\mathcal{Z}^k(M) &:= \{{\rm closed } \ k{\rm -forms \ on } \ M\},\\
\mathcal{B}^k(M) &:= \{{\rm exact } \ k{\rm -forms \ on } \ M\}.

Because \(d: \mathbf{C}^k(M) \to \mathbf{C}^{k+1}(M)\) is linear, its kernel and image are linear subspaces. Together with the fact that every exact form is closed, we may define the de Rham cohomology group in degree \(k\) as the following.

\(\color{purple}{\mathbf{Definition.}\ (de\ Rham\ cohomology\ group)}\) We define the \(pth\) de Rham group of \(M\) to be the quotient vector space

H^k_{dR}(M) &= \mathcal{Z}^k(M)/\mathcal{B}^k(M).

Notice that the diference between an exact form and the form which always returns the number \(0\) is an exact form. Thus, if \(\omega \in C^k\) is exact, then \(\omega\) is equivalent to zero in \(H_{dR}^k(M)\).

To get our hands a little dirty, let's try to reason out what we can get for \(H^0(M)\) when \(M\) is a smooth manifold. So let's begin by asking, when is a \(0\)-form closed? Recall that \(0\)-forms on \(M\) are just functions \(\varphi: M \to \mathbb{R}\) which assign to every point in \(M\) a real number.

Thus, in local coordinates,

df &= \frac{\partial \varphi}{\partial x_1}dx_1 + ... + \frac{\partial\varphi}{\partial x_n}dx_n.

Hence, a \(0\)-form \(\varphi\) is closed if and only if its first partial derivatives vanish. That is, if it is locally constant. The only way for a locally constant function \(M\) to not be constant on \(M\) is for \(M\) to have multiple connected components, say, \(M_1, M_2, ..., M_m\). The most general such functions are \(\varphi_i\) which take on the constant values \(c_i\) on \(M_i\) and \(0\) elsewhere, for each \(1\leq i \leq m\).

Now, if we are trying to find exact \(0\)-forms, we must be able to have \((-1)\)-forms, which do not exist! Therefore we have \(\mathcal{B}^0(M) = \{0\}\), the trivial group.


H^0_{dR}(M) &= \mathcal{Z}^0(M)/\mathcal{B}^0(M)\\
&= \mathcal{Z}^0(M)/\{0\}\\
&\simeq \mathcal{Z}^0.

This implies that \(H^0_{dR}(M)\) is isomorphic (i.e., algebraically the same) to the space of locally constant functions on \(M\). This space has dimension equal to the number of components \(M\) has; in our case, dimension \(m\).

Then, how do we know if a closed \(k\)-form is exact if it depends on the underlying space? We do have the following wonderful theorem!

\(\color{red}{\mathbf{Theorem.}}\) Let \(S\) be a \(k\)-dimensional manifold, \(M\) a smooth manifold, and let \(\omega\) be a differential \(k\)-form on \(M\).  If

\int_S\phi^\ast\omega = 0

for every map \(\phi: S\to M\), then \(\omega\) is exact.

Let's investigate what this theorem is saying by first looking at a slight variation of its statement for \(k = 1\). Let \(\varphi\) be a 1-form on \(M\). If \(\int_{\gamma}\varphi = 0\) for all closed curves \(\gamma\), then \(\varphi\) is exact.

If we consider some \(k\)-dimensional smooth manifold \(S\) and a smooth manifold \(M\), what the theorem is saying is that \(\omega\) is exact only if its \(\it{pullback}\) by all  maps \(\phi:S\to M\) integrate to zero.

The pullback by \(\phi\) is a \(k\)-form \(\phi^\ast \omega\) on \(S\). The intuition for considering different \(\phi\)'s is that they move \(S\) around in \(M\), and considering \(\phi^\ast\omega\), let's us look at the form \(\omega\) on these different images of \(S\) in \(M\). This is very much like considering all different k-dimensional manifolds \(S\) in \(M\), and looking at what \(\omega\) integrates to over all these different manifolds.

Wednesday, January 6, 2016

Dissemination of Math in the internet age

I'm at the Joint Mathematics Meetings in Seattle. One of the first talks (8:00 AM!) of the first day was given by Prof. Tim Gowers on How might Mathematics be better disseminated (slight change in the title from what was originally published). Prof. Gowers highlighted several themes on better ways of doing mathematical research as well as better ways of disseminating the same, which he has been working on, and popularizing on the web, in recent years. Here are some of my own interpretations of what he talked about.

In the current setting of mathematical research, most emphasis is on being the "first to prove the theorem", and the basic unit of discourse is the peer-reviewed journal article. There are more than a few things wrong with this set up, the main one being that the wheel is reinvented repeatedly! Here is a type of result which could be very useful. If Lemma A is true, then BIG RESULT B is true, which would be fantastic news!. But, I have a counterexample for Lemma A :-(. Unfortunately, such a negative result cannot be "published" in a peer-reviewed journal article. Hence, others fumble around and reinvent the same result!

Another major drawback of the current system is that mathematical conversation happens at an inordinately slow pace. Someone proves a theorem, which appears in the journal in two years time. Then someone else reads it, comes up with a modification or a simpler proof, which appears in another journal two more years later! But in the current internet age, it's only natural to expect conversations occurring at a much faster pace (live tweets, any one?).

While electronic publication has made all research easily searchable, the search capacity is strong only in one direction, so to speak. If you know what you're looking for in terms of keywords, then it's easy to find it by search, e.g., you want to know what Szemer├ędi's theorem states, a quick search pulls up multiple relevant web pages. What would be very useful is a way to search "in reverse" using some limited keywords or partial statements (and not the name itself!) to see if such a result is already known. In other words, the community needs a mathematics research database that allows semantic search. Today, forums such as the mathoverflow often gets us accurate answers to questions of the form "has this been done before?".

In current times, mathematics research should use the internet - both for conducting it as well as to disseminate it. But someone having a high rating on the Math StackExchange for posting numerous answers, or who writes popular blog articles on otherwise difficult to understand math papers, is not rewarded for these activities in the current system. To get tenure, you better have the required number of papers in the top(-enough!) journals! One could have a potentially huge impact by writing easy-to-understand expository blog posts on otherwise hard-to-read set of mathematical papers written by other authors. These blog posts could in turn spur new contributions from others, who would not have had the inclination otherwise to digest the original papers. As such, this effort could potentially be worth much more than publishing a paper with a new theorem! But then again, the current system has no means of rewarding such expository efforts.

In follow-up conversation, Prof. Gowers agreed that senior/reputed mathematicians such as himself could afford to spend more time and effort on such endeavors on the internet without worrying about the rewards or evaluations. For tenure-track faculty or other junior researchers, the best course might well be to do both - publish via the conventional means, but also spend some effort on internet-based activities. Further down the line, we as a community would want to be able to judge how "impactful" a series of blog posts have been, so as to reward the same. But we must be careful not to go down the same path as overusing journal impact factors (so, don't judge the impact of a blog post by the number of times it has been re-tweeted, or +1-ed!).

In the latter part of the talk, Prof. Gowers highlighted four of his personal efforts in this regard - a reform of the journal system (Discrete Analysis), informal mathematical writing (via his blog), polymath projects, and automating proof discovery. He also presented his ideas of what mathematics research would be like in 20-30 years from now, and then in 50-60 years from now. Not surprisingly, computers would be expected to do most of the heavy (and light:-) lifting in the future.

A question from the audience inquired about the place of real, i.e., face-to-face, conversations on mathematics that happen at conferences. Would there be less of a place for them in mathematics research in the future? While agreeing that such conversations have their place, Prof. Gowers observed that may be the back-and-forth postings on a polymath project (with time-stamps!) allows the posters the time to understand the subject better, and think through before posting what they want to post. An instant face-to-face conversation would not give that luxury!

In summary, much need to be changed for mathematics research to be done right and be impactful in the internet age. Individual researchers need to be a bit bold, and not worry too much about rewards and ratings when spending time and effort on the internet. The more people who do so, the sooner the inevitable transition would happen!

Wednesday, November 4, 2015

Discrete optimization @ Oaxaca - II

Some snippets from days 2 and 3 at the BIRS-CMO Workshop on Discrete Optimization in Oaxaca.

Thomas Rothvoss presented his work on constructive discrepancy minimization for convex sets (slides are available). The basic problem is that of assigning one of two colors \(\chi(i)\) to each \(i \in [n]:=\{1,\dots,n\}\) of \(n\) items represented by \(\{-1,+1\}\) such that for a system of sets \(\cal{S} = \{S_1,\dots,S_m\}\) with \(S_i \in [n]\) we minimize the maximum "mismatch", defined as the discrepancy:

\( \rm{disc}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \max\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right| \).

I found the techniques developed fairly deep, and would find use in lots of applications (i.e., for proving results on other optimization problems; Thomas talked about an application to bin packing). We had previously looked at the somewhat related problem of number partitioning. There, we assign a set of integers \(\{a_1, \dots, a_n\}\) to two sets such that the sums of the numbers over the two sets are as close to each other as possible (the difference between these two sums is the discrepancy here). At the same time, the corresponding alternative definition of discrepancy given as
\( \rm{disc'}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \sum\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right|\)

would make the problem sort of "easy" here. With that objective, one could prove that a random assignment of \(\pm 1\) would perhaps do as well as we can. Nonetheless, an appropriately defined notion of "weighted" discrepancy, where each element now has, say, nonnegative weights, would be interesting to consider. It appears a generalization to more than two colors would be interesting, but perhaps tricky to establish the building blocks of results.

Tamon Stephen talked about a variant of the Hirsch conjecture using circuit diameter of polyhedra, instead of the default graph diameter. See the preprint for an illustration of circuit distance between vertices of a polyhedron - unlike the graph diameter, it's not symmetric. The idea is that one is allowed to take "shortcuts" along the interior of the polyhedron along with the walks along the edges. In that sense, it's mixing the ideas of the default simplex method and the interior point method for solving linear programs (LPs). The authors show that the most basic counterexample to the Hirsch conjecture, the Klee-Walkup polyhedron, in fact satisfies the Hirsch bound. It would be interesting for software programs that solve LPs to be able to seamlessly and intelligently switch back and forth between interior point and simplex methods (a basic ability to do so is already provided by some of the state-of-the-art solvers)

Juan Pablo Vielma talked about when are Minkowski sums good/bad in the context of formulations for unions of polyhedra, and for unions of convex sets in general (the slides should be up soon here; but other versions are already available). For unions of polyhedra, aggregated formulations are often "short", but are not as tight as disaggregated ones (also termed extended formulations). The latter formulations are sharp/ideal, but are often too large in size (see the excellent review on MIP formulation techniques by JP Vielma, or lectures 5-7 from my IP class for a shorter overview). This interesting line of work tries to find a better middle ground by finding the sharpest formulations that are not extended, i.e., without having to add extra variables. Things get very interesting when one considers unions of convex sets (in place of polyhedra)!

Monday, November 2, 2015

Discrete optimization @ Oaxaca - I

I'm at the BIRS-CMO Workshop on Discrete Optimization in beautiful Oaxaca (in Mexico). Unlike other typical workshops, the organizers have tried hard to encourage lots of discussion and interactions among the participants - big props to Jesus De Loera and Jon Lee! I'm hoping to write snippets on talks/discussion that I found particularly interesting (yes, it'll be a biased view :-).

The meeting started with an apt talk by Dan Bienstock on LP formulations for polynomial optimization problems (based on this paper; Dan has also posted the slides). He started with a motivating problem - the optimal power flow (OPF) problem, which motivates the use of the treewidth of the intersection graph of the constraints as the parameter which controls the complexity of the proposed reformulation operator. And real-life power grids often have small treewidths. The reformulation operator produces linear programming approximations that attain provable bounds for mixed integer polynomial problems where the variables are either binary, or require \(0 \leq x_j \leq 1\).

Informally, the intersection graph of a system of constraints has one vertex for each variable \(x_j\), and edge \((x_i,x_j)\) is present when both \(x_i\) and \(x_j\) appear in some constraint. The simple example of a subset sum (or knapsack) problem was insightful. With the single constraint being \(x_1+\dots+x_n \leq \beta\), a constraint graph could have \(n+1\) vertices \({0,1,\dots,n}\), with the \(n\) edges \((0,j)\) for each \(j\) corresponding to \(x_j\) (node \(0\) is a "dummy" node here). The treewidth of this "star" graph is \(1\). The other key trick employed is the use of binary variables to approximate a continuous variable \(0 \leq x \leq 1\) (attributed originally to Glover). For a given error term \(0 < \gamma < 1\), we can approximate \(x \approx \sum_{j=1}^L \left(1/2^h\right) y_h\), where \(y_h\) are binary variables. With \(L = \lceil \log_2 (1/\gamma) \rceil \), we can get \(x \leq \sum_{j=1}^L \left(1/2^h\right) y_h \leq x + \gamma\). This step helps to get pure binary problems in place of the mixed integer problems. The treewidth gets blown up by \(L\), but things still work out nicely. This paper seems to have lots of nice and deep "tricks" (I hope to study it in detail).

There were several other interesting talks, and a very interactive problem session to conclude the day. Oh, and I learned a new terminology used in the power industry from Shabbir Ahmed's talk: a prosumer is someone who both produces and consumes power. I'm wondering why that is more apt than a conducer...

There was some lively discussion over dinner about how the optimization and operations research communities have failed to sell itself as well as the CS community (as a whole, or even the CS theory community by itself). Large membership sizes of ACM vs INFORMS and similarly large NSF budgets for CS vs OR were cited as indicators. There was also an anecdote mentioned about how back in the 1980s when NSF funding for algorithms/CS theory was on the decline, a group of several top big names from that field submitted a memo/petition to the lawmakers in DC, and also convinced them in person that "algorithms/theory is as fundamental as cosmology" (needless to say, I'm paraphrasing to a huge extent here!). And yes, they managed to restore the funding flow. The optimization community tried a bit of the same trick with Karmarkar's interior point algorithm for LP. May be we optimizers should try harder - not just from the point of view of securing funding, but also from the point of view of our students getting better industry jobs!

Saturday, September 12, 2015

Math of Data Science @ ICERM

Note: This is a re-post of the blog post I wrote originally on WordPress last month. It appears that Blogger is a much better platform for the kind of posts I'd like to make. Yes, I'm still learning ...

I recently attended the topical workshop on Mathematics in Data Sciences at ICERM. The attendance was good mix of students, postdocs, and researchers/faculty from academic institutions and national labs along with a sizable number of industry folks. The line of talks involved a similar mixture as well (abstracts/slides from many of the talks are available from the workshop page linked above). In particular from the industry side, there were talks from data scientists (or "engineers" with similar roles) from Ayasdi, LinkedIn (formerly), Netflix, New York Times, and Schlumberger-Doll, to name a few. Indeed, I found this diversity a direct indicator of the young age of the discipline in question, i.e., data science. And yes, the usual jokes about data science/big data were not spared, including the one about how big data is like teenage sex, how big data is very much a man's game since you usually hear men boasting "my data is bigger than yours", and how data scientists are mostly data janitors!

Coming from the academic side, what I found most interesting at this workshop were the panel (and open) discussion sessions. In the first such discussion, the group tried to come up with a (not so short!) list of topics that a program in data science should train the (undergraduate/masters) student in. After starting with the usual suspects such as calculus and linear algebra, probability and statistics, algorithms, machine learning, and databases, the group expanded the scope. Next came high dimensional geometry, information theory, data visualization/exploration, experimental design, and communication/business "skills". But many in the audience appeared to be surprised by the suggestion of a class on inverse problems, and electromagnetism (yes!). The topics and then associated skills to be taught soon filled up two large panels of white board (recall the "teenage sex" joke, any one?). To wrap up the session, it was suggested that the student be trained (at least) in Python, GitHub, Sql (or something similar), all from the point of view of industry readiness. As far as the mathematicians are concerned, it was suggested that they could start by making a wish list of all results (related to data science) one wants to see as theorems, and such a list will keep them busy for more than a life time. But to see any such effort make huge impact, one should ideally work with a domain expert. One particular subtopic of much importance in this context (no pun intended!) is that of textual data - very important for data science, and as yet not well explored by mathematicians.

The panel discussion about careers in data science was quite popular as well. A majority of the panelists were junior (read "young") data scientists from the industry, and were able to shed a lot of light on what a typical work day looks like for a data scientist. One aspect of their work that particularly appealed to me (coming from traditional academia) is how quickly and directly they are able to see the impact of their ideas and work. For instance, a data scientist in a social media company could brainstorm for 2 hours, write the code in 2 more hours, and see thousands of users enjoying the benefits before the end of the day! On the other hand, academicians often wait years, if not months, to just count citations of their papers.

If there was one take home message from the data scientists to (young and old) aspirants, it was to just play with data - of many types and from many sources, and not to worry so much about all the different classes/training (or proving theorems). Be ever-ready to dive into any data that you come across, manipulate/analyze it quickly, and get the first insights.

I'm not attempting to list any summary/thoughts on the Mathematics involved (as meant in the title of the workshop). The list of relevant Math/Stat/CS topics has been huge already, and is not getting any shorter in this era of data science. I doubt if we're going to precisely define what data science is any time soon!