Wednesday, November 4, 2015

Discrete optimization @ Oaxaca - II

Some snippets from days 2 and 3 at the BIRS-CMO Workshop on Discrete Optimization in Oaxaca.

Thomas Rothvoss presented his work on constructive discrepancy minimization for convex sets (slides are available). The basic problem is that of assigning one of two colors \(\chi(i)\) to each \(i \in [n]:=\{1,\dots,n\}\) of \(n\) items represented by \(\{-1,+1\}\) such that for a system of sets \(\cal{S} = \{S_1,\dots,S_m\}\) with \(S_i \in [n]\) we minimize the maximum "mismatch", defined as the discrepancy:

\( \rm{disc}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \max\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right| \).

I found the techniques developed fairly deep, and would find use in lots of applications (i.e., for proving results on other optimization problems; Thomas talked about an application to bin packing). We had previously looked at the somewhat related problem of number partitioning. There, we assign a set of integers \(\{a_1, \dots, a_n\}\) to two sets such that the sums of the numbers over the two sets are as close to each other as possible (the difference between these two sums is the discrepancy here). At the same time, the corresponding alternative definition of discrepancy given as
\( \rm{disc'}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \sum\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right|\)

would make the problem sort of "easy" here. With that objective, one could prove that a random assignment of \(\pm 1\) would perhaps do as well as we can. Nonetheless, an appropriately defined notion of "weighted" discrepancy, where each element now has, say, nonnegative weights, would be interesting to consider. It appears a generalization to more than two colors would be interesting, but perhaps tricky to establish the building blocks of results.

Tamon Stephen talked about a variant of the Hirsch conjecture using circuit diameter of polyhedra, instead of the default graph diameter. See the preprint for an illustration of circuit distance between vertices of a polyhedron - unlike the graph diameter, it's not symmetric. The idea is that one is allowed to take "shortcuts" along the interior of the polyhedron along with the walks along the edges. In that sense, it's mixing the ideas of the default simplex method and the interior point method for solving linear programs (LPs). The authors show that the most basic counterexample to the Hirsch conjecture, the Klee-Walkup polyhedron, in fact satisfies the Hirsch bound. It would be interesting for software programs that solve LPs to be able to seamlessly and intelligently switch back and forth between interior point and simplex methods (a basic ability to do so is already provided by some of the state-of-the-art solvers)

Juan Pablo Vielma talked about when are Minkowski sums good/bad in the context of formulations for unions of polyhedra, and for unions of convex sets in general (the slides should be up soon here; but other versions are already available). For unions of polyhedra, aggregated formulations are often "short", but are not as tight as disaggregated ones (also termed extended formulations). The latter formulations are sharp/ideal, but are often too large in size (see the excellent review on MIP formulation techniques by JP Vielma, or lectures 5-7 from my IP class for a shorter overview). This interesting line of work tries to find a better middle ground by finding the sharpest formulations that are not extended, i.e., without having to add extra variables. Things get very interesting when one considers unions of convex sets (in place of polyhedra)!

Monday, November 2, 2015

Discrete optimization @ Oaxaca - I

I'm at the BIRS-CMO Workshop on Discrete Optimization in beautiful Oaxaca (in Mexico). Unlike other typical workshops, the organizers have tried hard to encourage lots of discussion and interactions among the participants - big props to Jesus De Loera and Jon Lee! I'm hoping to write snippets on talks/discussion that I found particularly interesting (yes, it'll be a biased view :-).

The meeting started with an apt talk by Dan Bienstock on LP formulations for polynomial optimization problems (based on this paper; Dan has also posted the slides). He started with a motivating problem - the optimal power flow (OPF) problem, which motivates the use of the treewidth of the intersection graph of the constraints as the parameter which controls the complexity of the proposed reformulation operator. And real-life power grids often have small treewidths. The reformulation operator produces linear programming approximations that attain provable bounds for mixed integer polynomial problems where the variables are either binary, or require \(0 \leq x_j \leq 1\).

Informally, the intersection graph of a system of constraints has one vertex for each variable \(x_j\), and edge \((x_i,x_j)\) is present when both \(x_i\) and \(x_j\) appear in some constraint. The simple example of a subset sum (or knapsack) problem was insightful. With the single constraint being \(x_1+\dots+x_n \leq \beta\), a constraint graph could have \(n+1\) vertices \({0,1,\dots,n}\), with the \(n\) edges \((0,j)\) for each \(j\) corresponding to \(x_j\) (node \(0\) is a "dummy" node here). The treewidth of this "star" graph is \(1\). The other key trick employed is the use of binary variables to approximate a continuous variable \(0 \leq x \leq 1\) (attributed originally to Glover). For a given error term \(0 < \gamma < 1\), we can approximate \(x \approx \sum_{j=1}^L \left(1/2^h\right) y_h\), where \(y_h\) are binary variables. With \(L = \lceil \log_2 (1/\gamma) \rceil \), we can get \(x \leq \sum_{j=1}^L \left(1/2^h\right) y_h \leq x + \gamma\). This step helps to get pure binary problems in place of the mixed integer problems. The treewidth gets blown up by \(L\), but things still work out nicely. This paper seems to have lots of nice and deep "tricks" (I hope to study it in detail).

There were several other interesting talks, and a very interactive problem session to conclude the day. Oh, and I learned a new terminology used in the power industry from Shabbir Ahmed's talk: a prosumer is someone who both produces and consumes power. I'm wondering why that is more apt than a conducer...

There was some lively discussion over dinner about how the optimization and operations research communities have failed to sell itself as well as the CS community (as a whole, or even the CS theory community by itself). Large membership sizes of ACM vs INFORMS and similarly large NSF budgets for CS vs OR were cited as indicators. There was also an anecdote mentioned about how back in the 1980s when NSF funding for algorithms/CS theory was on the decline, a group of several top big names from that field submitted a memo/petition to the lawmakers in DC, and also convinced them in person that "algorithms/theory is as fundamental as cosmology" (needless to say, I'm paraphrasing to a huge extent here!). And yes, they managed to restore the funding flow. The optimization community tried a bit of the same trick with Karmarkar's interior point algorithm for LP. May be we optimizers should try harder - not just from the point of view of securing funding, but also from the point of view of our students getting better industry jobs!

Saturday, September 12, 2015

Math of Data Science @ ICERM


Note: This is a re-post of the blog post I wrote originally on WordPress last month. It appears that Blogger is a much better platform for the kind of posts I'd like to make. Yes, I'm still learning ...

I recently attended the topical workshop on Mathematics in Data Sciences at ICERM. The attendance was good mix of students, postdocs, and researchers/faculty from academic institutions and national labs along with a sizable number of industry folks. The line of talks involved a similar mixture as well (abstracts/slides from many of the talks are available from the workshop page linked above). In particular from the industry side, there were talks from data scientists (or "engineers" with similar roles) from Ayasdi, LinkedIn (formerly), Netflix, New York Times, and Schlumberger-Doll, to name a few. Indeed, I found this diversity a direct indicator of the young age of the discipline in question, i.e., data science. And yes, the usual jokes about data science/big data were not spared, including the one about how big data is like teenage sex, how big data is very much a man's game since you usually hear men boasting "my data is bigger than yours", and how data scientists are mostly data janitors!

Coming from the academic side, what I found most interesting at this workshop were the panel (and open) discussion sessions. In the first such discussion, the group tried to come up with a (not so short!) list of topics that a program in data science should train the (undergraduate/masters) student in. After starting with the usual suspects such as calculus and linear algebra, probability and statistics, algorithms, machine learning, and databases, the group expanded the scope. Next came high dimensional geometry, information theory, data visualization/exploration, experimental design, and communication/business "skills". But many in the audience appeared to be surprised by the suggestion of a class on inverse problems, and electromagnetism (yes!). The topics and then associated skills to be taught soon filled up two large panels of white board (recall the "teenage sex" joke, any one?). To wrap up the session, it was suggested that the student be trained (at least) in Python, GitHub, Sql (or something similar), all from the point of view of industry readiness. As far as the mathematicians are concerned, it was suggested that they could start by making a wish list of all results (related to data science) one wants to see as theorems, and such a list will keep them busy for more than a life time. But to see any such effort make huge impact, one should ideally work with a domain expert. One particular subtopic of much importance in this context (no pun intended!) is that of textual data - very important for data science, and as yet not well explored by mathematicians.

The panel discussion about careers in data science was quite popular as well. A majority of the panelists were junior (read "young") data scientists from the industry, and were able to shed a lot of light on what a typical work day looks like for a data scientist. One aspect of their work that particularly appealed to me (coming from traditional academia) is how quickly and directly they are able to see the impact of their ideas and work. For instance, a data scientist in a social media company could brainstorm for 2 hours, write the code in 2 more hours, and see thousands of users enjoying the benefits before the end of the day! On the other hand, academicians often wait years, if not months, to just count citations of their papers.

If there was one take home message from the data scientists to (young and old) aspirants, it was to just play with data - of many types and from many sources, and not to worry so much about all the different classes/training (or proving theorems). Be ever-ready to dive into any data that you come across, manipulate/analyze it quickly, and get the first insights.

I'm not attempting to list any summary/thoughts on the Mathematics involved (as meant in the title of the workshop). The list of relevant Math/Stat/CS topics has been huge already, and is not getting any shorter in this era of data science. I doubt if we're going to precisely define what data science is any time soon!