Wednesday, January 6, 2016

Dissemination of Math in the internet age

I'm at the Joint Mathematics Meetings in Seattle. One of the first talks (8:00 AM!) of the first day was given by Prof. Tim Gowers on How might Mathematics be better disseminated (slight change in the title from what was originally published). Prof. Gowers highlighted several themes on better ways of doing mathematical research as well as better ways of disseminating the same, which he has been working on, and popularizing on the web, in recent years. Here are some of my own interpretations of what he talked about.

In the current setting of mathematical research, most emphasis is on being the "first to prove the theorem", and the basic unit of discourse is the peer-reviewed journal article. There are more than a few things wrong with this set up, the main one being that the wheel is reinvented repeatedly! Here is a type of result which could be very useful. If Lemma A is true, then BIG RESULT B is true, which would be fantastic news!. But, I have a counterexample for Lemma A :-(. Unfortunately, such a negative result cannot be "published" in a peer-reviewed journal article. Hence, others fumble around and reinvent the same result!

Another major drawback of the current system is that mathematical conversation happens at an inordinately slow pace. Someone proves a theorem, which appears in the journal in two years time. Then someone else reads it, comes up with a modification or a simpler proof, which appears in another journal two more years later! But in the current internet age, it's only natural to expect conversations occurring at a much faster pace (live tweets, any one?).

While electronic publication has made all research easily searchable, the search capacity is strong only in one direction, so to speak. If you know what you're looking for in terms of keywords, then it's easy to find it by search, e.g., you want to know what Szemer├ędi's theorem states, a quick search pulls up multiple relevant web pages. What would be very useful is a way to search "in reverse" using some limited keywords or partial statements (and not the name itself!) to see if such a result is already known. In other words, the community needs a mathematics research database that allows semantic search. Today, forums such as the mathoverflow often gets us accurate answers to questions of the form "has this been done before?".

In current times, mathematics research should use the internet - both for conducting it as well as to disseminate it. But someone having a high rating on the Math StackExchange for posting numerous answers, or who writes popular blog articles on otherwise difficult to understand math papers, is not rewarded for these activities in the current system. To get tenure, you better have the required number of papers in the top(-enough!) journals! One could have a potentially huge impact by writing easy-to-understand expository blog posts on otherwise hard-to-read set of mathematical papers written by other authors. These blog posts could in turn spur new contributions from others, who would not have had the inclination otherwise to digest the original papers. As such, this effort could potentially be worth much more than publishing a paper with a new theorem! But then again, the current system has no means of rewarding such expository efforts.

In follow-up conversation, Prof. Gowers agreed that senior/reputed mathematicians such as himself could afford to spend more time and effort on such endeavors on the internet without worrying about the rewards or evaluations. For tenure-track faculty or other junior researchers, the best course might well be to do both - publish via the conventional means, but also spend some effort on internet-based activities. Further down the line, we as a community would want to be able to judge how "impactful" a series of blog posts have been, so as to reward the same. But we must be careful not to go down the same path as overusing journal impact factors (so, don't judge the impact of a blog post by the number of times it has been re-tweeted, or +1-ed!).

In the latter part of the talk, Prof. Gowers highlighted four of his personal efforts in this regard - a reform of the journal system (Discrete Analysis), informal mathematical writing (via his blog), polymath projects, and automating proof discovery. He also presented his ideas of what mathematics research would be like in 20-30 years from now, and then in 50-60 years from now. Not surprisingly, computers would be expected to do most of the heavy (and light:-) lifting in the future.

A question from the audience inquired about the place of real, i.e., face-to-face, conversations on mathematics that happen at conferences. Would there be less of a place for them in mathematics research in the future? While agreeing that such conversations have their place, Prof. Gowers observed that may be the back-and-forth postings on a polymath project (with time-stamps!) allows the posters the time to understand the subject better, and think through before posting what they want to post. An instant face-to-face conversation would not give that luxury!

In summary, much need to be changed for mathematics research to be done right and be impactful in the internet age. Individual researchers need to be a bit bold, and not worry too much about rewards and ratings when spending time and effort on the internet. The more people who do so, the sooner the inevitable transition would happen!

Wednesday, November 4, 2015

Discrete optimization @ Oaxaca - II

Some snippets from days 2 and 3 at the BIRS-CMO Workshop on Discrete Optimization in Oaxaca.

Thomas Rothvoss presented his work on constructive discrepancy minimization for convex sets (slides are available). The basic problem is that of assigning one of two colors \(\chi(i)\) to each \(i \in [n]:=\{1,\dots,n\}\) of \(n\) items represented by \(\{-1,+1\}\) such that for a system of sets \(\cal{S} = \{S_1,\dots,S_m\}\) with \(S_i \in [n]\) we minimize the maximum "mismatch", defined as the discrepancy:

\( \rm{disc}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \max\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right| \).

I found the techniques developed fairly deep, and would find use in lots of applications (i.e., for proving results on other optimization problems; Thomas talked about an application to bin packing). We had previously looked at the somewhat related problem of number partitioning. There, we assign a set of integers \(\{a_1, \dots, a_n\}\) to two sets such that the sums of the numbers over the two sets are as close to each other as possible (the difference between these two sums is the discrepancy here). At the same time, the corresponding alternative definition of discrepancy given as
\( \rm{disc'}(\cal{S}) = \min\limits_{\chi(i) = \pm 1} \sum\limits_{S \in \cal{S}} \left| \sum_{i \in S} \chi(i) \right|\)

would make the problem sort of "easy" here. With that objective, one could prove that a random assignment of \(\pm 1\) would perhaps do as well as we can. Nonetheless, an appropriately defined notion of "weighted" discrepancy, where each element now has, say, nonnegative weights, would be interesting to consider. It appears a generalization to more than two colors would be interesting, but perhaps tricky to establish the building blocks of results.

Tamon Stephen talked about a variant of the Hirsch conjecture using circuit diameter of polyhedra, instead of the default graph diameter. See the preprint for an illustration of circuit distance between vertices of a polyhedron - unlike the graph diameter, it's not symmetric. The idea is that one is allowed to take "shortcuts" along the interior of the polyhedron along with the walks along the edges. In that sense, it's mixing the ideas of the default simplex method and the interior point method for solving linear programs (LPs). The authors show that the most basic counterexample to the Hirsch conjecture, the Klee-Walkup polyhedron, in fact satisfies the Hirsch bound. It would be interesting for software programs that solve LPs to be able to seamlessly and intelligently switch back and forth between interior point and simplex methods (a basic ability to do so is already provided by some of the state-of-the-art solvers)

Juan Pablo Vielma talked about when are Minkowski sums good/bad in the context of formulations for unions of polyhedra, and for unions of convex sets in general (the slides should be up soon here; but other versions are already available). For unions of polyhedra, aggregated formulations are often "short", but are not as tight as disaggregated ones (also termed extended formulations). The latter formulations are sharp/ideal, but are often too large in size (see the excellent review on MIP formulation techniques by JP Vielma, or lectures 5-7 from my IP class for a shorter overview). This interesting line of work tries to find a better middle ground by finding the sharpest formulations that are not extended, i.e., without having to add extra variables. Things get very interesting when one considers unions of convex sets (in place of polyhedra)!

Monday, November 2, 2015

Discrete optimization @ Oaxaca - I

I'm at the BIRS-CMO Workshop on Discrete Optimization in beautiful Oaxaca (in Mexico). Unlike other typical workshops, the organizers have tried hard to encourage lots of discussion and interactions among the participants - big props to Jesus De Loera and Jon Lee! I'm hoping to write snippets on talks/discussion that I found particularly interesting (yes, it'll be a biased view :-).

The meeting started with an apt talk by Dan Bienstock on LP formulations for polynomial optimization problems (based on this paper; Dan has also posted the slides). He started with a motivating problem - the optimal power flow (OPF) problem, which motivates the use of the treewidth of the intersection graph of the constraints as the parameter which controls the complexity of the proposed reformulation operator. And real-life power grids often have small treewidths. The reformulation operator produces linear programming approximations that attain provable bounds for mixed integer polynomial problems where the variables are either binary, or require \(0 \leq x_j \leq 1\).

Informally, the intersection graph of a system of constraints has one vertex for each variable \(x_j\), and edge \((x_i,x_j)\) is present when both \(x_i\) and \(x_j\) appear in some constraint. The simple example of a subset sum (or knapsack) problem was insightful. With the single constraint being \(x_1+\dots+x_n \leq \beta\), a constraint graph could have \(n+1\) vertices \({0,1,\dots,n}\), with the \(n\) edges \((0,j)\) for each \(j\) corresponding to \(x_j\) (node \(0\) is a "dummy" node here). The treewidth of this "star" graph is \(1\). The other key trick employed is the use of binary variables to approximate a continuous variable \(0 \leq x \leq 1\) (attributed originally to Glover). For a given error term \(0 < \gamma < 1\), we can approximate \(x \approx \sum_{j=1}^L \left(1/2^h\right) y_h\), where \(y_h\) are binary variables. With \(L = \lceil \log_2 (1/\gamma) \rceil \), we can get \(x \leq \sum_{j=1}^L \left(1/2^h\right) y_h \leq x + \gamma\). This step helps to get pure binary problems in place of the mixed integer problems. The treewidth gets blown up by \(L\), but things still work out nicely. This paper seems to have lots of nice and deep "tricks" (I hope to study it in detail).

There were several other interesting talks, and a very interactive problem session to conclude the day. Oh, and I learned a new terminology used in the power industry from Shabbir Ahmed's talk: a prosumer is someone who both produces and consumes power. I'm wondering why that is more apt than a conducer...

There was some lively discussion over dinner about how the optimization and operations research communities have failed to sell itself as well as the CS community (as a whole, or even the CS theory community by itself). Large membership sizes of ACM vs INFORMS and similarly large NSF budgets for CS vs OR were cited as indicators. There was also an anecdote mentioned about how back in the 1980s when NSF funding for algorithms/CS theory was on the decline, a group of several top big names from that field submitted a memo/petition to the lawmakers in DC, and also convinced them in person that "algorithms/theory is as fundamental as cosmology" (needless to say, I'm paraphrasing to a huge extent here!). And yes, they managed to restore the funding flow. The optimization community tried a bit of the same trick with Karmarkar's interior point algorithm for LP. May be we optimizers should try harder - not just from the point of view of securing funding, but also from the point of view of our students getting better industry jobs!

Saturday, September 12, 2015

Math of Data Science @ ICERM

Note: This is a re-post of the blog post I wrote originally on WordPress last month. It appears that Blogger is a much better platform for the kind of posts I'd like to make. Yes, I'm still learning ...

I recently attended the topical workshop on Mathematics in Data Sciences at ICERM. The attendance was good mix of students, postdocs, and researchers/faculty from academic institutions and national labs along with a sizable number of industry folks. The line of talks involved a similar mixture as well (abstracts/slides from many of the talks are available from the workshop page linked above). In particular from the industry side, there were talks from data scientists (or "engineers" with similar roles) from Ayasdi, LinkedIn (formerly), Netflix, New York Times, and Schlumberger-Doll, to name a few. Indeed, I found this diversity a direct indicator of the young age of the discipline in question, i.e., data science. And yes, the usual jokes about data science/big data were not spared, including the one about how big data is like teenage sex, how big data is very much a man's game since you usually hear men boasting "my data is bigger than yours", and how data scientists are mostly data janitors!

Coming from the academic side, what I found most interesting at this workshop were the panel (and open) discussion sessions. In the first such discussion, the group tried to come up with a (not so short!) list of topics that a program in data science should train the (undergraduate/masters) student in. After starting with the usual suspects such as calculus and linear algebra, probability and statistics, algorithms, machine learning, and databases, the group expanded the scope. Next came high dimensional geometry, information theory, data visualization/exploration, experimental design, and communication/business "skills". But many in the audience appeared to be surprised by the suggestion of a class on inverse problems, and electromagnetism (yes!). The topics and then associated skills to be taught soon filled up two large panels of white board (recall the "teenage sex" joke, any one?). To wrap up the session, it was suggested that the student be trained (at least) in Python, GitHub, Sql (or something similar), all from the point of view of industry readiness. As far as the mathematicians are concerned, it was suggested that they could start by making a wish list of all results (related to data science) one wants to see as theorems, and such a list will keep them busy for more than a life time. But to see any such effort make huge impact, one should ideally work with a domain expert. One particular subtopic of much importance in this context (no pun intended!) is that of textual data - very important for data science, and as yet not well explored by mathematicians.

The panel discussion about careers in data science was quite popular as well. A majority of the panelists were junior (read "young") data scientists from the industry, and were able to shed a lot of light on what a typical work day looks like for a data scientist. One aspect of their work that particularly appealed to me (coming from traditional academia) is how quickly and directly they are able to see the impact of their ideas and work. For instance, a data scientist in a social media company could brainstorm for 2 hours, write the code in 2 more hours, and see thousands of users enjoying the benefits before the end of the day! On the other hand, academicians often wait years, if not months, to just count citations of their papers.

If there was one take home message from the data scientists to (young and old) aspirants, it was to just play with data - of many types and from many sources, and not to worry so much about all the different classes/training (or proving theorems). Be ever-ready to dive into any data that you come across, manipulate/analyze it quickly, and get the first insights.

I'm not attempting to list any summary/thoughts on the Mathematics involved (as meant in the title of the workshop). The list of relevant Math/Stat/CS topics has been huge already, and is not getting any shorter in this era of data science. I doubt if we're going to precisely define what data science is any time soon!