[MUSIC] Welcome back to the course on approximation algorithms. In this chapter, we will study the set cover problem. This will be an opportunity for us to explore a new technique in the design of approximation algorithms using the solution of a linear program and rounding it using randomized rounding. This will be our first randomized algorithm in this course. But first, let us give the definition of the set cover problem. Input and output. What is the input? The input to the set cover problem is collection of elements. Here we have one, two, three, four, five, six elements in a universe. And then, a collection of subsets of these elements. For example, the blue subset contains both the orange and the green elements. Each subset has a cost. For example, what kind of situation does this correspond to? Perhaps you want to buy a certain set of items. You want to buy some various ingredients for cooking, let's say. And then, you know that if you go to this store, you will get this particular set of elements. If you go to that store, you'll get another set of foods. That store, another set of foods. With certain prices in each set. And then you have to decide which stores to visit, which bunches of items to buy. So that in the end you have everything you need. So, you have these elements, subsets of costs and the output is a collection of subsets. You choose certain of the subsets of the input, not all of them but a few, so as to cover all the elements. Every element in the universe must be covered, must be in at least one of the subsets that you choose. And your goal is to minimize the cost of the sets that you buy. For example here, you bought three sets perhaps, maybe, the three sets that are slightly shaded. And these three sets have cost 4, 2, and 2, for a total cost of 8. And you can verify that every one of the elements is covered by at least one of the sets. Now, does this ring a bell? Isn't this kind of familiar? Yes, of course. Even if you don't understand anything to what I'm saying, you recognize one word, cover. We've seen this word before. Vertex cover, vertex cover, set cover. Indeed, these two problems are closely related, and what is the relation? The vertex cover is a special case of the set cover problem. The set cover problem is a generalization of the vertex cover problem. Let us see why. This is an instance of the vertex cover problem. Graph vertices, edges, you have to pick vertices to cover every edge. It was the one that was in the picture with Charlie Brown in the first chapter. We want to cover all edges with the fewest vertices. Cover edges, so, the edges are the elements. Using vertices, so that vertices are the sets. What is the set associated to a vertex? It is the set of all the edges adjacent to that vertex. All the edges that touch that vertex. What is the cost of a set? If you just want the fewest vertices, then each set has cost one and you want to minimize the total cost, ie, minimize the total number of sets that you choose. So in this way, you could reformulate the vertex cover problem as a a set cover problem. The set cover problem is more general. So it is at least as difficult as the vertex cover problem. And so, we will now study an algorithm to solve approximately the set cover problem. How did we solve vertex cover? We wrote a linear programming relaxation and solved it by rounding. So, they suggest that here we should use a similar approach, integer program. Our integer program will be a natural extension of the integer program for vertex cover. So, let me now give you an integer programming formulation of the set cover problem. Integer program, we need variables. In the case of vertex cover, we had one variable for each vertex, which can be interpreted to mean whether or not we take this vertex. In the set cover problem, we now have one variable for each possible subset s given in the input. The interpretation is, x sub s will be 0 or 1. 1 if s is put in the cover, and 0 if it isn't constraint. As in the vertex cover problem, there's one constraint for each element. In the vertex cover problem there was one constraint for each edge. Seeing that for each edge you had to choose one or the other or both of its endpoints and put them in the cover. Here, we have one constraint for each element, saying the element must be covered. That is, if you look at all the x sub s variables, for all the sets containing s, at least one of them must be in your cover. At least one of them must be equal to 1. If you sum the x sub s over all those sets, the sum must be at least 1. Every vector x sub s that satisfies these constraints is a set cover. For every set cover, there is a corresponding vector where each x sub s is 0 or 1. That satisfies all the constraints. So with those variables and constraints, we have captured every possible feasible solution. And what is the objective? The objective is to minimize the cost. Every set has a cost. Let's call it c sub s. We want to minimize the sum of s of c sub s, x sub s. This, when x sub s is 0 or 1, is exactly the sum of the costs of all the sets, such that x sub s is 1. It is the sum of all the costs, of all the sets that are in the cover. So with this, we have an integer program for the vertex cover problem. Now let us continue imitating our solution for vertex cover. Linear programming relaxation. It is the same as the integer program except that instead of positing that x sub s has to be 0 or 1, we write a constraint that x sub s has to be a real number in the closed interval 0, 1. That is linear programming relaxation for set cover. Next step, if we try to extend vertex cover, rounding up. How did we round in vertex cover? We said for every variable x sub u associated to vertex u, we round x sub u to 1. If and only if, in the solution of the linear program, the real variable x sub u is at least 1/2. What happens if we do this here? We can do the same. We can say, write the linear programming for set cover. Solve it, and then each x sub s, if it's at least 1/2 we round it to 1. But that fails. It fails in the sense that it fails to give a solution that corresponds to a set cover. It may fail. Let me show this to you by an example. We take the same input that I discussed earlier. This input. I claimed that this vector of a variable x sub s is a solution to the linear program. It satisfies all the constraints. Why? Look at the yellow element. It's covered 1/3 by this set, 1/3 by that set, 1/3 by that set. Total, 100%. Green, 2/3, 1/3, 1/2. Total, more than 100%. It's covered and well covered. Yellow, 1/3, 1/3, 1/3. It's covered to 100%. Orange, 1/3 and 2/3. We're good. Blue, 1/3, 1/3, and 1/3. Again, we're good. Purple, and so on. For every element the sum of the x sub s values is at least one. And so, we a solution to the linear program here. Now what happens when we round it to 1? Look at the values of x sub s. 2/3 is rounded up to 1. 1/2 is rounded up to 1. All the other variables, they're all at 1/3, they all get rounded down to 0. What's going to happen now? Look at the yellow element. It belonged to three sets. The values of the three sets were 1/3, 1/3, 1/3. This gets rounded down to zero, zero, zero. None of these three sets belongs to the output cover. The yellow element will not be covered. Using this rounding here, we will not get a feasibly set cover if we round this vector x sub s of x sub s values. So what can we do? Let's lower the threshold, to what? To 1/3, obviously here. If we lower the threshold to 1/3, then all of the variables here, they do get rounded to one. And so the result is a feasible set cover. However, it is easy to imagine another input, where there's a feasible solution where every x sub s has value one-quarter. For example, let's say that we have a square, four elements. Or maybe even a pentagon, five elements. All possible subsets of size two. And each set, let's say that in x sub s it has value one-quarter. Five elements, all possible subsets of size two. Each x sub s has value one-quarter. Then each element belongs to four subsets. Each of them has value one-quarter. One-quarter plus one-quarter plus one-quarter plus one-quarter, that's one. Every element if covered, but every variable has value one-quarter. So if you threshold is one-third, then, everything will be rounded to zero and this will not work again. If you try to lower the threshold, you have the same problem again and again and again. For every choice of a threshold, there exists an instance of set cover and a vector x sub s. [INAUDIBLE] rounding using that threshold will fail. Will give you something which is not a set cover, which does not satisfy all the constraints, which does not cover all the elements. So, we need to devise a better rounding now. We need to devise a rounding that will not have a fixed threshold, that will somehow adapt to our output of the linear program. How do we do that? We use randomized rounding. We have seen the definition of set cover. We have seen a linear programming relaxation for set cover. We have seen how the rounding used for the cover does not extend to solve this problem. Now we are going to see a randomized rounding solution for a set cover.