Monday, June 16, 2014

Implemented Treap for the first time

As a way to solve Problem E from ZeptoLab Code Rush 2014, I decided to implement Treap for the first time. Treap is a randomized binary search tree that the maximum height of the tree is about O(log n) (similar to AVL tree). It's easier to implement than a normal AVL tree. Why do I have to implement it? I need to perform these operations in O(log n): Add, Remove, Find the sum of the k smallest elements. My implementation was added here.

Note that this problem E can also be solved by other data structures: Binary Indexed Tree, Cartesian tree or Priority Queue, because this problem only requires calling this operation, "Find the sum of the k smallest elements", multiple times with the non-ascending value of k.

Thursday, June 5, 2014

Topcoder | SRM 623

I will skip Div1-300, because it can be solvable after carefully thinking/coding.


I would like to introduce a technique used to solve this problem. The technique is to apply a transformation (rotate the graph 45-degree clockwise) to the original graph. With the original graph, if we are at (0, 0) we can move to get any points that drop from the area above the blue lines. After applying the transformation, the claim stays true. So what is the maximum number of points we can get? Of course, we can only get the points that are in the 1stquadrant.

The transformation matrix is [[1, 1], [-1, 1]]. What can we do with this new graph? First, we can notice that if we are at (x, y), the next fruit position that we can get to must be (u, v), where u >= x, v >= y. This means an optimal sequence of pieces of fruits we get will be ascending by x-coordinates. We can think of this problem as a different problem that all the points stay still, but we are the one who moves right or up to collect the points. So let's sort all the points by their x-coordinate first. Then we name a multiset to store y-cooridnates and loop through all the points in the ascending order and do the following:

  • If the point p is not in the 1stquadrant, we ignore (continue).
  • We put p.y (y-coordinate) into a multiset, then we remove the first element (if exists in the set) which is greater than p.y.
  • After finish looping, the size of this set will be the maximum number of pieces of fruit we can get.
Why does this give us an optimal solution? Consider this example, suppose now our optimal sequence is (1, 1), (2, 3) (our multiset will be [1, 3]) and the next point is (4, 5). From (2, 3), we can get the fruit at (4, 5) next. So we can put this new point into our map. The other case is that the next point is (4, 2) (its y-coordinate is less than the current position's). If we are at (2, 3), we cannot get the fruit at (4, 2). We need to pick either (4, 2) or (2, 3), but we can't really tell which point we should go right now. What we know is that the number of piece of fruit will not increase. We choose to add 2 into our multiset and remove 3 because (4, 2) will allow more spaces we can go next. In other words, we pick (4, 2) as a 'shadow' of (2, 3). If sometimes later, we know that (2, 3) is not a good choice, this will be fine. At (4, 2), we can go to all the points we can go from (2, 3), so there is no harm to pick (4, 2) instead. But if (2, 3) is a good choice, we would already remove 3 (the y-coordinate of (2, 3)) from our set which is what we expect. It can be more clear if you draw some pictures of both cases.

Here are the problems: SRM 623 | Problems

Tuesday, June 3, 2014

Heavy Light Decomposition technique

I have heard about this technique so many times recently, but I haven't gotten a chance to use it during a programming contest. I think this technique is very interesting and pretty cool, so I decide to implement it to solve a problem, QTREE. The problem is that you're given a tree of N nodes and a number of operations. Each query can be one of the following:

  1. CHANGE a b: update the cost of the ath edge to b
  2. QUERY a b: print the maximum edge cost on the path from a to b
So you're asked to perform these operations. This problem would be very easy to solve if the graph is just a chain (not a tree). In that case, you can easily solve it by using a segment tree that can perform each operation in O(log N). I claim that this problem can also be solved by segment trees. This is how Heavy Light Decomposition (HLD) technique comes into play. Before I talk about HLD, I will explain how HLD can solve this problem. Even though the given graph is a tree, we still decompose it into several chains. If we can do this, we can perform each operation on separate chains and combine the results together. We already know that performing an operation on a chain takes O(log N) by using a segment tree, so we need to decompose the tree into some number of chains. And HLD can do this and guarantee that the number of chains is O(log N). So it will take us O(log N)2 to perform each operation which is fast enough to solve this problem. A tree after applied HLD will look like this:

The colors indicate different chains and the numbers in the nodes indicate the chain heads. I won't talk about the implementation detail (I will post my code I wrote to solve this QTREE problem on the bottom. The idea is that we do one Depth-First search to gather information (subtree's size, each node's parent), build HLD, then build a segment tree over all the chains. So if we want to answer QUERY u v, we can just first find the lowest common ancestor between u and v which is node 1, then we find the maximum edge cost on the path from u to node 1 and the maximum edge cost on the path from v to node 1. To find the maximum edge cost on each path, we can just crawl up the chains. For example, we can crawl up from node u to node 1 by visiting the pink chain, red chain, and green chain. Upon visiting each chain, we can ask our segment tree what the maximum edge cost between me and the head of this chain. We take the maximum among those and that will give us the answer. This is a basic application of HLD. You can imagine that we can do any kinds of operations we can do on a chain. We can just make a different functionality of a segment tree to handle those operation. HLD is just a technique to decompose a tree into chains. HLD should be able to tell us how to crawl up the tree chain-by-chain as well.

There is a similar technique to HLD. For example, we can divide an array into O(√N) parts. Each part contains O(√N) elements. When we're asked to perform some operation on the array, we can just do it separately on each part (which should be easier somehow depending on a problem). We might be able to reduce time to perform each operation from O(N) to O(√N).

Here is the link to my code for QTREE

Monday, June 2, 2014

Codeforces Round #250 | Div. 1

Problem A

I will just give a hint for A. The idea is that disconnecting the graph requires cutting all the edges. Cutting an edge e(v1, v2) can cost either v1.value or v2.value. Of course, we want the cost to be the lower one, and we can always force it by some cutting sequence.

Problem B

This problem is very interesting for me. It took me a large amount of time in order to solve it. So suppose that x is the minimum number of animals among the routes areas from p to q. If we set the cost of each edge to be the minimum value between its ends, x will be equal to the minimum edges' cost along the path. f(p, q) is the maximum value of x's among all simple routes between p and q. It looks really hard if we consider all the routes. We have to calculate the sum value of f(p, q) for all pairs p, q (p ≠ q). But actually this sum is equal to c(e1) + c(e2) + ... + c(em) where c(ei) is ei.cost ✕ the number of all pairs p, q whose f(p, q) is ei.cost. To find c(ei), we need to start with the largest edge. If e(u, v) is the largest edge, f(u, v) is surely equal to e.cost. Then we put u, v into the same set. Then move to the second largest edge and so on. Notice that for each edge e(u, v), f(p, q) = e.cost where p ∈ the set of u and q ∈ the set of v, so c(e) = e.cost ✕ |u| ✕ |v|. Then we union u and v. Doing this to all the edges in descending order of their weights will give you the sum value of f(p, q) for all pairs p, q (p ≠ q). And the best data structure that can do this very efficiently is Disjoin Set.

Problem C

One can easily realize that this problem is a dp problem. The question is: what is the best way to do dp? Well let dp[i][j] = the number of ways to split the polygon p[i], p[i + 1], ... ,p[j] into non-degenerate triangles. Note that i can be greater than j and that will represent a polygon p[i], p[i + 1], ..., p[n], p[1], p[2], ..., p[j]. How can we calculate dp[i][j]? We can try splitting the polygon(i, j) by a non-degenerate triangle(i, k, j) where k is between i and j. We need to check that triangle(i, k, j) doesn't cross the entire polygon. So dp[i][j] will be the sum of dp[i][k] ✕ dp[k][j] for all valid triangles(i, k, j). By doing this, we don't have to encounter double-counting problem, because we split the polygon by different triangles(i, k, j).

Problem ​D

This problem is basically a classic problem that can be solved by Segment tree if we don't have the second type of operation. Let's think about the second type of operation (mod). Does it really increase the difficulty of this problem? If we handle the second operation by simply mod-ing down the tree, we can stop when the maximum value of all the values in a subtree is less than the mod value. If x < mod, x % mod = x. Another fact is that mod-ing significantly decrease a value. Even if we mod everything down the tree, their values will become less than some mod value pretty fast. By these two facts, we can just solve this problem straightforwardly by using the normal segment tree. Each tree node keeps two values: sum (the sum of this subtree) and max (the maximum over all the leaves of this subtree).

Problems: Codeforces Round #250 | Div. 1 | Problems

Lesson learnt:

  • Checking that a line connecting between two corners doesn't cross the entire polygon and is inside the polygon can be easily done by checking if the sum of the areas' parts separated by this line is still equal to the area of the entire polygon
  • Getting a polygon(start, end) can be handled by the following code (this is a nice way to deal with a circular array):
    vector<Point> v;
    for (int i = start; i != end; i = (i + 1) % N) {
  • When you find that you're stuck in a thought process or you keep repeatedly thinking about the same idea, you should stop thinking, step back. You might want to re-read the problem statement again, think slowly about the big idea to solve the problem, and consider the simplest case. By doing this, you might end up with a new working solution. I noticed that sometimes I can fall into my own thought process and can be misled by something that won't ever direct me to the intended approach to solve a problem. Also, it's worth taking a short 10-second break.

Sunday, June 1, 2014

GCJ 2014 Round 2 Follow-up

Yesterday I left the post with 2 GCJ14 Round 2 unsolved problems. Here are the solutions to those problems

Problem C: Don't Break The Nile: as I said, we can use Maximum flow algorithm to solve this problem, but actually we don't have to run the full straightforward MaxFlow algorithm. We can apply the idea that MaxFlow = Min-cut to solve this problem. A way that we can find the min-cut to the graph is to draw the min-cut curve going from the west to the east side. Consider the following picture (from GCJ2014 sample test):

The purple line is a minimum cut on this graph (the network flow graph is not shown). Basically, we try to make a new graph such that all builds are represented as vertices, and there are 2 more vertices for the west and east sides. We added an edge between each pair of vertices (For simplicity, some edges are not shown in the picture). The weight of each edge is the distance between two buildings. In analogy, the cut's cost between two buildings is the amount of water that can go through the gaps between the buildings. So let's think a bit of how we can find the minimum cut? --------- Look like a shortest path problem, right? and yes, the cost of the minimum cut is just the shortest path from the west to the east side. That's it. The size of the entire grid doesn't matter here. I really like this problem!

About implementation, finding the distance between two buildings are somewhat tricky though. My method is like there are four points on each rectangle. Let's take the minimum distance between the 16 pairs of points between the two buildings (The distance between two points here is max(abs(p1.x - p2.x), abs(p1.y - p2.y)) - 1). This is not all. We have to consider 4 more cases where two buildings are located side-by-side :)

Now move to the hardest problem: Problem D. Trie Sharding. There are two parts of this problem: 1) Find the maximum total number of nodes of all possible group arrangements. 2) How many group arrangements that results in that maximum number of nodes. To answer the first question, let's build a trie of all the strings. Consider the following picture for the first sample input ("AAA", "AAB", "AB", "B"):

The original trie can be composed of these two tries ({"AAA", "B"}, {"AAB", "AB"}). Considering the original trie, the second number on each node indicates how many times this node can appear on different tries (Of course, the number can't be greater than N). The way that we can calculate all these numbers is to starting from leaves to the root. We can be sure that each node that represents the end of a string can appear only once on a trie, so we can set the second number to 1. For the other nodes, their second number are the sum of the second numbers of all of their children. If the sum is greater than N, set it to N. Finally, the sum of all the second numbers are the total number of the nodes. Answering the second question is harder. In different interpretation, the second number is actually the number of trie subtrees rooted at each node. We claim that that the number of group arrangements is the multiplication of the number of ways to make each subtree. Consider the root subtree as an example. There are 2 subtrees from the first child and 1 subtree from the second child. We have to count the number of ways to combine these 3 subtrees into 2 tries. So the subproblem is that given a list of numbers x(#subtrees on each child), and a number kk = min(N, total #subtree), count how many ways we can combine all the subtrees into kk tries (trees). The order doesn't matter and two subtrees on the same child can't be in the same resulting subtree.

For example, x = {2, 1}, kk = 2, there are 2 ways: ({X, Y}, {X}) and ({X}, {X, Y}), X is a subtree from the first child and Y is a subtree from the second child. Notice that two x's need to be in the different trees.

This can be solved by Dynamic Programming:

long long count(vector<int>&x, int kk) {
    // dp[i] = the number of ways to combine x into i tries
    // C[i][j] = (i choose j)
    int sz = x.size();
    for (int i = 1; i <= kk; i++) {
        dp[i] = 1;
        for (int j = 0; j < sz; j++)
            dp[i] = (dp[i] * C[i][x[j]]) % MOD; // there are i tries, choose x[j] tries for x[j] subtrees 
        for (int j = 1; j < i; j++)
            // we subtract dp[i] by the number of ways that we don't use all i tries.
            dp[i] = (((dp[i] - dp[j] * C[i][j]) % MOD) + MOD) % MOD;
    return dp[kk];
The end!!!