hyPiRion

FBHC 2013 Intro Round in Clojure

posted

Everyone these days uses C++ whenever participating in programming competitions. Even I, a person who prefers to use Java in competitions, see the appeal of using C++ in competitions:

  • It is less verbose than Java.
  • It has the data structures you need.
  • For people with an understanding of how data structures work, it’s easier to create data structures you need.
  • Even if you have the “wrong” approach with a worse runtime than what the problem author expects (say you’ve implemented O(n n ) instead of O(n log n), you may still get away with it.)
  • It’s easy to read input.

However, at times it’s impossibly hard to solve a problem without immutable data structures: Sometimes you have to do recursion and memoization, and doing so with mutable structures seems to be a recipe for disaster. And whenever your code turns more than 200 lines long you tend to lose control and end up debugging some issue, which wouldn’t be there if you did it functionally.

As such, maybe there’s time for a change. Not only because C++ lacks some features, but also because people need to see things from a different perspective: Too many people these days use imperative languages without knowing about functional ones. “Forcing” people to learn how to program functionally will make them able to utilize functional tricks, even if they’re still going to write their applications in imperative languages.

So why not try out Clojure in programming competitions? For non-functional programmers, you’ll feel crippled for some time due to the lack of mutability. For people familiar with both worlds, the main issue is how one does “the mutable stuff” one tends to do with data structures in imperative languages. It really requires thought, but when it’s done often, it gives you the power to design persistent data structures—a valuable skill in our world with ever-increasing concurrent computations.

Enter Facebook Hacker Cup

The qualification round of Hacker Cup has ended, but there’s still Round 1 for those interested in working in new ways. Usually, Round 1 requires you to solve all 3 problems correctly, but without time limit. If you feel confident enough that you’ll solve every single one of them, I encourage you to challenge yourself to use Clojure or some other functional programming language.

If you’re not sure how you should use Clojure to solve these problems, read on: I’ll show how I would have solved the problems in the Qualification round if I were using Clojure. Mind you, this is not an introduction to Clojure, so don’t treat it as such.

First of all, be sure you have Clojure in your environment. Installing it on Linux-based systems should be as easy as installing the clojure or clojure1.4 package with your preferred package manager. And with that, we’re off.

Beautiful Strings

The easiest problem in the qualification round had to do with strings, and the problem text was as follows1:

When John was a little kid he didn’t have much to do. There was no internet, no Facebook, and no programs to hack on. So he did the only thing he could… he evaluated the beauty of strings in a quest to discover the most beautiful string in the world.

Given a string s, little Johnny defined the beauty of the string as the sum of the beauty of the letters in it.

The beauty of each letter is an integer between 1 and 26, inclusive, and no two letters have the same beauty. Johnny doesn’t care about whether letters are uppercase or lowercase, so that doesn’t affect the beauty of a letter. (Uppercase F is exactly as beautiful as lowercase f, for example.)

You’re a student writing a report on the youth of this famous hacker. You found the string that Johnny considered most beautiful. What is the maximum possible beauty of this string?

Input

The input file consists of a single integer m followed by m lines.

Output

Your output should consist of, for each test case, a line containing the string “Case #x: y” where x is the case number (with 1 being the first case in the input file, 2 being the second, etc.) and y is the maximum beauty for that test case.

Constraints

5 ≤ m ≤ 50
2 ≤ length of s ≤ 500

Example input:Example output:

5
ABbCcc
Good luck in the Facebook Hacker Cup this year!
Ignore punctuation, please :)
Sometimes test cases are hard to make up.
So I just go consult Professor Dalves

Case #1: 152
Case #2: 754
Case #3: 491
Case #4: 729
Case #5: 646

The problem and Its Solution

Restating the problem, we can say that we want to give every letter of the alphabet a value from 1 to 26, and that no two letters have the same value. Given a string s, we want to find a way to assign letters a value

Intuitively, we can see that we should give the highest value to the most frequent letter and the lowest value to the least frequent letter. Notice that since we’re asked to find the sum, it doesn’t really matter which letter we give what value. As such, knowing the frequencies is enough to solve this problem.

Since we can sort the frequencies, our problem becomes fairly straightforward: Just find the frequencies, sort them highest to lowest, and give the first value 26, second 25, etc.

Implementation

To start up, we must have some way of running these things. To make this blog post easier to read, I’ll avoid Leiningen and go with bare Clojure. However, if you want to utilize libraries and don’t hassle too much with the classpath, make a project and use lein run -m namespace.name.

Create a file and make it executable (chmod +x foo.clj), and put in the following:

#!/usr/bin/env clojure

(println "Hello, World!")

Then execute it as any other binary (./foo.clj) and ensure that you get Hello, World! in the console. Don’t be afraid if it seems horribly slow: It’s only the startup time of Clojure, not the code itself which is slow.

Now, our first “problem” would be to read input from stdin. With Clojure, that’s usually easy: Use (read) when you need to read in integers or doubles, and (read-line) if you need to read a line. For this round, this suffices, but if you need to read words, you have to wrap the read with a str like so: (str (read))2.

With that and the Clojure cheatsheet up, something like this shouldn’t be impossible for Clojurists to generate:

#!/usr/bin/env clojure

(defn best-solution []
  (->> (read-line)
       .toLowerCase
       (filter #(Character/isLetter %))
       frequencies
       vals
       (sort >)
       (map * (iterate dec 26))
       (reduce +)))

(defn pprint-result [pos res]
  (println (format "Case #%d: %d" (inc pos) res)))

(defn main []
  (let [T (read)]
    (read-line) ;; Finish off the first line.
    (dotimes [t T]
      (->> (best-solution)
           (pprint-result t)))))
(main)

best-solution should be rather easy to follow: Read the line, make it lowercase, filter the letters (keep only letters), count the frequencies, take the values (the numbers), sort them higher to lower, multiply each frequency by its optimal value, and sum them together. Every comma (,) represents a line in the best-solution function, and it explains better how instead of why. A common argument functional programmers use, perhaps justified this time.

Balanced Smileys

The second problem is about balancing parentheses, a suitable problem for the Lisp dialect we’re working with:

Your friend John uses a lot of emoticons when you talk to him on Messenger. In addition to being a person who likes to express himself through emoticons, he hates unbalanced parenthesis so much that it makes him go :(

Sometimes he puts emoticons within parentheses, and you find it hard to tell if a parenthesis really is a parenthesis or part of an emoticon.

A message has balanced parentheses if it consists of one of the following:

  • An empty string “”
  • One or more of the following characters: a to z, ` ` (a space) or : (a colon)
  • An open parenthesis (, followed by a message with balanced parentheses, followed by a close parenthesis ).
  • A message with balanced parentheses followed by another message with balanced parentheses.
  • A smiley face :) or a frowny face :(

Write a program that determines if there is a way to interpret his message while leaving the parentheses balanced.

Input

The first line of the input contains a number T (1 ≤ T ≤ 50), the number of test cases.
The following T lines each contain a message of length s that you got from John.

Output

For each of the test cases numbered in order from 1 to T, output Case #i: followed by a string stating whether or not it is possible that the message had balanced parentheses. If it is, the string should be YES, else it should be NO (all quotes for clarity only)

Constraints

1 ≤ length of s ≤ 100

Example input:Example output:
5
:((
i am sick today (:()
(:)
hacker cup: started :):)
)(

Case #1: NO
Case #2: YES
Case #3: YES
Case #4: YES
Case #5: NO

The Problem and Its Solution

Restating the problem, we can say that our goal is to figure out whether we can interpret the message in a way that balances the parentheses. Handling ( and ) is easy enough, but :( can be treated as :( or as : followed by a (. As such, we need to find a way to represent this.

One solution would be to have a set of possible states instead of a single state. A state represents how many parentheses we have opened, for example, 3 would mean we have opened three parentheses. Our set of states when we start are thus #{0}, as we’ve opened zero parentheses.

Whenever we hit upon a (, we increment every value in the set by one, and when we hit upon a ), we decrement them by one. Whenever we hit upon :(, we take the set and increment every value by one. But instead of keeping that specific set, we merge it with the old set, having both possible states within a single set. We do the same for :), except that we decrement every value in the set. When all elements are consumed, we check if the set contains the value zero, and report YES or NO based upon that.

As a minor notice: We remove every negative state in the set when we decrement, as it is not a legal position. (It’s impossible to open a negative amount of parentheses!)

Implementation

Reading in the values and spitting out a solution can be done in more or less the exact same fashion as the previous problem, so let’s keep that part of the code. We start off by removing irrelevant data, and group the different combinations ((, ), :( and :)) in a list through the magic of regular expressions.

#!/usr/bin/env clojure

(defmulti update-state (fn [state type] type))

(defmethod update-state "(" [state _]
  (set (map inc state)))

(defmethod update-state ")" [state _]
  (->> (map dec state)
       (remove neg?)
       set))

(defmethod update-state :default [state smiley]
  (into state (update-state state
                            (subs smiley 1))))

(defn remove-irrelevant [string]
  (re-seq #"\(|\)|:\(|:\)" string))

(defn possibly-balanced? []
  (let [relevant (remove-irrelevant (read-line))]
    (-> (reduce update-state #{0} relevant)
        (contains? 0))))

(defn pprint-result [pos res]
  (println (format "Case #%d: %s" (inc pos)
                   (if res "YES" "NO"))))

(defn main []
  (let [T (read)]
    (read-line) ;; Finish off the first line.
    (dotimes [t T]
      (->> (possibly-balanced?)
           (pprint-result t)))))
(main)

We then perform a reduction, using update-state as the reducing function, #{0} as the initial state, and using the different combinations we’ve filtered out as the collection to reduce over. We then check that the final state contains zero or not.

update-state is a multimethod, which for OO-programmers can roughly be translated to polymorphism without encapsulation and/or inheritance involved. Whenever we hit on either ( and ), we update the state as specified in the written solution above. Those should be rather straightforward to find. For :( and :), we dispatch on the :default method and take the current state and merge it with the state we get by using either ( or ).

Find the Min

The last and hardest problem is related to finding the maximal

After sending smileys, John decided to play with arrays. Did you know that hackers enjoy playing with arrays? John has a zero-based index array, m, which contains n non-negative integers. However, only the first k values of the array are known to him, and he wants to figure out the rest.

John knows the following: for each index i, where ki < n, m[i] is the minimum non-negative integer which is *not* contained in the previous *k* values of m.

For example, if k = 3, n = 4 and the known values of m are [2, 3, 0], he can figure out that m[3] = 1.

John is very busy making the world more open and connected, as such, he doesn’t have time to figure out the rest of the array. It is your task to help him. Given the first k values of m, calculate the nth value of this array. (i.e. m[n - 1]).

Because the values of n and k can be very large, we use a pseudo-random number generator to calculate the first k values of m. Given positive integers a, b, c and r, the known values of m can be calculated as follows:

m[0] = a
m[i] = (b * m[i - 1] + c) % r, 0 < i < k

Input

The first line contains an integer T (T ≤ 20), the number of test cases.
This is followed by T test cases, consisting of 2 lines each.
The first line of each test case contains 2 space separated integers,
n, k (1 ≤ k ≤ 105, k < n ≤ 109).
The second line of each test case contains 4 space separated integers
a, b, c, r (0 ≤ a, b, c ≤ 109, 1 ≤ r ≤ 109).

Output

For each test case, output a single line containing the case number and the nth element of m.

Example input:Example output:
5
97 39
34 37 656 97
186 75
68 16 539 186
137 49
48 17 461 137
98 59
6 30 524 98
46 18
7 11 9 46


Case #1: 8

Case #2: 38

Case #3: 41

Case #4: 40

Case #5: 12

Problems

The first thing to notice is that b * m[i] + c may overflow if using the wrong numeric types. With longs—as is standard for Clojure from 1.4—this result is contained within the restrictions given. As such, this is not a worry in Clojure.

A more troublesome worry—or should I say the troublesome worry—is the size of n. It seems merely impossible to run this in time without doing any work at all. As such, it’s likely that we’re going to do some modulo on the result. We’ll have to do some analysis later on to find out.

To restate the problem, we have an array m where the k first elements in this list have been given, and the rest (n - k) of the values are defined as the lowest non-negative integer which not found in the k elements in front of this element.

The first thing one may notice is that if n is sufficiently large, then the values generated in the sequence may not be larger than k: As we can only peek back at the k elements behind us, this seems reasonable. In addition, as we can only peek at those k elements, it seems reasonable to believe that we at some point end up taking the value of the k+1th element behind us. The question is just when we do so.

To figure out, let’s have a look at an example

     +-+-+-+-+ +=+=+=+=+=+-+-+-+-+-+
m -> |0|2|0|1| |3|4|2|0|1|3|4|2|0|1|...->n
     +-+-+-+-+ +=+=+=+=+=+-+-+-+-+-+
         k=4

So we see that there will be a repeating sequence of k+1 numbers, found immediately after we’ve generated the k first elements. As such, we can skip k+1 * in-k elements and still calculate the correct element.

Implementation

Generating the random numbers seems to be fairly straightforward:

(defn lc-generator [a b c r]
  (iterate #(mod (+ (* b %) c) r) a))

This creates an infinite sequence of random numbers generated by the linear congruential generator specified in this exercise. Picking the k first elements of this lazy sequence gives us what we want.

The next part is to generate the k+1 next elements in the array. The only “hard” part about this is to find the lowest element in the sequence of values we got. Having a sorted set which we fill up with the first k values, remove the elements in the array and put back the k+1th element after we’ve picked the lowest value should suffice. The only thing we have to be careful with is that there may be multiple similar values in the k first elements, so we keep a hash map with the position of the last element with that value in order to avoid using it too early.

We then just modulo, and pick the nth element in the seq. It’s not as hard as you may think, but watch out for off-by-one errors.

#!/usr/bin/env clojure

(defn lc-generator [a b c r]
  (iterate #(mod (+ (* b %) c) r) a))

(defn last-position-table [lc-seq k]
  (->> (take k lc-seq)
       (map-indexed (fn [a b] [b a]))
       (into {})))

(defn candidate-set [lc-seq k]
  (let [k+1-set (into (sorted-set) (range (inc k)))]
    (reduce disj k+1-set
            (take k lc-seq))))

(defn k+1 [lc-seq k]
  (let [hm (last-position-table lc-seq k)
        ss (candidate-set lc-seq k)]
    (letfn [(generate [i [m_i & r] ss]
              (if (= i (inc k))
                nil
                (lazy-seq
                 (let [ss* (if (<= (hm m_i 0) i)
                             (conj ss m_i)
                             ss)
                       lowest (first ss)]
                   (cons
                    lowest
                    (generate (inc i) r
                              (disj ss* lowest)))))))]
      (generate 0 lc-seq ss))))

(defn find-last []
  (let [[n k a b c r] (doall (repeatedly 6 read))
        lc-seq (lc-generator a b c r)]
    (let [next-k+1 (k+1 lc-seq k)]
      (nth next-k+1 (mod (- n k 1) (inc k))))))

(defn pprint-result [pos res]
  (println (format "Case #%d: %d" (inc pos) res)))

(defn main []
  (let [T (read)]
    (read-line) ;; Finish off the first line.
    (dotimes [t T]
      (->> (find-last)
           (pprint-result t)))))
(main)

last-position-table returns a hash map/lookup table which tells us which index element e has within the k first elements, or nil if it’s not there. candidate-set returns a sorted set of candidates; a set with the k+1 first non-negative integers, but without those which is used in the k first elements.

Perhaps the hardest part to understand within this chunk is the generate function. It generates the k+1 elements after the k first ones lazily, using the sorted set to find the min fast and the hash map to avoid putting in integers which is still contained in the k first elements. However, it’s not very magical nor unknown, just a bit different for imperative and object-oriented programmers.

Improving Ancient Skills

And that’s how easy the Facebook Hacker Cup Qualification round is in Clojure. Mind you, I’m not suggesting that experienced C++ algorists should stop using C++ and just use functional languages in programming competitions. Programming competitions are currently designed to be solvable with some mutable state and code running sequentially, at which C++ and Java excel. There are some exceptions: Some of the easier problems are in fact way easier to solve in a functional programming language. I wouldn’t be surprised if there are some in the upcoming round in Hacker Cup, and with 24 hours to do them you have the time to try and fail some. It also gives you a relatively easy introduction to functional programming. As such, I would challenge you to try it out.

However, I can understand why people prefer to use C++ and Java for these competitions. The problems are designed around these languages. I cannot think of any hard programming competition problem I’ve worked on which would be easier to solve with persistent data structures instead of transient ones. To make matters worse, very few competitions accept threading, and in e.g. TopCoder and ICPC you’re only allowed to use C, C++, and Java. While it gives you great knowledge of data structures, algorithms and great programming skills in general, maybe the competitions should be modernized. We’re living in a world where we most of us create real living systems. They often use more than one thread, and sometimes they even run over multiple computers. Creating programs which accept input and spit out some output is not how we write our programs anymore. Maybe the competitions we create should reflect that change, and not hone our abilities in data structures unsuitable for the concurrent world.

  1. These problems are licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. They are available at Facebook Hacker Cup 2013 Qualification Round

  2. Don’t use laziness when reading input from stdin, it may appear out-of-order if you try to do things in parallel. Wrap every map, for and repeatedly with a doall: (doall (repeatedly 10 #(str (read)))) will for instance eagerly read 10 words and return them in a seq.