Translating to Clojure: a learning task (Part 1)

The Experiment

My colleague and (ex) fellow Tucsonan, Rick Hightower, recently set himself a learning task: write a 62-based encoder/decoder in Java, translate it into Scala, add functional programming (FP) features, and then translate the result into Clojure, a functional programming language in the Lisp family of languages. Having programmed professionally in both Java and Scala, and being a big fan of Clojure, I was intrigued by Rick's article describing his programming journey.

Because Java and Scala are "kissing cousins", the translation to Scala was pretty direct and the Scala code is probably fairly easy for a Java programmer to understand, even if they do not "speak" Scala. Clojure, however, is a very different language and, as Rick admits in the article, a direct translation "is not the best way to do this per se". In my experience, translating imperative or object-oriented code to Clojure works best if one "steps back" and considers the overall logic and the code sections at a slightly more abstract level. Understanding the goals of each section often allows us to simplify, or even enhance, the code during the translation process.

I haven't written any Clojure code in a couple of years, so I was enthused to try Rick's learning task as a way to get back into the language. I've posted my translation of the Scala code to Clojure in a gist. Clojure is a powerful language and code in any functional programming language can be pretty dense. Just looking at this code doesn't provide any insight into the steps (and mis-steps) that went into the translation process. In this article I aim to provide some of that detail and, hopefully, illuminate some of the concepts behind Clojure that make it fun to use.

By the way, I'm not saying that my Clojure code is the best Clojure code in the world. A more experienced Clojure programmer might do better on one or more metrics of speed, efficiency, readability, etc. but, hopefully, my code can serve to illustrate some of the principles of functional programming.

A Word on the Programming Process

Programming in Clojure is a highly interactive process. Most Clojure programmers I know build their programs a fragment at a time, testing each little piece in a simple console interface called a REPL (for Read-Eval-Print-Loop). The individual working fragments are then composed into larger and larger pieces. (The use of composition is an important aspect of most programming languages but is fundamental to functional programming languages). Most major editors support Clojure and most of them interact facilely with a REPL on your behalf; allowing you to edit, test, and debug from within the editor.

Interactive Development

Using a REPL speeds up development and catches mistakes early. With a direct, low-level translation of Scala to Clojure, Rick ended up with this definition of the DIGITS array:

(def DIGITS (clojure.string/join "" [
              0  1   2   3   4   5  6   7   8   9
              'A, 'B 'C 'D 'E 'F 'G 'H 'I 'J 'K 'L
              'M 'N 'O 'P 'Q 'R 'S 'T 'U 'V 'W 'X 'Y 'Z
              'a 'b 'c 'd 'e 'f 'g 'h 'i 'j 'k 'l
              'm 'n 'o 'p 'q 'r 's 't 'u 'v 'w 'x 'y 'z
              ])
  )

Evaluating this expression at the REPL reveals that it yields a string. Thus, we can simplify the definition of DIGITS by just defining it as a Clojure string literal (user=> is the REPL prompt):

Clojure 1.11.1
user=> (def DIGITS (clojure.string/join "" [
              0  1   2   3   4   5  6   7   8   9
              'A, 'B 'C 'D 'E 'F 'G 'H 'I 'J 'K 'L
              'M 'N 'O 'P 'Q 'R 'S 'T 'U 'V 'W 'X 'Y 'Z
              'a 'b 'c 'd 'e 'f 'g 'h 'i 'j 'k 'l
              'm 'n 'o 'p 'q 'r 's 't 'u 'v 'w 'x 'y 'z
              ])
  )

user=> DIGITS
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

user=> (def DIGITS "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")

Clojure is Lisp on Java

Clojure is "built on" Java. Its datatypes are based on abstractions, which are specified by Java interfaces and implemented by Java classes. Several of Clojure's basic datatypes map directly to Java datatypes. For examples: natural numbers default to Java Longs, Clojure Characters are Java Characters, and Clojure strings are Java Strings. We can see an illustration of this in the REPL using the type function:

user=> (type DIGITS)
java.lang.String

Clojure's deep connection with Java is possible because it, too, is a Java Virtual Machine hosted language, like Java and Scala. It utilizes the JVM type system, garbage collector, threads, and other JVM components and it has great interoperability with Java classes. Experienced Clojure programmers try to avoid "re-inventing the wheel", by using existing Java library capabilities where possible.

Clojure code is organized into namespaces, which are a context for names and a container for variables. When a new namespace is created it will, by default, contain mappings for the classnames in java.lang. For our translation exercise, this means that a routine like doPow in the Java or Scala code can be replaced, in Clojure, by an inter-op call to java.lang.Math.pow and a subsequent cast to a long integer, as illustrated in the REPL:

user=> (defn doPow [exponent, base]
  (if (== exponent  0)  1
    (* (doPow (- exponent 1)  base) base)
  )
)
user=> (doPow 16 2)
65536
user=> (long (Math/pow 2 16))
65536

However, since the capability to raise a number to a power is used several times in the Scala code, it makes sense to retain this as separate function. Therefore, I initially translated this to Clojure (and tested it in the REPL) like so:

user=> (defn powerOf [base exponent]
  (long (Math/pow base exponent)) )
user=> (powerOf 2 16)
65536
user=> (powerOf 2 0)
1

The power of Clojure goes to my head

Once I created powerOf, I realized that Clojure would easily allow me to generalize it to a function which created a whole family of functions, each raising a different base to an exponent. I say "easy" because one of the hallmarks of functional programming is the use of High-order functions (HOFs): functions which take other functions as arguments and/or return functions. In Clojure, the HOF partial takes a function and some of its required arguments and returns a new function with the arguments "partially filled in". We can use partial to redefine powerOf as a function generating function:

user=> (defn powerOf [base]
  (partial (fn [base exponent] (long (Math/pow base exponent)))
           base) )
user=> (def pow2 (powerOf 2))

Notice that powerOf has been redefined as a function taking only one argument: the base. Our previous powerOf function, with base and exponent arguments, is now an anonymous function (denoted by the fn) and it is the first argument to the HOF partial. The second argument to partial is base, the item to be filled into the two-argument anonymous function.

Whew...if this is the first time you've seen HOFs in action, this can be a little mind-boggling, but the end result is that our new powerOf function takes a single argument (base) and uses it to "partially fill in" the arguments of the inner, two-argument, anonymous function, and then returns that function. The returned function has no name so we have to bind it to a name (with def) in order to be able to use it more than once.

In the following example, we use powerOf to generate a "powers of 2" function, which we name pow2. Note that pow2 is a function with a "built-in" base of 2. We can call it as many times as we want, each time giving it the single exponent argument that is the power to raise 2 to.

user=> (def pow2 (powerOf 2))
user=> (pow2 16)
65536
user=> (pow2 5)
32

But wait, there's more! The same powerOf function can be used to generate multiple functions, each with a different "built-in" base. In this example, we generate another one: a "base 16 powerOf" function:

user=> (def raise16To (powerOf 16))
user=> (raise16To 2)
256
user=> (raise16To 4)
65536

Java is loopy

Another thing to notice about the Java and Scala source code is that there is a lot of iteration, which is not uncommon for either language. Functional programmers, on the other hand, tend to think of data more abstractly as sequences; often of unknown, or even infinite, length.

Sequences are a fundamental abstraction in Clojure and many Clojure programs are written to consume, transform, and generate sequences of data. In my experience, translating iterative object-oriented code to good Clojure code often entails that the programmer examines the intent of the loop and tries to replace it with code which operates on a sequence of elements, rather than processing the data one element at a time. Luckily, operations on sequences are Clojure's forte and using these functions can eliminate a lot of indices, counters, pointers, and magic numbers which often occur in Java and Scala code.

For example, in the convertToLong method of the Java code there is a variable index, looping over the characters of the digit string and a separate position counter, looping over the indices of the digit string. Inside the loop, computeValue is called which calls findIndexOfDigitInTable to loop over the DIGITS string (a fixed magic number of times). That's a lotta loops...with other calculations and operations going on inside each of them, repeated for each character in the input string. To me, that mode of processing obscures the algorithm and is difficult to debug.

The Algorithm using Sequences

The algorithm for convertToLong is relatively easy but it took me a long time to "play computer" and walk through all the interactions in the Java (and Scala) version. Basically, you have a string of digit characters in some base. Each digit character represents a factor you multiply by the power of the base at that position. The factor is computed by indexing into the string of all possible digit characters. Then you sum the multiplications to get the value of the number in base-10. For example the base-62 number '384' represents (3 * 62^2) + (8 * 62^1) + (4 * 62^0).

Again, in my opinion, the algorithm is much easier to see in my Clojure version of convertToLong, as it is mirrored in the structure of the function:

1. powersAt = (...compute a sequence of the powers of the base for all digit positions)
2. factors = (...compute a sequence of the factors for all digit positions)
3. return the sum of (the multiplication of the powers sequence times the factors sequence)

But HOW is it doing it? (Step 1)

To compute the sequence of "powers of the base", powersAt is using one of the core FP operations: map. map takes a function (thus, map is a HOF) and applies the function to one or more argument sequences, returning a sequence of the results. The function that we want map to repeat is a function which raises a specific base to a power (e.g., a base of 62). We saw above that we can generate such a function using powerOf by saying: (powerOf 62).

But how many times do we want to repeat our function? Well, we only need to repeat it once for each digit in the number to be converted. The range and count functions work together to generate a sequence from 0 to (the size of the input string - 1). So, for example, if the input string is "H2O" then map produces our desired sequence of powers (but backwards, so we want to reverse them for the upcoming multiplication):

user=> (map (powerOf 62) (range (count "H2O")))
(1 62 3844)
user=> (reverse (map (powerOf 62) (range (count "H2O"))))
(3844 62 1)
user=> (def powersAt (reverse (map (powerOf 62) (range (count "H2O")))))

But HOW is it doing it? (Step 2)

In the second step of the algorithm, we need to compute the multiplicative factors for each of the powers. Each factor can be quickly determined by finding the zero-based index of each digit character within the string of all the possible digit characters. We could write a function to loop through the string of all possible digit characters but, since the set of all possible digit characters doesn't change, it will be more efficient to use a "direct access" data structure.

In order to relate the individual digit characters to their indices we can use a Clojure Map data structure (not to be confused with the map function). We can use the zipmap function to create such a map from a sequence of keys and a sequence of values. In our case, the keys are the individual digit characters and the values are their integer indices. We'll save the map in a variable digitIndex:

user=> (zipmap DIGITS (range (count DIGITS)))
{\A 10, \a 36, \B 11, \b 37, \C 12, \c 38, \D 13, \d 39, \E 14, \e 40, \F 15, \f 41, \G 16, \g 42, \H 17, \h 43, \I 18, \i 44, \J 19, \j 45, \K 20, \k 46, \L 21, \l 47, \M 22, \m 48, \N 23, \n 49, \O 24, \o 50, \0 0, \P 25, \p 51, \1 1, \Q 26, \q 52, \2 2, \R 27, \r 53, \3 3, \S 28, \s 54, \4 4, \T 29, \t 55, \5 5, \U 30, \u 56, \6 6, \V 31, \v 57, \7 7, \W 32, \w 58, \8 8, \X 33, \x 59, \9 9, \Y 34, \y 60, \Z 35, \z 61}
user=> (def digitIndex (zipmap DIGITS (range (count DIGITS))))

It is important to note that a string in Clojure is considered to be a sequence of characters, so that the DIGITS string satisfies the requirement that zipmap's arguments be sequences.

Now any individual digit character can be "looked up" in the map to find its index:

user=> (digitIndex \H)
17
user=> (digitIndex \z)
61
user=> (digitIndex \0)
0

This incantation works because Clojure Maps implement a lookup function for their keywords, such that (aMap key) => value (or nil, if there is no value for the given key).

As we want the indices of all the digit characters in the input number, we can, once again, turn to the map function, computing the factors sequence of Step 2:

user=> (map digitIndex "H2O")
(17 2 24)
user=> (def factors (map digitIndex "H2O"))

But HOW is it doing it? (Step 3)

Having computed the sequence of "powers of the base 62" (Step 1) and the sequence of multiplicative factors (Step 2), all that's left is to multiply and sum them to calculate the base-10 number:

user=> (reduce + (map * powersAt factors))
65496

We use map once more, this time with multiplication (*) as the function and our sequences as the arguments, producing a new sequence of pairwise products. The reduce function then sums the products to create the final base-10 result. eh...Voilá!

But that's only half the program!

Less than half probably, as I've only discussed the code for converting from a base-62 number string to a base-10 number. In the Part 2, I'll explain the code for the inverse: converting a base-10 number to a base-62 number string. Also, for those of you who have been looking at the program code in the gist, you will have noticed that it does not quite match what we've been developing in Part 1. The posted code has been generalized to enable conversion to and from any base from 2 to 62. I'll explain those differences in Part 2 also.

For now, I'd like to reiterate the critical importance and convenience of interactive Clojure development using a REPL and composition. In this article, we were able to try out, understand, and validate each little fragment of the program and then compose those pieces to produce greater and greater functionality. This may seem odd to Java programmers who are used to "batch compiling" files and then trying to debug them but to Clojure programmers the REPL allows us to explore, understand, construct, and debug as we go.