Friday, December 28, 2012

lein-resource 0.2.0 hits clojars

I just deployed lein-resource to clojars. What is lein-resource? A Leiningen plugin that can be used to copy files from mulitiple source directories to a target directory while maintaining the sub-directories. Also, each file will be transformed using stencil. The map that is passed to stencil contains a combination of: The project map The system properties (with .prop added to the name ) Additional values (currently only :timestamp) Values set in the project.clj using :resource :extra-values

Wednesday, December 19, 2012

Hammock driven development pays off with a ClojureScript pipe function - ok not really - See core.async instead

Again, with only using the think method, here is the ClojureScript version of the pipe method
Thanks to Are pipe dreams made of promises? for writing it and Clojure - From Callbacks to Sequences for pointing it out to me. EDIT: While still a good idea, futures and promises are not ClojureScript approved.

Another EDIT:  See core.async for the right way to do this.

Clojure - From Callbacks to Sequences

I have been spending some hammock time on the problem of converting a stream of events into a sequence.  And Lo and Behold, the answer falls in my lap.  I may need to see if I can implement the pipe function in ClojureScript.  Maybe if I spend more time in the hammock, someone will have already done it.

Be sure and read Clojure - From Callbacks to Sequences for a nice explanation and examples.

Here is the function that does the magic:

Wednesday, February 1, 2012

Clojure: lazy seq + database = bad

In my work on topoged-hibernate I naively thought that it would be great to return a lazy-seq of the results of a query like:


However, this has several problems, not the least of which is that the underlying session is closed and the data is inaccessible.  When the with-session macro finishes, the session and transaction are closed and then the lazy-seq is returned.  However, it no longer has a connection to the database and so an exception is thrown once sequence is read.  This is just one problem with accessing data from a database via a lazy-seq.

The same problem applies to any attempt to use a lazy-seq to access a database.  The problem is that this is no way to know when the seq is no longer in use to close the connection.  One could easily add to the seq to check once the end of the dataset had been reached.  At that point, it could close the connection.

However, a lot of the time the entire dataset need not be read as (take 10 dataset).   The seq would be never get to the end and leave to connection open.  This is not really a Clojure problem, but a problem with Java.  In Java, there is no way to know when an Object is done being used.  On course there is the finalizer but that has proven to has so many problems that its use is discouraged.

One solution is to read all the data before returning.  This is a common way of handing data and works for small datasets (small enough to fit into memory).  It is an easy and efficient way to get the desired results and still manage the database connections.

Larger datasets, that cannot be stored in memory, have to be processed as the data is read, thusly:


Now the data is processed within a closure that will close the connection once the processing is completed.  Of course, one must take care to not accidentally return a lazy-seq by returning a map or filter of the data.

What I wish we could do is to return a seq that could then close its own connection once it is no longer in use.  There are some huge issues with that including:
  • Java is not helpful and only figures out if an Object is no longer in use at garbage collection which is not guaranteed to happen for any given Object.
  • The onus then falls on Clojure to keep track of each resultset and "determine" when it is no longer in use.  This might entail writing a secondary garabage collection in Clojure.
  • Another problem is that the seq might never be cleaned up, like if it were "def"ed.
In conclusion, while using lazy-seqs to process datasets might seem like a good idea, you will quickly find it is not worth the trouble.

Thursday, January 26, 2012

Hexlify in Clojure

Looking at this gist, I have created functionality similar to EMACS hexlify-buffer. In EMACS, it reads a binary file and presents two views by breaking the file into 16 bytes sections.  Each section is shown on the same line as in example below:


The first column is the offset into the file, in hex.
The 8 columns are the hex representation of the 16 bytes represented in the line.
The last column is a printable representation of the 16 bytes, with unprintable characters represented as a period.

The hexlify clojure function accepts a seq of bytes or chars and returns a vector, each element of the vector represents 16 bytes of the input.

For example:

(hexlify "The quick brown  dog jumped over the lazy fox") 
 
=> ([("54" "68" "65" "20" "71" "75" "69" "63" "6b" "20" "62" "72" "6f" "77" "6e" "20")
  ("T" "h" "e" "." "q" "u" "i" "c" "k" "." "b" "r" "o" "w" "n" ".")
  (84 104 101 32 113 117 105 99 107 32 98 114 111 119 110 32)]

 [("20"  "64"   "6f"   "67"   "20"   "6a"   "75"   "6d"   "70"   "65"   "64"   "20"   "6f"   "76"   "65"   "72")
  ("." "d" "o" "g" "." "j" "u" "m" "p" "e" "d" "." "o" "v" "e" "r")
  (32 100 111 103 32 106 117 109 112 101 100 32 111 118 101 114)]

 [("20" "74" "68" "65" "20" "6c" "61" "7a" "79" "20" "66" "6f" "78")
  ("." "t" "h" "e" "." "l" "a" "z" "y" "." "f" "o" "x")
  (32 116 104 101 32 108 97 122 121 32 102 111 120)])

The first element of each vector is the 16 hex values, one for each byte in the partition.  The second element in the vector is the printatble character representation and the last element in the vector is the numeric representation.

This allows inspection of byte arrays within clojure quickly and easily.


The code is repeated below: