Page Contents

2022-05-24 #apache_csv #apache_math #clustering #csv #jscheme #kawa #scheme #sisc

Examples of how three implementations of Scheme work with the JVM. I look at:

With examples:

hello world

simplest access to static member and method

data analysis

using Apache Commons CSV and Math libraries to read a CSV file, process some statistics, and cluster the data. (Replicates /220408-read-csv/, /220409-descriptive-statistics/ and /220413-kmeans-example.)

Takeaway:

  • JScheme’s dot notation is very tidy, and keeps things close to Java notation.

  • SISC is cumbersome. Unless I’m doing something wrong, everything you use from the Java side must be "acknowledged" on the Scheme side, and all values must be converted.

  • Kawa is much the more powerful and flexible - classes can be defined in Kawa, and instances of them passed back to Java functions.

JScheme

Created by Peter Norvig for Java 1.1 (see http://norvig.com/jscheme.html), and taken over by Ken Anderson and Tim Hickey in 1998. The last released version, 7.2, was modified around 2002.

JScheme implements R4RS, except that:

  • strings are immutable (JVM limitation)

  • continuations are limited to being escape procedures

Hello World

Using System.out means getting hold of the static object out, and then calling the println method on it:

(.println System.out$ "Hello")

Calling a static method is direct:

(display (Math.sqrt 10.0))

Note that JScheme knows about the java.lang namespace.

Data Analysis

The full file "csv-jscheme.scm" is:

;; Reading CSV files from JScheme

(import "java.io.*")                                                        ; 1
(import "org.apache.commons.csv.*")
(import "org.apache.commons.math3.stat.descriptive.*")

;; converts a CSV record to a list, with the first four entries
;; converted to real numbers
;; returns empty list if invalid
(define (csvrecord->list record)                                            ; 2
  (if (= (.size record) 5)
      (let ((sepal-length (string->number (.get record 0)))
            (sepal-width (string->number (.get record 1)))
            (petal-length (string->number (.get record 2)))
            (petal-width (string->number (.get record 3))))
        (if (and (number? sepal-length)
                 (number? sepal-width)
                 (number? petal-length)
                 (number? petal-width))
            (list sepal-length sepal-width
                  petal-length petal-width
                  (.get record 4))
            ())) ; return empty list if first four entries are not numbers
      ())) ; return empty list if record is not of correct size

;; reads CSV data for Iris dataset from file, and converts to list of lists
(define (read-data filename)
  (tryCatch                                                                 ; 3
    (let* ((input-reader (BufferedReader. (FileReader. filename)))          ; 4
           (records (.iterator (.parse CSVFormat.RFC4180$ input-reader))))  ; 5
      (let loop ((result '()))
        (if (.hasNext records)
            (let ((item (csvrecord->list (.next records))))                 ; 6
              (loop (if (null? item) ; ignore empty list
                        result
                        (cons item result))))
            (reverse result))))
    (lambda (exn)                                                           ; 7
      (display "Error: in reading data from ") (display filename) (newline)
      (System.exit -1))))

;; return an instance of DescriptiveStatistics filled with values using
;; given accessor function on the dataset
(define (statistics dataset accessor-fn)
  (let ((ds (DescriptiveStatistics.)))
    (for-each (lambda (instance)
                (.addValue ds (accessor-fn instance)))
              dataset)
    ds))

;; given an attribute name and DescriptiveStatistics instance,
;; display some interesting information
(define (display-statistics name ds)
  (display name) (newline)
  (display "-- minimum: ") (display (.getMin ds)) (newline)
  (display "-- maximum: ") (display (.getMax ds)) (newline)
  (display "-- mean:    ") (display (.getMean ds)) (newline)
  (display "-- stddev:  ") (display (.getStandardDeviation ds)) (newline))

(define dataset (read-data "iris.data"))

(display "Size of dataset: ") (display (length dataset)) (newline)
(display-statistics "Sepal length" (statistics dataset car))
(display-statistics "Sepal width" (statistics dataset cadr))
(display-statistics "Petal length" (statistics dataset caddr))
(display-statistics "Petal width" (statistics dataset cadddr))
1 Imports the java libraries.
2 csvrecord→list follows very closely to the Java version, except we do not have exceptions, and use () as an error value.
3 tryCatch runs an expression, catching any exceptions.
4 Opens a buffered reader, using very natural syntax.
5 Gets at the csv-records, using an iterator.
6 Use .next, the Java iterator in action.
7 Handles any exception.

Calling and result are as expected, not forgetting to include all the jar file libraries:

> java -cp "commons-csv-1.9.0.jar;commons-math3-3.6.1.jar;jscheme-7.2.jar" jscheme.REPL .\csv-jscheme.scm
Size of dataset: 150
Sepal length
-- minimum: 4.3
-- maximum: 7.9
-- mean:    5.843333333333334
-- stddev:  0.8280661279778628
Sepal width
-- minimum: 2.0
-- maximum: 4.4
-- mean:    3.054
-- stddev:  0.43359431136217386
Petal length
-- minimum: 1.0
-- maximum: 6.9
-- mean:    3.758666666666666
-- stddev:  1.7644204199522626
Petal width
-- minimum: 0.1
-- maximum: 2.5
-- mean:    1.1986666666666665
-- stddev:  0.763160741700841

NOTE: The last step, creating a Clusterable object, is not possible purely in JScheme. A separate Java class would have to be created, compiled, and included separately.

SISC

Implemented by Scott Miller and Matthias Radestock, over the period 2002 - 2007.

SISC implements R5RS.

Hello World

Java classes and methods must be "acknowledged" by the Scheme side.

(import s2j)                                                          ; 1

(define-java-class <java.lang.System>)                                ; 2
(define println (generic-java-method '|println|))                     ; 3
(define getout (generic-java-field-accessor '|out|))                  ; 4
(println (getout (java-null <java.lang.System>)) (->jstring "hello")) ; 5
1 The scheme-java bridge library.
2 Pick out the java.lang.System class.
3 Identify println as a generic method.
4 And out as a field accessor.
5 Finally, call println. Notice a null instance of java.lang.System is needed for getout, and the string must be converted into a Java string.

Similarly, to access sqrt:

(define-java-class <java.lang.Math>)
(define msqrt (generic-java-method '|sqrt|))
(msqrt (java-null <java.lang.Math>) (->jdouble 10.0))

Data Analysis

;; CSV file reading and analysis using SISC

(import s2j)                                                                    ; 1

(define-java-class <buffered-reader> |java.io.BufferedReader|)                  ; 2
(define-java-class <file-reader> |java.io.FileReader|)
(define-java-class <jl-system> |java.lang.System|)
(define-java-class <csv-format> |org.apache.commons.csv.CSVFormat|)
(define-java-class <descriptive-statistics> |org.apache.commons.math3.stat.descriptive.DescriptiveStatistics|)
(define get (generic-java-method '|get|))
(define add-value (generic-java-method '|addValue|))
(define get-min (generic-java-method '|getMin|))
(define get-max (generic-java-method '|getMax|))
(define get-mean (generic-java-method '|getMean|))
(define get-standard-deviation (generic-java-method '|getStandardDeviation|))
(define exit (generic-java-method '|exit|))
(define iterator (generic-java-method '|iterator|))
(define has-next (generic-java-method '|hasNext|))
(define next (generic-java-method '|next|))
(define parse (generic-java-method '|parse|))
(define size (generic-java-method '|size|))
(define getrfc (generic-java-field-accessor '|RFC4180|))

;; converts a CSV record to a list, with the first four entries
;; converted to real numbers
;; returns empty list if invalid
(define (csvrecord->list record)
  (if (= (->number (size record)) 5)                                            ; 3
      (let ((sepal-length (string->number (->string (get record (->jint 0)))))  ; 4
            (sepal-width (string->number (->string (get record (->jint 1)))))
            (petal-length (string->number (->string (get record (->jint 2)))))
            (petal-width (string->number (->string (get record (->jint 3))))))
        (if (and (number? sepal-length)
                 (number? sepal-width)
                 (number? petal-length)
                 (number? petal-width))
            (list sepal-length sepal-width
                  petal-length petal-width
                  (->string (get record (->jint 4))))
            ())) ; return empty list if first four entries are not numbers
      ())) ; return empty list if record is not of correct size

;; reads CSV data for Iris dataset from file, and converts to list of lists
(define (read-data filename)
  (with-failure-continuation                                                    ; 5
    (lambda (m e)
      (display "Error: in reading data from ") (display filename) (newline)
      (exit (java-null <jl-system>) (->jint -1)))
    (lambda ()
      (let* ((input-reader (java-new <buffered-reader> (java-new <file-reader> (->jstring filename))))
             (records (iterator (parse (getrfc (java-null <csv-format>)) input-reader))))
        (let loop ((result '()))
          (if (->boolean (has-next records))
              (let ((item (csvrecord->list (next records))))
                (loop (if (null? item) ; ignore empty list
                          result
                          (cons item result))))
              (reverse result)))))))

;; return an instance of DescriptiveStatistics filled with values using
;; given accessor function on the dataset
 (define (statistics dataset accessor-fn)
   (let ((ds (java-new <descriptive-statistics>)))
     (for-each (lambda (instance)
                 (add-value ds (->jdouble (accessor-fn instance))))
               dataset)
     ds))

;; given an attribute name and DescriptiveStatistics instance,
;; display some interesting information
 (define (display-statistics name ds)
   (display name) (newline)
   (display "-- minimum: ") (display (->number (get-min ds))) (newline)
   (display "-- maximum: ") (display (->number (get-max ds))) (newline)
   (display "-- mean:    ") (display (->number (get-mean ds))) (newline)
   (display "-- stddev:  ") (display (->number (get-standard-deviation ds))) (newline))

(define dataset (read-data "iris.data"))

(display "Size of dataset: ") (display (length dataset)) (newline)

(display-statistics "Sepal length" (statistics dataset car))
(display-statistics "Sepal width" (statistics dataset cadr))
(display-statistics "Petal length" (statistics dataset caddr))
(display-statistics "Petal width" (statistics dataset cadddr))
1 A Scheme-to-Java bridge library.
2 Every Java class and method must be "acknowledged" on the Scheme side: this does have the advantage of providing them with Scheme-natural names.
3 Each type needs to be converted, e.g. primitives are different in Scheme and Java.
4 …​ strings too.
5 Error handling - catches any Java exceptions.

TODO: So far, I could not complete the cluster part of the implementation due to an error I could not resolve.

Kawa

Created and continuously developed and maintained for over 25 years by Per Bothner. Implements several related languages/features, but importantly covers R7RS-small. Last version 3.1.1. Only restriction is that:

  • tail-call optimisation is a compile-time flag, and usually off

Hello World

The structure here is: class-name + static member + method

(java.lang.System:out:println "Hello from Kawa")

So, for example, using sqrt from the Math class would be:

(display (java.lang.Math:sqrt 10.0))

Kawa needs the full namespace, although names can be aliased:

(define-alias jlMath java.lang.Math)
(display (jlMath:sqrt 10.0))

Data Analysis

The full file "csv-kawa.scm" is:

;; CSV file reading and analysis using Kawa scheme

(import (class java.io                                                    ; 1
               BufferedReader FileReader)
        (class org.apache.commons.csv
               CSVFormat)
        (class org.apache.commons.math3.stat.descriptive
               DescriptiveStatistics)
        (class org.apache.commons.math3.ml.clustering
               Clusterable KMeansPlusPlusClusterer))

(define-simple-class IrisInstance (Clusterable)                           ; 2
                     (sepal-length)
                     (sepal-width)
                     (petal-length)
                     (petal-width)
                     (label)
                     (point)
                     ((getPoint)                                          ; 3
                      point))

;; converts a CSV record to a IrisInstance instance, with the first four fields
;; converted to real numbers
;; returns empty list if invalid
(define (csvrecord->iris-instance record)                                 ; 4
  (if (= (record:size) 5)
      (let ((sepal-length (string->number (record:get 0)))
            (sepal-width (string->number (record:get 1)))
            (petal-length (string->number (record:get 2)))
            (petal-width (string->number (record:get 3))))
        (if (and (number? sepal-length)
                 (number? sepal-width)
                 (number? petal-length)
                 (number? petal-width))
            (make IrisInstance                                            ; 5
                  sepal-length: sepal-length
                  sepal-width: sepal-width
                  petal-length: petal-length
                  petal-width: petal-width
                  label: (record:get 4)
                  point: (double[] sepal-length sepal-width petal-length petal-width))
            ())) ; return empty list if first four entries are not numbers
      ())) ; return empty list if record is not of correct size

;; reads CSV data for Iris dataset from file, and converts to list of IrisInstance
(define (read-data filename)
  (with-exception-handler                                                 ; 6
    (lambda (exn)                                                         ; 7
      (display "Error: in reading data from ") (display filename) (newline)
      (display exn)
      (java.lang.System:exit -1))
    (lambda ()
      (let* ((input-reader (BufferedReader (FileReader filename)))        ; 8
             (records ((CSVFormat:RFC4180:parse input-reader):iterator))) ; 9
        (let loop ((result '()))
          (if (records:hasNext)
              (let ((item (csvrecord->iris-instance (records:next))))     ; 10
                (loop (if (null? item) ; ignore empty list
                          result
                          (cons item result))))
              (reverse result)))))))

;; return an instance of DescriptiveStatistics filled with values using
;; given accessor function on the dataset
(define (statistics dataset accessor-fn)
  (let ((ds (DescriptiveStatistics)))
    (for-each (lambda (instance)
                (ds:addValue (accessor-fn instance)))
              dataset)
    ds))

;; given an attribute name and DescriptiveStatistics instance,
;; display some interesting information
(define (display-statistics name ds)
  (display name) (newline)
  (display "-- minimum: ") (display (ds:getMin)) (newline)
  (display "-- maximum: ") (display (ds:getMax)) (newline)
  (display "-- mean:    ") (display (ds:getMean)) (newline)
  (display "-- stddev:  ") (display (ds:getStandardDeviation)) (newline))

(define dataset (read-data "iris.data"))

(display "Size of dataset: ") (display (length dataset)) (newline)

(display-statistics "Sepal length" (statistics dataset (lambda (instance) instance:sepal-length)))  ; 11
(display-statistics "Sepal width" (statistics dataset (lambda (instance) instance:sepal-width)))
(display-statistics "Petal length" (statistics dataset (lambda (instance) instance:petal-length)))
(display-statistics "Petal width" (statistics dataset (lambda (instance) instance:petal-width)))

(let* ((model (KMeansPlusPlusClusterer 3))
       (clusters (model:cluster dataset))                                 ; 12
       (iterator (clusters:iterator)))
  (let loop ()
    (if (iterator:hasNext)
        (let ((cluster (iterator:next)))
          (display "Cluster: ") (display ((cluster:getCenter):getPoint)) (newline)
          (display "Cluster has: ") (display ((cluster:getPoints):size)) (display " points") (newline)
          (loop)))))
1 Imports the java libraries.
2 The IrisInstance class implements the Clusterable interface, required by the library.
3 The interface requires a getPoint method, which returns a double[]
4 csvrecord→list follows very closely to the Java version, except we do not have exceptions, and use () as an error value.
5 Making an instance of IrisInstance requires passing in values for all the fields, including the double[] for the point.
6 with-exception-handler runs an expression, catching any exceptions.
7 Handles any exception.
8 Opens a buffered reader around a file reader.
9 Gets at the csv-records, using an iterator.
10 Use :next, the Java iterator in action.
11 The accessor function for the dataset is a field lookup.
12 Instances of the class can be passed to the clustering algorithm.

The code is run as follows, including the libraries. Notice the --no-warn-unknown-member flag - this makes Kawa output a bit quieter.

> java -cp "kawa.jar;commons-csv-1.9.0.jar;commons-math3-3.6.1.jar" kawa.repl --no-warn-unknown-member .\csv-kawa.scm
Size of dataset: 150
Sepal length
-- minimum: 4.3
-- maximum: 7.9
-- mean:    5.843333333333334
-- stddev:  0.8280661279778628
Sepal width
-- minimum: 2.0
-- maximum: 4.4
-- mean:    3.054
-- stddev:  0.43359431136217386
Petal length
-- minimum: 1.0
-- maximum: 6.9
-- mean:    3.758666666666666
-- stddev:  1.7644204199522626
Petal width
-- minimum: 0.1
-- maximum: 2.5
-- mean:    1.1986666666666665
-- stddev:  0.763160741700841
Cluster: [5.005999999999999 3.4180000000000006 1.464 0.2439999999999999]
Cluster has: 50 points
Cluster: [6.853846153846153 3.0769230769230766 5.715384615384615
          2.053846153846153]
Cluster has: 39 points
Cluster: [5.88360655737705 2.740983606557377 4.388524590163935
          1.4344262295081966]
Cluster has: 61 points