Primes results for x86 vs. PPC vs. Arm

2006-12-22 16:03:06 PST

Tags: , , , , ,

Well, since, I got Debian installed on my GP2X, the first thing I did was run some benchmarks. As per usual, I fell back on my "raw and simple math computation" benchmark suite Primes, a collection of prime number finders written in many languages. For the rest of the article we will thus be evaluating languages purely on math computational speed and nothing else.

I had already gathered data from some of my computers from when I bought Bast, my G3, so I just ran the benchmarks on the GP2X and added them in for comparison.

Machines

Name Arch Speed OS
Inferno x86 - Athlon 1500MHz Getnoo Linux
Nika x86 - Pentium-M 1500MHz Getnoo Linux
Kvasir x86 - Pentium 4 2800MHz Getnoo Hardened Linux
Bast PPC - G3 350MHz Getnoo Linux
GP2X ARM - 920T 200MHz Debian Linux

Results

Time in seconds to find all prime numbers between 1 and one million

	Inferno	Nika    Kvasir  Bast    GP2X
C	1.19	0.79	0.45	2.83	35.1
ObjC	1.19	0.8		2.83	43.4
C++	1.93	1.06	1.1	4.76	50.1
Java	3.59	1.63	2.14	40.3
C#	3.69	1.87		10.5	140
Awk	32.1	27.1	30	199	2065
Perl	38.2	21	23.3	145	1280
PHP	15.1	8.89	13.4	64.9	758
Python	54	38	43.8	211	1526
Lisp	10.4	5.19		36.3	2674
Primes - x86 vs. PPC vs. ARM - Unscaled

As you can see, of course, the 200MHz ARM GP2X was destroyed by everyone else. This really doesn't tell us anything at all actually.

Clearly scaling needed to be introduced so that we could see how the languages faired on the different architectures. All being equal I assumed that GCC would give the best optimization per platform and that the C results would be the least skewed, so I choose to represent all the results as multiples of the C time results, or divide all the times by the time that the C prime number finder took on that computer. It yielded the following results:

Time in multiples of C time to find all prime numbers between 1 and one million

	Inferno	Nika    Kvasir  Bast    GP2X
C	1.00	1.00	1.00	1.00	1.00
ObjC	1.00	1.01		1.00	1.24
C++	1.62	1.34	2.44	1.68	1.43
Java	3.02	2.06	4.76	14.24
C#	3.10	2.37		3.71	3.99
Awk	26.97	34.30	66.67	70.32	58.83
Perl	32.10	26.58	51.78	51.24	36.47
PHP	12.69	11.25	29.78	22.93	21.60
Python	45.38	48.10	97.33	74.56	43.48
Lisp	8.74	6.57		12.83	76.18
Primes - x86 vs. PPC vs. ARM - Scaled

Analysis

x86 machines

You would think we could use these 3 machines to establish a baseline but we find a fairly large variance in results here. First of all, inexplicably, Inferno is the only machine to have a worse result for Perl than Awk. Also, we can see that when it comes to the interpreted languages, Kvasir is near the worst in all cases. I would attribute this to Hardened Gentoo adding very viable overhead to the interpreters. It seems to show there is a definite cost for added security in terms of efficiency.

PPC

The most notable result here is the Java result which is clearly markedly bad. Java for the PPC could use some optimization. C# care of mono on the other hand is competitive. The rest of the interpreted languages also lag behind on the PPC, although Lisp (SBCL) comes pretty close the the x86 scaled results.

ARM

And now what you've been waiting for, the ARM results for the GP2X. Well, Sadly, I just couldn't really get Java for the GP2X. Sun Java appeared to be unavailable. I could have used something like Kaffe or Jikes, but it probably wouldn't have been to fair, and also, on Debian, they insisted in pulling in some parts of Xorg as dependencies, and I didn't have a lot of space to play around with on the GP2X. Kind of lame, oh well. Again, as with the PPC, C# care of mono is very respectable. Other than that, the Lisp result was way off by a staggering amount, but that can be explained by SBCL not being available for ARM. I had to resort to the clearly slower CLisp compiler/'interpreter.'

Languages

As much hype as there is around Python these days, it appears to be fairly slow across the board compared to other interpreted languages. PHP on fact is the fastest of PERL, PHP, and Python on all tested architectures, so if you need speed and yet still the flexibility of an interpreted language, you might seriously want to use the CLI version of PHP. It seems the interpreter is very optimized across the board.

If you happened to be on an architectures supported by a good Lisp compiler like SBCL it is also a very attractive and viable option, but unfortunately for ARM, it looks like CLisp isn't, speed wise anyway.

As for Java, hopefully the GPLing of it will allow it to a) be further optimized for alternate architectures like PPC, and b) for it to be fully ported to 'new' architectures like ARM. If you're looking for speed but the middle ground flexibility of a VM language than in the mean time C# in the form of Mono is a fantastic looking choice.

Note: Ruby has again been omitted because until 2.0 is released, which includes a real VM, it is sadly not even remotely competitive in this area (math computation). It is about an order of magnitude slower than any of the other interpreted languages. Ruby 1.9 CVS last I checked (half a year ago) was competitive with Python :)

And that's it from me on the Primes front (at east until I acquire a Sparc box :P ...)

Crude benchmarks of a G3 compared to some x86 boxes

2006-11-21 20:51:08 PST

Tags: , , ,

Well, now that Bast was up and running I had to start testing it. I mean that's why I bought a Mac. So I tossed on some of my old code, specifically, VM-Proto, Fast-Lang, and Primes.

VM-proto was a tiny virtual machine and compiler I wrote several years ago to learn about basic compilation to byte code, and execution of said byte code. I also then made it very portable in order to learn about low level portability issues like endian safe code. I was pleased when vm-proto compiled, was able to compile source (a prime number finder) to object code and execute that object code. Better yet, object code vm-proto compiled on Bast (PPC) was executable by vm-proto on Inferno (x86), and object code compiled by vm-proto on Inferno, was executable on Bast. A complete success.

Next up was fast-lang, a simpler interpreter that I wrote this summer just to see how fast I could make a simple language. It compiled and was able to execute a simple stack test program I wrote for it, but failed on the more complex prime number finder with a seg fault. Clearly not so portable, although that was never a goal with it it, I just wanted it to be fast on what I had, which was x86.

Finally came my Primes suite, which is a collection of prime number finders I wrote in about 37 languages and a simple perl program to run them all and store benchmarks on how fast they ran. It was a fun way to get introduced to a lot of languages and also fun to race them against each other.

I've often used Primes as a crude benchmark of languages' computational power and now I was going to use it to see how well languages worked on a Mac as compared to x86.

I ran a subset of tests in C, Object C, C++, Java, C# (mono), Awk, Perl, PHP, Python, and Common Lisp (sbcl). I ran the tests on four of my computers for comparison: Inferno, a 1.5GHz Athlon-XP; Kvasir, a 2.8GHz Pentium 4; Nika, a 1.5GHz Pentium M, and of course Bast, my new 350MHz PPC G3. The results are below:

	Inferno	Nika	Kvasir	Bast
C	1.19	0.79	0.45	2.83
ObjC	1.19	0.8		2.83
C++	1.93	1.06	1.1	4.76
Java	3.59	1.63	2.14	40.3
C#	3.69	1.87		10.5
Awk	32.1	27.1	30	199
Perl	38.2	21	23.3	145
PHP	15.1	8.89	13.4	64.9
Python	54	38	43.8	211
Lisp	10.4	5.19		36.3
Primes Test Results

There were some surprising results in there, but let's start with Bast, the G3. C, Object C, and C++ were about par for the course, but then look at the spike in time for Java. For the record, Nika and Inferno are both using Sun's JDK 1.5, and Bast is using IBM's JDK 1.5. There really shouldn't be that difference. Clearly Java is a lot less optimized for the PPC. (Hopefully now that it's GPLed that can get some attention?). Next up was C# care of Mono, which looks good comparatively. Then a huge spike with Awk, which looks like it could be missing some PPC love. It's a bit hard to tell with perl, but it looks reasonable when compared with the PHP and Python results. Comparatively speaking PHP is looking pretty good on the PPC. Python is slowest, but it's slowest on x86 too so no surprise. And Finally Common Lisp care of SBCL, looking good, and beating Java on the PPC. Impressive. So what have we learned? Java and Awk need some PPC love. (I didn't bother to include Ruby results, because until 1.9 becomes the stable 2.0, Ruby doesn't officially have a VM implementation and is incomparably slow. Ruby 1.9/2.0 with YARV is competitive, but we'll leave that for another day).

Now to my surprise, aside from C, Kvasir lost to Nika on all tests, even though Nika is half the speed. Nika also beat out Inferno and they should be much more comparable. Also, on Inferno you can see that Perl actually performs worse than awk. So I'll try to guess and explain that. Inferno and Kvasir both use GCC 3.4. and compiled all the languages with GCC 3.4, while Nika uses GCC 4.1. Inferno and Nika both use -O3 optimization while Kvasir uses -O2 optimization, and Kvasir is using the hardened tool chain with a hardened kernel. So we can see that GCC 4.1 seems to provide a solid edge over 3.4 (as seen from Nika beating Inferno), and GCC 3.4 + the hardened tool chain + a hardened kernel gives a massive performance hit compared to vanilla gcc 4.1 as seen from Nika beating all languages but C on the nearly twice as powerful Kvasir. I'm a bit shocked to find that my laptop which I would have guessed would be the clear looser with respect to interpreted languages number crunching won out. Neat to know.

So now I've learned a few things.

You can grab Primes from my git server

git clone git://git.mindstab.net/git/primes

But be warned, the autoTest.pl code is old and beyond hideous. But it kind of gets the job done.

(And please no complaints that these tests aren't fair. I'm not claiming they are anything remotely definitive. They are what they are: prime number finders in a bunch of languages)

Edit: I forgot to mention two things:

1. Getting Java for PPC is still a pain. Sun-SDK in portage isn't available for PPC. All that is is the IBM version, which is still fetch restricted. You have to sign up with IBM to get it which involved you telling them all about your business, weather you have one or not, and all about yourself, and full addresses for both, plus a phone number and email address. Extremely lame and I hope the new GPL Java fixes this stat.

2. I didn't include fortran results because GCC 4.1 seems to have a new and improved fortran compiler and my terrible old code doesn't compile anymore and I wasn't in the mood to go brush up on fortran to figure out why. Would have been neat though.

Better Lisp code

2006-07-20 23:16:51 PST

Tags: , ,

So. Yes my Primes suite is a bad benchmark at times. Crude certainly. But also unfair for languages that I don't know well. Remember when I was surprised at how "slow" the lisp results were? Well my bad. I got Paul Graham's "ANSI Common Lisp" for my birthday on the 12th. Been doing some reading. I'm starting to get the idea behind lisp, and actually haskell too. Functional programing, programing with out side effects. Lots of recursion. Which normally I only use in certain circumstances but not in place of loops because it's innefficient. So my Lisp code used a loop. Well, it turns out that little did I know functional languages like these have something called "tail loops" (the recursive call is the last part of a function) and they can be optimized to goto loops or something. So I wrote a new primes in lisp.

primes2.cl

(defun check (max i sq)
  (if (> i sq)
    (format t "~d~%" max)
    (if (not (= (mod max i) 0))
      (check max (+ i 2) sq))))

(defun _primes (max i)
  (if (< i max)
    (progn
      (check i 3 (sqrt i))
      (_primes max (+ i 2)))))

(defun primes (max)
  (_primes max 3))
results.1000000.txt

(22/36) Common Lisp results:
Execute [sbcl]:         5.21 seconds

That's better looking. And I'm still just a total beginner. Cool though.

Food for after thought: I took a look at a haskel intro linked to on reddit and for the first time ever stuff about haskell actually made sense to me. Being a functional language like lisp, and having a list data type, like lisp, it's actually got surprisingly strong similarities. Huh.
Also, sadly, currently Ruby doesn't have tail recursion optimization, I've heard it said VARV (the ruby VM) in the next version of Ruby will support it.

Been playing

2006-07-01 23:20:17 PST

Tags: , , ,

So I found this (bcompile) on reddit. It's a tutorial on bootstrapping/writting a compiler starting only with a hex to binary compiler.

So I was thinking about languages again. And it's the summer (maybe that time of year again?). So today on this holiday weekend I bashed out a language that was as fast as I could make it (rather crappy as a side effect ;)).

ftp://ftp.mindstab.net/lang-2006.07.01.tar.gz

README

Unnamed language.
<haplo @mindstab.net>
www.mindstab.net

I wrote this 'language' in a day.  It's only goal was to be as fast as I
could make it.  To that end it has little to no error correction, and
statically sized buffers.  Still, it's pretty fast.

Compared to vm-proto, the 'language' I wrote last year:

vm-proto:
        assembly like syntax
        compiles to endian safe object code
        ~2.5x slower at calculating primes than unnamed lang
        has some registers
        written in c++
        usage:
                compile foo.vms
                vm foo.vmo

Unnamed lang:
        assembly like syntax
        compiles and executes code in memory
        (possibly endian safe but untested)
        about  1/3 code size of vm-proto
        has some registers
        has a working stack (small, but can be enlarged in the code by a define)
        written in dirty c (function and variable names aren't the sanest)
        usage:
                lang < foo.src

Considering this is about as fast as I can think to make a language, I honestly
don't know how languages that are so much more fully featured like PHP are
written considering PHP is only marginally slower (and faster than vm-proto),
and I assume VM's like mono and java compile their byte code to architecture
machine code at some point. (?)

I also have been running some bench marks with my old primes suite.

1000000

(1/36) C results:
Compile [gcc]:          0.10 seconds
Execute:                0.88 seconds

(4/36) x86 Assembly results:
Compile [gcc]:          0.04 seconds
Execute:                0.76 seconds

(13/36) C# (Mono) results:
Compile [mcs]:          1.03 seconds
Execute [mono]:         1.74 seconds

(16/36) AWK results:
Execute [awk]:          23.6 seconds

(17/36) PERL results:
Execute [perl]:         22.4 seconds

(18/36) PHP results:
Execute [php]:          9.65 seconds

(19/36) Python results:
Execute [python]:       41.3 seconds

(21/36) Ruby results:
Execute [ruby]:         227. seconds

(22/36) Common Lisp results:
Execute [sbcl]:         10.5 seconds

Compile [gcl]:          0.00 seconds
Execute [gcl]:          23.7 seconds

(26/36) Fourth results:
Execute [gforth-fast]:  2.15 seconds

haplo@nika ~/src/my/lang/vm-proto $ time ./vm primes.vmo > /dev/null
real    0m25.325s
user    0m25.226s
sys     0m0.052s

haplo@nika ~/src/my/lang/fast-lang $ time ./lang < primes.src >/dev/null
real    0m7.558s
user    0m7.412s
sys     0m0.024s

[Update: 2006.07.02]
optimized lang
haplo@nika ~/src/my/lang/fast-lang $ time ./lang < primes.src  > /dev/null
real    0m3.753s
user    0m3.736s
sys     0m0.008s

As you can see, PHP5 is really fast for it's class (interpreted language). I was surprised. Ruby is distressingly slow. Mono is rather closer to native compiled speed. And my newest language beats all the fancy interpreted languages! (I'm pretty sure forth has a pretty clear machine code compile path :)).

[Update: I was surprised to find out that GCL is slower by far than SBCL even though GCL compiles Lisp.]

[last published version of primes can still be found at ftp://ftp.mindstab.net/primes with both a .tar.gz available for download and all the source online for browsing.]

Valid XHTML 1.0!
Valid CSS!
Mindstab.net is proudly powered by WordPress
Entries (RSS) and Comments (RSS).
18 queries. 0.473 seconds.