Useless Microoptimizations Homepage Forum Forum Index Useless Microoptimizations Homepage Forum
Don't get confused, this is just my homepage, not really a message board. I implemented it as a forum for reasons you can find here.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

crabench - a benchmarking suite for Unix (with results for different computers)

 
Post new topic   Reply to topic    Useless Microoptimizations Homepage Forum Forum Index -> General hardware notes
View previous topic :: View next topic  
Author Message
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Fri Sep 30, 2005 1:31 pm   Reply with quote

This is a very early version of my benchmark suite.

Detailed info on the suite and the tests I run can be found in the FAQ are at
http://www.cons.org/cracauer/crabench/FAQ

Result graphs:


The user CPU time results are pretty usable, the wall clock results are partly rough. That is particularly true for those tests that don't much much CPU time, for example the pure file transfer http tests. These vary greatly and I didn't have enough time yet to run them often enough.

The file with the raw results can be found here (gzip'ed text):
http://www.cons.org/cracauer/crabench/results.txt.gz

%%

Results specialized on memory timings and cache size are listed in this thread:
http://cracauer-forum.cons.org/forum/viewtopic.php?t=27


Last edited by Useless Microoptimizations on Wed Nov 02, 2005 7:12 pm; edited 8 times in total
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Fri Sep 30, 2005 1:46 pm   Reply with quote

I am currently running these benchmarks and update the graphs several times a day.

I will make a note in this thread when major updates happen. If you subscribe to this thread you can get notifications by mail.

Problems or questions - mail me.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Sat Oct 01, 2005 8:51 am   Reply with quote

The graphs have been updated with my 3800+ running with memory speeds simulating the Opteron with its pc2100.

That way you can see what effect the bigger cache of the Opteron has, and other differences between the architectures (the numbers fluctuate a little different from what you would expect for cache size reasons only).

Overall it says that the bigger cache is pretty much useless except for low-quality video. Not even the big C++ compilations in the Mozilla subdirectories show much of an effect. Money is obviously better spent elsewhere.

The Linux kernel compilation shows some improvement from the bigger cache, which is interesting because the FreeBSD kernel build does not. Still, 7% savings is all the bigger cache buys you. For the same money you can usually buy a 10% higher clocked CPU which will show the same improvments for this test but much better improvements for the others.

The different performance characteristic between the Linux and the FreeBSD kernel builds is probably rooted in the include file structure. FreeBSD's include files are much more straightforward, Linux's are deeply nested and actual language constructs expand via long detours through macros and typedefs.

I should delete the old Opteron run (the 2 GB one), it is erratic. It has been taken when the suite was on FreeBSD-6.0-beta2 and while at the time I didn't see a difference in timings it now becomes apparent that subsequent little fiddlings threw these numbers off.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Sat Oct 01, 2005 10:32 am   Reply with quote

I have a half-verified stable overclock of the 3200+ @ 2.7 GHz with 1:1 RAM coming in, first results in the charts. [ETA: not stable]



Now that I cleaned up the Opteron results and the 3800+ simulating the Opteron's RAM some differences become apparent:
  • the real Opteron outperforms the 3800+ on some of the tests where the foreground is running a CPU eater (Lisp) and the background is plain http.
  • however, the 3800 X2, although at super-slow RAM with its 512 KB cache, shreds the real Opteron when there is a huge number of plain http (no CPU eating) processes both in the foreground and the background with no CPU eaters around


An interpretation:
  • the bigger cache helps a lot to "defend" a foreground CPU eater against light-CPU I/O intensive backgrounds.
  • however, something in the architecture of the X2, or in the Via socket 939 chipset, really helps extreme multitasking with I/O and few CPU. (note however that most I/O here is simulated since I use spare files on disk and the localhost interface for http).


I will add a run where 16 and 256 background plain http fetchers are running with the Lisp in the foreground. That should shed some light on this.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Mon Oct 03, 2005 11:02 am   Reply with quote

Results added:
  • Full Pentium-M 1.3 Ghz run
  • Pentium-4 2.8 Northwood
  • Pentium-4 2.8 Northwood with hyperthreading
  • Results for a working run of the 3800+ at 2.4 GHz in the Asus board with a SMP kernel


The Pentium-4 results are mostly as expected, but the hyperthreaded variant shows a huge advantage in the tests that don't involve any CPU eaters, which just put huge numbers of data transactions through http together (the "plain http" tests). You need the "wall clock" charts to see this, not the user CPU time.

If you mix CPU eaters and plain http, then you overall get a big advantage out of Hypterthreading, but only for the CPU eater. The background transactions with Hyperthreading are slower than without. The advantage for the foreground CPU eater is huge, though.

[Note that there is a typo in the cart labels at this time. The run with Hyperthreading off is the one which says "1 CPU". Although it says "HT on", it has Hyperthreading turned off in the BIOS and and the single-CPU kernelis running. I'll correct the label ASAP.]
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Wed Oct 05, 2005 3:52 pm   Reply with quote

New results for a dual 2.4 GHz Opteron with DDR400 in a Thunder K8W board.

New results for 2.8 GHz Xeon with Hyperthreading (= 4 logical CPUs) coming in. Looks very decent for some of the non-CPU intensive stuff like many plain http connections at the same time.

Chart reorganization:
  • New reference machine is X2 3800+ SMP in the Asus board.
  • Omit incomplete runs of 3800+ in the Asus board. I will get new results to compare memory bandwidth and timings soon. I have to do that in the DFI board, the Asus board is too random when fiddling with the RAM.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Thu Oct 06, 2005 7:24 pm   Reply with quote

I reorganized the naming of the machines so that they are easier to recognize and come out in a useful order.

The new reference machine is now the 3800+ at 2.4 MHz with slow RAM in the Asus board.

Over the next days I will collect results that put this one in direct comparision with:
  • the same components in the DFI SLI-DR
  • reduce RAM timings, first 1T, then 2-2-2-5
  • increase RAM clock to 250 MHz at 2T 3-3-3-8
  • 250 MHz 1T 2.5-5-5-7 or whatever the TCCD comes up with
  • max out RAM speed
  • figure out which timing does what exactly

Keep in mind each run takes a day so patience please smile
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Tue Oct 11, 2005 4:27 pm   Reply with quote

A lot of different timings for X2 3800+ @ 2.4 GHz and 200 MHz RAM coming in.

Short analysis: spend your money on other things than fancy RAM. Flowers to overclock your girlfriend have a higher chance to make your computer noticably faster.

I will post a writeup with actual RAM recommendations soon.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Sat Oct 15, 2005 10:17 pm   Reply with quote

Results coming in for the X2 at 2.5 GHz.

Including a run with the TCCD RAM at 312 MHz, just to illustrate the point that this kind of RAM tuning is useless.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Wed Oct 19, 2005 8:10 pm   Reply with quote

Cleaned up the graphs. Removed redundant tests, don't show too many memory details for my A64s.

If you want to see all the detail memory timing results, see the seperate memory timings page.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Wed Nov 02, 2005 10:14 pm   Reply with quote

Results for Opteron socket 939 at 2.9 GHz are in.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Thu Jun 22, 2006 7:52 pm   Reply with quote

Results for Opteron 875 x4 (4 dual = 8 cores) and a Pentium-M at 1.73 GHz (Inspiron Laptop) are in, as are results for a Sempron to show how much the smaller cache hurts.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Wed Jun 28, 2006 12:55 pm   Reply with quote

Somebody was friendly enough to run my suite on a Conroe:

Userlevel CPU time:
http://www.cons.org/cracauer/crabench/core2.user.html

Chart including multithreading/multiprocessing wall clock times:
http://www.cons.org/cracauer/crabench/core2.wall.html
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Display posts from previous:   
Post new topic   Reply to topic    Useless Microoptimizations Homepage Forum Forum Index -> General hardware notes All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group