KTE ? Linux Cluster Research Report



ABSTRACT

The research was(is) conducted to build and evaluate clustering
Environment running oracle database on Linux operating system, as a
platform for the construction of a Virtual Reality Environment.
The research addresses stability and performance of software applications
- MPI, LAM, Java and Oracle database installed as single versus
multiprocessor or high-speed networked machine with emphasis on clustering
computing environment.

The Project was divided into phases dynamically modified to match fast
paced world of software development.

The first phase was to build and test Linux cluster on Linux RedHat 6.2
distribution, supported and certified by Oracle Corporation for use with
the latest database release(8.1.7).

The second phase was to configure and optimize diskless linux cluster
which will allow to utilize computing resources of classroom workstations
after operating hours without disruption to existing operating systems.

The third phase was to test parallel development tools(MPI, LAM, Java)and
run applications(oracle) on Linux Cluster. In this phase research was
extended to Linux SuSE 7.2 distribution, at this point only linux platform
certified by Oracle Corporation for the latest database release(9i).

The fourth phase was to merge existing Linux Cluster with SHARC-net
development to establish a collection of clusters at various academic
institutions to a super-cluster that will give participating researchers
access to the computing power of all connected Linux clusters. 

The fifth phase was to design and engineer the construction of a Virtual
Reality Environment.

The sixth phase was to study the degree of accelerated learning this
environment provides.

The first three phases were successfully accomplished. Diskless cluster
systems allowing rapid construction of a Linux Cluster with multiple nodes
were built on Red Hat 6.2 and SuSE 7.2. Performance tests were performed
on Linux Cluster with numerous nodes ( up to 28 ). Various applications
were installed and tested. Installation and testing of ORAC (Oracle Real
Application Cluster) was delayed and not completed as research founds were
not sufficient to buy required hardware. 

The fourth phase is delayed until Sheridan College will receive needed
equipment and gigabit optical network connection(ORION).

The fifth phase started with evaluation of Java-based Web collaboratory
system jointly created  by WebWisdom.com and the Northeast Parallel
Architectures Center at Syracuse University. This freeware comes with an
unlimited license to use it in research projects, in academia and
education, and in corporate in-house projects, it can be enhanced by
developing and linking additional collaborative modules.
http://www.collabworx.com/legacy/tango/

The sixth phase can not be conducted until the development of fifth phase
project is completed.



Definition and Terms

ORION - Ontario Research Innovation Optical Network -is gigabit optical
network that connects universities, regional advanced networks in Ontario,
colleges, & public sector research and education institutions to CANET3
and the Internet.
ext2 - Linux filesystem
reiserfs - Journaling file system
LVM - Logical Volume Management


LITERATURE REVIEW



METHODOLOGY AND PROCEDURES

In the first phase of the project research was conducted to choose right
tools for creating cluster on Red Hat 6.2 linux distribution. 
Four different sets of clustering tools were considered for construction
of  cluster:
  1. SCMS - Smile Cluster Management System 1.2.2
  2. OSCAR - Open Source Cluster Application Resources
  3. SCE - Scalable Cluster Environment
  4. SCYLAD Beowulf
Every one of those sets was designed and tested on Red Hat linux
distribution. The Smile Cluster Management System 1.2.2 was chosen for
reasons listed:
  1. The most suitable for diskless cluster.
  2. Fast, automated installation procedure.
  3. Extensive set of cluster management tools including remote web
administration.
  
Red Hat 6.2 was installed on Dell Precision 420 MT workstation and
upgraded to current kernel (2.4.6). SCMC was installed and configured to
support up to fourty nodes. This machine is called master, and it is the
only machine having files installed. Cluster nodes are booted from floppy
disk containing custom made kernel image, and use directory structure
created on master (in tftpboot directory) using Network File System.
Corrections to files created by SCMS were made in tftpboot directory
(/etc/host, /etc/host.equiv, /root/.rhost and /etc/pam.d/rlogin). 
Cluster was tested initially using master and four Dell 420 MT
workstations in Convergence lab (s419). All machines are connected to
ethernet twenty four ports switch CISCO Catalyst 3500 Series XL.

 
In the second phase of the research different cluster monitoring and
parallel processing tools were installed and tested. (Small) Programs were
developed to test parallel environment using MPICH. For benchmarking a
ray-tracing suite called POVRay was installed. Cluster was benchmarked in
room s421 using the master and twenty Dell 420 MT workstations connected
to five linked by fiber optic ethernet switches CISCO Catalyst 3500 Series
XL, and in room s142 using the master and twenty eight Dell 420 MT
workstations connected to a single ethernet switch CISCO Catalyst 3500
Series XL. Monitoring tools were optimized and scripts allowing a remote
node rebooting were corrected.

In the third phase the SuSE 7.2 linux distribution was installed on Dell
Precision 420 MT workstation. There are not to many tools for automated
creation of cluster on SuSE distribution.
All files and services needed for cluster operation were created manually
as described: 
Etherboot program was used to create program supporting 3COM 905 network
card, for booting a diskless node from a floppy disk.
IP -Address scheme was planned using one of the reserved Class-C networks,
giving possibility of having 244 workstations associated with a single
server. For the master IP address was 192.168.1.2 ( IP address 192.168.1.1
was used by Red Hat cluster), and for the nodes addresses from
192.168.1.10 to 192.168.1.254 were assigned.
Kernel image for nodes was compiled with root file system on NFS support
and automatic IP kernel level configuration.
DHCP server on the master was configured to alow dynamic registration of
any nodes in the range chosen ( nodes network card MAC addresses were not
used ) and to point the node to the location of kernel image.
Inetd service was configured to support tftp service for remote
instalation of linux kernel image.
Tftpboot directory structure was created. Custom made file system for
nodes was created. Master distribution file system: 5.9 GB, node file
system: 101 MB for each node and 1.1 GB of common files. Due to the hard
drive space restriction file system was populated for four nodes only.
Network File Service on master was configured in /etc/exports file giving
each node access to appropriate directories. Files /etc/hosts and
/etc/host.equiv were edited to alow network communication between nodes
and master.
Files /etc/pam.d/rlogin and /etc/pam.d/rsh on master and on all nodes were
modified to ease security for the purpose of running parallel programs and
commands. 
Following clustering tools and parallel libraries were installed and
tested or evaluated:
1. bWatch - a cluster performance monitor.
2. clusterit - set of parallel commands
3. ganglia - cluster reporting and monitoring toolkit
4. heartbeat - heartbeat subsystem for High-Availability Linux
5. lam - Local Area Multicomputer
6. procstatd - proc monitoring daemon for Beowulf Clusters
7. pvm - Parallel Virtual Machine
8. pvmpov - POVRay with PVM support
9. vacm - VA-Cluster Manager
10. xmpi - a graphical user interface for MPI program development
11. xmtv - a graphic server for LAM/MPI
12. xpvm - a graphical console and monitor for PVM

 
Oracle 9i was installed, standard and limited clustered(*1) version of
database was created and initially tested for performance on ext2 and
reiserfs file systems.
Database was created first on the ext2 file system and then on reiserfs
file system. following was tested on both installations:
1. Copping single large (340 MB) file 
2. Copping directory with many files
3. SQL Select statement - for database read performance
4. Import - for db read and write performance

*1 outside nodes were not participating in database data processing.
Oracle9i Enterprise Edition Release 9.0.1.0.0 - Production
With the Partitioning and Real Application Clusters options
JServer Release 9.0.1.0.0 - Production

RESEARCH AND FINDINGS



SUMMARY, CONCLUSIONS and RECOMMENDATIONS

From the two linux distribution systems the cluster was tested on, Red Hat
6.2 is better suited for initial construction of the cluster, as there are
many scripts and programs developed for building cluster exclusively on
Red Hat. On the other hand set of clustering tools for management of the
cluster is packed right into SuSE 7.2 distribution, and is immediately
ready to use after linux installation while such tools for Red Hat must be
downloaded from the web and then compiled.
Because those distributions were installed on different hardware
configurations there is no data to compare the influence of the
distribution on the performance of the cluster, both systems proven to be
very stable for the whole period of tests performed. 
Performance of the cluster system as expected is increasing with the
number of nodes added. Although programs run using MPICH are reaching the
highest performance when up to 16 nodes are used in cluster, having
significant lost after that point, while programs run using PVM were
showing progressive performance increase in the whole 28 nodes range
tested.
Installation of Oracle 9i database with Real Application Clusters was not
fully successful. 
Full Oracle Real Application Clusters installation requires raw devices, a
raw device is just another access method to disks connected to the machine
via IDE, SCSI or Fiber Channel (and other less common methods). No network
is involved. Such cluster needs to have a storage box somewhere that all
cluster nodes are connected to - but not via network(it's a storage box,
i.e. it has disks, it's not a computer with a mainboard etc.), but via
SCSI or Fiber Channel. And they all must have equal access to every part
of the disk system.
Oracle Real Application Clusters database was installed with raw devices
on the master computer. Such database was mounting as Real Application
Clusters only locally. It was not possible to mount it with additional
nodes, as the nodes could not get access to raw devices. 
Reiser journalling file system slowed performance of the database when
compared with standard linux ext2 file system, especially when multiple
read-write operations were performed.

Recommendations:
Storage device for full installation of Oracle Real Application Clusters is 
strongly recomended. More tests should be run to evaluate file system 
performance. In addition to file systems tested new linux ext3 file system, 
IBM's clustering file system and SGI's xfs should be examined.