|
Date: June 16, 2003 Author: Glen Corneau,
gcorneau@us.ibm.com
Disclaimer: This is a very quick and dirty, simple
GPFS 2.1 in a RSCT Peer Domain (rpd) environment
implementation list. For full details, please see the proper
documentation:
- RSCT
Administration Guide, SA22-7889-02 (also in PDF
format)
http://www.ibm.com/servers/eserver/pseries/library/clusters/aix_guide.html
- GPFS Concepts, Planning and Installation Guide,
GA22-7895-01 (only in PDF
format)
http://www-1.ibm.com/servers/eserver/pseries/library/gpfs.html#aix_clusters
- GPFS
FAQ
http://www.ibm.com/servers/eserver/pseries/software/sp/gpfs_faq.html
Implementation of GPFS 2.1 in an rpd environment
has two major steps (assuming you've got the RSCT and GPFS
code installed already):
- Implementing the RSCT Peer Domain
- Configuring the GPFS cluster and filesystem in that Peer
Domain.
These steps are done by example between two
p630 systems (cler01 and cler02) sharing a SSA disk subsystem
running AIX 5.1 ML4.
RSCT Peer
Domain
- Remote shell
You must have remote shell and
remote copy capability between nodes in the cluster, go
ahead and configure that here.
root@cler01 > cat
/.rhosts cler01.dfw.ibm.com
root cler02.dfw.ibm.com
root | Its the same on
cler02. You could also use OpenSSH (or technically, any
method that uses standard rsh/rcp flags and syntax).
- Prepare RSCT Security Keys
Execute the
following command on each of the nodes to properly set up
the RSCT security keys. You must list all
the nodes in the domain! The documentation implies that you
don't need to add the node you're running on, this is
false.
| root@cler01 > preprpnode cler01
cler02 |
- Create the Peer Domain
You only need to run
this command on one of the nodes, now that you've set up the
proper security keys. List all the nodes in the
domain.
| root@cler01 > mkrpdomain GPFS cler01
cler02 |
- Bring the Peer Domain Online
root@cler01 > startrpdomain
GPFSdomain root@cler01 > lsrpdomain Name
Opstate
RSCTActiveVersion MixedVersions TSPort GSPort GPFS
Pending online
2.2.1.30 No
12347
12348 (now wait a minute, try again and
you should see) root@cler01 >
lsrpdomain Name
Opstate
RSCTActiveVersion MixedVersions TSPort GSPort GPFS
Online
2.2.1.30 No
12347
12348 |
You've finished the first part, configuring the RSCT Peer
Domain (that wasn't that bad now was it?). Moving right
along to....
Configuring
GPFS
- Create the Cluster
You will choose the remote
shell and remote copy commands here, as well as define the
nodes in the cluster. The node list file should contain the
IP address/hostname of the interface you intend to use for
the GPFS token management traffic. The default remote
commands (rsh and rcp) do not need to be specified.
root@cler01 > cat
/tmp/hosts cler01.dfw.ibm.com cler02.dfw.ibm.com root@cler01
> mmcrcluster -t rpd -n /tmp/hosts -p cler01 -s
cler02 Mon Jun 9 15:18:05 CDT 2003: mmcrcluster:
Processing node cler01.dfw.ibm.com Mon Jun 9
15:18:06 CDT 2003: mmcrcluster: Processing node
cler02.dfw.ibm.com mmcrcluster: Command
successfully completed mmcrcluster: Propagating the
changes to all affected nodes. This is an
asynchronous process. root@cler01 >
mmlscluster GPFS cluster
information ========================
Cluster id: gpfs030609201804 Remote shell
command:
/usr/bin/rsh Remote file copy command:
/usr/bin/rcp
GPFS cluster data repository
servers: -------------------------------------
Primary server:
cler01.dfw.ibm.com Secondary
server: cler02.dfw.ibm.com root@cler01
> |
- Create GPFS Nodeset
You can have multiple
nodesets as subsets of the GPFS cluster. In a two-node
cluster, its going to be both nodes. The "-C nodesetid" is
limited to 8 characters. The "-A" specifies if the GPFS
daemons start automatically and the "-U yes" specifies that
we want single-node quorum enabled (i.e. if we lose one
node, the other node still has access to the GPFS
filesystems). There are limitations to this mode (snq), so I
suggest looking at the GPFS FAQ here.
You can also set the pagepool (GPFS pinned kernel memory
cache) with the "-p" flag here if you want.
root@cler01 > mmconfig -a -A -C bigfsn -U
yes mmconfig: Command successfully
complete mmconfig: Propagating the changes to all
affected nodes. This is an asynchronous
process. root@cler01
> | |
- Create Disk Descriptor File
You need to create
a file that lists the disks that are going to be part of the
GPFS filesystem. The documentation discusses failure groups
(basically, paths to a disk that comprise a single point of
failure) and putting that information into the disk
descriptor file. If you're not doing either data or meta
replication, GPFS-style, then this doesn't really come into
play. Most customers have multiple paths to disk as well as
highly available disk subsystems, making GPFS-style
replication unnecessary.
root@cler01 > cat
/tmp/disk hdisk4:::dataAndMetadata: hdisk5:::dataAndMetadata: hdisk6:::dataAndMetadata: hdisk7:::dataAndMetadata: hdisk8:::dataAndMetadata: hdisk9:::dataAndMetadata: hdisk10:::dataAndMetadata: hdisk11:::dataAndMetadata: |
- Prepare the disks.
The disks must have PVIDs
and if they don't, you can always add them via "chdev -l
hdiskX -a pv=yes". This command will create the volume
groups on the nodes, create the LVs and make sure they are
imported to all systems in the cluser as necessary.
root @ cler01 => mmcrlv -y -F
/tmp/disk
----------------------------------------------------------------------- Step
0: Setting up
environment.
----------------------------------------------------------------------- Step
1: Making volume groups and logical volumes on the
local node.
cler01.dfw.ibm.com:
gpfs0vg cler01.dfw.ibm.com:
gpfs0lv cler01.dfw.ibm.com:
gpfs1vg cler01.dfw.ibm.com:
gpfs1lv cler01.dfw.ibm.com:
gpfs2vg cler01.dfw.ibm.com:
gpfs2lv cler01.dfw.ibm.com:
gpfs3vg cler01.dfw.ibm.com:
gpfs3lv cler01.dfw.ibm.com:
gpfs4vg cler01.dfw.ibm.com:
gpfs4lv cler01.dfw.ibm.com:
gpfs5vg cler01.dfw.ibm.com:
gpfs5lv cler01.dfw.ibm.com:
gpfs6vg cler01.dfw.ibm.com:
gpfs6lv cler01.dfw.ibm.com:
gpfs7vg cler01.dfw.ibm.com: gpfs7lv
Writing
new descriptor file for use by subsequent GPFS disk
commands.
-----------------------------------------------------------------------
Logical
volume(s) have now been created and recorded in
the GPFS cluster configuration
file.
Beginning post-processing of created
logical volume(s) . . .
Varying off volume
groups on the local node. Importing volume groups
on any remote nodes. cler02.dfw.ibm.com:
gpfs0vg cler02.dfw.ibm.com:
gpfs1vg cler02.dfw.ibm.com:
gpfs2vg cler02.dfw.ibm.com:
gpfs3vg cler02.dfw.ibm.com:
gpfs4vg cler02.dfw.ibm.com:
gpfs5vg cler02.dfw.ibm.com:
gpfs6vg cler02.dfw.ibm.com: gpfs7vg Varying on
volume groups on the local
node.
Post-processing of created logical
volume(s) has completed.
root @ cler01 =>
|
- Start the GPFS daemons
It's not clear from the
documentation that this is what you should do next, but you
do. Otherwise the creation of the filesystem will
fail.
root@cler01 > mmstartup
-a cler01.dfw.ibm.com: 0513-059 The mmfs Subsystem
has been started. Subsystem PID is
28068. cler02.dfw.ibm.com: 0513-059 The mmfs
Subsystem has been started. Subsystem PID is
12510. |
- Create the GPFS filesystem
Go ahead and look
at the documentation here to see the variety of flags. The
basic information is the filesystem name/mountpoint
("/gpfs/bigfs"), the device ("biglv"), the disk descriptor
file ("/tmp/disk"), the nodeset name ("bigfsn"), whether or
not the filesystem automatically mounts with the GPFS daemon
start ("-A yes") and the fileystem block size ("-B 16K").
The last option ("-v no") is really only used if you're
creating a GPFS filesystem in a cluster where disks have
previously been used for other GPFS filesystems but aren't
now (and are still defined to GPFS).
root@cler01 > mmcrfs /gpfs/bigfs biglv -F
/tmp/disk -C bigfsn -A yes -B 16K -v no
The
following disks of biglv will be formatted on node
cler01.dfw.ibm.com: gpfs0lv: size 2199552
KB gpfs1lv: size 2199552 KB gpfs2lv: size
2199552 KB gpfs3lv: size 2199552 KB gpfs4lv:
size 2199552 KB gpfs5lv: size 2199552
KB gpfs6lv: size 2199552 KB gpfs7lv: size
2199552 KB Formatting file system ... Creating
Inode File Creating Allocation Maps Clearing
Inode Allocation Map Clearing Block Allocation
Map 56 % complete on Mon Jun 9 16:06:21 2003 100
% complete on Mon Jun 9 16:06:25 2003 Flushing
Allocation Maps Completed creation of file system
/dev/biglv. mmcrfs: Propagating the changes to all
affected nodes. This is an asynchronous
process. root @ cler01 =>
mmlsgpfsdisk
File system Disk
name Primary
node
Backup
node --------------------------------------------------------------------------- biglv
gpfs0lv
(directly attached) biglv
gpfs1lv
(directly attached) biglv
gpfs2lv
(directly attached) biglv
gpfs3lv
(directly attached) biglv
gpfs4lv
(directly attached) biglv
gpfs5lv
(directly attached) biglv
gpfs6lv
(directly attached) biglv
gpfs7lv
(directly attached)
root @ cler01
=> |
- Mount the filesystem
Do this on both nodes in
the cluster and "voila!" you're done!
root@cler01 > mount
/gpfs/bigfs root@cler01 > df -k
/gpfs/bigfs Filesystem
1024-blocks Free
%Used Iused %Iused Mounted
on /dev/biglv
17596416 17543744
1%
10 1%
/gpfs/bigfs root@cler01
> |
|
|