GPFS 2.1 Quick Configuration Guide

Date: June 16, 2003
Author: Glen Corneau, gcorneau@us.ibm.com

Disclaimer: This is a very quick and dirty, simple GPFS 2.1 in a RSCT Peer Domain (rpd) environment implementation list. For full details, please see the proper documentation:

  • RSCT Administration Guide, SA22-7889-02 (also in PDF format)
    http://www.ibm.com/servers/eserver/pseries/library/clusters/aix_guide.html
  • GPFS Concepts, Planning and Installation Guide, GA22-7895-01 (only in PDF format)
    http://www-1.ibm.com/servers/eserver/pseries/library/gpfs.html#aix_clusters
  • GPFS FAQ
    http://www.ibm.com/servers/eserver/pseries/software/sp/gpfs_faq.html

Implementation of GPFS 2.1 in an rpd environment has two major steps (assuming you've got the RSCT and GPFS code installed already):
  1. Implementing the RSCT Peer Domain
  2. Configuring the GPFS cluster and filesystem in that Peer Domain.
These steps are done by example between two p630 systems (cler01 and cler02) sharing a SSA disk subsystem running AIX 5.1 ML4.

RSCT Peer Domain
  1. Remote shell
    You must have remote shell and remote copy capability between nodes in the cluster, go ahead and configure that here.

    root@cler01 > cat /.rhosts
    cler01.dfw.ibm.com root
    cler02.dfw.ibm.com root

    Its the same on cler02. You could also use OpenSSH (or technically, any method that uses standard rsh/rcp flags and syntax).

  2. Prepare RSCT Security Keys
    Execute the following command on each of the nodes to properly set up the RSCT security keys.
    You must list all the nodes in the domain! The documentation implies that you don't need to add the node you're running on, this is false.

    root@cler01 > preprpnode cler01 cler02

  3. Create the Peer Domain
    You only need to run this command on one of the nodes, now that you've set up the proper security keys.
    List all the nodes in the domain.

    root@cler01 > mkrpdomain GPFS cler01 cler02

  4. Bring the Peer Domain Online

    root@cler01 > startrpdomain GPFSdomain
    root@cler01 > lsrpdomain
    Name  Opstate        RSCTActiveVersion MixedVersions TSPort GSPort
    GPFS  Pending online 2.2.1.30          No            12347  12348

    (now wait a minute, try again and you should see)
    root@cler01 > lsrpdomain
    Name  Opstate        RSCTActiveVersion MixedVersions TSPort GSPort
    GPFS  Online         2.2.1.30          No            12347  12348

You've finished the first part, configuring the RSCT Peer Domain (that wasn't that bad now was it?).
Moving right along to....

Configuring GPFS

  1. Create the Cluster
    You will choose the remote shell and remote copy commands here, as well as define the nodes in the cluster. The node list file should contain the IP address/hostname of the interface you intend to use for the GPFS token management traffic. The default remote commands (rsh and rcp) do not need to be specified.

    root@cler01 > cat /tmp/hosts
    cler01.dfw.ibm.com
    cler02.dfw.ibm.com
    root@cler01 > mmcrcluster -t rpd -n /tmp/hosts -p cler01 -s cler02
    Mon Jun 9 15:18:05 CDT 2003: mmcrcluster: Processing node cler01.dfw.ibm.com
    Mon Jun 9 15:18:06 CDT 2003: mmcrcluster: Processing node cler02.dfw.ibm.com
    mmcrcluster: Command successfully completed
    mmcrcluster: Propagating the changes to all affected nodes.
    This is an asynchronous process.
    root@cler01 > mmlscluster
    GPFS cluster information
    ========================
      Cluster id: gpfs030609201804
      Remote shell command:      /usr/bin/rsh
      Remote file copy command:  /usr/bin/rcp

    GPFS cluster data repository servers:
    -------------------------------------
      Primary server:    cler01.dfw.ibm.com
      Secondary server:  cler02.dfw.ibm.com
    root@cler01 >

  2. Create GPFS Nodeset
    You can have multiple nodesets as subsets of the GPFS cluster. In a two-node cluster, its going to be both nodes. The "-C nodesetid" is limited to 8 characters. The "-A" specifies if the GPFS daemons start automatically and the "-U yes" specifies that we want single-node quorum enabled (i.e. if we lose one node, the other node still has access to the GPFS filesystems). There are limitations to this mode (snq), so I suggest looking at the GPFS FAQ here. You can also set the pagepool (GPFS pinned kernel memory cache) with the "-p" flag here if you want.

    root@cler01 > mmconfig -a -A -C bigfsn -U yes
    mmconfig: Command successfully complete
    mmconfig: Propagating the changes to all affected nodes.
    This is an asynchronous process.
    root@cler01 >
    |
  3. Create Disk Descriptor File
    You need to create a file that lists the disks that are going to be part of the GPFS filesystem. The documentation discusses failure groups (basically, paths to a disk that comprise a single point of failure) and putting that information into the disk descriptor file. If you're not doing either data or meta replication, GPFS-style, then this doesn't really come into play. Most customers have multiple paths to disk as well as highly available disk subsystems, making GPFS-style replication unnecessary.

    root@cler01 > cat /tmp/disk
    hdisk4:::dataAndMetadata:
    hdisk5:::dataAndMetadata:
    hdisk6:::dataAndMetadata:
    hdisk7:::dataAndMetadata:
    hdisk8:::dataAndMetadata:
    hdisk9:::dataAndMetadata:
    hdisk10:::dataAndMetadata:
    hdisk11:::dataAndMetadata:

  4. Prepare the disks.
    The disks must have PVIDs and if they don't, you can always add them via "chdev -l hdiskX -a pv=yes". This command will create the volume groups on the nodes, create the LVs and make sure they are imported to all systems in the cluser as necessary.

    root @ cler01 => mmcrlv -y -F /tmp/disk

    -----------------------------------------------------------------------
    Step 0: Setting up environment.

    -----------------------------------------------------------------------
    Step 1: Making volume groups and logical volumes on the local node.

    cler01.dfw.ibm.com: gpfs0vg
    cler01.dfw.ibm.com: gpfs0lv
    cler01.dfw.ibm.com: gpfs1vg
    cler01.dfw.ibm.com: gpfs1lv
    cler01.dfw.ibm.com: gpfs2vg
    cler01.dfw.ibm.com: gpfs2lv
    cler01.dfw.ibm.com: gpfs3vg
    cler01.dfw.ibm.com: gpfs3lv
    cler01.dfw.ibm.com: gpfs4vg
    cler01.dfw.ibm.com: gpfs4lv
    cler01.dfw.ibm.com: gpfs5vg
    cler01.dfw.ibm.com: gpfs5lv
    cler01.dfw.ibm.com: gpfs6vg
    cler01.dfw.ibm.com: gpfs6lv
    cler01.dfw.ibm.com: gpfs7vg
    cler01.dfw.ibm.com: gpfs7lv

    Writing new descriptor file for use by subsequent GPFS disk commands.

    -----------------------------------------------------------------------

    Logical volume(s) have now been created and recorded in the
    GPFS cluster configuration file.

    Beginning post-processing of created logical volume(s) . . .

    Varying off volume groups on the local node.
    Importing volume groups on any remote nodes.
    cler02.dfw.ibm.com: gpfs0vg
    cler02.dfw.ibm.com: gpfs1vg
    cler02.dfw.ibm.com: gpfs2vg
    cler02.dfw.ibm.com: gpfs3vg
    cler02.dfw.ibm.com: gpfs4vg
    cler02.dfw.ibm.com: gpfs5vg
    cler02.dfw.ibm.com: gpfs6vg
    cler02.dfw.ibm.com: gpfs7vg
    Varying on volume groups on the local node.

    Post-processing of created logical volume(s) has completed.

    root @ cler01 =>

  5. Start the GPFS daemons
    It's not clear from the documentation that this is what you should do next, but you do. Otherwise the creation of the filesystem will fail.

    root@cler01 > mmstartup -a
    cler01.dfw.ibm.com: 0513-059 The mmfs Subsystem has been started. Subsystem PID is 28068.
    cler02.dfw.ibm.com: 0513-059 The mmfs Subsystem has been started. Subsystem PID is 12510.

  6. Create the GPFS filesystem
    Go ahead and look at the documentation here to see the variety of flags. The basic information is the filesystem name/mountpoint ("/gpfs/bigfs"), the device ("biglv"), the disk descriptor file ("/tmp/disk"), the nodeset name ("bigfsn"), whether or not the filesystem automatically mounts with the GPFS daemon start ("-A yes") and the fileystem block size ("-B 16K"). The last option ("-v no") is really only used if you're creating a GPFS filesystem in a cluster where disks have previously been used for other GPFS filesystems but aren't now (and are still defined to GPFS).

    root@cler01 > mmcrfs /gpfs/bigfs biglv -F /tmp/disk -C bigfsn -A yes -B 16K -v no

    The following disks of biglv will be formatted on node cler01.dfw.ibm.com:
    gpfs0lv: size 2199552 KB
    gpfs1lv: size 2199552 KB
    gpfs2lv: size 2199552 KB
    gpfs3lv: size 2199552 KB
    gpfs4lv: size 2199552 KB
    gpfs5lv: size 2199552 KB
    gpfs6lv: size 2199552 KB
    gpfs7lv: size 2199552 KB
    Formatting file system ...
    Creating Inode File
    Creating Allocation Maps
    Clearing Inode Allocation Map
    Clearing Block Allocation Map
    56 % complete on Mon Jun 9 16:06:21 2003
    100 % complete on Mon Jun 9 16:06:25 2003
    Flushing Allocation Maps
    Completed creation of file system /dev/biglv.
    mmcrfs: Propagating the changes to all affected nodes.
    This is an asynchronous process.
    root @ cler01 => mmlsgpfsdisk

     File system   Disk name    Primary node           Backup node
    ---------------------------------------------------------------------------
     biglv         gpfs0lv      (directly attached)
     biglv         gpfs1lv      (directly attached)
     biglv         gpfs2lv      (directly attached)
     biglv         gpfs3lv      (directly attached)
     biglv         gpfs4lv      (directly attached)
     biglv         gpfs5lv      (directly attached)
     biglv         gpfs6lv      (directly attached)
     biglv         gpfs7lv      (directly attached)

    root @ cler01 =>

  7. Mount the filesystem
    Do this on both nodes in the cluster and "voila!" you're done!

    root@cler01 > mount /gpfs/bigfs
    root@cler01 > df -k /gpfs/bigfs
    Filesystem    1024-blocks      Free %Used    Iused %Iused Mounted on
    /dev/biglv       17596416  17543744    1%      10     1%  /gpfs/bigfs
    root@cler01 >