[Opensource] how to set up a lab of linux boxen
Glen Turner
gdt at gdt.id.au
Wed Jun 11 12:52:34 EST 2008
victor rajewski wrote:
> I'm toying with the idea of setting up a lab full of linux boxes. It
> would be really nice to be able to use something like Ghost Solution
> Suite to handle imaging each machine. Has anyone done anything like
> this? GSS _can_ handle linux partitions, but I think it does it on a
> block-by-block basis, and won't change individual files across the
> images, so it can't change the hostname - each imaged machine will
> have the same hostname as the source image (also ssh keys and maybe
> other crypto stuff would need to be changed). There are some apps that
> seem to be able to handle this - FOG, G4L (Ghost for Linux) and
> clonezilla - has anyone used any of these? Does anyone have any other
> solutions to this problem? Is anyone running a lab full of linux
> machines?
Unix has a long history in university computing labs and Linux
builds on that. Linux is the OS of choice for huge deployments.
As a result, there are tools which make is easy to run a few
hundred machines and not too hard to run a few hundred thousand
machines.
The way that is done is very different to the way you run labs
of Windows machines. The main difference is that you don't use
disk imaging, rather you automate the installation the operating
system and use automated package managers, automated configuration
managers and the built-in authentication and sharing tools.
INSTALLATION
Red Hat and Ubuntu have PXE-based automated installers. You boot
the machine, the BIOS runs PXE, the DHCP server issues an IP address
and an image to load using TFTP, that image is the installer, and it
installs Linux to the disk from a HTTP server on your local network.
Installation takes about 20 minutes and can be run in parallel across
a lab full of new machines, simply by turning them all on at the same
time.
For more info Google: redhat pxe kickstart
PACKAGE MANAGER
Both Ubuntu and Red Hat use package managers to control the operating
system's files. This varies greatly from Windows, where each application
is responsible for its own installation and update.
The initial PXE installation will allow you to list packages to install.
Rather than do that, use a minimal installation (so the same installer
script can be used for all machines at the school).
Packages have dependencies -- other packages which should be installed
prior to this package. You use that to control the software load on the
machine. Create a new empty package containing the list of packages to
load as dependencies.
yum install exampleschool-typicallab
will then install all of your dependencies -- that is, the software
load for that class of machine.
The PXE installer script will usually allow arbitrary commands to be
run after installation. This often looks like:
yum -y update yum
yum -y update
yum -y install exampleschool-basics
yum -y install exampleschool-typicallab
For more information see Marc Merlin's presentation at linux.conf.au
2004. Marc was the sysadmin for Google and knows something about
sysadmining tens of thousands of machines.
CONFIGURATION MANAGER
Now that you have the software loaded, you need to configure it.
The two popular configuration managers are puppet and cfengine.
Puppet is the best choice for a new installation.
Put basically, you place each machine into a class, then give
configurations that apply across the entire class. So making
a change across the 250 machines which AARNet controls takes
me about, oh, five seconds.
It is traditional to use Subversion to control the Puppet
master configurations. This allows errors to be easily recovered
from (you "rollback" to the previous revision), automatically
records dates and times of change, and you get all the add-on
tools for Subversion (such as viewvc for web access, and
e-mails to a mailing list for each change).
The last line of the PXE installation script is to run puppet
or cfengine to configure the machine. When the machine
reboots after installation it is a completely updated,
completely configured, ready-to-go machine. Note that
none of the installed software actually runs during the
installation, so there's no chance for the machine to be
subverted from the Internet whilst the software is being
installed and configured.
AUTHENTICATION
Now you have machines, you want people to be able to use them.
You could simply use Puppet to control /etc/passwd, /etc/group
and friends, but that would quickly become painful if you
have a whole school of accounts. Usually you use cfengine/puppet
to set up a minimal set of local users (such as root, so you
can log into the machine after you've taken it apart on the
workbench).
The usual way is to use a remote authentication protocol.
Unix has a choice: LDAP or Kerberos. LDAP is a directory
which can be use to authenticate users, Kerberos is single
sign on. You can actually use both simultaneously.
Windows Active Directory can act as both a Kerberos
and a LDAP server for Linux systems, but you can login
to a Windows Domain from Linux directly and save yourself
the configuration nightmare on the Windows side.
It is traditional in a lab environment to place every
user into the UNIX group "users" and to place each student
in a subject into a group named after the subject. This allows
teachers and students within a subject to share files via a
NFS directory
(eg,
read-write via
/srv/y12-english teachers y12-english rwxr-----
)
Linux also supports file quotas to control disk usage
per user.
LDAP
There are two choices: OpenLDAP and Fedora Directory Server.
I'd suggest FDS for a school.
http://directory.fedoraproject.org/
You'll also need a schema -- a directory design. See
http://middleware.internet2.edu/dir/docs/ldap-recipe.htm
You typically set up authentication so that it checks
for local users first, then for remote users. This makes
no difference for lab machines (since the Puppet-controlled
/etc/passwd will only contain a password for root) but
is very useful for laptops.
KERBEROS
There are umpteen documents on setting up a Kerberos
Key Distribution Server. It's not hard. The main
downside of Kerberos is that applications need to
be programmed to understand Kerberos, but most application
on Linux already have this done.
SHARE POINTS
When it comes to user-owned files you have two choices:
- people keep files on the machine where they are created
- people keep files on a server
Differing choices are correct for differing classes of machine
(use a server for a lab, use local disk for a laptop).
Unixen usually use NFS for keeping remote files. These days,
run both NFS version 3 on the client and server (that is,
explicitly deny use of NFSv2). If you have a complex network
(such as one NFS server serving two sites) then use NFS over
TCP.
autofs or pam_mount can be used to automatically mount a
user's files as they log in.
Linux can also mount files from a Windows server. Again,
pam_mount can be used to do this automatically for each
user as they log in.
Teachers might wish to share files from the server. The
way to do this is to NFS mount /srv to the
server and make a directory on the server for each group
of files. Use Unix group permissions to control who can
write the files so that teachers can copy files up themselves
rather than go looking for you. On the server running
NFS and Samba (which looks like a Windows share) against
the same directory is fine.
In general "share" in a directory name indicates a point
where an NFS share can be mounted read-only. You might
distribute read-only resources from /usr/local/share
(eg, school forms, SVG of school logo, etc).
REMOTE ACCESS
Unix uses SSH for remote access to machines. In a lab
scenario you don't want the average user to be able to
SSH to the machine. So you limit usage of the SSH server
to members of the wheel group (ie, system administrators).
You may also restrict usage to public keys rather than
to passwords and set the public key directory to somewhere
where user's can update it but puppet/cfengine can.
Then system administrators can login to the lab machines
remotely. Better still, if they use a ssh-agent at their
end they can log in without user id and password each time.
This makes is very easy to log into a machine or run a
command against a whole lab of machines, although
there's really nothing you can do that way that you can't
do better via puppet. However, you do want to be able to
SSH in as this is the simplest way to look at hardware
problems (eg, run a SMART report to check the disk).
SERVICES
The UNIX approach to e-mail is to provide it from a server.
Clients then connect with the server and store nothing locally.
So providing e-mail to lab machines comes down to configuring
IMAP-SSH and STMP-Submission on your mail server then pushing
a configuration to use that into user's directories (which are
on the NFS server, so simply copy the files).
The same holds for Instant Messaging, etc.
For Firefox you might want to pre-install some Bookmarks
and some extensions (such as AdBlock Plus).
CAPACITY PLANNING AND ANTICIPATING TROUBLE
Linux logs a lot of data. This goes to disk by default.
Change syslog on the lab machines to send this to a
central logger running syslog-ng and placing each
machine's logs into its own directory. You can make
these directories web browsable so you can easily
remotely look at the logs. Logwatch can be used to
extract the interesting information from the logs
for a quick look each day.
Linux supports a SNMP daemon. Configure this to
respond to queries from a SNMP network manager
(running Cacti SNMP poller and grapher). This allows
CPU usage, memory usage, disk usage, logged in users,
run queue lengths, disk error counters, network
counters and so on to be plotted. This makes it
really, really easy to have a history of a machine
and to readily compare it with similar machines
and to answer capacity planning questions (eg, are
applications running satisfactorily or is more RAM
or faster CPU desirable).
You can also use Nagios to track machine availability.
It will alarm. You can use this to alarm if important
machines become unavailable or if temperature or fan
speeds of any machine become troublesome. Using Nagios
puts you in the situation of having been automatically
told of faults and corrected them before they are
noticed by users and thus become critical.
Linux has a time service called NTP. You can sync all
machines to a central time server (which in turn can
by synced to the Internet or GPS). A consistent and
accurate network time turns out to useful and removes
a minor annoyance.
Linux is good for providing network services. Your Linux
servers are the best choice for running DHCP and DNS
forwarding. It will run a transparent HTTP proxy (iptables
with Squid), which is useful for schools when students all
hit the same site in a class.
MAINTENANCE
There will be a package to run yum/apt-get and puppet/cfengine
periodically. Typically the package manager will run daily
and the configuration manager will run each half hour.
To install new software put the package on the web server
and add that package as dependenices for the exampleschool-...
packages. Yum will install that software overnight
when it checks for updates for the exampleschool-...
package.
To update a configuration change the file on the
configuration server and check it into subversion.
To add a new user, add them to the directory/kerberos
and create their directory on the server.
If you need immediate access to the machine use SSH. Let's
say you need to install some software immediately (some
teacher forgot to tell you). Add the software to the
exampleschool-... package, now use expect and SSH to
log into the entire lab machines as root and run
"yum -y update" on each one rather than waiting for
it to run overnight. The important thin here is that
satisfying this urgent task hasn't undermined the
long-run maintainability of the lab.
Operating system upgrades are usually done via the package
manager. An upgrade takes about four hours.
CONCLUSION
A Linux lab takes a lot of up-front configuration, which has to
be done when you understand Linux the least. Once you
get over this hurdle a Linux lab runs with very little
effort. In a lab environment Linux machines use about a
tenth of the ongoing sysadmin resources than Windows machines.
Depending how much monitoring you care to do, you can get
in front of the reactiveness curve -- replacing disks as
they pre-error rather than when they fail, taking hot
machines out of service before they die. It is typical
of Linux system administration that there is less blackbox
mystery: hardware errors are found rather than cause ongoing
flakiness, software failures have a cause and the remedy
for that cause is understood.
You should sysadmin Linux machines using the tools created
for Linux by sysadmins of huge installations. Using tools
designed for Windows (such as disk imaging tools) will leave
you swimming against the current and unable to gain assistance
from the Linux mainstream (such as the SAGE-AU mailing list).
Given the high startup costs followed by the huge economies of scale
in Linux sysadmin there's a lot to be said for sector-wide
administration rather than per-school administration. Note
that this suggestion doesn't preclude per-school customisations
(you can see that each school would have its own puppet/cfengine
class and own software load lists).
More information about the opensource
mailing list