LON-CAPA SQL Database Documentation

Scott Harrison

Last updated: 02/12/2001

This file describes issues associated with LON-CAPA and a SQL database.

Latest HOWTO

Current status of documentation

I am going to begin documentation by inserting what notes I have into this file. I will be subsequently rearranging them and editing them based on the tests that I conduct. I am trying to make sure that documentation, installation, and run-time issues are all consistent and correct. The current status of everything is that it works and has been minimally tested, but things need to be cleaned up and checked again!

Current status of implementation

Need to

Right now, a lot of "feasibility" work has been done. Recipes for manual installation and configuration have been gathered. Network connectivity of lond->lonsql->lond->lonc type tests have been performed. A binary installation has been compiled in an RPM (LON-CAPA-mysql, with perl components a part of LON-CAPA-systemperl). The most lacking test in terms of feasibility has been looking at benchmarks to analyze the load at which the SQL database can efficiently allow many users to make simultaneous requests of the metadata database.

Documentation has been pieced together over time. But, as mentioned in the previous section, it needs an overhaul.

The binary installation has some quirks associated with it. Some of the user permissions are wrong, although this is benign. Also, other options of binary installation (such as using binary RPMs put together by others) were dismissed given the difficulty of getting differing combinations of these external RPMs to work together.

Most configuration questions have been initially worked out to the point of getting this SQL software component working, however there may be more optimal approaches than currently exist.

Purpose within LON-CAPA

LON-CAPA is meant to distribute A LOT of educational content to A LOT of people. It is ineffective to directly rely on contents within the ext2 filesystem to be speedily scanned for on-the-fly searches of content descriptions. (Simply put, it takes a cumbersome amount of time to open, read, analyze, and close thousands of files.)

The solution is to hash-index various data fields that are descriptive of the educational resources on a LON-CAPA server machine. Descriptive data fields are referred to as "metadata". The question then arises as to how this metadata is handled in terms of the rest of the LON-CAPA network without burdening client and daemon processes. I now answer this question in the format of Problem and Solution below.

PROBLEM SITUATION:

  If Server A wants data from Server B, Server A uses a lonc process to
  send a database command to a Server B lond process.
    lonc= loncapa client process    A-lonc= a lonc process on Server A
    lond= loncapa daemon process

                 database command
    A-lonc  --------TCP/IP----------------> B-lond

  The problem emerges that A-lonc and B-lond are kept waiting for the
  MySQL server to "do its stuff", or in other words, perform the conceivably
  sophisticated, data-intensive, time-sucking database transaction.  By tying
  up a lonc and lond process, this significantly cripples the capabilities
  of LON-CAPA servers. 

  While commercial databases have a variety of features that ATTEMPT to
  deal with this, freeware databases are still experimenting and exploring
  with different schemes with varying degrees of performance stability.

THE SOLUTION:

  A separate daemon process was created that B-lond works with to
  handle database requests.  This daemon process is called "lonsql".

  So,
                database command
  A-lonc  ---------TCP/IP-----------------> B-lond =====> B-lonsql
         <---------------------------------/                |
           "ok, I'll get back to you..."                    |
                                                            |
                                                            /
  A-lond  <-------------------------------  B-lonc   <======
           "Guess what? I have the result!"

  Of course, depending on success or failure, the messages may vary,
  but the principle remains the same where a separate pool of children
  processes (lonsql's) handle the MySQL database manipulations.

Dependencies

I believe (but am not 100% confident) that the following RPMs are necessary (in addition to the current ones in rpm_list.txt) to run MySQL. Basically I discovered these dependencies while trying to do external RPM based installs. I assume, and sometimes found, that these dependencies apply to tarball-based distributions too. (So to play it on the safe side, I am going to include these RPMs as part of the core, minimal RPM set.)

Installation

Installation of the LON-CAPA SQL database normally occurs by default when using the LON-CAPA installation CD (see http://install.lon-capa.org). It is installed as the LON-CAPA-mysql RPM. This RPM encodes for the MySQL engine. Related perl interfaces (Perl::DBI, Perl::Msql-Mysql) are encoded in the LON-CAPA-systemperl RPM.

The three components of a MySQL installation for the LON-CAPA system are further described immediately below.
Perl::DBI module- the API "front-end"...
database interface module for organizing generic database commands which are independent of specific database implementation (such as MySQL, mSQL, Postgres, etc).
Perl::MySQL module- the API "mid-section"...
the module to directly interface with the actual MySQL database engine
MySQL database engine- the "back-end"...
the binary installation (compiled either from source or pre-compiled file listings) which provides the actual MySQL functionality on the system

Installation from source

Note: the mysql site recommends that Linux users install by using the MySQL RPMs (MySQL-client, MySQL, MySQL-shared, etc). While these RPMs work, I was unsuccessful at integrating this RPM-installed database with perl modules from www.cpan.org. Hence, I strongly recommend that, when installing from "source", MySQL and the perl components be in fact installed from their tarballs (.tar.gz, .tgz). (Perl components, when installed from RPMs, also wound up in incorrect locations on the disk.) Do not coordinate a source install with externally made RPMs! It is, of course, okay to use LON-CAPA RPMs such as LON-CAPA-systemperl and LON-CAPA-mysql since we, in fact, made these RPMs correctly :).

old notes in green

The following set of tarballs was found to work together properly on a LON-CAPA RedHat 6.2 system:

Installation was simply a matter of following the instructions and typing the several "make" commands for each

Configuration (automated)

Not yet developed. This will be part of an interface present on LON-CAPA systems that can be launched by entering the command /usr/sbin/loncapaconfig.

Manual configuration

This is not complete.

Starting the mysql daemon: Login on the Linux system as user 'www'. Enter the command /usr/local/bin/safe_mysqld &

Set a password for 'root': /usr/local/bin/mysqladmin -u root password 'new-password'

Adding a user: Start the mysql daemon. Login to the mysql system as root (mysql -u root -p mysql) and enter the right password (for instance 'newmysql'). Add the user www

INSERT INTO user (Host, User, Password)
VALUES ('localhost','www',password('newmysql'));

Granting privileges to user 'www':

GRANT ALL PRIVILEGES ON *.* TO www@localhost;
FLUSH PRIVILEGES;

Set the SQL server to start upon system startup: Copy support-files/mysql.server to the right place on the system (/etc/rc.d/...).

The Perl API

   $dbh = DBI->connect(	"DBI:mysql:loncapa",
			"www",
			"SOMEPASSWORD",
			{ RaiseError =>0,PrintError=>0});

There is an obvious need to CONNECT to the database, and in order to do
this, there must be:
  a RUNNING mysql daemon;
  a DATABASE named "loncapa";
  a USER named "www";
  and an ABILITY for LON-CAPA on one machine to access
       SQL database on another machine;
  
So, here are some notes on implementing these configurations.

** RUNNING mysql daemon (safe_mysqld method)

The recommended way to run the MySQL daemon is as a non-root user
(probably www)...

so, 1) login as user www on the linux machine
    2) start the mysql daemon as /usr/local/bin/safe_mysqld &

safe_mysqld only works if the local installation of MySQL is set to the
right directory permissions which I found to be:
chown www:users /usr/local/var/mysql
chown www:users /usr/local/lib/mysql
chown -R www:users /usr/local/mysql
chown www:users /usr/local/include/mysql
chown www:users /usr/local/var

** DATABASE named "loncapa"

As user www, run this command
    mysql -u root -p mysql
enter the password as SOMEPASSWORD

This allows you to manually enter MySQL commands.
The MySQL command to generate the loncapa DATABASE is:

CREATE DATABASE 'loncapa';

** USER named "www"

As user www, run this command
    mysql -u root -p mysql
enter the password as SOMEPASSWORD

To add the user www to the MySQL server, and grant all
privileges on *.* to www@localhost identified by 'SOMEPASSWORD'
with grant option;

INSERT INTO user (Host, User, Password)
VALUES ('localhost','www',password('SOMEPASSWORD'));

GRANT ALL PRIVILEGES ON *.* TO www@localhost;

FLUSH PRIVILEGES;

** ABILITY for LON-CAPA machines to communicate with SQL databases on
   other LON-CAPA machines

An up-to-date lond and lonsql.

Testing

** TEST the database connection with my current tester.pl code
which mimics what command will eventually be sent through lonc.

$reply=reply(
    "querysend:SELECT * FROM general_information WHERE Id='AAAAA'",$lonID);

Example sections of code relevant to LON-CAPA

Here are excerpts of code which implement the above handling:

**LONSQL
A subroutine from "lonsql" which establishes a child process for handling
database interactions.

sub make_new_child {
    my $pid;
    my $sigset;
    
    # block signal for fork
    $sigset = POSIX::SigSet->new(SIGINT);
    sigprocmask(SIG_BLOCK, $sigset)
        or die "Can't block SIGINT for fork: $!\n";
    
    die "fork: $!" unless defined ($pid = fork);
    
    if ($pid) {
        # Parent records the child's birth and returns.
        sigprocmask(SIG_UNBLOCK, $sigset)
            or die "Can't unblock SIGINT for fork: $!\n";
        $children{$pid} = 1;
        $children++;
        return;
    } else {
        # Child can *not* return from this subroutine.
        $SIG{INT} = 'DEFAULT';      # make SIGINT kill us as it did before
    
        # unblock signals
        sigprocmask(SIG_UNBLOCK, $sigset)
            or die "Can't unblock SIGINT for fork: $!\n";
	
	
        #open database handle
	# making dbh global to avoid garbage collector
	unless (
		$dbh = DBI->connect("DBI:mysql:loncapa","www","SOMEPASSWORD",{ RaiseError =>0,PrintError=>0})
		) { 
	            my $st=120+int(rand(240));
		    &logthis("WARNING: Couldn't connect to database  ($st secs): $@");
		    print "database handle error\n";
		    sleep($st);
		    exit;

	  };
	# make sure that a database disconnection occurs with ending kill signals
	$SIG{TERM}=$SIG{INT}=$SIG{QUIT}=$SIG{__DIE__}=\&DISCONNECT;

        # handle connections until we've reached $MAX_CLIENTS_PER_CHILD
        for ($i=0; $i < $MAX_CLIENTS_PER_CHILD; $i++) {
            $client = $server->accept()     or last;
            
            # do something with the connection
	    $run = $run+1;
	    my $userinput = <$client>;
	    chomp($userinput);
	    	    
	    my ($conserver,$querytmp)=split(/&/,$userinput);
	    my $query=unescape($querytmp);

            #send query id which is pid_unixdatetime_runningcounter
	    $queryid = $thisserver;
	    $queryid .="_".($$)."_";
	    $queryid .= time."_";
	    $queryid .= $run;
	    print $client "$queryid\n";
	    
            #prepare and execute the query
	    my $sth = $dbh->prepare($query);
	    my $result;
	    unless ($sth->execute())
	    {
		&logthis("WARNING: Could not retrieve from database: $@");
		$result="";
	    }
	    else {
		my $r1=$sth->fetchall_arrayref;
		my @r2; map {my $a=$_; my @b=map {escape($_)} @$a; push @r2,join(",", @b)} (@$r1);
		$result=join("&",@r2) . "\n";
	    }
            &reply("queryreply:$queryid:$result",$conserver);

        }
    
        # tidy up gracefully and finish
	
        #close the database handle
	$dbh->disconnect
	   or &logthis("WARNING: Couldn't disconnect from database  $DBI::errstr ($st secs): $@");
    
        # this exit is VERY important, otherwise the child will become
        # a producer of more and more children, forking yourself into
        # process death.
        exit;
    }
}

** LOND enabling of MySQL requests
This code is part of every lond child process in the way that it parses command request syntax sent to it from lonc processes. Based on the diagram above, querysend corresponds to B-lonc sending the result of the query. queryreply corresponds to B-lond indicating that it has received the request and will start the database transaction (it returns "ok" to A-lonc ($client)).

# ------------------------------------------------------------------- querysend
                   } elsif ($userinput =~ /^querysend/) {
                       my ($cmd,$query)=split(/:/,$userinput);
		       $query=~s/\n*$//g;
                     print $client sqlreply("$hostid{$clientip}\&$query")."\n";
# ------------------------------------------------------------------ queryreply
                   } elsif ($userinput =~ /^queryreply/) {
                       my ($cmd,$id,$reply)=split(/:/,$userinput); 
		       my $store;
                       my $execdir=$perlvar{'lonDaemons'};
                       if ($store=IO::File->new(">$execdir/tmp/$id")) {
			   print $store $reply;
			   close $store;
			   print $client "ok\n";
		       }
		       else {
			   print $client "error:$!\n";
		       }