This document is work in progress. Feel free to modify/update. [[TableOfContents]] == Abstract == This memo is written for a new GForge developer who wants to see how GForge works and DBMS/LDAP data structures at a glance. This memo is written based on the GForge Debian Packages in http://gforge.free.fr/debian/stable. DForge == Quick Installation == On [Debian], {{{"apt-get install gforge"}}} will automatically handle everything. But if you are using other distributions like RedHat, you will need to add / edit files manually. Currently, GForge package is included in the [Debian] unstable (a.k.a. sarge). If you would like to install it on [Debian] stable (a.k.a. woody), you need to add a few lines to your /etc/apt/sources.list. For more information, please check http://gforge.free.fr/. == Overview == GForge uses PostgreSQL to store all information including user, projects, messages, etc. GForge uses LDAP to support the mail forwarding and PAM based login such as ssh and CVS. Using those two data storage systems, GForge works with PHP Web interface, CVS, Mailman, Exim, SSH, FTP, etc. The overview of GForge is depicted below: Uploads:overview.jpg Basically a user and administrator stores data into DBMS and LDAP. The crontab work is performed in a certain time period, and creates directories such as user directories and group directories. For example the update-user-group-cvs.pl script is running once an hour and creates directories if there are updated users or groups. The detailed functions are described in the crontab works section. == Data Structure == In this section, we describe the database and LDAP data structures in detail. === DATABASE === There are almost one hundred database tables for GForge. This means it is a very complicated system. Let's take a look at the three important tables first: users, groups, and user_group tables. The users table stores user information such as user name, password, email, etc. The groups table is used to save group (project) data such as group name, group desc, license, etc. Also there are many options such as use_cvs, use_forum, use_survey, etc in the groups table. The user_group table is used to store relationship between users and groups. It stores user membership and user roles in a certain group. || Tip: The user gets access to the server by SSH in the default GForge configuration. This can be a serious security issue because the default GForge installation does not have a decent chroot environment. If you would like to restrict the user from logging in to the system, you need to set the shell of that user to something other than /bin/bash. Setting it to /bin/false effectively limits the available user activity. If you want to set it to every new user by default, modify the default value of "sell" in the users table. || The status field in the user and groups tables indicates the status of the user or the group. When a user creates a the project or creates a new account, the status is in the pending status, and after confirming the user¡¯s email or approval the status is in active state. This field is used for the crontab-works to create directories or copy suitable files. The basic diagram of these three tables is show as follow: Uploads:gforgedb1.jpg === LDAP === LDAP (Lightweight Directory Access Protocol) is very useful to store directory structure data such as users and groups in GForge. LDAP data is consisted of distinguish name (dn), object, and it¡¯s attributes. [More about LDAP introduction] LDAP is an open protocol, and there are many LDAP-aware applications including Sendmail, Exim, Pam/login, etc. GForge uses LDAP for integrating login/PAM and mail aliases. The GForge system queries username and password from the LDAP server when a user tries to login to GForge using SSH or FTP. The Exim in GForge queries LDAP to find out aliases information and the mailman also uses LDAP when processing mailing list operation. LDAP stores aliases for mailing list. The LDAP and DBMS synchronization has two categories: user/group and mail aliases. The user/group LDAP data are updated at the real time by the PHP web interface. For example, whenever a user changes user/group information such as password, shell, and group members, the PHP web interface updates DBMS and LDAP together. But for mailing list aliases, the PHP web interface only updates DBMS from user input. There is a crontab script that reads all DBMS data and generates LDAP in LDIF (LDAP Data Interchange Format). Another program reads the LDIF and updates LDAP. Those scripts are running periodically. Since LDAP is an optional data storage in GForge, we can disable LDAP use by changing a variable, $sys_use_ldap=0; in /etc/gforge/local.inc. For more LDAP related configurations, see LDAP related configuration files: /etc/ldap, /etc/exim and /etc/nsswitch.conf. === Tools === In this section, we introduce several tools to manipulate database and ldap. The PostgrSQL tool is provided for PostgresSQL monitoring, and slapcat can be used to view all data in LDAP. We recommend GUI based browsing tools: * [http://www.iit.edu/~gawojar/ldap/index.html LDAP Browser/Editor] * [http://www.pgexplorer.com/ PG Explorer GUI development tool for postgres] Since using those GUI tools is trivial, we will skip the explanation. See related manual pages for more information. == Configuration == === local.inc === This file holds the local configuration options. == PHP interface == PHP handles the key part of user interaction and database update. We describe some important PHP programs in this section. === Login === The /usr/share/gforge/www/account/login.php shows the login form, and verifies the id and password using session functions. The session related common functions are defined in /usr/share/gforge/common/include/session.php. The following SQL query shows how to verify with given loginname and passwd from the user table. Note that ths passwd is stored in the md5 crypt algorithm. {{{ SELECT user_id,status,unix_pw FROM users WHERE user_name='$loginname' AND user_pw='".md5($passwd). }}} If the login is successful, the login script creates the cookie and stores the cookie and session information into the session table. After that point, the login script just compares the cookie and the session data in the session table. The following SQL shows the update statement for the session table: {{{ INSERT INTO session (session_hash, ip_addr, time, user_id) VALUES ( '".session_get_session_cookie_hash($cookie)."', '".$GLOBALS['REMOTE_ADDR']."', '".time()."', $user_id ) }}} This method is a pretty much general technique to support login with database. == Crontab-works == Crontab-works perform to generate statistics, create directories, copy files and so on. In this section we describe crontab-works. Most of GForge crontab files are located in the /etc/cron.d directory. === update-user-group-cvs === In every hour, the /usr/lib/gforge/bin/update-user-group-cvs.sh script is run by cron to update user, group and cvs. Basically it reads all entries in the database, and checks if there are any changes. If there are changes, it performs proper actions such as creating a directory, or copying files. The script calls the following three important scripts: * /usr/lib/gforge/bin/user_dump_update.pl - update user * /usr/lib/gforge/bin/group_dump_update.pl - update group * /usr/lib/gforge/bin/ssh_dump_update.pl - update ssh info === Mailman === The Mailman software is one of the best mailing list program. When an admin of a group creates a new mailing list, a PHP page gets data from user and stores it into the mail_group_list table shown bellow: [Can I add a figure reference?] Uploads:gforge_db_mail_group_list.png When a user creates a mailing list, the password is randomly generated by md function in PHP, and will be sent to the mailing list creator by email. Unfortunately, if user changes the mailman admin password through mailman Web interface, GForge has no way to know the changed password. More integration with Mailman might be necessarily. Initially, the state data field is set to 1. Then every hour, the mail crontab-work invokes the create-mailing-lists.pl script in /usr/lib/gforge/bin. This script retrieves all records whose state is 1 from the mail_group_list and creates the list using mailman tools. If the script finishes its mission successfully, the script updates the state field to 3. The script calls several mailman related tools to create the list. The following example shows how those tools are invoked by the script: {{{ * usr/sbin/newlist -q haha-test admin@users.moon.cse.ucsc.edu cdb6f1750938f613 >/dev/null 2>&1 * usr/lib/mailman/bin/config_list -o /tmp/t7JZLi haha-test * /usr/lib/mailman/bin/config_list -i /tmp/t7JZLi haha-test * /usr/lib/mailman/bin/withlist -l -r fix_url haha-test -u lists.moon.cse.ucsc.edu }}} The newlist creates the actual mailing list, configures mailing list using the config_list. For more information for each script, see man pages or visit the Mailman project homepage. For example, all the Mailman-relased commands are stored in /var/lib/mailman/bin if you use Debian. If the mailman is not working, try to run create-mailing-lists.pl manually. Also it is good idea to check mailman configuration file, mm_cfg.py in the /etc/mailman directory. {{{ ####################################################### # Here's where we get the distributed defaults. # from Defaults import * ############################################################## # Put YOUR site-specific configuration below, in mm_cfg.py . # # See Defaults.py for explanations of the values. # DEFAULT_URL = 'http://moon.cse.ucsc.edu/cgi-bin/mailman/' DEFAULT_EMAIL_HOST = 'moon.cse.ucsc.edu' DEFAULT_URL_HOST = 'moon.cse.ucsc.edu' IMAGE_LOGOS = '/images/mailman/' USE_ENVELOPE_SENDER = 0 DEFAULT_SEND_REMINDERS = 0 }}} When you add a mailing list, you need to setup some email aliases so that the list can get emails and distribute the emails. These are typical example for¡®kldp¡¯ mailing list aliases: {{{ ## kldp mailing list kldp: "|/usr/local/mailman/mail/mailman post kldp" kldp-admin: "|/usr/local/mailman/mail/mailman admin kldp" kldp-bounces: "|/usr/local/mailman/mail/mailman bounces kldp" kldp-confirm: "|/usr/local/mailman/mail/mailman confirm kldp" kldp-join: "|/usr/local/mailman/mail/mailman join kldp" kldp-leave: "|/usr/local/mailman/mail/mailman leave kldp" kldp-owner: "|/usr/local/mailman/mail/mailman owner kldp" kldp-request: "|/usr/local/mailman/mail/mailman request kldp" kldp-subscribe: "|/usr/local/mailman/mail/mailman subscribe kldp" kldp-unsubscribe: "|/usr/local/mailman/mail/mailman unsubscribe kldp" }}} Usually the aliases are saved in the /etc/aliases file. It means if a MTA (Mail Transfer Agent) an email for¡®kldp¡¯, the MTA refers aliases and sends message to the mailman program using pipe. The mailman program, /usr/local/mailman/mail/mailman, handles the email. As of Mailman version 2.1.2 or higher, you should invoke the qrunners daemon by invoking {{{/etc/init.d/mailman start}}} if you use Debian. But if you don't have a list named "mailman", you cannot run the qrunners. Be sure to create a list named "mailman" first. GForge uses LDAP to store alias information rather than using the /etc/aliases file. To store data to LDAP several scripts are used. The scripts read data from DBMS and update LDAP. The sql2ldif.pl script in the $SCRIPT directory reads DBMS data and dumps the data into the LDAP format. The install-ldap.sh calls the sql2ldif.pl and updates LDAP using LDAP client commands such as ldapadd and ldapmodify. The install-ldap.sh script is wrapped by the update-user-group-cvs.sh, and update-user-group-cvs is performed by crontab. GForge uses exim as its default MTA, and exim is configured to read data from LDAP to handle emails. For example, exim.conf in /etc/exim/ includes the following configuration to forward emails for mailing list. {{{ forward_for_gforge_lists: domains = lists.dforge.cse.ucsc.edu driver = aliasfile pipe_transport = address_pipe query = "ldap:///cn=$local_part,ou=mailingList,dc=gforge,dc=moon,dc=cse,dc=ucsc,dc=edu?debGforgeListPostaddress" search_type = ldap user = nobody group = nogroup }}} === CVS tarball === Every day, $SCRIPT_DIR/tarball.sh generates cvsroot.tar.gz which includes the whole CVS root directory for each project. The file is copied into the /var/lib/gforge/cvstarballs directoty, and an admin of a project can download their project CVS tar ball from the admin webpage. === DB update/statistics related work === Many DB update and statistics works are performed by crontab. We just show work list from the comments of the /etc/cron.d/ gforge-db-postgresql. For more information, see the cron file. {{{ # Grab projects from trove map and put into foundry_projects table # Recalculate user popularity metric # Daily recalculate of the sums under the trove map # Daily deletion of sessions, closing jobs, etc # Daily crunching of survey data and other associated ratings # Daily crunching of project summary data (counts) # Daily close pending artifacts # Daily project metrics # Weekly project metrics # Database vacuuming # Daily rotation of the activity_log # Daily aggregating of the numbers # Daily sweep of the HTTP log files for stats information # Daily sweep of the stats into final tables # Daily sweep of the HTTP log files for project activity }}} === install-ftp === Every hour, $SCRIPT_DIR/install-ftp.sh is called with the update argument. It just reads all group directories and creates ftp group pub directories in the ftp root directory. === Update LDAP === As we mentioned in the Mailman section, in every hour the $SCRIPT_DIR/install-ldap.sh script is called with the update argument, and the script reads data from DBMS and updates LDAP. === Apache === An admin can add a virtual host for a GForge project site. For example, let¡¯s assume that a group homepage is java.kldp.org, and the admin wants to use kachi.com to access the project homepage. First the admin should change the kachi.com DNS entry. Then the admin has to add a virtual host for GForge so that kachi.com can be directed to the right directory for the project. To support this, GForge provides a web interface. The web interface gets user input and stores the host name into DBMS. In every hour the $SCRIPT_DIR/create-vhosts.sh creats vhost configure for Apache and restart GForge Apache server. Personally I think it is a pretty bad idea. Maybe developing a dynamic virtual hosting apache module that reads host name from DB might be a compelling solution for that. == File Layout == Description of the important directories and files used by GForge. === Configuration === * General GForge configuration: /etc/gforge * Apache configuration: /etc/apache * Exim configuration: /etc/exim * Bind(DNS) configuration: /etc/bind === GForge executions === * PHP interface: /usr/share/gforge/www * Languages: /usr/share/gforge/www/include/languages * Theme: /usr/share/gforge/www/themes * Scripts: /usr/lib/gforge/bin ==== www/tracker/ ==== *'''browse.php''' - screen for viewing summaries of tracker items *'''detail.php''' - screen for viewing/changing tracker item data *'''dtracker.php''' - file to control which screen's file is included === Cron related directores === * Cron daemon: /etc/cron.d * Daily cron files : /etc/cron.daily * Weekly cron files: /etc/cron.weekly * Montly cron files: /etc/cron.monthly === Chroot related directories === * User directory: /var/lib/gforge/chroot/users * Group directory: /var/lib/gforge/chroot/groups * CVS directory: /var/lib/gforge/chroot/cvsroot === Cache/Log directories === * Dump (Cache): /var/lib/gforge/dumps * Log: /var/log/gforge * Apache: /var/log/apache