Documentation

warning: Creating default object from empty value in /var/www/virtual/rlogix/modules/taxonomy/taxonomy.module on line 1387.
: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /var/www/virtual/rlogix/includes/unicode.inc on line 311.

Updating dell bios from linux

http://linux.dell.com/biosdisk/

Here's how I used it:

* Download and unpack the .tar.gz (currently 0.75-2).

* I had to hack a line in the spec template in the unpacked
biosdisk tree for CentOS 5 to be happy (this was for
biosdisk 0.65 - might be fixed in the 0.75 release):

biosdisk-mkrpm-redhat-template.spec:
Change "Copyright: GPL" to "License: GPL"

* Download the BIOS for your Dell desktop/laptop. I find the
quickest way is to use the service tag
(as root: "dmidecode | grep 'Serial Number' | head -1") and
plug in into the Dell Support site (i.e. http://support.dell.com/
then "Search for Drivers" green button and then select
"Choose by Service Tag" and then enter your Service Tag).

* Select the Bios item and then download the (usually .exe)
file. I rename the .exe to something that fits into an 8.3
filing system and represents the version number some way.
E.g. for an E521's 1.1.11 BIOS, I called the file 1_1_11.exe
- I'll call that .exe from now on.

* As root in the unpacked biosdisk tree (where you've also
put the .exe file):
./biosdisk mkpkg $PWD/.exe
mv /var/lib/biosdisk/rpm/RPMS/noarch/*.rpm /tmp
rpm -Uvh /tmp/*.rpm
(Note I've used /tmp to move the rpm to here - if you want
to keep it, store it somewhere that won't get cleaned out)

* Reboot your machine and you should get a "BIOS "
option in your grub menu. Select that and you can flash your
BIOS - no USB keys/CDs/virtual machines needed!

Documentation

Syncing perl modules between servers/architectures

This probably should be automated....

Configure Cpan: cpan -e
cpan> o conf prerequisites_policy follow
cpan> o conf commit

On source host:
cpan> autobundle

Open Bundle file. Remove header and footer files, leaving modules in between. Copy the bundle file over to dest host as /tmp/modulelist.

On dest host:
PERL_MM_USE_DEFAULT=1 /sites/utils/bin/perl -MCPAN -e 'open(F, "/tmp/modulelist"); while () { split(" ", $_); next if (!@_[0]); print "INSTALLING: @_[0]\n"; CPAN::Shell->notest(install => @_[0]); }'

Documentation

fiding files deleted but still in use by process

find /proc -links 0 -type f -size +XXXXc -ls

Documentation

php 5.2.6 extra extensions

Installing PHP 5.2.6 on CentOS 5 - extra extensions

Update 23rd May 2009 - It appears dbase, readline, json and filter were included either compiled into php or as a shared module in php-common, so I’ve removed them from the spec.

In using the installation of PHP 5.2.6 on CentOS, I noticed that there were a few modules missing from the repository that are included as part of CentOS’s extras repository. They are nowhere to be found in the testing repository.

I found an easy way to build the RPM’s by downloading both the testing php source RPM (SRPM), and the extras php-extras-5.1.6 SRPM, rewriting the php-extras spec file so it incorporated the same patches, and building it as the php-5.2.6 version.

The php-extras include the dbase, readline, mcrypt, mhash, tidy and mssql extensions. In my spec file, I have removed dbase and readline, as they have moved to php-common. I have also included the newly added extensions to php 5.2; json and filter. In future versions of CentOS, these should hopefully be updated in the extras or the base repository.

The steps that are needed:

Set up a source RPM build environment

useradd mockbuild
mkdir /usr/src/redhat
yum install rpm-build

Download and install the SRPMs, so that their build files are in the environment

wget http://mirror.centos.org/centos/5.3/extras/SRPMS/php-extras-5.1.6-15.el5...
wget http://dev.centos.org/centos/5/testing/SRPMS/php-5.2.6-2.el5s2.src.rpm

rpm -i php-extras-5.1.6-15.el5.centos.1.src.rpm
rpm -i php-5.2.6-2.el5s2.src.rpm

Save this php-extras.spec file to /usr/src/redhat/SPECS/php-extras.spec. The changes I have made reference the patches contained in the php SRPM, and have dbase and readline removed.also include the json and filter extensions to be built into RPMs as well.

Now run the build process

rpmbuild -ba /usr/src/redhat/SPECS/php-extras.spec

You will probably get an error at the start e.g.

error: Failed build dependencies:
freetds-devel is needed by php-extras-5.2.6-1.1.i386
libtidy-devel is needed by php-extras-5.2.6-1.1.i386

Install all of the packages mentioned, e.g.

yum install -y freetds-devel libtidy-devel

Now run the build process again, and by the end, you should have both an SRPM and the RPMs of the php extras.

You can put these in a repository or install them as the following:

rpm -i /usr/src/redhat/RPMS/i386/php-tidy-5.2.6-1.1.i386.rpm

Documentation

nagios ndomod data_processing_options calculator

NDO data processing options Calculator

Event broker options Calculator

Documentation

Common Mysql queries

Common MySQL Queries

*Aggregates*	Find available reservation periods	Trees of known depth	Parties with candidates in all districts
Basic aggregation	Find overlapping periods	*JOIN*	Who makes all the parts for a given assembly?
Aggregates across multiple joins	Find sequenced duplicates	Approximate joins	*Sequences*
Aggregates excluding leaders	Is a given booking period available?	Cascading JOINs	Find blocks of unused numbers
Aggregates of specified size	Julian date	Data-driven joins	Find missing numbers in a sequence
Avoiding repeat aggregation	Last business day before a reference date	Full Outer Join	Find previous and next values in a sequence
Cascading aggregates	Make a calendar table	Intersection and difference	Find row with next value of specified columns
Cross-aggregates	Peak visit counts by datetime period	Many-to-many joins	Find sequence starts and ends
Group data by datetime periods	Sum for time periods	What else did buyers of X buy?	Find specific sequences
League table	Sum time values	*Join or subquery?*	Gaps in a time series
Sales commissions: double aggregation	The date of next Thursday	Parents without children	Make values of a column sequential
Show only one child row per parent row	*Schedules*	Parties who have contracts with one another	Track stepwise project completion
Skip repeating values	Game schedule	The unbearable slowness of IN()	Winning Streaks
Within-group aggregates	Pivot table schedule	The [Not] Exists query pattern	*Spherical geometry*
Within-group aggregates with a wrinkle	*DDL*	What exams did a student not register for?	Great circle distance
Within-group quotas (Top N per group)	Add auto-incrementing primary key to a table	*NULLs*	*Statistics without aggregates*
*Aggregates and Statistics*	Auto-increment: reset next value	List NULLs at end of query output	Moving average
Average the top 50% values per group	Change or drop a foreign key	Parents with and without children	Multiple sums across a join
Averages from bands of values	Compare structures of two tables	*Ordering resultsets*	Percentiles
Count unique values of one column	Compare two databases	Next row	Random row selection
Median	Find child tables	Order by a column containing digits and letters	Running Sum
Mode	Find parent tables	Order by month name	Sum across categories
Rank order	Find primary key of a table	Order by numerics then alphas	Top ten
*Data comparison*	Find the size of all databases on the server	Suppress repeating ordering values	*Stored procedures*
Backslashes in data	List differences between two databases	*Pagination*	A cursor if necessary, but not necessarily a cursor
Compare data in two tables	List users of a database	Pagination	Emulate sp_exec
Show rows where column value changed	Rename Database	*Pivot tables*	Variable-length argument for query IN() clause
*Date and time*	Show Create Trigger	Automate the writing of pivot table queries	*Strings*
Age in years	Show Tables	Column value associations	Count delimited substrings
Appointments available	*Frequencies*	Group column statistics in rows	Count substrings
Count business days between two dates	Display column values which occur N times	Pivot table using math tricks	Levenshtein distance
Count Tuesdays between two dates	Display every Nth row	Pivot table with CONCAT	Proper case
Date of first Friday of next month	*Graphs and Hierarchies*	Pivot table without GROUP_CONCAT	Retrieve octets from IP addresses
Date of Monday in a given week of the year	Trees, networks and parts explosions in MySQL	*Relational division*	Return digits or alphas from a string
Datetime difference	Dijkstra’s shortest path algorithm	All possible recipes with given ingredients	Strip HTML tags

Basic aggregation

This is the simplest grouping query pattern. For column foo, display the first (smallest), last (largest) or average value of column bar.

SELECT foo, MIN(bar) AS bar FROM tbl GROUP BY foo
To return the highest value, and order top to bottom by that value:


SELECT foo, MAX(bar) AS Count

FROM tbl GROUP BY foo ORDER BY Count DESC;
Ditto for AVG(), COUNT() etc. It is easily extended for multiple grouping column expressions.

Back to the top

Browse the book

Buy the book

Feedback

Aggregates across multiple joins

Given a parent table and two child tables, a query which sums values in both child tables, grouping on a parent table column, returns sums which are exactly twice as large as they should be. In this recent example from the MySQL General Discussion list:


CREATE TABLE packageItem (
 packageItemID INT, 


 packageItemName CHAR(20), 


 packageItemPrice DECIMAL(10,2)


);


INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00);
CREATE TABLE packageCredit (
 packageCreditID INT, 


 packageCreditItemID INT, 


 packageItemType CHAR(10), 


 packageCreditAmount DECIMAL(10,2)


);  


INSERT INTO packageCredit VALUES
(1,1,'Deposit',25.00),


(2,1,'Balance',92.00);
CREATE TABLE packageItemTax (


 packageItemTaxID INT, 


 packageItemTaxItemID INT, 


 packageItemTaxName CHAR(5), 


 packageItemTaxAmount DECIMAL(10,2)

); INSERT INTO packageItemTax VALUES (1,1,'GST',7.00), (2,1,'HST',10.00);
The query ...


SELECT 


  i.packageItemID,


  i.packageItemName,
  i.packageItemPrice,


  SUM(t.packageItemTaxAmount) as Tax,


  SUM(c.packageCreditAmount) as Credit


FROM packageItem i


LEFT JOIN packageCredit c ON i.packageItemID=c.packageCreditItemID


LEFT JOIN packageItemTax t ON i.packageItemID=t.packageItemTaxItemID

GROUP BY i.packageItemID;
returns ...


+---------------+-----------------+------------------+-------+--------+


| packageItemID | packageItemName | packageItemPrice | Tax   | Credit |

+---------------+-----------------+------------------+-------+--------+ | 1 | Delta Hotel | 100.00 | 34.00 | 234.00 | +---------------+-----------------+------------------+-------+--------+
With three child tables, the sums are tripled. Why? Because the query aggregates across each join.How then to get the correct results? With correlated subqueries:


SELECT 


  packageItemID,


  SUM(packageItemPrice) AS PriceSum,


  ( SELECT SUM(c.packageCreditAmount) 


    FROM packageCredit c 


    WHERE c.packageCreditItemID = packageItemID
  ) AS CreditSum,


  ( SELECT SUM(t.packageItemTaxAmount) 


    FROM packageItemTax t 


    WHERE t.packageItemTaxItemID = packageItemID


  ) AS TaxSum
FROM packageItem


GROUP BY packageItemID;


+---------------+----------+-----------+--------+


| packageItemID | PriceSum | CreditSum | TaxSum |


+---------------+----------+-----------+--------+

| 1 | 100.00 | 117.00 | 17.00 | +---------------+----------+-----------+--------+

If subqueries are unavailable or too slow, replace them with temp tables.

Back to the top

Browse the book

Buy the book

Feedback

Aggregates excluding leaders

You have a table of grouped ranks ...


DROP TABLE IF EXISTS grps,ranks;
CREATE TABLE grps (grp int);


INSERT INTO grps VALUES(1),(2),(3),(4);


CREATE TABLE ranks(grp int,rank int);


INSERT INTO ranks VALUES(1, 4 ),(1, 7 ),(1, 9 ),(2, 2 ),(2, 3 ),(2, 5 ),(2, 6 ),(2, 8 ),(3, 1 ),(4,11 ),(4,12 ),(4,13 );

and you wish to list ranks by group omitting the leading rank in each group. The simplest query for group leaders is ...


SELECT grp, MIN(rank) as top 


FROM ranks r2
GROUP BY grp


+------+------+


| grp  | top  |


+------+------+


|    1 |    4 |


|    2 |    2 |

| 3 | 1 | | 4 | 11 | +------+------+
The simplest way to get a result that omits these is an exclusion join from the ranks table to the above result:


SELECT r1.grp, r1.rank


FROM ranks r1


LEFT JOIN (


  SELECT grp, MIN(rank) as top 
  FROM ranks r2


  GROUP BY grp


) AS r2 ON r1.grp=r2.grp AND r1.rank = r2.top


WHERE r2.grp IS NULL
ORDER BY grp, rank;


+------+------+


| grp  | rank |


+------+------+


|    1 |    7 |


|    1 |    9 |
|    2 |    3 |


|    2 |    5 |


|    2 |    6 |


|    2 |    8 |

| 4 | 12 | | 4 | 13 | +------+------+

Back to the top

Browse the book

Buy the book

Feedback

Aggregates of specified size

Find the values of a table column c1 for which there are a specified number of listed values in another column c2.

To get an overview of the values of c2 for each value of c1:


SELECT

c1, GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values' FROM table GROUP BY c1;
To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of items in the list ...


SELECT c1 


FROM table


WHERE c2 IN (1,2,3,4)


GROUP BY c1


HAVING COUNT(c2)=4;

This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.

To list c1 values that have exactly one instance of each c2 value, add DISTINCT to the count criterion:


SELECT c1 


FROM table


WHERE c2 IN (1,2,3,4)


GROUP BY c1


HAVING COUNT(DISTINCT c2)=4;

Back to the top

Browse the book

Buy the book

Feedback

Avoiding repeat aggregation

In a good introductory tutorial on MySQL subqueries, Jeremy Cole developed a triply nested query to retrieve the name, population, head of state, and number of official languages in countries with the most official languages. The query uses two tables from the MySQL world database:


CREATE TABLE country (


  Code char(3) NOT NULL DEFAULT '' PRIMARY KEY,


  Name char(52) NOT NULL DEFAULT '',
  Population int(11) NOT NULL DEFAULT '0',


  HeadOfState char(60) DEFAULT NULL,


  ... other columns ...


);


CREATE TABLE countrylanguage (
  CountryCode char(3) NOT NULL DEFAULT '' PRIMARY KEY,


  Language char(30) NOT NULL DEFAULT '',


  IsOfficial enum('T','F') NOT NULL DEFAULT 'F',

Percentage float(4,1) NOT NULL DEFAULT '0.0' );
The query needs to aggregate language counts by country twice: once for all language counts by country, and once again to identify countries with the highest number of languages:


SELECT name, population, headofstate, top.num
FROM Country


JOIN ( 


  SELECT countrycode, COUNT(*) AS num


  FROM CountryLanguage


  WHERE isofficial='T'


  GROUP BY countrycode
  HAVING num = (


    SELECT MAX(summary.nr_official_languages)


    FROM  (


      SELECT countrycode, COUNT(*) AS nr_official_languages


      FROM CountryLanguage 


      WHERE isofficial='T' 
      GROUP BY countrycode


    ) AS summary


  )


) as top ON Country.code=top.countrycode;


+--------------+------------+-------------+-----+


| name         | population | headofstate | num |
+--------------+------------+-------------+-----+


| Switzerland  |    7160400 | Adolf Ogi   |   4 |


| South Africa |   40377000 | Thabo Mbeki |   4 |

+--------------+------------+-------------+-----+
In addition, one of the nested subqueries is buried in a HAVING clause. This is fine with small tables, but if the table being aggregated is very large and the aggregation is complex, performance may be unsatisfactory. Substituting a temporary table for the double nesting can improve performance in two ways:

the aggregation needs to be done just once
we can use an exclusion join, which is usually faster than a HAVING clause, to find countries with the maximum counts:


DROP TABLE IF EXISTS top;


CREATE TABLE top ENGINE=MEMORY


  SELECT countrycode, COUNT(*) AS num


  FROM CountryLanguage l1
  WHERE isofficial='T'


  GROUP BY countrycode;
SELECT name,population,headofstate,t3.num


FROM country c


JOIN (


  SELECT t1.countrycode, t1.num


  FROM top t1
  LEFT JOIN top t2 ON t1.num < t2.num


  WHERE t2.countrycode IS NULL


) AS t3 ON c.code=t3.countrycode;


+--------------+------------+-------------+-----+
| name         | population | headofstate | num |


+--------------+------------+-------------+-----+


| Switzerland  |    7160400 | Adolf Ogi   |   4 |

| South Africa | 40377000 | Thabo Mbeki | 4 | +--------------+------------+-------------+-----+ DROP TABLE top;

You notice that we haven't actually used a TEMPORARY table? Indeed we haven't, because of the MySQL limitation that temporary tables cannot be referenced multiple times in a query. Until that's lifted, we get almost as much speed improvement from using a MEMORY table as a temporary table.

Back to the top

Browse the book

Buy the book

Feedback

Cascading aggregates

When you have parent-child-grandchild tables, eg companies, users, actions, and your query requirement is for per-parent aggregates from the child table and per-child aggregates from the grandchild table, then cascading joins yield spuriously multiplied counts, and correlated subqueries fail because the second correlated subquery cannot find a visible joining column.

One solution is to use derived tables. Assuming ...


CREATE TABLE companies (id int, name char(10));

CREATE TABLE users (id INT,companyid INT); CREATE TABLE actions (id INT, userid INT, date DATE);

then...

Join companies & users once to establish a derived company-user table.
Join them a second time, this time aggregating on users.id to generate user counts per company.
Join the first derived table to the actions table, aggregating on actions.id to report actions per user per company:

Here is the SQL:


SELECT cu1.cid, cu1.cname, cu2.cid, cu2.uCnt, ua.aCnt


FROM (


  SELECT c.id AS cid, c.name AS cname, u1.id AS uid 
  FROM companies c


  INNER JOIN users u1 ON u1.companyid=c.id


) AS cu1


INNER JOIN (


  SELECT c.id AS cid, COUNT(u2.id) AS uCnt
  FROM companies c


  INNER JOIN users u2 ON u2.companyid=c.id


  GROUP BY c.id


) AS cu2 ON cu1.cid=cu2.cid
INNER JOIN (


  SELECT u3.id AS uid, COUNT(a.id) AS aCnt


  FROM users u3


  INNER JOIN actions a ON a.userid=u3.id

GROUP BY u3.id ) AS ua ON ua.uid=cu1.uid;

Back to the top

Browse the book

Buy the book

Feedback

Cross-aggregates

Given the table authorbook(authid INT, bookid INT), what query finds the books who have authors with more than one book in the table?

Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:


SELECT a1.bookid


FROM authorbook a1
INNER JOIN (


  SELECT authid,count(bookid)


  FROM authorbook a2


  GROUP BY authid


  HAVING COUNT(bookid)>1


) AS a3 ON a1.authid=a3.authid;

Back to the top

Browse the book

Buy the book

Feedback

Group data by datetime periods

To group rows by a time period whose length in minutes divides evenly into 60, use this formula:


GROUP BY ((60/periodMinutes) * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / periodMinutes ))

where thistime is the TIME column and periodMinutes is the period length in minutes. So to group by 15-min periods, write ...


SELECT ...
GROUP BY ( 4 * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / 15 ))


...

The same logic works for months ...


GROUP BY ((12/periodMonths) * YEAR( thisdate ) + FLOOR( MONTH( thisdate ) / periodMonths ))

It could be made to work for weeks with a function that maps the results of WEEK() to the range 1...52.

When the desired grouping period is a value returned by a MySQL date-time function, matters become simpler: just group by the desired value. Thus to group by weeks, write ..


SELECT ...


GROUP BY WEEK( datecol)


...

If there is no MySQL date-time function that returns the desired grouping period, you will need to write your own stored function.

You can also Group By an expression like


  CEIL( TIME_TO_SEC( TIMEDIFF( timestamp1, timestamp2 )) / (60*60) )

modifying the denominator to suit.

Back to the top

Browse the book

Buy the book

Feedback

League table

Here is a simple soccer league table setup that was developed in the MySQL Forum by J Williams and a contributor named "Laptop Alias". The teams table tracks team ID and name, the games table tracks home and away team IDs and goal totals for each game. The query for standings is built by aggregating a UNION of home team and away team game results:


DROP TABLE IF EXISTS teams, games;


CREATE TABLE teams(id int primary key auto_increment,tname char(32));


CREATE TABLE games(id int primary key auto_increment, date datetime, 
                   hteam int, ateam int, hscore tinyint,ascore tinyint);
INSERT INTO teams VALUES(1,'Wanderers'),(2,'Spurs'),(3,'Celtics'),(4,'Saxons');


INSERT INTO games VALUES


(1,'2008-1-1 20:00:00',1,2,1,0),(2,'2008-1-1 20:00:00',3,4,0,2),
(3,'2008-1-8 20:00:00',1,3,1,1),(4,'2008-1-8 20:00:00',2,4,2,1);


SELECT * FROM teams;


+----+-----------+


| id | tname     |


+----+-----------+


|  1 | Wanderers |
|  2 | Spurs     |


|  3 | Celtics   |


|  4 | Saxons    |


+----+-----------+


SELECT * FROM games;
+----+---------------------+-------+-------+--------+--------+


| id | date                | hteam | ateam | hscore | ascore |


+----+---------------------+-------+-------+--------+--------+


|  1 | 2008-01-01 20:00:00 |     1 |     2 |      1 |      0 |
|  2 | 2008-01-01 20:00:00 |     3 |     4 |      0 |      2 |


|  3 | 2008-01-08 20:00:00 |     1 |     3 |      1 |      1 |
|  4 | 2008-01-08 20:00:00 |     2 |     4 |      2 |      1 |


+----+---------------------+-------+-------+--------+--------+
-- Standings query:
SELECT 


  tname AS Team, Sum(P) AS P,Sum(W) AS W,Sum(D) AS D,Sum(L) AS L,


  SUM(F) as F,SUM(A) AS A,SUM(GD) AS GD,SUM(Pts) AS Pts 
FROM(


  SELECT 


    hteam Team, 


    1 P,


    IF(hscore > ascore,1,0) W,


    IF(hscore = ascore,1,0) D,


    IF(hscore < ascore,1,0) L,
    hscore F,


    ascore A,


    hscore-ascore GD,


    CASE WHEN hscore > ascore THEN 3 WHEN hscore = ascore THEN 1 ELSE 0 END PTS
  FROM games


  UNION ALL


  SELECT 


    ateam,


    1,


    IF(hscore < ascore,1,0),


    IF(hscore = ascore,1,0),


    IF(hscore > ascore,1,0),


    ascore,


    hscore,
    ascore-hscore GD,


    CASE WHEN hscore < ascore THEN 3 WHEN hscore = ascore THEN 1 ELSE 0 END
  FROM games


) as tot


JOIN teams t ON tot.Team=t.id 


GROUP BY Team 


ORDER BY SUM(Pts) DESC ;
+-----------+------+------+------+------+------+------+------+------+


| Team      | P    | W    | D    | L    | F    | A    | GD   | Pts  |
+-----------+------+------+------+------+------+------+------+------+


| Wanderers |    2 |    1 |    1 |    0 |    2 |    1 |    1 |    4 |
| Spurs     |    2 |    1 |    0 |    1 |    2 |    2 |    0 |    3 |
| Saxons    |    2 |    1 |    0 |    1 |    3 |    2 |    1 |    3 |
| Celtics   |    2 |    0 |    1 |    1 |    1 |    3 |   -2 |    1 |

+-----------+------+------+------+------+------+------+------+------+

Back to the top

Browse the book

Buy the book

Feedback

Sales commissions: double aggregation

Employees' sales commission rates increase as sales totals increase, according to specified bands of sales total amounts—like a graduated income tax in reverse. How do we automate sales commission calculations?


DROP TABLE IF EXISTS tblsales, tblcom;


CREATE TABLE tblsales(employeeID int,sales int);


INSERT INTO tblsales VALUES(1,2),(1,5),(1,7),(2,9),(2,15),(2,12);
SELECT * FROM tblsales;


+------------+-------+


| employeeID | sales |


+------------+-------+


|          1 |     2 |


|          1 |     5 |
|          1 |     7 |


|          2 |     9 |


|          2 |    15 |


|          2 |    12 |
+------------+-------+


CREATE TABLE tblcom(


  comstart DECIMAL(6,2),


  commend DECIMAL(6,2),


  comfactor DECIMAL(6,2),


  pct INT


);


INSERT INTO tblcom VALUES
(1.00,10.00,0.10,10),(11.00,20.00,0.20,20),(21.00,30.00,0.30,30),(31.00,40.00,0.40,40);


SELECT * FROM tblcom;


+----------+---------+-----------+------+


| comstart | commend | comfactor | pct  |


+----------+---------+-----------+------+


|     1.00 |   10.00 |      0.10 |   10 |
|    11.00 |   20.00 |      0.20 |   20 |


|    21.00 |   30.00 |      0.30 |   30 |

| 31.00 | 40.00 | 0.40 | 40 | +----------+---------+-----------+------+
The first problem is to work out how commission ranges map to sales totals to determine base amounts for calculation of each part-commission. We assume the ranges are inclusive, ie a range that starts at 1 euro is meant to include that first euro:

if amt < comstart, base amount = 0
if amt <= commend, base amount = amt-comstart+1
if amt > commend, base amount = commend - comstart+1

This is a nested IF():

IF(s.amt<c.comstart,0,IF(s.amt<=c.commend,s.amt-c.comstart,c.commend-c.comstart))
The second problem is how to apply every commission range row to every employee sales sum. That's a CROSS JOIN between aggregated sales and commissions:


SELECT *


FROM (


  SELECT employeeID,SUM(sales) AS amt


  FROM tblSales


  GROUP BY employeeID


) AS s
JOIN tblcom


ORDER BY s.employeeID;


+------------+------+----------+---------+-----------+------+


| employeeID | amt  | comstart | commend | comfactor | pct  |
+------------+------+----------+---------+-----------+------+


|          1 |   14 |     1.00 |   10.00 |      0.10 |   10 |


|          1 |   14 |    11.00 |   20.00 |      0.20 |   20 |
|          1 |   14 |    21.00 |   30.00 |      0.30 |   30 |


|          1 |   14 |    31.00 |   40.00 |      0.40 |   40 |
|          2 |   36 |    31.00 |   40.00 |      0.40 |   40 |


|          2 |   36 |     1.00 |   10.00 |      0.10 |   10 |
|          2 |   36 |    11.00 |   20.00 |      0.20 |   20 |


|          2 |   36 |    21.00 |   30.00 |      0.30 |   30 |

+------------+------+----------+---------+-----------+------+
Now check how the formula applies on every commission band for every sales total:


SELECT 


  s.employeeID,s.amt,c.comstart,c.commend,


  IF(s.amt<=c.comstart,0, 


     IF(s.amt<c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)


  ) AS base,
  c.comFactor AS ComPct,


  IF(s.amt<=c.comstart,0,


    IF(s.amt<c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)


  ) * comFactor AS Comm


FROM (


  SELECT employeeID,SUM(sales) AS amt
  FROM tblSales


  GROUP BY employeeID


) AS s


JOIN tblcom c


ORDER BY s.employeeID,comstart;


+------------+------+----------+---------+-------+--------+--------+


| employeeID | amt  | comstart | commend | base  | ComPct | Comm   |
+------------+------+----------+---------+-------+--------+--------+


|          1 |   14 |     1.00 |   10.00 | 10.00 |   0.10 | 1.0000 |


|          1 |   14 |    11.00 |   20.00 |  4.00 |   0.20 | 0.8000 |
|          1 |   14 |    21.00 |   30.00 |  0.00 |   0.30 | 0.0000 |


|          1 |   14 |    31.00 |   40.00 |  0.00 |   0.40 | 0.0000 |
|          2 |   36 |     1.00 |   10.00 | 10.00 |   0.10 | 1.0000 |


|          2 |   36 |    11.00 |   20.00 | 10.00 |   0.20 | 2.0000 |
|          2 |   36 |    21.00 |   30.00 | 10.00 |   0.30 | 3.0000 |


|          2 |   36 |    31.00 |   40.00 |  6.00 |   0.40 | 2.4000 |

+------------+------+----------+---------+-------+--------+--------+
Finally, SUM the formula results to aggregate commissions on aggregated sales:


SELECT 


  s.employeeID,


  s.amt,


  SUM(IF(s.amt<=c.comstart,0,


        IF(s.amt<=c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)) * c.pct/100
  ) AS Comm


FROM (


  SELECT employeeID,SUM(sales) AS amt


  FROM tblSales


  GROUP BY employeeID


) AS s
JOIN tblcom c


GROUP BY s.employeeID;


+------------+------+----------+


| employeeID | amt  | Comm     |


+------------+------+----------+


|          1 |   14 | 1.800000 |

| 2 | 36 | 8.400000 | +------------+------+----------+

Back to the top

Browse the book

Buy the book

Feedback

Show only one child row per parent row

Given tables parent(id int not null primary key, etc...) and child (id int not null primary key, pid int not null references parent (id), etc...), how do we write a query that retrieves only one child row per pid even when the child table has multiple matching rows? MySQL permits use of GROUP BY even when the SELECT list specifies no aggregate function, so this will work:


select p.id, c.id 


from parent p 


join child c on p.id=c.pid


group by p.id;

But is it accurate? No, because it displays only the first c.pid value it happens to find. For further discussion see Within-group aggregates.

Back to the top

Browse the book

Buy the book

Feedback

Skip repeating values

You want to report all unique values of a column and skip all rows repeating any of these values.

SELECT col FROM foo GROUP BY col

Back to the top

Browse the book

Buy the book

Feedback

Within-group aggregates

You have a products table with columns item, supplier, price. Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item.


DROP TABLE IF EXISTS products;


CREATE TABLE products(item int,supplier int,price decimal(6,2));


INSERT INTO products VALUES(1,1,10),(1,2,15),(2,2,20),(2,1,21),(2,2,18);


SELECT * FROM products;
+------+----------+-------+


| item | supplier | price |


+------+----------+-------+


|    1 |        1 | 10.00 |


|    1 |        2 | 15.00 |
|    2 |        2 | 20.00 |


|    2 |        1 | 21.00 |


|    2 |        2 | 18.00 |

+------+----------+-------+
Your first thought may be to GROUP BY item, but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.

This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous at best, so we think better names for it are subaggregates, inner aggregates, or within-group aggregates.

It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...


SELECT item,supplier,MIN(price)

FROM products GROUP BY item;
Will this reliably return the correct supplier per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary. To retrieve the correct supplier for each item, you need more logic.

The simplest and often best-performing solution to the within-aggregates problem is an outer self exclusion join...


SELECT p1.item,p1.supplier,p1.price


FROM products AS p1


LEFT JOIN products AS p2 ON p1.item = p2.item AND p1.price > p2.price

WHERE p2.item IS NULL;
...because in the resultset built by joining on left item = right item and left price > right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.

You can also accomplish this by building a table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:


CREATE TEMPORARY TABLE tmp (
  item INT,


  minprice DECIMAL DEFAULT 0.0


);


LOCK TABLES products READ;


INSERT INTO tmp 


  SELECT item, MIN(price)

FROM products GROUP BY item;
to which you then join the products table:


SELECT products.item, supplier, products.price 
FROM products 


JOIN tmp ON products.item = tmp.item


WHERE products.price=tmp.minprice;


UNLOCK TABLES;


DROP TABLE tmp;

From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:


SELECT item, supplier, price


FROM products AS p1


WHERE price = (

SELECT MIN(p2.price) FROM products AS p2 WHERE p1.item = p2.item );
Another solution, sometimes the fastest of all, is to move the aggregating subquery from the WHERE clause to the FROM clause:


SELECT p.item, p.supplier, p.price


FROM products AS p


JOIN (


  SELECT item, MIN(price) AS minprice
  FROM products


  GROUP BY item


) AS pm ON p.item = pm.item AND p.price = pm.minprice;

Finally, here is a rather different approach which can find both within-group minima and within-group maxima in a single query. The idea is to aggregate the concatenated within-group grouped column value and the within-group grouping column name in a single string, then use SUBSTR() to break them apart in the result:


SELECT 


  item,


  SUBSTR( MIN( CONCAT( LPAD(price,6,0),supplier) ), 7)   AS MinSupplier,
    LEFT( MIN( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MinPrice,


  SUBSTR( MAX( CONCAT( LPAD(price,6,0),supplier) ), 7)   AS MaxSupplier,


    LEFT( MAX( CONCAT( LPAD(price,6,0),supplier) ), 6)+0 AS MaxPrice
FROM  products


GROUP BY item;


+------+-------------+----------+-------------+----------+


| item | MinSupplier | MinPrice | MaxSupplier | MaxPrice |
+------+-------------+----------+-------------+----------+


|    1 | 1           |       10 | 2           |       15 |


|    2 | 2           |       18 | 1           |       21 |

+------+-------------+----------+-------------+----------+
Try all solutions to find which is fastest for your version of the problem.

To find the top or bottom N per group, you might think the LIMIT clause would work, but LIMIT is limited in subqueries. See Within-group quotas.

Back to the top

Browse the book

Buy the book

Feedback

Within-group aggregates with a wrinkle

We have a wages table holding wage rates by waiter and startdate, and a tips table which tracks hours worked and tips received per waiter per day. The requirement is to report wages and concurrent tips per waiter per day.


DROP TABLE IF EXISTS wages,tips;
CREATE TABLE wages( id int, waiter int, start date, rate decimal(6,2));


INSERT INTO wages VALUES


( 1, 4, '2005-01-01', 5.00 ),
( 2, 4, '2005-03-01', 6.00 ),


( 3, 5, '2007-01-05', 7.00 ),


( 4, 5, '2008-03-20', 8.00 ),
( 5, 5, '2008-04-01', 9.00 );


CREATE TABLE tips(


  id int, 


  date date, 


  waiter int, 
  hours_worked smallint, 


  tabs smallint, 


  tips decimal(6,2)


);


INSERT INTO tips VALUES


( 1, '2008-02-29', 4, 6.50, 21, 65.25 ),
( 2, '2008-03-06', 5, 6.00, 15, 51.75 ),


( 3, '2008-03-21', 4, 2.50, 5, 17.85 ),


( 4, '2008-03-22', 5, 5.25, 10, 39.00 );
SELECT * FROM wages;


+------+--------+------------+------+


| id   | waiter | start      | rate |


+------+--------+------------+------+


|    1 |      4 | 2005-01-01 | 5.00 |
|    2 |      4 | 2005-03-01 | 6.00 |


|    3 |      5 | 2007-01-05 | 7.00 |
|    4 |      5 | 2008-03-20 | 8.00 |


|    5 |      5 | 2008-04-01 | 9.00 |
+------+--------+------------+------+


SELECT * FROM tips;


+------+------------+--------+--------------+------+-------+


| id   | date       | waiter | hours_worked | tabs | tips  |
+------+------------+--------+--------------+------+-------+


|    1 | 2008-02-29 |      4 |            7 |   21 | 65.25 |


|    2 | 2008-03-06 |      5 |            6 |   15 | 51.75 |
|    3 | 2008-03-21 |      4 |            3 |    5 | 17.85 |


|    4 | 2008-03-22 |      5 |            5 |   10 | 39.00 |

+------+------------+--------+--------------+------+-------+
For the above dataset, the result which correctly matches wages and tips would be:


+------+------------+------+------+-------+------+--------+------+------------+


| tid  | Date       | Hrs  | tabs | tips  | wid  | waiter | rate | start      |
+------+------------+------+------+-------+------+--------+------+------------+


|    1 | 2008-02-29 |    7 |   21 | 65.25 |    2 |      4 | 6.00 | 2005-03-01 |
|    2 | 2008-03-06 |    6 |   15 | 51.75 |    3 |      5 | 7.00 | 2007-01-05 |
|    3 | 2008-03-21 |    3 |    5 | 17.85 |    2 |      4 | 6.00 | 2005-03-01 |
|    4 | 2008-03-22 |    5 |   10 | 39.00 |    4 |      5 | 8.00 | 2008-03-20 |

+------+------------+------+------+-------+------+--------+------+------------+
Why is this different from an ordinary within-groups aggregate? The correct wage rate for a tips row is not the wages row for that waiter with the latest date; it is the wages row having the latest date before the date in the given tips row.

One way to proceed is to build a temporary table from a join of wages to tips on waiter and wages.start < tips.date, then exclusion-join that result to itself to remove all rows except those with the latest wage rate per tips row. A two-step:


-- wages-tips join removing later wage changes:
DROP TABLE IF EXISTS tmp;


CREATE TABLE tmp


SELECT 


  t.id AS tid, t.date AS Date, t.hours_worked AS Hrs,t.tabs,t.tips,
  w.id AS wid, w.waiter, w.rate, w.start


FROM tips t


JOIN wages w ON w.waiter=t.waiter AND w.start<=t.date;
-- self-exclusion join to remove obsolete wage rows:


SELECT t1.*


FROM tmp t1


LEFT JOIN tmp t2 ON t1.tid=t2.tid and t1.start<t2.start
WHERE t2.waiter is null


ORDER BY t1.Date;


+------+------------+------+------+-------+------+--------+------+------------+


| tid  | Date       | Hrs  | tabs | tips  | wid  | waiter | rate | start      |
+------+------------+------+------+-------+------+--------+------+------------+


|    1 | 2008-02-29 |    7 |   21 | 65.25 |    2 |      4 | 6.00 | 2005-03-01 |
|    2 | 2008-03-06 |    6 |   15 | 51.75 |    3 |      5 | 7.00 | 2007-01-05 |
|    3 | 2008-03-21 |    3 |    5 | 17.85 |    2 |      4 | 6.00 | 2005-03-01 |
|    4 | 2008-03-22 |    5 |   10 | 39.00 |    4 |      5 | 8.00 | 2008-03-20 |

+------+------------+------+------+-------+------+--------+------+------------+ DROP TABLE tmp;
That's fine, but can we skip the temp table? Yes—by adding the condition wages.start <= tips.date to each side of the exclusion join:


SELECT 
  t.id AS tid, t.date, t.hours_worked AS Hrs,t.tabs,t.tips,


  w.id AS wid, w.waiter, w.rate, w.start


FROM tips t


JOIN wages w ON w.waiter=t.waiter AND w.start <= t.date
LEFT JOIN wages w2 ON w.waiter=w2.waiter AND w2.start<=t.date AND w.start<w2.start


WHERE w2.id IS NULL


ORDER BY t.date;

Much simpler, and it gives the same result as the two-step.

Back to the top

Browse the book

Buy the book

Feedback

Within-group quotas (Top N per group)

A table has multiple rows per key value, and you need to retrieve, say, the first or earliest two rows per key.

If the groups are fairly small, this can be done efficiently with a self-join and counts. For example the following table (based on a tip by Rudy Limeback) has three small data groups:


DROP TABLE IF EXISTS test;


CREATE TABLE test (
  id INT, 


  entrydate DATE 


);


INSERT INTO test VALUES


( 1, '2007-5-01' ),


( 1, '2007-5-02' ),
( 1, '2007-5-03' ),


( 1, '2007-5-04' ),


( 1, '2007-5-05' ),


( 1, '2007-5-06' ),


( 2, '2007-6-01' ),
( 2, '2007-6-02' ),


( 2, '2007-6-03' ),


( 2, '2007-6-04' ),


( 3, '2007-7-01' ),


( 3, '2007-7-02' ),

( 3, '2007-7-03' );
The first two rows per ID are the rows which, for a given ID, have two or fewer rows with earlier dates. If we use an inequality join with the COUNT(*) function to find the earlier rows per ID ...


SELECT t1.id, t1.entrydate, COUNT(*) AS earlier


FROM test AS t1


JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate


+------+------------+---------+


| id   | entrydate  | earlier |


+------+------------+---------+


|    1 | 2007-05-01 |       1 |
|    1 | 2007-05-02 |       2 |


|    1 | 2007-05-03 |       3 |


|    1 | 2007-05-04 |       4 |
|    1 | 2007-05-05 |       5 |


|    1 | 2007-05-06 |       6 |


|    2 | 2007-06-01 |       1 |
|    2 | 2007-06-02 |       2 |


|    2 | 2007-06-03 |       3 |


|    2 | 2007-06-04 |       4 |
|    3 | 2007-07-01 |       1 |


|    3 | 2007-07-02 |       2 |


|    3 | 2007-07-03 |       3 |

+------+------------+---------+
... then we get our result immediately by removing rows where the 'earlier' count exceeds 2:


SELECT t1.id, t1.entrydate, count(*) AS earlier


FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate


GROUP BY t1.id, t1.entrydate


HAVING earlier <= 2;
+------+------------+---------+


| id   | entrydate  | earlier |


+------+------------+---------+


|    1 | 2007-05-01 |       1 |


|    1 | 2007-05-02 |       2 |
|    2 | 2007-06-01 |       1 |


|    2 | 2007-06-02 |       2 |


|    3 | 2007-07-01 |       1 |

| 3 | 2007-07-02 | 2 | +------+------------+---------+
This works beautifully with smallish aggregates. But the query algorithm compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.

What to do? Forget GROUP BY! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per ID:


DROP TEMPORARY TABLE IF EXISTS earliers;


CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);


INSERT INTO earliers 
  SELECT id,entrydate FROM test WHERE id=1 ORDER BY entrydate LIMIT 2;


INSERT INTO earliers 


  SELECT id,entrydate FROM test WHERE id=2 ORDER BY entrydate LIMIT 2;

INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=3 ORDER BY entrydate LIMIT 2;

You need one INSERT statement per grouping value. To print the result, just query the earliers table:


SELECT * FROM earliers


ORDER BY id, entrydate;
+------+------------+


| id   | entrydate  |


+------+------------+


|    1 | 2007-05-01 |


|    1 | 2007-05-02 |
|    2 | 2007-06-01 |


|    2 | 2007-06-02 |


|    3 | 2007-07-01 |


|    3 | 2007-07-02 |

+------+------------+ DROP TEMPORARY TABLE earliers;
Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an INSERT statement for every grouping value, and return the result:


DROP PROCEDURE IF EXISTS listearliers;
DELIMITER |


CREATE PROCEDURE listearliers()


BEGIN


  DECLARE curdone, vid INT DEFAULT 0;


  DECLARE idcur CURSOR FOR SELECT DISTINCT id FROM test;
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET curdone = 1;


  DROP TEMPORARY TABLE IF EXISTS earliers;


  CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
  SET @sql = 'INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=? ORDER BY  entrydate LIMIT 2';
  OPEN idcur;


  REPEAT


    FETCH idcur INTO vid;


    IF NOT curdone THEN


      BEGIN


        SET @vid = vid;


        PREPARE stmt FROM @sql;
        EXECUTE stmt USING @vid;


        DROP PREPARE stmt;


      END;


    END IF;


  UNTIL curdone END REPEAT;


  CLOSE idcur;


  SELECT * FROM earliers ORDER BY id,entrydate;

DROP TEMPORARY TABLE earliers; END; | DELIMITER ; CALL listearliers();

Back to the top

Browse the book

Buy the book

Feedback

Average the top 50% values per group

Each row of a games table records one game score for a team:


DROP TABLE IF EXISTS games;

CREATE TABLE games(id INT, teamID INT, score INT); INSERT INTO games VALUES (1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6), (6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7);

How would we write a query that returns the average of the top 50% of scores per team?

The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians.

How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. The following query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians:


DROP TABLE IF EXISTS medians;
CREATE TABLE medians


SELECT p1.teamid, AVG(P1.score) AS median


FROM games AS P1, games AS P2


WHERE p1.teamid=p2.teamid
GROUP BY p1.teamid


HAVING (


  SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
  AND 


  SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)


)
OR (


  SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
  AND 


  SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)


);
+--------+--------+


| teamid | median |


+--------+--------+


|      1 | 4.5000 |


|      2 | 6.8333 |

+--------+--------+
Now join games to medians accepting only top-half values:


SELECT g.teamid, AVG(g.score) AS Top50Avg
FROM games g


JOIN medians m ON g.teamid = m.teamid AND g.score >= m.median


GROUP BY g.teamid
ORDER BY Top50Avg DESC;


+--------+----------+


| teamid | Top50Avg |


+--------+----------+


|      2 |   7.2500 |


|      1 |   5.5000 |

+--------+----------+ DROP TABLE medians;
Yes, all the logic can be moved into one query:


SELECT g.teamid, AVG(g.score) AS Top50Avg


FROM games g
JOIN (


  SELECT p1.teamid, AVG(P1.score) AS median


  FROM games AS P1, games AS P2


  WHERE p1.teamid=p2.teamid


  GROUP BY p1.teamid
  HAVING (


    SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
    AND 


    SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)


  )
  OR (


    SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
    AND 


    SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)


  )
) AS m ON g.teamid = m.teamid AND g.score >= m.median


GROUP BY g.teamid


ORDER BY Top50Avg DESC;

Back to the top

Browse the book

Buy the book

Feedback

Averages from bands of values

To count and average scores in bands of 10, ie 0-9,10-19 etc:


create table scores(score int);


insert into scores values(5),(15),(25),(35);
SELECT 10  * FLOOR( score / 10  ) AS  Bottom,
       10  * FLOOR( score / 10  ) + 9 AS Top, 


       Count( score ),


       Avg( score ) 
FROM scores


GROUP BY 10  * FLOOR( score / 10  );


+--------+------+----------------+--------------+


| Bottom | Top  | Count( score ) | Avg( score ) |
+--------+------+----------------+--------------+


|      0 |    9 |              1 |       5.0000 |


|     10 |   19 |              1 |      15.0000 |
|     20 |   29 |              1 |      25.0000 |


|     30 |   39 |              1 |      35.0000 |

+--------+------+----------------+--------------+

Back to the top

Browse the book

Buy the book

Feedback

Count unique values of one column


SELECT col_name, COUNT(*) AS frequency 


FROM tbl_name


GROUP by col_name


ORDER BY frequency DESC;

Back to the top

Browse the book

Buy the book

Feedback

Median

Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:


SELECT l1.hours As Median


FROM BulbLife As l1, bulbLife AS l2


GROUP BY l1.Hours


HAVING SUM(CASE WHEN l2.hours <= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)+1) / 2

AND SUM(CASE WHEN l2.hours >= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)/2) + 1;

An anonymous reader pointed out that this will cost O(N*N), ie it does not scale, so we posted a MySQL implementation of Torben Mogenson's algorithm for calculating the median (http://ndevilla.free.fr/median/median/node20.html), which is said to be amongst the fastest. It also proved too slow. Now Joe Wynne has offered an algorithm which appears to be correct, and which does scale. Here it is as a MySQL stored procedure:


DROP PROCEDURE IF EXISTS Median;


DELIMITER |


CREATE PROCEDURE Median( tbl CHAR(64), col CHAR(64), OUT res DOUBLE )
BEGIN


  DECLARE arg CHAR(64);


  SET @sql = CONCAT( 'SELECT ((COUNT(*))/2) INTO @c FROM ', tbl );


  PREPARE stmt FROM @sql;
  EXECUTE stmt;


  DROP PREPARE stmt;


  SET @a = CONVERT(FLOOR(@c), SIGNED);


  IF @a = @c THEN 


    BEGIN
      SET @a = @a-1;


      SET @b = 2;


      SET arg = CONCAT( 'AVG(', col, ')' );


    END;
  ELSE


    BEGIN


      SET @b = 1;


      SET arg = col;


    END;


  END IF;


  SET @sql = CONCAT('SELECT ', arg, ' INTO @res FROM (SELECT ', col, ' FROM ', tbl, 
                    ' ORDER BY ', col, ' LIMIT ?,?) as tmp');


  PREPARE stmt FROM @sql;


  EXECUTE stmt USING @a, @b;
  DROP PREPARE stmt;


  SET res=@res;


END;


|


DELIMITER ;

Why don't we make it a function? Because MySQL functions do not (yet?) allow dynamic SQL.

Back to the top

Browse the book

Buy the book

Feedback

Mode

Statistically, the mode is the most frequently occurring value. Given tables parent(id int) and child(pid int, cid int), where child.pid references parent.id as a foreign key, what query finds the parent.id most often represented in the child id, that is, the modal count of child.pid?


SELECT pid, COUNT(*) AS frequency

FROM child GROUP BY pid ORDER BY frequency DESC LIMIT 1;

Back to the top

Browse the book

Buy the book

Feedback

Rank order

Without MSSQL's RANK() aggregate function, how do we display rank order in a MySQL query, for example from a table like this?


CREATE TABLE votes( name CHAR(10), votes INT );

INSERT INTO votes VALUES ('Smith',10),('Jones',15),('White',20),('Black',40),('Green',50),('Brown',20);
The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:


SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1


JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)


GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;


+-------+-------+------+


| name  | votes | Rank |


+-------+-------+------+


| Green |    50 |    1 |
| Black |    40 |    2 |


| White |    20 |    3 |


| Brown |    20 |    3 |

| Jones | 15 | 5 | | Smith | 10 | 6 | +-------+-------+------+

Suppose you want to know the vote count for White and the names and tallies for the next highest and next lowest counts:


SELECT 


  s1.name, s1.votes, COUNT(s2.name) rank, 


  IF(s1.name = 'White','<-',' ') AS 'Near Ranks'
FROM votes s1


JOIN votes s2 ON s1.votes < s2.votes or (s1.votes=s2.votes and s1.name = s2.name)


GROUP BY s1.name, s1.votes
ORDER BY rank


) a


LEFT JOIN (


  SELECT 


    s1.name, s1.votes, COUNT(s2.name) rank,


    IF(s1.name = 'White', '<-',' ') AS 'Near Ranks'
  FROM votes s1


  JOIN votes s2 ON s1.votes < s2.votes or (s1.votes=s2.votes and s1.name = s2.name)


  GROUP BY s1.name, s1.votes
  ORDER BY rank


) b ON a.rank BETWEEN b.rank-1 AND b.rank+1


WHERE a.name = 'White';


+-------+-------+------+------------+


| name  | votes | rank | Near Ranks |
+-------+-------+------+------------+


| Black |    40 |    2 |            |


| White |    20 |    3 | <-         |

| Brown | 20 | 3 | | +-------+-------+------+------------+

Back to the top

Browse the book

Buy the book

Feedback

Backslashes in data

Backslashes multiply weirdly:


SELECT 'a\b' RLIKE 'a\b';

returns 1, as does...

SELECT 'a\\b' RLIKE 'a\\\\b';
because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the RLIKE argument. But if you are querying a table for such a string from the MySQL client, this doubling happens twice--once in the client, and once in the database--so to find a column value matching 'a\\b', you need to write...

SELECT desc FROM xxx WHERE desc RLIKE 'aa\\\\\\\\bb';
That's eight backslashes to match two!

Back to the top

Browse the book

Buy the book

Feedback

Compare data in two tables

This query UNIONs queries for matching column names from two tables, and keeps just those rows which occur once in the union. Those are the rows unique to one table or the other. Customise your column list { id, col1, col2, col3 ...} as desired.


SELECT 
  MIN(TableName) as TableName, id, col1, col2, col3, ...


FROM (


  SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...
  FROM a


  UNION ALL


  SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...


  FROM b


) AS tmp

GROUP BY id, col1, col2, col3, ... HAVING COUNT(*) = 1 ORDER BY ID;

Back to the top

Browse the book

Buy the book

Feedback

Show rows where column value changed

SQL is set-oriented, but it can solve row-by-row problems. Suppose you need to retrieve only the rows that differ from immediatel previous rows given some ordering spec:


drop table if exists t;
create table t (


  p char(3),


  d date


);


insert into t values


('50%','2008-05-01'),


('30%','2008-05-02'),


('30%','2008-05-03'),


('50%','2008-05-04'),
('50%','2008-05-05'),


('20%','2008-05-06'),


('20%','2008-05-07'),


('50%','2008-05-08'),


('70%','2008-05-09'),


('70%','2008-05-10');


select * from t order by d;


+------+------------+


| p    | d          |
+------+------------+


| 50%  | 2008-05-01 | *


| 30%  | 2008-05-02 | *


| 30%  | 2008-05-03 |
| 50%  | 2008-05-04 | *


| 50%  | 2008-05-05 |


| 20%  | 2008-05-06 | *
| 20%  | 2008-05-07 |


| 50%  | 2008-05-08 | *


| 70%  | 2008-05-09 | *

| 70% | 2008-05-10 | +------+------------+
We want to retrieve only rows whose `p` values differ from immediately previous values (marked by * above). As with running sums. we get the desired listing by tracking row-to-row value changes with user variables:


set @p='';


set @d='';


select p 'Pct Changed',d Date from (
  select 


    p,


    if( p<>@p, d, @d ) as d,


    @p:=p,


    @d:=d 


  from t 
  order by d


) as t 


group by d; 


+-------------+------------+


| Pct Changed | Date       |
+-------------+------------+


| 50%         | 2008-05-01 |


| 30%         | 2008-05-02 |


| 50%         | 2008-05-04 |


| 20%         | 2008-05-06 |

| 50% | 2008-05-08 | | 70% | 2008-05-09 | +-------------+------------+

Back to the top

Browse the book

Buy the book

Feedback

Age in years

Given a birthdate in @dob, here is a simple formula for age in years:


DATE_FORMAT(FROM_DAYS(TO_DAYS(now()) - TO_DAYS(@dob)), '%Y') + 0;

and here is one for age in years to two decimal places (ignoring day of month):

ROUND((((YEAR(now()) - YEAR(@dob)))*12 + (((MONTH(now()) - MONTH(@dob)))))/12, 2)

Back to the top

Browse the book

Buy the book

Feedback

Appointments available

Given a clinic of physicians, patients and appointments, how to find an available appointment time for a given physician?

This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...


CREATE TABLE a_dt (         -- POSSIBLE APPOINTMENT DATES AND TIMES


  d DATE,


  t TIME


);
CREATE TABLE a_drs (        -- DOCTORS


  did INT                   -- doctor id


);


CREATE TABLE a_pts (        -- PATIENTS
  pid INT


);


CREATE TABLE a_appts (      -- APPOINTMENTS


  aid INT,                  -- appt id


  did INT,                  -- doctor id

pid INT, -- patient id d DATE, t TIME );
Now we can apply the [Not] Exists query pattern. To find free appointment datetimes for a given doctor in a given datetime range, we left join possible appointments to existing appointments on date and time and doctor, add Where conditions for desired appointment datetimes, and finally add a Where condition that the appointment slot be null, i.e. free...


SELECT d.did, a.d, a.t
FROM a_dt AS a


LEFT JOIN a_appts AS ap USING (d,t) 


LEFT JOIN a_drs AS d 
   ON a.d = ap.d 


  AND a.t = ap.t 


  AND ap.did = d.did 


  AND ap.did = 1
WHERE a.d BETWEEN desired_start_date AND desired_end_date


  AND a.t BETWEEN desired_start_time AND desired_end_time


  AND ap.aid IS NULL;

Back to the top

Browse the book

Buy the book

Feedback

Count business days between two dates

Given a table named dates with date columns d1,d2, if the two dates are in the same year, the solution is simply the date difference in days minus the date difference in weeks:


SELECT d1, d2, DATEDIFF(d2, d1) - (WEEK(d2) - WEEK(d1)) * 2 AS BizDays


FROM dates

ORDER BY d1, d2;
For dates that span different years, week numbers won't work. The answer the number of raw days, minus twice the number of whole weeks (because there are two weekend days/week), minus the number of weekend days in any remainder part-week. This algorithm works when the start and stop dates are themselves business days (but needs refinement to work when passed weekend dates--anybody want to try?):


SET @d1='2007-1-1';


SET @d2='2007-3-31';


SET @dow1 = DAYOFWEEK(@d1);
SET @dow2 = DAYOFWEEK(@d2);


SET @days = DATEDIFF(@d2,@d1);


SET @wknddays = 2 * FLOOR( @days / 7 ) +
                IF( @dow1 = 1 AND @dow2 > 1, 1,                               


                    IF( @dow1 = 7 AND @dow2 = 1, 1,               
                        IF( @dow1 > 1 AND @dow1 > @dow2, 2,       


                            IF( @dow1 < 7 AND @dow2 = 7, 1, 0 )

) ) ); SELECT FLOOR(@days - @wkndDays) AS BizDays;
The algorithm is easily encapsulated in a function:


DROP FUNCTION IF EXISTS BizDayDiff;
DELIMITER |


CREATE FUNCTION BizDayDiff( d1 DATE, d2 DATE ) 


RETURNS INT


DETERMINISTIC


BEGIN


  DECLARE dow1, dow2, days, wknddays INT;
  SET dow1 = DAYOFWEEK(d1);


  SET dow2 = DAYOFWEEK(d2);


  SET days = DATEDIFF(d2,d1);


  SET wknddays = 2 * FLOOR( days / 7 ) +
                 IF( dow1 = 1 AND dow2 > 1, 1,                               


                     IF( dow1 = 7 AND dow2 = 1, 1,               
                         IF( dow1 > 1 AND dow1 > dow2, 2,       


                             IF( dow1 < 7 AND dow2 = 7, 1, 0 )

) ) ); RETURN FLOOR(days - wkndDays); END; | DELIMITER ;
To include time in the difference, you would probably adopt the convention of returning a string like N days hh:mm:ss where N is the date difference calculated above, minus one if the time portion of d1 is later than that of d2:


DROP FUNCTION IF EXISTS BizDateTimeDiff;


DELIMITER |


CREATE FUNCTION BizDateTimeDiff( d1 DATETIME, d2 DATETIME ) 
RETURNS CHAR(30)


DETERMINISTIC


BEGIN


  DECLARE dow1, dow2, days, wknddays INT;


  DECLARE tdiff CHAR(10);


  SET dow1 = DAYOFWEEK(d1);
  SET dow2 = DAYOFWEEK(d2);


  SET tdiff = TIMEDIFF( TIME(d2), TIME(d1) );


  SET days = DATEDIFF(d2,d1);


  SET wknddays = 2 * FLOOR( days / 7 ) +
                 IF( dow1 = 1 AND dow2 > 1, 1,                               


                     IF( dow1 = 7 AND dow2 = 1, 1,               
                         IF( dow1 > 1 AND dow1 > dow2, 2,       


                             IF( dow1 < 7 AND dow2 = 7, 1, 0 )   
                           )


                       )


                   );


  SET days = FLOOR(days - wkndDays) - IF( ASCII(tdiff) = 45, 1, 0 );
  SET tdiff = IF( ASCII(tdiff) = 45, TIMEDIFF( '24:00:00', SUBSTRING(tdiff,2)), TIMEDIFF( tdiff, '00:00:00' ));


  RETURN CONCAT( days, ' days ', tdiff );
END;


|


DELIMITER ; 


SELECT BizDateTimeDiff( '2007-1-1 00:00:00', '2007-3-31 00:00:00' ) AS dtdiff;


+------------------+


| dtdiff           |
+------------------+


| 64 days 00:00:00 |


+------------------+


SELECT BizDateTimeDiff( '2007-1-1 11:00:00', '2007-3-31 00:00:00' ) AS dtdiff;


+------------------+
| dtdiff           |


+------------------+


| 63 days 13:00:00 |


+------------------+


SELECT BizDateTimeDiff( '2007-1-1 12:00:00', '2007-3-31 13:00:00' ) AS dtdiff;
+------------------+


| dtdiff           |


+------------------+


| 64 days 01:00:00 |


+------------------+


SELECT BizDateTimeDiff( '2007-1-1 00:00:00', '2007-3-31 11:00:00' ) AS dtdiff;

+------------------+ | dtdiff | +------------------+ | 64 days 11:00:00 | +------------------+
To factor in national and religious holidays, you need a holidays table and a stored procedure that adds in the number of holidays between d1 and d2.

Back to the top

Browse the book

Buy the book

Feedback

Count Tuesdays between two dates

Date arithmetic is deceptively hard. One way to appreciate the difficulties is to read Chapter 21 in our book. Another is to try to calculate the number of Tuesdays (or another weekday) between two dates. It's not a back-of-the-napkin problem.

An earlier formula we had for this problem sometimes gave incorrect results. As a debugging aid, we wrote a brute force calculator for the problem:


SET GLOBAL log_bin_trust_function_creators=1;


DROP FUNCTION IF EXISTS DayCount;


DELIMITER |
CREATE FUNCTION DayCount( d1 DATE, d2 DATE, daynum SMALLINT ) RETURNS INT
BEGIN


  DECLARE days INT DEFAULT 0;


  IF D1 IS NOT NULL AND D2 IS NOT NULL THEN
    WHILE D1 <= d2 DO


      BEGIN


        IF DAYOFWEEK(d1) = daynum THEN


          SET days=days+1;


        END IF;


        SET d1 = ADDDATE(d1, INTERVAL 1 DAY);
      END;


    END WHILE;


  END IF;


  RETURN days;


END;


|


DELIMITER ;


select 


  daycount('2008-3-16','2008-3-28',7) 


  AS 'Sats from 2008-3-16 through 2008-3-28';

You would not want to use that function on long date spans in a big table, but it will do for testing.

Now, how to count the number of Tuesdays, say, between two dates? The basic logic is:

1. Count weeks between the two dates.

2. If beginning and ending weekdays are the same, then if they're Tuesday, the answer is weeks+1, otherwise it's just weeks.

3. Otherwise, if the beginning weekday <= the ending weekday, then if Tuesday is between them, the answer is weeks+1, otherwise it's just weeks.

4. Otherwise the ending weekday is less than the starting weekday; if Tuesday >= the starting weekday or <= the ending weekdayy, the answer is weeks+1, otherwise it's just weeks.

For a convenient datasource, we'll use the two date columns orderdate and shippeddate in the orders table of the NorthWind database, and we'll use our brute force function DayCount() to check results:


SET @day = 3;
SELECT 


  DATE_FORMAT(orderdate,'%y%m%d') AS OrdDt,


  DATE_FORMAT(shippeddate,'%y%m%d') AS ShipDt,


  LEFT(DAYNAME(orderdate),3) AS D1,


  LEFT(DAYNAME(shippeddate),3) AS D2,


  @dow1 := DAYOFWEEK(orderdate) AS 'dw1',
  @dow2 := DAYOFWEEK(shippeddate) AS 'dw2',


  @days := DATEDIFF(shippeddate,orderdate) AS Days,


  @wks  := FLOOR( @days / 7 ) AS Wks,
  FLOOR( IF( @dow1 = @dow2, IF( @day = @dow1, @wks+1, @wks),


             IF( @dow1 < @dow2, IF( @day BETWEEN @dow1 AND @dow2, @wks+1, @wks),
                 IF( @day >= @dow1 OR @day <= @dow2, @wks+1, @wks ) 


               )


           )


       ) AS Res,

DayCount(DATE(orderdate),DATE(shippeddate),@day) AS Chk FROM orders HAVING !ISNULL(res-chk) AND res-chk <> 0; Empty set (0.00 sec)
No errors. We get the same result for @day = 1, 2, 4, 5, 6 and 7.

But the formula is buried in the specifics of one table, so abstract it to a reusable function:


DROP FUNCTION IF EXISTS NamedDaysBetween;


DELIMITER |


CREATE FUNCTION NamedDaysBetween( d1 DATE, d2 DATE, daynum SMALLINT ) 
RETURNS INT


BEGIN


  DECLARE dow1, dow2, wks, days INT;


  IF !ISNULL(d1) AND !ISNULL(d2) THEN


    SET dow1 = DAYOFWEEK( d1 );
    SET dow2 = DAYOFWEEK( d2 );


    SET days = DATEDIFF( d2, d1 );


    SET wks  = FLOOR( days / 7 );
    SET days = IF( dow1 = dow2, IF( daynum = dow1, wks+1, wks),


                   IF( dow1 < dow2, IF( daynum BETWEEN dow1 AND dow2, wks+1, wks),
                       IF( daynum >= dow1 OR daynum <= dow2, wks+1, wks ) 


                     )


                 );


  END IF;


  RETURN days;

END; | DELIMITER ;
Again check it against lots of date value pairs:


SELECT 


  nameddaysbetween(orderdate,shippeddate,3) - daynamecount(orderdate,shippeddate,3) 


  AS diff

FROM orders HAVING !ISNULL(diff) AND diff <> 0; Empty set (0.00 sec)

Back to the top

Browse the book

Buy the book

Feedback

Date of first Friday of next month

Assuming a calendar table calendar(date DATE) with one row per date through the relevant period...


SET @d = NOW();
SELECT MIN(date) AS 1stFridayOfMonth


FROM calendar 


WHERE YEAR(date) = IF( MONTH(@d) = 12, 1+YEAR(@d), YEAR(@d) )

AND MONTH(date) = IF( MONTH(@d) = 12, 1, MONTH(@d) + 1 ) AND WEEKDAY(date)=4;

Back to the top

Browse the book

Buy the book

Feedback

Date of Monday in a given week of the year

The week number of a given date @d, when weeks are defined as starting on Mondays and when we agree to number weeks of the year from 1 through 53, is given by WEEK(@d, 2). Here is a way to get the date of Monday in that week:


set @d='2008-1-31';
select makedate( left(yearweek(@d),4),week( @d, 2 ) * 7 ) as 1stdayOfWeek;


+--------------+


| 1stdayOfWeek |


+--------------+


| 2008-01-28   |
+--------------+


set @d='2008-7-15';


select makedate( left(yearweek(@d),4),week( @d, 2 ) * 7 ) as 1stdayOfWeek;


+--------------+


| 1stdayOfWeek |

+--------------+ | 2008-07-14 | +--------------+

Back to the top

Browse the book

Buy the book

Feedback

Datetime difference

Find the difference between two datetime values in seconds, minutes, hours or days. If dt1 and dt2 are datetime values of the form 'yyyy-mm-dd hh:mm:ss', the number of seconds between dt1 and dt2 is

UNIX_TIMESTAMP( dt2 ) - UNIX_TIMESTAMP( dt1 )
To get the number of minutes divide by 60, for the number of hours divide by 3600, and for the number of days, divide by 3600 * 24.

Back to the top

Browse the book

Buy the book

Feedback

Find available reservation periods

Given a bookings table where each row specifies one reservation period for one property, find the unbooked periods for a given property:


CREATE TABLE bookings( ID int, propertyID int, startDate date, endDate date );


INSERT INTO bookings VALUES 
  (1,1,'2007-1-1','2007-1.15'),


  (2,1,'2007-1-20','2007-1.31'),


  (3,1,'2007-2-10','2007-2-17');


SELECT * FROM bookings; 


+------+------------+------------+------------+


| ID   | propertyID | startDate  | endDate    |
+------+------------+------------+------------+


|    1 |          1 | 2007-01-01 | 2007-01-15 |


|    2 |          1 | 2007-01-20 | 2007-01-31 |

| 3 | 1 | 2007-02-10 | 2007-02-17 | +------+------------+------------+------------+
Reservation systems usually adopt the closed-open convention of representing when reservations begin and end. For example, if you book a hotel room for 22 May through 24 May, the hotel expects you to stay overnight on 22 May and 23 May, but not on 24 May. Apart from that difference, this is the same pattern as Finding missing numbers in a sequence.


SELECT


  a.enddate AS 'Available From',


  Min(b.startdate) AS 'To'


FROM bookings AS a 


JOIN bookings AS b ON a.propertyID=b.propertyID AND a.enddate < b.startdate
WHERE a.propertyID=1


GROUP BY a.enddate


HAVING a.enddate < MIN(b.startdate);


+----------------+------------+


| Available From | To         |


+----------------+------------+


| 2007-01-15     | 2007-01-20 |

| 2007-01-31 | 2007-02-10 | +----------------+------------+
This query cannot see reservation dates earlier than the first existing reservation date, or later than the last. Usually, you would want a calendar table to provide those limits, but you can fake them with a union. If the allowable reservation period is 1 Dec 2006 through 1 Jul 2007, union the left side of the join with a made-up row for 1 Dec 2006, and union the right side of the join with a made-up row for 1 Jul 2007:


SELECT


  a.enddate AS 'Available From',
  Min(b.startdate) AS 'To'


FROM (


  SELECT 0,1 as propertyID,'2006-12-01' as startdate,'2006-12-01' as enddate


  UNION


  SELECT * FROM bookings
) AS a


JOIN (


  SELECT * FROM bookings


  UNION


  SELECT 0,1,'2007-07-01' as startdate,'2007-07-02' as enddate


) AS b ON  a.propertyID=b.propertyID AND a.enddate < b.startdate
WHERE a.propertyID=1


GROUP BY a.enddate


HAVING a.enddate < MIN(b.startdate);


+----------------+------------+


| Available From | To         |


+----------------+------------+


| 2006-12-01     | 2007-01-01 |
| 2007-01-15     | 2007-01-20 |


| 2007-01-31     | 2007-02-10 |


| 2007-02-17     | 2007-07-01 |


+----------------+------------+

Back to the top

Browse the book

Buy the book

Feedback

Find overlapping periods

You have a table of visits, and you would like to display the time periods during which there were visit time overlaps.


drop table if exists visits;


create table visits(id int primary key,start datetime,end datetime);


insert into visits values
(1, '2008-09-01 15:01', '2008-09-01 15:04'),


(2, '2008-09-01 15:02', '2008-09-01 15:09'),


(3, '2008-09-01 15:12', '2008-09-01 15:15'),


(4, '2008-09-01 16:11', '2008-09-01 16:23'),

(5, '2008-09-01 16:19', '2008-09-01 16:25'), (6, '2008-09-01 17:52', '2008-09-01 17:59'), (7, '2008-09-01 18:18', '2008-09-01 18:22');

There are two overlap periods in this data:


1     |-----|


2        |-----|


3                 |--|


4                       |-----|


5                          |-----|


6                                  |---|


7                                        |---|

One solution is to use a View to identify starting and stopping events, then define an Overlaps View:


CREATE VIEW events AS


SELECT start AS time, 1 AS value, id FROM visits
UNION


SELECT end AS time, -1 AS value, id FROM visits;
CREATE VIEW overlaps AS


SELECT time t, (SELECT SUM(value) FROM events WHERE time <=t ) as visitcount
FROM events;
SELECT t, visitcount


FROM overlaps


WHERE visitcount=(SELECT MAX(visitcount) FROM overlaps);


+---------------------+------------+


| t                   | visitcount |

+---------------------+------------+ | 2008-09-01 15:02:00 | 2 | | 2008-09-01 16:19:00 | 2 | +---------------------+------------+
But that doesn't show us when overlap periods end. There is a fuller and more straightforward solution: join visits to itself on the criteria that

(i) the first of each joined pair of visits started no earlier than the second,
(ii) the first visit started before the second ended, and
(iii) the second visit started before the first ended:


SELECT v1.id,v1.start,v2.id,v2.end


FROM visits v1


JOIN visits v2 ON v1.id <> v2.id and v1.start >= v2.start and v1.start < v2.end and v2.start < v1.end;
+------+---------------------+------+---------------------+


| id   | start               | id   | end                 |


+------+---------------------+------+---------------------+


| 2    | 2008-09-01 15:02:00 | 1    | 2008-09-01 15:04:00 |

| 5 | 2008-09-01 16:19:00 | 4 | 2008-09-01 16:23:00 | +------+---------------------+------+---------------------+

Back to the top

Browse the book

Buy the book

Feedback

Find sequenced duplicates

A table that tracks time periods may require period uniqueness. That means it has no sequenced duplicates.

If a table has columns processID, start_date and end_date, those three columns are period unique if there exists no pair of rows with the same processID and overlapping start_date and end_date values. If there is such a pair of rows, the table exhibits sequenced duplication.

Another way of saying it: if an instant is the smallest datetime unit of start_date and end_date columns, then if there are no sequenced duplicates, there is exactly one processID value at any instant.

Here is a query to find sequenced duplicates for those columns:


SELECT t.processid 


FROM tbl t 


WHERE EXISTS ( 
  SELECT * FROM tbl AS t3 


  WHERE t3.processid IS NULL 


) 


OR EXISTS ( 
  SELECT * FROM tbl AS t1 


  WHERE 1 < ( 


    SELECT COUNT(processid) 


    FROM tbl AS t2 
    WHERE t1.processid = t2.processid 


      AND t1.start_date < t2.end_date 


      AND t2.start_date < t1.end_date 


  ) 


);

Back to the top

Browse the book

Buy the book

Feedback

Is a given booking period available?

You rent vacation properties, tracking bookings with a table like this:


CREATE TABLE bookings( ID int, propertyID int, startDate date, endDate date );


INSERT INTO bookings VALUES (1,1,'2007-1-1','2007-1.15'),(2,1,'2007-1-20','2007-1.31');
SELECT * FROM bookings;


+------+------------+------------+------------+


| ID   | propertyID | startDate  | endDate    |


+------+------------+------------+------------+


|    1 |          1 | 2007-01-01 | 2007-01-15 |

| 2 | 1 | 2007-01-20 | 2007-01-31 | +------+------------+------------+------------+
You need a query indicating whether a given property is available for a given period of time.

Hotels & property renters usually adopt what is called the 'closed-open' convention for bookings, eg a booking from 22 May through 24 May means you sleep there the nights of 22 and 23 May. To show that property P is available for the desired closed-open period dStart to dEnd, you need to prove there is no booked period for P that overlaps dStart through dEnd. Until you're used to thinking about periods, it's easier to analyse graphically. There are four ways a booked reservation can overlap the desired date range ...

dStart dEnd |----------------| startDate endDate |--------------------------| |------| |----| |------|
but there are just two ways a booked reservation can not overlap:

dStart dEnd |----------------| |-----| |-----| |-----| |-----|
So the period dStart through dEnd is available if there is no row where ...


!(endDate <= dStart OR startDate >= dEnd)

or equivalently ...

endDate > dStart AND startDate < dEnd
Here is a simple stored procedure for testing the query:


DROP PROCEDURE IF EXISTS isavailable;


DELIMITER |


CREATE PROCEDURE isavailable( iProperty int, dStart date, dEnd date )
SELECT IF( COUNT(1),'No','Yes' ) AS Available


FROM bookings


WHERE propertyID = iProperty 


  AND startDate < dEnd 
  AND endDate > dStart;


|


DELIMITER ;
CALL isavailable(1,'2006-12-27','2007-1-20');


+-----------+


| Available |


+-----------+


| No        |
+-----------+
CALL isavailable(1,'2007-1-10' ,'2007-1-16');


+-----------+


| Available |


+-----------+


| No        |


+-----------+
CALL isavailable(1,'2007-1-16' ,'2007-1-17');
+-----------+


| Available |


+-----------+


| Yes       |


+-----------+
CALL isavailable(1,'2007-1-22' ,'2007-1-23');


+-----------+


| Available |
+-----------+


| No        |


+-----------+
CALL isavailable(1,'2007-1-22' ,'2007-2-2');


+-----------+


| Available |


+-----------+


| No        |
+-----------+
CALL isavailable(1,'2007-2-1' ,'2007-2-2');


+-----------+


| Available |


+-----------+


| Yes       |


+-----------+
CALL isavailable(1,'2006-12-1' ,'2007-2-1');

+-----------+ | Available | +-----------+ | No | +-----------+ 1 row in set (0.00 sec)

Back to the top

Browse the book

Buy the book

Feedback

Julian date

Unix_Timestamp( datetimevalue ) / (60*60*24) ) + 2440587.5

Back to the top

Browse the book

Buy the book

Feedback

Last business day before a reference date

Given a date value in datetimecol ...


SELECT 
  @refday := datetimecol,


  @dow := DAYOFWEEK(@refday) AS DOW,


  @subtract := IF( @dow = 1, 2, IF( @dow = 2, 3, 1 )) AS MINUS,

@refday - INTERVAL @subtract DAY AS LastBizDay FROM ... etc

Back to the top

Browse the book

Buy the book

Feedback

Make a calendar table

You need a calendar table for joins to datetime data in other tables:


create table calendar (


  dt datetime primary key

);
An elegant method of generating any desired number of sequential values, posted by Giuseppe Maxia on his blog, is ...

Create three dummy rows in a View.
Cross join them to make 10 dummy rows.
Cross join those to make 100, 1,000 or however many you need.

So to give the calendar table a million rows at one-hour intervals starting on 1 Jan 1970:


create view v3 as select 1 n union all select 1 union all select 1;


create view v as select 1 n from v3 a, v3 b union all select 1;
set @n = 0;


insert into calendar select '1970-1-1 00:00:00'+interval @n:=@n+1 hour 


from v a, v b, v c, v d, v e, v;

If your MySQL version does not support Views, or if you prefer to avoid writing the View joins, make a table of integers 1 through 10...

create table ints(i int); insert into ints values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
and join them as we did with the Views above.

Here is a variation on Giuseppe's method (written about also by Baron Schwartz and Rob Wultsch) to make a table of a thousand dates starting today:


CREATE TABLE digits(i INT PRIMARY KEY);


INSERT INTO digits VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
SELECT CURDATE() + INTERVAL t.i*10 + u.i DAY AS Date


FROM digits AS t 


JOIN digits AS u

JOIN digits AS v WHERE ( t.i*100 + u.i*10 + v.i ) < 1000;
You can use such a SELECT as a View, as an inline derived table, or as input to a CREATE TABLE statement.

A slightly more elaborate method, giving the calendar table an auto-increment key that can also be used as a surrogate for datetime interval calculations:


  CREATE TABLE calendar (


    id INT AUTO_INCREMENT PRIMARY KEY,


    date DATE,


    UNIQUE days (date)

);
Calculate the number of days needed in the calendar, eg

SELECT DATEDIFF('2010-12-31','1989-12-31'); # 7670, or 21*365 plus 5

Find a table with that many rows, 7670 in this case. Add a row to the calendar table for every day in the range:


  INSERT INTO calendar (id) 


  SELECT NULL FROM [name of table with 7670  rows] LIMIT 4018;

Populate the date column by incrementing the starting date:

UPDATE calendar SET date = ADDDATE('1989-12-31',id);
The calendar table now has one row for each day from 1990-01-01 through 2010-12-31. Keep the auto_increment ID column for quick day counts in the range, or drop the column if you don't need that.

To make the calendar table a diary, make the period one leap year, add month, day and text columns, update month and day values with MONTH(date) and DAYOFMONTH(date) respectively, and if the diary is to be used from year to year, drop the date field.
[Based on a builder.com SQL Tip by Arthur Fuller and a MySQL list tip by Michael Stassen]

To automate all this, write a stored procedure, for example:


CREATE TABLE times (
  date_hour DATETIME,


  KEY ( date_hour ) 


);


DROP PROCEDURE IF EXISTS timespopulate;


DELIMITER |


CREATE PROCEDURE timespopulate( startdate DATETIME, num INT )
BEGIN


  DECLARE ctr INT DEFAULT 0;


  WHILE ctr < num DO


    BEGIN


      INSERT INTO times VALUES ( DATE_ADD( startdate, INTERVAL ctr HOUR) );
      SET ctr = ctr + 1;


    END;


  END WHILE;


END;


|


DELIMITER ;


CALL timespopulate( '2007-1-1, 31*24 );

Or, you can have the sproc do your counting:


DROP PROCEDURE IF EXISTS calendar; 


DELIMITER |


CREATE PROCEDURE calendar( pstart datetime, pstop datetime, pminutes int )
DETERMINISTIC


BEGIN


  DECLARE thisdate datetime;


  DROP TABLE IF EXISTS cal;


  CREATE TABLE cal( dt datetime ); 
  SET thisdate=pstart;


  INSERT INTO cal VALUES(pstart);


  WHILE thisdate < pstop u


    SET thisdate = adddate( thisdate, INTERVAL pminutes MINUTE );
    INSERT INTO cal VALUES( thisdate );


  END WHILE;


END |


DELIMITER ;


-- make cal for 2007, 20-min intervals:

CALL calendar('2007-1-1 00:00:00', '2007-2-1 00:00:00', 20);

Back to the top

Browse the book

Buy the book

Feedback

Peak visit counts by datetime period

You have a visits table (id int, start datetime, end datetime), and you wish to track peak visit counts. A simple solution is to self-join on non-matching IDs and overlapping visit times, group by ID, then order by the resulting counts:


SELECT a.id,group_concat(b.id) as Overlaps, count(b.id)+1 as OverlapCount
FROM visits a


JOIN visits b on a.id < b.id and a.start < b.end and b.start < a.end


GROUP BY a.id

ORDER BY OverlapCount DESC;

Back to the top

Browse the book

Buy the book

Feedback

Sum for time periods

A table tracks attendance at some location:


drop table if exists t;


create table t(interval_id int,start datetime,end datetime, att int);
insert into t values


(1,'2007-01-01 08:00:00','2007-01-01 12:00:00',5 ),


(2,'2007-01-01 13:00:00','2007-01-01 17:00:00',10),


(3,'2007-01-01 10:00:00','2007-01-01 15:00:00',15),


(4,'2007-01-01 14:00:00','2007-03-07 19:00:00',20);
select * from t;


+-------------+---------------------+---------------------+------+


| interval_id | start               | end                 | att  |


+-------------+---------------------+---------------------+------+


|           1 | 2007-01-01 08:00:00 | 2007-01-01 12:00:00 |    5 |
|           2 | 2007-01-01 13:00:00 | 2007-01-01 17:00:00 |   10 |


|           3 | 2007-01-01 10:00:00 | 2007-01-01 15:00:00 |   15 |

| 4 | 2007-01-01 14:00:00 | 2007-03-07 19:00:00 | 20 | +-------------+---------------------+---------------------+------+
In this table, att is a delta: it tracks entrances. Actual attendance is SUM(att) at any given moment. For example, if the attendance sum is x at a given moment, then after we add a row with att=y and start/end datetimes embracing that moment, attendance will be x+y. So to retrieve total attendance at 01330h on 1 Jan 2007, we write:


SELECT SUM(att) 


FROM t 


WHERE t.start <= '2007-01-01 13:30:00' AND t.end >= '2007-01-01 13:30:00';

+----------+ | SUM(att) | +----------+ | 25 | +----------+
Then how would we extract maximum attendance during a given period, for example, maximum attendance between 1300h and 1700h?

SQL does not deal efficiently with time. Some SQL dialects offer time series enhancements to the language; MySQL does not.

And, querying time series data for aggregate statistics gets complicated very quickly.

It gets a bit simpler with a calendar table that has a row for every possible datetime value. For our example, assume a granularity of one hour and a query period of one day. Naturally a real system would require a range of dates and perhaps a finer time granularity:


create table cal(id int,dt datetime);


insert into cal values(1,'2007-1-1 01:00:00');
insert into cal values(2,'2007-1-1 02:00:00');


insert into cal values(3,'2007-1-1 03:00:00');


insert into cal values(4,'2007-1-1 04:00:00');


insert into cal values(5,'2007-1-1 05:00:00');
insert into cal values(6,'2007-1-1 06:00:00');


insert into cal values(7,'2007-1-1 07:00:00');


insert into cal values(8,'2007-1-1 08:00:00');


insert into cal values(9,'2007-1-1 09:00:00');
insert into cal values(10,'2007-1-1 10:00:00');


insert into cal values(11,'2007-1-1 11:00:00');


insert into cal values(12,'2007-1-1 12:00:00');


insert into cal values(13,'2007-1-1 13:00:00');
insert into cal values(14,'2007-1-1 14:00:00');


insert into cal values(15,'2007-1-1 15:00:00');


insert into cal values(16,'2007-1-1 16:00:00');


insert into cal values(17,'2007-1-1 17:00:00');
insert into cal values(18,'2007-1-1 18:00:00');


insert into cal values(19,'2007-1-1 19:00:00');


insert into cal values(20,'2007-1-1 20:00:00');


insert into cal values(21,'2007-1-1 21:00:00');

insert into cal values(22,'2007-1-1 22:00:00'); insert into cal values(23,'2007-1-1 23:00:00'); insert into cal values(24,'2007-1-1 24:00:00');

To accumulate the maximum attendance sum, collect target values for defined periods in an inner query, and sum them from the outer query:


SELECT SUM( att ) 


FROM (


  SELECT


    t.start AS PeriodStart,


    t.end AS PeriodEnd,
    MIN(cal.dt) + INTERVAL 1 HOUR AS CountBegin,


    MAX(cal.dt) AS CountEnd,


    t.att                   


  FROM t


  JOIN cal ON cal.dt >= t.start AND cal.dt < t.end
  GROUP BY PeriodStart, PeriodEnd


  HAVING CountBegin < '2007-01-01 17:00:00' AND CountEnd > '2007-01-01 11:00:00'


) AS periods;


+------------+


| SUM( att ) |

+------------+ | 45 | +------------+
If the data is more complicated, eg if we also need to track exits, the period logic needs refinement but the principle remains the same.

Back to the top

Browse the book

Buy the book

Feedback

Sum time values


SELECT SEC_TO_TIME( SUM( TIME_TO_SEC( time_col ))) AS total_time 


FROM tbl;

Summing values like '12:65:23' produces meaningless results.

Back to the top

Browse the book

Buy the book

Feedback

The date of next Thursday

Given a date and its weekday number (1=Sunday, ..., 7=Saturday), there are three possibilities:
1. Today is Thursday: thebn next Thursday is 7 days from nopw.
2. Today is before Thuirsday in the week: then next Thursday is (5 minus today's weekday number) from now.
3. Today is after Thursday in the week: then next Thursday is today's weekday number-1 days from today.


set @d=curdate();


set @n = dayofweek(curdate());


select 
  @d:=adddate(curdate(),0) as date, 


  @n:=dayofweek(adddate(curdate(),0)) as weekday, 


  adddate(@d,if(@n=5,7,if(@n<5,5-@n,@n-1) ) ) as thurs;


+------------+---------+------------+


| date       | weekday | nextthurs  |

+------------+---------+------------+ | 2008-03-10 | 2 | 2008-03-13 | +------------+---------+------------+

Back to the top

Browse the book

Buy the book

Feedback

Game schedule

List all possible home-away encounters of teams listed in a table.


SELECT t1.name AS Visiting,


       t2.name AS Home

FROM teams AS t1 STRAIGHT_JOIN teams AS t2 WHERE t1.ID <> t2.ID;

Back to the top

Browse the book

Buy the book

Feedback

Pivot table schedule

You have a schedule table (period, day, subject, room) with a primary key period,day to avoid duplicate bookings. You wish to display the schedule as periods, subjects and rooms in rows, and days of the week in columns.


SELECT 
  period,


  MAX(IF(day=1, CONCAT(subject,' ',room), '')) AS Mon,


  MAX(IF(day=2, CONCAT(subject,' ',room), '')) AS Tue,


  MAX(IF(day=3, CONCAT(subject,' ',room), '')) AS Wed,
  MAX(IF(day=4, CONCAT(subject,' ',room), '')) AS Thu,


  MAX(IF(day=5, CONCAT(subject,' ',room), '')) AS Fri


FROM schedule


GROUP BY period

MAX() chooses existing over blank entries, and GROUP BY lines everything up on the same row.

Back to the top

Browse the book

Buy the book

Feedback

Add auto-incrementing primary key to a table

The steps are: (i) recreate the table, populating a new column from an incrementing user variable, then (ii) alter the table to add auto_increment and primary key properties to the new column. So given table t with columns named `dt` and `observed`...


DROP TABLE IF EXISTS t2;


SET @id=0;


CREATE TABLE t2


  SELECT @id:=@id+1 AS id, dt, observed FROM t ORDER BY dt;
ALTER TABLE t2 


  MODIFY id INT AUTO_INCREMENT PRIMARY KEY;


DROP TABLE t;


RENAME TABLE t2 TO t;

Back to the top

Browse the book

Buy the book

Feedback

Auto-increment: reset next value

ALTER TABLE tbl SET AUTO_INCREMENT=val;

Back to the top

Browse the book

Buy the book

Feedback

Change or drop a foreign key

To change a foreign key, first drop it, then declare the new, revised foreign key. The syntax for declaring a foreign key is ...

[CONSTRAINT [constraint_name]] FOREIGN KEY [key_name] (keycol_name,...) reference_definition
and the syntax for dropping one is ...


DROP FOREIGN KEY constraint_name

Notice that you can omit the CONSTRAINT when you declare a foreign key, but the only way to DROP a foreign key is to reference it by the constraint_name which you probably never specified!

There should be a circle of hell reserved for designers who build inconsistencies like this into their tools. The only way round this one is to run SHOW CREATE TABLE to find out what the foreign key's constraint_name is, so you can write the DROP statement. Here is a wee test case:

drop table if exists a,b;
create table a(i int primary key)engine=innodb;
create table b(i int,foreign key(i) references a(i)) engine=innodb;
show create table\G

CREATE TABLE `b` (
  `i` int(11) DEFAULT NULL,
  KEY `i` (`i`),
  CONSTRAINT `b_ibfk_1` FOREIGN KEY (`i`) REFERENCES `a` (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

-- drop and recreate the FK:
alter table b drop foreign key b_ibfk_1;
alter table b add foreign key(i) references a(i) on update cascade;
show create table b\G

Create Table: CREATE TABLE `b` (
  `i` int(11) DEFAULT NULL,
  KEY `i` (`i`),
  CONSTRAINT `b_ibfk_1` FOREIGN KEY (`i`) REFERENCES `a` (`i`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

drop table a,b;

Back to the top

Browse the book

Buy the book

Feedback

Compare structures of two tables

To compare columns by name and ordinal position in tables test.t1 and test.t2:




SELECT

  MIN(TableName) AS 'Table',


  column_name AS 'Column',

  ordinal_position AS 'Position'

FROM (

  SELECT

    't1' as TableName,

    column_name,

    ordinal_position

  FROM information_schema.columns AS i1


  WHERE table_schema='test' AND table_name='t1'

  UNION ALL

  SELECT

    't2' as TableName,

    column_name,

    ordinal_position

  FROM information_schema.columns AS i2

  WHERE table_schema='test' AND table_name='t2'


) AS tmp

GROUP BY column_name

HAVING COUNT(*) = 1

ORDER BY ordinal_position;

For MySQL 5.0.2 or later here is a query that lists all table structure differences between any two tables. It selects all information_schema.columns rows for one table, does the same for the second table, UNIONs these two queries,
then uses HAVING to pick only those rows where the COUNT(*)
in the union is 1—that is, where any column of one table differs from its mate.

To avoid having to cut and paste database and table names, save it as a stored procedure in any database (other than information_schema):




DROP PROCEDURE IF EXISTS CompareTableStructs;


-- Uncomment if MySQL version is 5.0.6-5.0.15:

-- SET GLOBAL log_bin_trust_routine_creators=TRUE;

DELIMITER |

CREATE PROCEDURE CompareTableStructs (


  IN db1 CHAR(64), IN tbl1 CHAR(64), IN db2 CHAR(64), IN tbl2 CHAR(64)

)

SELECT

  MIN(TableName) AS TableName,


  column_name,

  ordinal_position,

  column_default,

  is_nullable,

  data_type,

  character_maximum_length,

  numeric_precision,

  numeric_scale,

  character_set_name,

  collation_name,

  column_type,

  column_key,

  extra,

  privileges,

  column_comment

FROM (


  SELECT

    tbl1 as TableName,

    column_name,

    ordinal_position,

    column_default,

    is_nullable,

    data_type,

    character_maximum_length,

    numeric_precision,

    numeric_scale,

    character_set_name,

    collation_name,

    column_type,

    column_key,

    extra,


    privileges,

    column_comment

  FROM information_schema.columns AS i1

  WHERE table_schema=db1 AND table_name=tbl1

  UNION ALL

  SELECT

    tbl2 as TableName,

    column_name,


    ordinal_position,

    column_default,

    is_nullable,

    data_type,

    character_maximum_length,

    numeric_precision,

    numeric_scale,

    character_set_name,

    collation_name,

    column_type,

    column_key,

    extra,

    privileges,

    column_comment 

  FROM information_schema.columns AS i2


  WHERE table_schema=db2 AND table_name=tbl2

) AS tmp

GROUP BY column_name

HAVING COUNT(*) = 1

ORDER BY column_name ;


|

DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Compare two databases

One of EF Codd's rules for relational databases is the no-back-door rule: all info about tables should be accessible only by a query on tables. Since version 5, the MySQL implementation of information_schema (I_S) helps meet Codd's requirement. I_S supplies metadata in tables, so it's the first place to look for how to compare the structures of two databases.

Elsewhere on this page there is a simple query template for comparing data in two structurally similar tables:




SELECT MIN(TableName) as TableName, id, col1, col2, col3, ...


FROM (

  SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...

  FROM a

  UNION ALL

  SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...


  FROM b

) AS tmp

GROUP BY id, col1, col2, col3, ...

HAVING COUNT(*) = 1

ORDER BY ID;

To apply this logic to the comparison of two database structures:

write temp tables collecting desired I_S metadata on each database
map the compare-data query template to those two metadata tables

This logic is easiest to re-use when it is parameterised in a stored procedure, in a system database:




USE sys;


DROP PROCEDURE IF EXISTS CompareDBs;

DELIMITER |

CREATE PROCEDURE CompareDBs( vdb1 VARCHAR(64), vdb2 VARCHAR(64) )

BEGIN




  DROP TEMPORARY TABLE IF EXISTS desc1,desc2;

  CREATE TEMPORARY TABLE desc1

  SELECT

    t1.table_schema,

    t1.table_name,

    t1.table_type,

    t1.engine,

    c1.column_name,

    c1.ordinal_position,


    c1.column_type,

    c1.column_default,

    c1.is_nullable,

    c1.column_key

  FROM information_schema.tables t1

  JOIN information_schema.columns c1 USING (table_schema,table_name)

  WHERE t1.table_schema=vdb1 

  ORDER BY t1.table_name,c1.column_name;




  CREATE TEMPORARY TABLE desc2

  SELECT

    t1.table_schema,

    t1.table_name,

    t1.table_type,

    t1.engine,

    c1.column_name,

    c1.ordinal_position,

    c1.column_type,

    c1.column_default,

    c1.is_nullable,

    c1.column_key

  FROM information_schema.tables t1


  JOIN information_schema.columns c1 USING (table_schema,table_name)

  WHERE t1.table_schema=vdb2 

  ORDER BY t1.table_name,c1.column_name;



  SELECT

    TableName,column_name,MIN(SchemaName),table_type,engine,

    ordinal_position,column_type,column_default,is_nullable,column_key

  FROM (


    SELECT 

      a.table_schema AS SchemaName,a.table_name AS TableName,a.table_type,a.engine,

      a.column_name,a.ordinal_position,a.column_type,a.column_default,a.is_nullable,a.column_key

    FROM desc1 a

    UNION ALL

    SELECT 

      b.table_schema AS SchemaName,b.table_name AS TableName,b.table_type,b.engine,


      b.column_name,b.ordinal_position,b.column_type,b.column_default,b.is_nullable,b.column_key

    FROM desc2 b

  ) AS tmp

  GROUP BY TableName,table_type,engine,column_name,ordinal_position,column_type,column_default,is_nullable,column_key

  HAVING COUNT(*) = 1

  ORDER BY TableName,column_name,SchemaName;   




  DROP TEMPORARY TABLE desc1, desc2;



END |

DELIMITER ;

Call it as follows:




CALL compareDBs('db1','db2');

MEMORY tables would it speed it up, but unfortunately MySQL MEMORY tables do not support BLOB/TEXT columns.

Back to the top

Browse the book

Buy the book

Feedback

Find child tables

Starting with MySQL 5, you can find all tables which contain foreign key references to a given table with this information_schema query:




SELECT c.table_schema,u.table_name,u.column_name,u.referenced_column_name


FROM information_schema.table_constraints AS c

INNER JOIN information_schema.key_column_usage AS u

USING( constraint_schema, constraint_name )

WHERE c.constraint_type = 'FOREIGN KEY' 


 AND u.referenced_table_schema='db'

 AND u.referenced_table_name = 'table'

ORDER BY c.table_schema,u.table_name;

Unfortunately it performs slowly, not because of how it is written, but because of how MySQL has implemented information_schema. Indeed a bug report page is devoted to the issue. It says MySQL AB will not be speeding up information_schema query performance any time soon.

In theUsual we recently solved this problem by writing a PHP function that queries information_schema if that is required, but by default parses the results of iterative SHOW TABLES commands. The SHOW TABLES method logic is simple, and will port readily to another application language. It runs 10-50 times faster than the equivalent information_schema query.

The following version looks for child tables in one database; it slows down a bit when modified to search all server DBs, but even then it is much faster than its information_schema equivalent. It assumes an available connection object $conn:




function childtables( $db, $table, $via_infoschema=FALSE ) {


  GLOBAL $conn;

  $ret = array();

  if( $via_infoschema ) {

    $res = mysql_query( childtablesqry( $db, $table )) || die( mysql_error() );


    if( !is_bool( $res )) 

      while( $row = mysql_fetch_row( $res )) 

        $ret[] = $row;

  }

  else {


    $tables = array();

    $res = mysql_query( "SHOW TABLES" );

    while( $row = mysql_fetch_row( $res )) $tables[] = $row[0];


    $res = mysql_query( "SELECT LOCATE('ANSI_QUOTES', @@sql_mode)" );

    $ansi_quotes = $res ? mysql_result( $res, 0 ) : 0;


    $q = $ansi_quotes ? '"' : "`";

    $sref = ' REFERENCES ' . $q . $table . $q . ' (' . $q;


    foreach( $tables as $referringtbl ) {

      $res = mysql_query( "SHOW CREATE TABLE $referringtbl" );

      $row = mysql_fetch_row( $res );


      if(( $startref = stripos( $row[1], $sref )) > 0 ) {

        $endref = strpos( $row[1], $q, $startref + strlen( $sref ));


        $referencedcol = substr( $row[1], $startref+strlen($sref), 

                                 $endref-$startref-strlen($sref) );

        $endkey = $startref; 

        while( substr( $row[1], $endkey, 1 ) <> $q ) $endkey--;


        $startkey = --$endkey;

        while( substr( $row[1], $startkey, 1 ) <> $q ) $startkey--;

        $referencingcol = substr( $row[1], $startkey+1, $endkey - $startkey );


        $ret[] = array( $db, $referringtbl, $referencingcol, $referencedcol );

      }

    }

  }

  return $ret;

}



function childtablesqry( $db, $table ) {


  return "SELECT c.table_schema,u.table_name,u.column_name,u.referenced_column_name " .

         "FROM information_schema.table_constraints AS c " .

         "INNER JOIN information_schema.key_column_usage AS u " .


         "USING( constraint_schema, constraint_name ) " .

         "WHERE c.constraint_type = 'FOREIGN KEY' " .

         "AND u.referenced_table_schema='$db' " .


         "AND u.referenced_table_name = '$table' " .

         "ORDER BY c.table_schema,u.table_name";

}

Back to the top

Browse the book

Buy the book

Feedback

Find parent tables

List tables which are referenced by foreign key constraints in a given table.This is a simple query on two information_schema tables: table_constraints and key_column_usage. It is easy to parameterise, so we show it in stored procedures. The first sproc lists all foreign key references in a database. The second lists all foreign key references for a table.




CREATE PROCEDURE ListParentsForDb( pdb CHAR(64) )


BEGIN

  SELECT 

   u.table_schema AS 'Schema',

   u.table_name AS 'Table',

   u.column_name AS 'Key',

   u.referenced_table_schema AS 'Parent Schema',

   u.referenced_table_name AS 'Parent table',


   u.referenced_column_name AS 'Parent key'

  FROM information_schema.table_constraints AS c

  INNER JOIN information_schema.key_column_usage AS u

  USING( constraint_schema, constraint_name )


  WHERE c.constraint_type = 'FOREIGN KEY'

    AND c.table_schema = pdb

  ORDER BY u.table_schema,u.table_name,u.column_name;

END;



CREATE PROCEDURE ListParentsForTable( pdb CHAR(64), ptable CHAR(64) )


BEGIN

  SELECT 

   u.table_schema AS 'Schema',

   u.table_name AS 'Table',

   u.column_name AS 'Key',

   u.referenced_table_schema AS 'Parent Schema',

   u.referenced_table_name AS 'Parent table',


   u.referenced_column_name AS 'Parent key'

  FROM information_schema.table_constraints AS c

  INNER JOIN information_schema.key_column_usage AS u

  USING( constraint_schema, constraint_name )


  WHERE c.constraint_type = 'FOREIGN KEY'

    AND c.table_schema = pdb

    AND u.referenced_table_name = ptable

  ORDER BY u.table_schema,u.table_name,u.column_name;

END;

Back to the top

Browse the book

Buy the book

Feedback

Find primary key of a table

To retrieve primary keys of db.tbl...




SELECT k.column_name 

FROM information_schema.table_constraints t 

JOIN information_schema.key_column_usage k 

USING (constraint_name,table_schema,table_name)

WHERE t.constraint_type='PRIMARY KEY' 


  AND t.table_schema='db'

  AND t.table_name='tbl'

For pre-5 versions of MySQL:




SHOW INDEX FROM tbl

WHERE key_name='primary';

Back to the top

Browse the book

Buy the book

Feedback

Find the size of all databases on the server

This is based on a query Mark Leith posted to the MySQL General Discussion list.




DROP VIEW IF EXISTS dbsize;

CREATE VIEW dbsize AS

SELECT 

  s.schema_name AS 'Schema',

  SUM(t.data_length) AS Data,


  SUM( t.index_length ) AS Indexes,

  SUM(t.data_length) + SUM(t.index_length) AS 'Mb Used',

  IF(SUM(t.data_free)=0,'',SUM(t.data_free)) As 'Mb Free',

  IF( SUM(t.data_free)=0,


      '',

      100 * (SUM(t.data_length) + SUM(t.index_length)) / ((SUM(t.data_length)+SUM(t.index_length) + SUM(IFNULL(t.data_free,0))) )

    ) AS 'Pct Used',

  COUNT(table_name) AS Tables


FROM information_schema.schemata s

LEFT JOIN information_schema.tables t ON s.schema_name = t.table_schema

GROUP BY s.schema_name 

WITH ROLLUP

Back to the top

Browse the book

Buy the book

Feedback

List differences between two databases

Given two databases named @db1 and @db2:




SELECT 

  MIN(table_name) as TableName, 

  table_catalog,table_schema,table_name,column_name,

  ordinal_position,column_default,is_nullable,

  data_type,character_maximum_length,character_octet_length,

  numeric_precision,numeric_scale,character_set_name,

  collation_name,column_type,column_key,

  extra,privileges,column_comment

FROM (

  SELECT 'Table a' as TableName, 


  table_catalog,table_schema,table_name,column_name,

  ordinal_position,column_default,is_nullable,

  data_type,character_maximum_length,character_octet_length,

  numeric_precision,numeric_scale,character_set_name,

  collation_name,column_type,column_key,

  extra,privileges,column_comment

  FROM information_schema.columns c1

  WHERE table_schema=@db1

  UNION ALL

  SELECT 'Table a' as TableName, 


  table_catalog,table_schema,table_name,column_name,

  ordinal_position,column_default,is_nullable,

  data_type,character_maximum_length,character_octet_length,

  numeric_precision,numeric_scale,character_set_name,

  collation_name,column_type,column_key,

  extra,privileges,column_comment

  FROM information_schema.columns c2

  WHERE table_schema=@db2

) AS tmp

GROUP BY tablename,


  table_catalog,table_schema,column_name,

  ordinal_position,column_default,is_nullable,

  data_type,character_maximum_length,character_octet_length,

  numeric_precision,numeric_scale,character_set_name,

  collation_name,column_type,column_key,

  extra,privileges,column_comment

HAVING COUNT(*) = 1

ORDER BY tablename,column_name;

Back to the top

Browse the book

Buy the book

Feedback

List users of a database




DROP PROCEDURE IF EXISTS ListDbUsers;


DELIMITER |

CREATE PROCEDURE ListDbUsers( dbname CHAR(64) )

  SELECT host,user

  FROM mysql.user

  WHERE Select_priv = 'Y' 


       OR Insert_priv = 'Y' 

     OR Update_priv = 'Y' 

     OR Delete_priv = 'Y' 

     OR Create_priv = 'Y' 


     OR Drop_priv = 'Y' 

     OR Reload_priv = 'Y' 

     OR Shutdown_priv = 'Y' 

     OR Process_priv = 'Y' 


     OR File_priv = 'Y' 

     OR Grant_priv = 'Y' 

     OR References_priv = 'Y' 

     OR Index_priv = 'Y' 


     OR Alter_priv = 'Y' 

     OR Show_db_priv = 'Y' 

     OR Super_priv = 'Y' 

     OR Create_tmp_table_priv = 'Y' 


     OR Lock_tables_priv = 'Y' 

     OR Execute_priv = 'Y' 

     OR Repl_slave_priv = 'Y' 

     OR Repl_client_priv = 'Y' 


     OR Create_view_priv = 'Y' 

     OR Show_view_priv = 'Y' 

     OR Create_routine_priv = 'Y' 

     OR Alter_routine_priv = 'Y' 


     OR Create_user_priv = 'Y' 

     OR Event_priv = 'Y' 

     OR Trigger_priv = 'Y' 

  UNION

  SELECT host,user


  FROM mysql.db

  WHERE db=dbname

    AND (

     Select_priv = 'Y' 

     OR Insert_priv = 'Y' 

     OR Update_priv = 'Y' 


     OR Delete_priv = 'Y' 

     OR Create_priv = 'Y' 

     OR Drop_priv = 'Y' 

     OR Grant_priv = 'Y' 


     OR References_priv = 'Y' 

     OR Index_priv = 'Y' 

     OR Alter_priv = 'Y' 

     OR Create_tmp_table_priv = 'Y' 


     OR Lock_tables_priv = 'Y' 

     OR Create_view_priv = 'Y' 

     OR Show_view_priv = 'Y' 

     OR Create_routine_priv = 'Y' 


     OR Alter_routine_priv = 'Y' 

     OR Execute_priv = 'Y' 

     OR Event_priv = 'Y' 

     OR Trigger_priv = 'Y' 


  )

  UNION

  SELECT host,user

  FROM mysql.tables_priv

  WHERE db=dbname

  UNION

  SELECT host,user

  FROM mysql.columns_priv

  WHERE db=dbname;

|

DELIMITER ;


CALL ListDbUsers( 'test' );

Back to the top

Browse the book

Buy the book

Feedback

Rename Database

It's sometimes necessary to rename a database. MySQL 5.0 has no command for it. Simply bringing down the server to rename a database directory is not safe. MySQL 5.1.7 introduced a RENAME DATABASE command, but the command left several unchanged database objects behind, and was found to lose data, so it was dropped in 5.1.23.

It seems a natural for a stored procedure using dynamic (prepared) statements. PREPARE supports CREATE | RENAME TABLE. As precautions:

Before calling the sproc, the new database must have been created.
The procedure refuses to rename the mysql database.
The old database is left behind, minus what was moved.




DROP PROCEDURE IF EXISTS RenameDatabase;

DELIMITER |


CREATE PROCEDURE RenameDatabase(

  IN oldname CHAR (64), IN newname CHAR(64)

)

BEGIN

  DECLARE version CHAR(32);

  DECLARE sname CHAR(64) DEFAULT NULL;


  DECLARE rows INT DEFAULT 1;

  DECLARE changed INT DEFAULT 0;

  IF STRCMP( oldname, 'mysql' ) <> 0 THEN


    REPEAT

      SELECT table_name INTO sname

      FROM information_schema.tables AS t

      WHERE t.table_type='BASE TABLE'

        AND t.table_schema = oldname

      LIMIT 1;


      SET rows = FOUND_ROWS();

      IF rows = 1 THEN

        SET @scmd = CONCAT( 'RENAME TABLE ', oldname, '.', sname,


                           ' TO ', newname, '.', sname );

        PREPARE cmd FROM @scmd;

        EXECUTE cmd;

        DEALLOCATE PREPARE cmd;

        SET changed = 1;


      END IF;

    UNTIL rows = 0 END REPEAT;

    IF changed > 0 THEN

      SET @scmd = CONCAT( "UPDATE mysql.db SET Db = '",


                          newname,

                          "' WHERE Db = '", oldname, "'" );

      PREPARE cmd FROM @scmd;

      EXECUTE cmd;

      DROP PREPARE cmd;


      SET @scmd = CONCAT( "UPDATE mysql.proc SET Db = '",

                          newname,

                          "' WHERE Db = '", oldname, "'" );


      PREPARE cmd FROM @scmd;

      EXECUTE cmd;

      DROP PREPARE cmd;

      SELECT version() INTO version;

      IF version >= '5.1.7' THEN


        SET @scmd = CONCAT( "UPDATE mysql.event SET db = '",

                            newname,

                            "' WHERE db = '", oldname, "'" );


        PREPARE cmd FROM @scmd;

        EXECUTE cmd;

        DROP PREPARE cmd;

      END IF;

      SET @scmd = CONCAT( "UPDATE mysql.columns_priv SET Db = '",


                          newname,

                          "' WHERE Db = '", oldname, "'" );

      PREPARE cmd FROM @scmd;

      EXECUTE cmd;

      DROP PREPARE cmd;


      FLUSH PRIVILEGES;

    END IF;

  END IF;

END;

|

DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Show Create Trigger

As yet, MySQL has no Show Create Trigger command.Here is a stored procedure which behaves as a Show Create Trigger command should, listing all Triggers defined on one table in one database




SET GLOBAL log_bin_trust_routine_creators=TRUE; 

DROP PROCEDURE IF EXISTS ShowCreateTrigger; 


DELIMITER | 

CREATE PROCEDURE ShowCreateTrigger( IN db CHAR(64), IN tbl CHAR(64) ) 

  BEGIN 

    SELECT  


      CONCAT( 

        'CREATE TRIGGER ',trigger_name, CHAR(10),  

        action_timing,' ', event_manipulation, CHAR(10), 

        'ON ',event_object_schema,'.',event_object_table, CHAR(10), 


        'FOR EACH ROW', CHAR(10), 

        action_statement, CHAR(10) 

      ) AS 'Triggers' 

    FROM information_schema.triggers 

    WHERE event_object_schema = db 


      AND event_object_table = tbl; 

  END; 

| 

DELIMITER ;

and here is a stored proc which lists all triggers in a database:




SET GLOBAL log_bin_trust_routine_creators=TRUE; 

DROP PROCEDURE IF EXISTS ListTriggers; 

DELIMITER | 

CREATE PROCEDURE ListTriggers( IN db CHAR(64) ) 


  BEGIN 

    SELECT  

      trigger_name AS 'Trigger',  

      event_object_table AS 'Table' 

    FROM information_schema.triggers 

    WHERE event_object_schema = db; 


  END; 

| 

DELIMITER ;

If you have a collection of generic stored procs like these, it's most convenient to keep them in one place for easy accessibility. We keep ours in a sys database.

Back to the top

Browse the book

Buy the book

Feedback

Show Tables

The MySQL SHOW TABLES command is fine, but sometimes we want a little more information.

This simple stored procedure lists the table name, engine type, version, collation and rowcount for every table in a database. (Individual databases come and go, so we keep all such database-wide stored routines in a 'system' database.)




DROP PROCEDURE IF EXISTS ShowTables;

DELIMITER |

CREATE PROCEDURE ShowTables( IN dbname CHAR(64) )

BEGIN


  SELECT 

    table_name,

    engine,

    version,

    table_collation AS collation,

    table_rows AS rows

  FROM information_schema.tables

  WHERE table_schema=dbname;

END;

|


DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Display column values which occur N times




SELECT id

FROM tbl

GROUP BY id

HAVING COUNT(*) = N;

Change the HAVING condition to >1 to list duplicate values, etc.

Back to the top

Browse the book

Buy the book

Feedback

Display every Nth row




MySQL earlier than version 4.1:




SELECT id

FROM tbl

GROUP BY id 

HAVING MOD(id,N) = 0;



Since MySQL 4.1:




SELECT *

FROM tbl

WHERE ( id, 0 ) IN ( 

  SELECT id, MOD( id, N ) FROM tbl 


);

 

SELECT *

FROM (

 SELECT id FROM tbl 

) AS tmp

WHERE MOD( tmp.id, N ) = 0;

Back to the top

Browse the book

Buy the book

Feedback

Trees, networks and parts explosions in MySQL

http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

Back to the top

Browse the book

Buy the book

Feedback

Dijkstra’s shortest path algorithm

Given a table of source-to-destination paths, each of whose nodes references a row in a nodes table, how do we find the shortest path from one node to another?

One answer is Dijkstra's algorithm (http://en.wikipedia.org/wiki/Dijkstra's_algorithm). Peter Larsson has posted a SQL Server implementation of it on the SQL Team Forum. Here is a MySQL implementation.

The DDL:




DROP TABLE IF EXISTS dijnodes,dijpaths;

CREATE TABLE dijnodes (


  nodeID int PRIMARY KEY AUTO_INCREMENT NOT NULL,

  nodename varchar (20) NOT NULL,

  cost int NULL,

  pathID int NULL,


  calculated tinyint NOT NULL 

);



CREATE TABLE dijpaths (

  pathID int PRIMARY KEY AUTO_INCREMENT,

  fromNodeID int NOT NULL ,


  toNodeID int NOT NULL ,

  cost int NOT NULL

);

Here is a stored procedure to populate valid nodes and paths:




DROP PROCEDURE IF EXISTS dijAddPath;


DELIMITER |

CREATE PROCEDURE dijAddPath( 

  pFromNodeName VARCHAR(20), pToNodeName VARCHAR(20), pCost INT 

)

BEGIN

  DECLARE vFromNodeID, vToNodeID, vPathID INT;


  SET vFromNodeID = ( SELECT NodeID FROM dijnodes WHERE NodeName = pFromNodeName );

  IF vFromNodeID IS NULL THEN


    BEGIN

      INSERT INTO dijnodes (NodeName,Calculated) VALUES (pFromNodeName,0);

      SET vFromNodeID = LAST_INSERT_ID();

    END;

  END IF;

  SET vToNodeID = ( SELECT NodeID FROM dijnodes WHERE NodeName = pToNodeName );


  IF vToNodeID IS NULL THEN

    BEGIN

      INSERT INTO dijnodes(NodeName, Calculated) 

      VALUES(pToNodeName,0);

      SET vToNodeID = LAST_INSERT_ID();

    END;


  END IF;

  SET vPathID = ( SELECT PathID FROM dijpaths 

                  WHERE FromNodeID = vFromNodeID AND ToNodeID = vToNodeID 


                );

  IF vPathID IS NULL THEN

    INSERT INTO dijpaths(FromNodeID,ToNodeID,Cost) 

    VALUES(vFromNodeID,vToNodeID,pCost);

  ELSE

    UPDATE dijpaths SET Cost = pCost  


    WHERE FromNodeID = vFromNodeID AND ToNodeID = vToNodeID;

  END IF;

END; 

|

DELIMITER ;

Use dijAddpath() to populate the tables:




call dijaddpath( 'a', 'b',  4 );

call dijaddpath( 'a', 'd',  1 );

call dijaddpath( 'b', 'a', 74 );


call dijaddpath( 'b', 'c',  2 );

call dijaddpath( 'b', 'e', 12 );

call dijaddpath( 'c', 'b', 12 );


call dijaddpath( 'c', 'f', 74 );

call dijaddpath( 'c', 'j', 12 );

call dijaddpath( 'd', 'e', 32 );


call dijaddpath( 'd', 'g', 22 );

call dijaddpath( 'e', 'd', 66 );

call dijaddpath( 'e', 'f', 76 );


call dijaddpath( 'e', 'h', 33 );

call dijaddpath( 'f', 'i', 11 );

call dijaddpath( 'f', 'j', 21 );


call dijaddpath( 'g', 'd', 12 );

call dijaddpath( 'g', 'h', 10 );

call dijaddpath( 'h', 'g',  2 );


call dijaddpath( 'h', 'i', 72 );

call dijaddpath( 'i', 'f', 31 );

call dijaddpath( 'i', 'j',  7 );


call dijaddpath( 'i', 'h', 18 );

call dijaddpath( 'j', 'f',  8 );



SELECT * FROM dijnodes;


+--------+----------+------+--------+------------+

| nodeID | nodename | cost | pathID | calculated |

+--------+----------+------+--------+------------+

|      1 | a        | NULL |   NULL |          0 |


|      2 | b        | NULL |   NULL |          0 |

|      3 | d        | NULL |   NULL |          0 |


|      4 | c        | NULL |   NULL |          0 |

|      5 | e        | NULL |   NULL |          0 |


|      6 | f        | NULL |   NULL |          0 |

|      7 | j        | NULL |   NULL |          0 |


|      8 | g        | NULL |   NULL |          0 |

|      9 | h        | NULL |   NULL |          0 |


|     10 | i        | NULL |   NULL |          0 |

+--------+----------+------+--------+------------+

SELECT * FROM dijpaths;

+--------+------------+----------+------+


| pathID | fromNodeID | toNodeID | cost |

+--------+------------+----------+------+

|      1 |          1 |        2 |    4 |


|      2 |          1 |        3 |    1 |

|      3 |          2 |        1 |   74 |


|      4 |          2 |        4 |    2 |

|      5 |          2 |        5 |   12 |


|      6 |          4 |        2 |   12 |

|      7 |          4 |        6 |   74 |


|      8 |          4 |        7 |   12 |

|      9 |          3 |        5 |   32 |


|     10 |          3 |        8 |   22 |

|     11 |          5 |        3 |   66 |


|     12 |          5 |        6 |   76 |

|     13 |          5 |        9 |   33 |


|     14 |          6 |       10 |   11 |

|     15 |          6 |        7 |   21 |


|     16 |          8 |        3 |   12 |

|     17 |          8 |        9 |   10 |


|     18 |          9 |        8 |    2 |

|     19 |          9 |       10 |   72 |


|     20 |         10 |        6 |   31 |

|     21 |         10 |        7 |    7 |


|     22 |         10 |        9 |   18 |

|     23 |          7 |        6 |    8 |


+--------+------------+----------+------+

Now for the stored procedure, a 6-step:

null out path columns in the nodes table
find the nodeIDs referenced by input params
loop through all uncalculated one-step paths, calculating costs in each
if a node remains uncalculated, the graph is invalid, so quit
write the path sequence to a temporary table
query the temp table to show the result

DROP PROCEDURE IF EXISTS dijResolve;
DELIMITER |
CREATE PROCEDURE dijResolve( pFromNodeName VARCHAR(20), pToNodeName VARCHAR(20) )
BEGIN
  DECLARE vFromNodeID, vToNodeID, vNodeID, vCost, vPathID INT;
  DECLARE vFromNodeName, vToNodeName VARCHAR(20);
  -- null out path info in the nodes table
  UPDATE dijnodes SET PathID = NULL,Cost = NULL,Calculated = 0;
  -- find nodeIDs referenced by input params
  SET vFromNodeID = ( SELECT NodeID FROM dijnodes WHERE NodeName = pFromNodeName );
  IF vFromNodeID IS NULL THEN
    SELECT CONCAT('From node name ', pFromNodeName, ' not found.' ); 
  ELSE
    BEGIN
      -- start at src node
      SET vNodeID = vFromNodeID;
      SET vToNodeID = ( SELECT NodeID FROM dijnodes WHERE NodeName = pToNodeName );
      IF vToNodeID IS NULL THEN
        SELECT CONCAT('From node name ', pToNodeName, ' not found.' );
      ELSE
        BEGIN
          -- calculate path costs till all are done
          UPDATE dijnodes SET Cost=0 WHERE NodeID = vFromNodeID;
          WHILE vNodeID IS NOT NULL DO
            BEGIN
              UPDATE 
                dijnodes AS src
                JOIN dijpaths AS paths ON paths.FromNodeID = src.NodeID
                JOIN dijnodes AS dest ON dest.NodeID = Paths.ToNodeID
              SET dest.Cost = CASE
                                WHEN dest.Cost IS NULL THEN src.Cost + Paths.Cost
                                WHEN src.Cost + Paths.Cost < dest.Cost THEN src.Cost + Paths.Cost
                                ELSE dest.Cost
                              END,
                  dest.PathID = Paths.PathID
              WHERE 
                src.NodeID = vNodeID
                AND (dest.Cost IS NULL OR src.Cost + Paths.Cost < dest.Cost)
                AND dest.Calculated = 0;
       
              UPDATE dijnodes SET Calculated = 1 WHERE NodeID = vNodeID;

              SET vNodeID = ( SELECT nodeID FROM dijnodes
                              WHERE Calculated = 0 AND Cost IS NOT NULL
                              ORDER BY Cost LIMIT 1
                            );
            END;
          END WHILE;
        END;
      END IF;
    END;
  END IF;
  IF EXISTS( SELECT 1 FROM dijnodes WHERE NodeID = vToNodeID AND Cost IS NULL ) THEN
    -- problem,  cannot proceed
    SELECT CONCAT( 'Node ',vNodeID, ' missed.' );
  ELSE
    BEGIN
      -- write itinerary to map table
      DROP TEMPORARY TABLE IF EXISTS map;
      CREATE TEMPORARY TABLE map (
        RowID INT PRIMARY KEY AUTO_INCREMENT,
        FromNodeName VARCHAR(20),
        ToNodeName VARCHAR(20),
        Cost INT
      ) ENGINE=MEMORY;
      WHILE vFromNodeID <> vToNodeID DO
        BEGIN
          SELECT 
            src.NodeName,dest.NodeName,dest.Cost,dest.PathID
            INTO vFromNodeName, vToNodeName, vCost, vPathID
          FROM 
            dijnodes AS dest
            JOIN dijpaths AS Paths ON Paths.PathID = dest.PathID
            JOIN dijnodes AS src ON src.NodeID = Paths.FromNodeID
          WHERE dest.NodeID = vToNodeID;
          
          INSERT INTO Map(FromNodeName,ToNodeName,Cost) VALUES(vFromNodeName,vToNodeName,vCost);
          
          SET vToNodeID = (SELECT FromNodeID FROM dijPaths WHERE PathID = vPathID);
        END;
      END WHILE;
      SELECT FromNodeName,ToNodeName,Cost FROM Map ORDER BY RowID DESC;
      DROP TEMPORARY TABLE Map;
    END;
  END IF;
END;
|
DELIMITER ;
CALL dijResolve( 'a','i');
+--------------+------------+------+
| FromNodeName | ToNodeName | Cost |
+--------------+------------+------+
| a            | b          |    4 |
| b            | c          |    6 |
| c            | j          |   18 |
| j            | f          |   26 |
| f            | i          |   37 |
+--------------+------------+------+

Back to the top

Browse the book

Buy the book

Feedback

Trees of known depth

A tree is a hierarchy where each node except the root has one parent. A parent-child link is an edge. Edges in a tree of known depth can be queried with n-1 self-joins where n is the number of edges from top to bottom. Here is a simple example of depth 2:




drop table if exists t;

create table t(id int, parent int, ord int, title char(20));


insert into t values

(1, 0, 0, 'Root'),

(2, 1, 0, 'Home'),

(3, 1, 1, 'Projects'),

(5, 1, 2, 'Secret area'),


(4, 1, 3, 'Tutorials'),

(8, 1, 4, 'Example'),

(6, 4, 0, 'Computing'),

(7, 4, 1, 'Life');

To list children under their parents, self-join on left parent = right id and order on left parent, right ord:




SELECT t1.id,t1.parent,t2.ord,t1.title,t1.title as parentTitle

FROM t t1

LEFT JOIN t t2 ON t1.parent=t2.id


ORDER BY t1.parent,t2.ord;

+------+--------+------+-------------+-------------+

| id   | parent | ord  | title       | parentTitle |

+------+--------+------+-------------+-------------+

|    1 |      0 | NULL | Root        | Root        |


|    8 |      1 |    0 | Example     | Example     |

|    2 |      1 |    0 | Home        | Home        |


|    3 |      1 |    0 | Projects    | Projects    |

|    5 |      1 |    0 | Secret area | Secret area |


|    4 |      1 |    0 | Tutorials   | Tutorials   |

|    6 |      4 |    3 | Computing   | Computing   |


|    7 |      4 |    3 | Life        | Life        |

+------+--------+------+-------------+-------------+

For trees of unknown depth see http://artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

Back to the top

Browse the book

Buy the book

Feedback

Approximate joins

There are two main ways to reconcile payments against charges:

Open Item: match payments against individual charges, typically by carrying the charge number in the payments table
Statement: list and sum all charges and all payments, and show the difference as the outstanding balance.

The Open Item method needs a foolproof way to match payments to charges, but what if the customer neglected to return a copy of the invoice, or to write the invoice number on the cheque? Reconciliation staff spend much of their time resolving such problems.

Can we help? Yes! It won't be entirely foolproof, but it will drastically cut down the onerous work of reconciliation.

Here is DDL for a test case:




CREATE SCHEMA approx;

USE approx;


CREATE TABLE charges (

  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,

  custID INT UNSIGNED,

  amount DECIMAL(10,2) NOT NULL


);

CREATE TABLE payments (

  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,

  custID INT UNSIGNED,

  amount DECIMAL( 10,2) NOT NULL


);

Both tables carry a custID column to identify whose charge or payment it is, but there is no foreign key linking payments to specific charges--that is the link we are going to approximate.

Now populate the tables with a few rows of sample charges and payments for customer #1, ensuring that you have a variety of payments – some that match the charge exactly, some that are close but not enough, and some that are slight overpayments.




INSERT INTO approx.charges VALUES 


(NULL,1,100),(NULL,1,12),(NULL,1,56),(NULL,1,43),(NULL,1,59),(NULL,1,998);

INSERT INTO approx.payments VALUES 

(NULL,1,99),(NULL,1,62),(NULL,1,40),(NULL,1,50),(NULL,1,12),(NULL,1,1000);



SELECT * FROM charges;

+----+--------+--------+

| ID | custID | amount |


+----+--------+--------+

|  1 |      1 | 100.00 |

|  2 |      1 |  12.00 |

|  3 |      1 |  56.00 |


|  4 |      1 |  43.00 |

|  5 |      1 |  59.00 |

|  6 |      1 | 998.00 |


+----+--------+--------+

SELECT * FROM payments;

+----+--------+---------+

| ID | custID | amount  |

+----+--------+---------+

|  1 |      1 |   99.00 |


|  2 |      1 |   62.00 |

|  3 |      1 |   40.00 |

|  4 |      1 |   50.00 |


|  5 |      1 |   12.00 |

|  6 |      1 | 1000.00 |

+----+--------+---------+

The first thing to do is define an approximation threshold: how close must the amount paid be to the amount charged before we conclude that the amounts are related? For this example we define the proximity threshold as 2. In a real-world example, it might be 10, or 50, or perhaps percentage of the charge. It all depends on the nature of the organisation and the typical total purchase. A house builder may make frequent purchases valued at $1000 and more. You scale the threshold to the typical situation.

Since the amount paid might be more or less or even equal to the amount charged, to link a payment to a charge we need not an equi-join but a theta-join that tests a range both below and above the charge amount. That might suggest a BETWEEN clause. Here is a better idea: use the ABS() function:




SET  @proximity = 2;   -- change this value to suit your situation

SELECT

  c.ID AS ChargeNo,


  c.Amount AS Charge,

  p.ID AS PaymentNo,

  p.Amount AS Payment

FROM charges c

JOIN payments p

  ON c.custID = p.custID 


  AND ABS(c.amount - p.amount) <= @proximity

WHERE c.custID = 1;

Before you run this query, look at the data to anticipate the result.

Here it is:




+----------+--------+-----------+---------+

| ChargeNo | Charge | PaymentNo | Payment |

+----------+--------+-----------+---------+

|        1 | 100.00 |         1 |   99.00 |


|        2 |  12.00 |         5 |   12.00 |

|        6 | 998.00 |         6 | 1000.00 |


+----------+--------+-----------+---------+

The solution is correct, as far as it goes, but it doesn’t go far enough. We correctly identified the three situations: underpayment, exact payment and overpayment, but we suppressed all charges that don’t have a matching payment. Reconciliation staff are probably interested in a bigger picture of the situation. Fix this by changing the INNER JOIN to a LEFT JOIN:




SET @proximity = 2;

SELECT 


  c.ID AS ChargeNo, 

  c.amount AS Charge, 

  p.ID AS PaymentNo, 

  p.amount AS Payment

FROM

  charges c


LEFT JOIN payments p

  ON c.custID = p.custID 

  AND ABS(c.amount - p.amount) <= @proximity

WHERE c.custID = 1;


+----------+--------+-----------+---------+

| ChargeNo | Charge | PaymentNo | Payment |

+----------+--------+-----------+---------+

|        1 | 100.00 |         1 |   99.00 |


|        2 |  12.00 |         5 |   12.00 |

|        3 |  56.00 |      NULL |    NULL |


|        4 |  43.00 |      NULL |    NULL |

|        5 |  59.00 |      NULL |    NULL |


|        6 | 998.00 |         6 | 1000.00 |

+----------+--------+-----------+---------+

Much better! The reconciliation people now know that three charges have no matching payment.

What if the customer mistakenly pays for something twice? Add a row to the Payments table with an amount of $1000, then re-run the last query:




+----------+--------+-----------+---------+

| ChargeNo | Charge | PaymentNo | Payment |

+----------+--------+-----------+---------+

|        1 | 100.00 |         1 |   99.00 |


|        2 |  12.00 |         5 |   12.00 |

|        3 |  56.00 |      NULL |    NULL |


|        4 |  43.00 |      NULL |    NULL |

|        5 |  59.00 |      NULL |    NULL |


|        6 | 998.00 |         6 | 1000.00 |

|        6 | 998.00 |         7 | 1000.00 |


+----------+--------+-----------+---------+

How convenient! We can see at once that charge number 6 was paid for twice.

Somebody in the reconciliation department owes you lunch.

Back to the top

Browse the book

Buy the book

Feedback

Cascading JOINs

Show parents, children and grandchildren including parents without children




SELECT parent.id AS ParentID,

       IFNULL(child.parent_id,') AS ChildParentID,

       IFNULL(child.id,') AS ChildID,


       IFNULL(grandchild.child_id,') AS GrandchildChildID

FROM parent

  LEFT JOIN child ON parent.id=child.parent_id

  LEFT JOIN grandchild ON child.id=grandchild.child_id;

Back to the top

Browse the book

Buy the book

Feedback

Data-driven joins

Data-driven table relationships are hard to maintain, but sometimes they cannot be avoided. How do we build joins for them? One way is to use a CASE statement in the SELECT list to handle the joining possibilities. In this example, the parent.linktable column determines the name of the table where a particular parent row's data is. The method is fine when the number of child tables is small:




USE test;

DROP TABLE IF EXISTS parent, child1, child2;



CREATE TABLE parent (

  id INT UNSIGNED PRIMARY KEY, 


  linktable CHAR(64) NOT NULL

);

INSERT INTO parent VALUES (1, 'child1'), (2, 'child2');



CREATE TABLE child1 (


  id INT UNSIGNED PRIMARY KEY, 

  data CHAR(10)

);

INSERT INTO child1 VALUES (1, 'abc');



CREATE TABLE child2 (


  id INT UNSIGNED PRIMARY KEY, 

  data CHAR(10)

);

INSERT INTO child2 VALUES (2, 'def');

To retrieve all child data for all parents, include in the SELECT list a CASE statement which handles all child table possibilities:




SELECT

    p.id,

    p.linktable,

    CASE linktable

      WHEN 'child1' THEN c1.data


      WHEN 'child2' THEN c2.data

      ELSE 'Error'

    END AS Data

FROM parent AS p

LEFT JOIN child1 AS c1 ON p.id=c1.id


LEFT JOIN child2 AS c2 ON p.id=c2.id;

+----+-----------+------+

| id | linktable | Data |

+----+-----------+------+

|  1 | child1    | abc  |


|  2 | child2    | def  |

+----+-----------+------+

When the number of child tables is too large for a convenient CASE statement, PREPARE the query in a stored procedure.

(Based on a MySQL Forum post by Felix Geerinckx)

Back to the top

Browse the book

Buy the book

Feedback

Full Outer Join

A FULL OUTER join between tables a and b retrieves:

all rows from a, with matching rows or nulls from b, and
all rows from b, with matching rows or nulls from a

so for these tables:




DROP TABLE IF EXISTS a,b;

CREATE TABLE a(id int,name char(1));


CREATE TABLE b(id int,name char(1));

INSERT INTO a VALUES(1,'a'),(2,'b');

INSERT INTO b VALUES(2,'b'),(3,'c');

SELECT * FROM a;


+------+------+

| id   | name |

+------+------+

|    1 | a    |

|    2 | b    |


+------+------+

SELECT * FROM b;

+------+------+

| id   | name |

+------+------+

|    2 | b    |


|    3 | c    |

+------+------+

a full outer join returns:




+------+------+------+------+

| id   | name | id   | name |


+------+------+------+------+

|    1 | a    | NULL | NULL |

|    2 | b    |    2 | b    |


| NULL | NULL |    3 | c    |

+------+------+------+------+

MySQL does not support FULL OUTER JOIN. How to emulate it? If the joining keys of each table are unique, you can just UNION left and right joins:




SELECT * FROM a LEFT JOIN b ON a.id=b.id

UNION

SELECT * FROM a RIGHT JOIN b ON a.id=b.id;

But suppose the tables to be joined have duplicate rows, and you wish your result to preserve them. For example, add a duplicate row to table a:




INSERT INTO a VALUES(1,'a');

Now UNION removes the duplicate row you want preserved in the result. How to get back the desired duplicates? A FULL OUTER JOIN consists of:

an INNER JOIN between a and b to catch row matches between a and b,
a LEFT EXCLUSION JOIN from a to b to catch rows that are in a and not in b,
a RIGHT EXCLUSION JOIN from b to a to catch rows in b that are not in a.

In SQL:




SELECT * FROM a INNER JOIN b ON a.id=b.id

UNION ALL

SELECT * FROM a LEFT JOIN b ON a.id=b.id WHERE b.id IS NULL


UNION ALL

SELECT * FROM a RIGHT JOIN b ON a.id=b.id WHERE a.id IS NULL

But the first two joins—the inner join, and the left exclusion join—are logically equivalent to a left outer join, so we can write:




SELECT * FROM a LEFT JOIN b ON a.id=b.id

UNION ALL


SELECT * FROM a RIGHT JOIN b ON a.id=b.id WHERE a.id IS NULL;

+------+------+------+------+

| id   | name | id   | name |


+------+------+------+------+

|    1 | a    | NULL | NULL |

|    2 | b    |    2 | b    |


|    1 | a    | NULL | NULL |

| NULL | NULL |    3 | c    |


+------+------+------+------+

Why doesn't MySQL implement FULL OUTER JOIN syntax for this? We don't know.

Back to the top

Browse the book

Buy the book

Feedback

Intersection and difference

MySQL implements UNION, but does not directly implement INTERSECTION or DIFFERENCE.

INTERSECTION is just an INNER JOIN on all columns:




drop table if exists a,b;

create table(i int,j int);

create table b like a;

insert into a values(1,1),(2,2);


insert into a values(1,1),(3,3);

select * from a join b using(a,b);

+------+------+

| i    | j    |


+------+------+

|    1 |    1 |

+------+------+

Get the DIFFERENCE between tables a and b by UNIONing exclusion joins from a to b, and from b to a:




select * from a left join b using(i,j) where b.i is null

union

select * from b left join a using(i,j) where a.i is null;


+------+------+

| i    | j    |

+------+------+

|    2 |    2 |

|    3 |    3 |


+------+------+

Back to the top

Browse the book

Buy the book

Feedback

Many-to-many joins

We have a collection of articles and users' scores of them. How to report statistics on these scores? We need three tables--one for articles, one for users, and a bridge table where each row represents one score on one article by one user:




DROP TABLE IF EXISTS art_articles;

CREATE TABLE art_articles (

  ID INT AUTO_INCREMENT PRIMARY KEY,

  title CHAR(30),


  txt TEXT,

  UNIQUE KEY (title)

);

INSERT INTO art_articles VALUES (1,'abc',''),(2,'def',''),(3,'ghi',''),(4,'jkl','');



DROP TABLE IF EXISTS art_users;


CREATE TABLE art_users(

  ID INT AUTO_INCREMENT PRIMARY KEY,

  name CHAR(20)

);

INSERT INTO art_users VALUES (1,'A'),(2,'B');




DROP TABLE IF EXISTS art_scores;

CREATE TABLE art_scores (

  id INT AUTO_INCREMENT PRIMARY KEY,

  articleID INT NOT NULL, -- references article.articleID


  userID INT NOT NULL,    -- references user.userID

  score DECIMAL(6,2)

);

INSERT INTO art_scores VALUES (1,1,1,80),(2,1,2,90),(3,2,2,60);



-- find average score for article titled 'abc'


SELECT a.title, AVG( s.score ) AS Art1Avg

FROM art_articles a

JOIN art_scores s ON a.id=s.articleID

WHERE a.title='abc'


GROUP BY a.title;



-- find average score submitted by user 1

SELECT u.name, AVG( s.score ) AS User1Avg


FROM art_users u

JOIN art_scores s ON u.id=s.userID

WHERE u.id = 1

GROUP BY u.name;

Back to the top

Browse the book

Buy the book

Feedback

What else did buyers of X buy?

We often wish to know which purchases are associated with which other purchases, ie "people who bought this item also bought ..." In a real-world database, the table that summarises this information might be a View that encapsulates joins from customers to orders to orderitems to products, perhaps scoped on a recent date range. Here we'll ignore all such detail, focussing only on the logic of three typical problems of this type:




DROP TABLE IF EXISTS userpurchases;

CREATE TABLE userpurchases(

  custID INT UNSIGNED,

  prodID INT UNSIGNED

);

INSERT INTO userpurchases 


VALUES (1,1),(1,2),(2,4),(3,1),(3,2),(4,2),(4,3),(5,1),(5,2),(5,3);

SELECT custID, GROUP_CONCAT(prodID ORDER BY prodID) AS PurchaseList

FROM userpurchases

GROUP BY custID;

+--------+--------------+

| custID | PurchaseList |


+--------+--------------+

|      1 | 1,2          |

|      2 | 4            |

|      3 | 1,2          |

|      4 | 2,3          |


|      5 | 1,2,3        |

+--------+--------------+

The basic idea is to self-join on the product ID as often as need be to get the answer. For example, to list all products bought by customers who'd already bought at least one other product, join userpurchases to itself on matching custIDs and non-matching prodIDs:




SELECT DISTINCT p2.prodid 

FROM userpurchases p1


JOIN userpurchases p2 ON p1.custID = p2.custID AND p1.prodID <> p2.prodID;

+--------+

| prodid |

+--------+

|      1 |


|      2 |

|      3 |

+--------+

To find what else buyers of product 1 bought, copy the above join and group by customer ID:




SELECT p1.custID,GROUP_CONCAT(p2.prodid) as 'Buyers of #1 Also bought'


FROM userpurchases p1

JOIN userpurchases p2 ON p1.custID=p2.custID AND p1.prodID <> p2.prodID 

WHERE p1.prodID = 1

GROUP BY p1.custID;


+--------+--------------------------+

| custID | Buyers of #1 Also bought |

+--------+--------------------------+

|      1 | 2                        |

|      3 | 2                        |


|      5 | 2,3                      |

+--------+--------------------------+

What customers bought both product 1 and product 2?




SELECT DISTINCT p1.custID

FROM userpurchases p1


JOIN userpurchases p2 ON p1.custID=p2.custID AND p1.prodID=1 AND p2.prodID=2

+--------+

| custID |

+--------+

|      1 |


|      3 |

|      5 |

+--------+

Here is a variation on the pattern. You have tables people(id, name), membership(id, name) and people_membership(id, membership_id, people_id). You need to find names of all persons and all their memberships where one of the memberships has id=X. It's another two-step:

1. Assemble the people who have a membership with id=X

2. Join from that derived result to people_memberships to get their other memberships, then to people to get their personal info:




SELECT 

  GROUP_CONCAT(m.name ORDER BY m.membership_id SEPARATOR ', ') AS 'Member Name'

  pX.id,

  pX.name

FROM (


  SELECT p.id,p.id,p.name

  FROM people p

  JOIN (

    SELECT id

    FROM people_membership 

    WHERE membership_id=X

  ) pmX ON p.id=pmX.id


) pX

JOIN people_membership pm ON pX.id=pm.id

JOIN membership m ON pm.membership_id=m.membership_id

GROUP BY pX.id

ORDER BY pX.name;

Back to the top

Browse the book

Buy the book

Feedback

Join or subquery?

Usually, a JOIN is faster than an uncorrelated subquery. For example in the sakila test database, customer is a parent of rental (via customer_id) which in turn is a parent of payment (via rental_id). The subquery version of a query for whether a customer has made payments and rentals...




SELECT DISTINCT c.customer_id

FROM customer c

WHERE c.customer_id IN (

   SELECT r.customer_id 

   FROM rental r


   JOIN payment p USING (rental_id) 

   WHERE c.customer_id = 599;

);

is eight times slower than the join version...


 


SELECT DISTINCT c.customer_id

FROM customer c

JOIN rental r USING (customer_id)

JOIN payment p USING (rental_id) 


WHERE c.customer_id = 599;

Running EXPLAIN on the two queries reveals why: the subquery version has to read most customer rows, while the join version proceeds inside out and discovers it needs to read just one customer row.

Back to the top

Browse the book

Buy the book

Feedback

Parents without children

Given tables parent(id INT), child(id INT,parent_id INT), how to find parents that have no children? This is a simple version of the Not Exists query pattern, which can be written as an exclusion join...




SELECT parent.id

FROM parent 


LEFT JOIN child ON parent.id = child.parent_id 

WHERE child.parent_id IS NULL;

or with a NOT EXISTS subquery, which is logically equivalent to the exclusion join, but usually much slower:




SELECT parent.id AS ParentID

FROM parent

WHERE NOT EXISTS (

  SELECT parent.id

  FROM parent 


  JOIN child ON parent.ID = child.parent_id

);

Back to the top

Browse the book

Buy the book

Feedback

Parties who have contracts with one another

You have a parties table that holds info on peoples' names etc, and a contracts table where each row has clientID and contractorID value pointing at a parties.partyID value--that is, each contracts row points at two parties rows. You want to list the names of all contractors and their clients.




SELECT clientpartyID, 

       pCli.name AS Client, 

       contractorpartyID, 


       pCon.name AS Contractor

FROM contracts

  INNER JOIN parties AS pCli 

    ON contracts.clientpartyID = pCli.partyID

  INNER JOIN parties AS pCon 


    ON contracts.contractorpartyID = pCon.partyID;

Back to the top

Browse the book

Buy the book

Feedback

The unbearable slowness of IN()

IN() subqueries can be impossibly slow. This query for orders from NorthWind customers who have made multiple orders...




SELECT orderID, discount

FROM orderdetails

WHERE orderID IN (


  SELECT orderID FROM orderdetails 

  GROUP BY orderID 

  HAVING COUNT(orderID)>1

);

takes half again as long to execute as this logically equivalent query using a correlated subquery in an EXISTS() clause ...




SELECT od.orderID, od.discount

FROM orderdetails od

WHERE EXISTS (

  SELECT orderID FROM orderdetails 

  WHERE orderID = od.orderID


  GROUP BY orderID 

  HAVING COUNT(orderID)>1

);

and is 50 times slower than this logically equivalent query which moves the subquery to the FROM clause:




SELECT o.orderID, discount

FROM orderdetails AS o

INNER JOIN (

  SELECT orderID 

  FROM orderdetails


  GROUP BY orderID

  HAVING COUNT(1) > 1

) AS t ON o.orderID=t.orderID;

Why? Both the IN() and EXISTS() queries have to execute a table scan for each row in the table. Performance degrades as the square of the number of rows. The JOIN version builds its derived table on one table scan, and quickly picks off its resultset from that.

For NOT IN(...), use a straight exclusion join. The basic pattern is that ...




SELECT ...

FROM a

WHERE a.x NOT IN (

  SELECT b FROM y


);

becomes ...




SELECT ...

FROM a 

LEFT JOIN b ON a.x=b.y


WHERE b.y IS NULL;

Back to the top

Browse the book

Buy the book

Feedback

The [Not] Exists query pattern

Given a table employee( employeeID INT, mgr_employeeID INT, salary DECIMAL(10,2)), find the managers who earn less than at least one of their subordinates.

We can write this query directly from the logic of its spec...




SELECT DISTINCT employeeID

FROM employee AS e


WHERE EXISTS ( 

  SELECT employeeID

  FROM employee AS m

  WHERE m.mgr_employeeID = e.employeeID AND e.salary > m.salary


);

...but a JOIN (or decorrelated) version of the logic is usually much faster. This query pattern is simple:

Inner join the table (t1) to itself (t2) on the grouping key.
Add the condition on which you wish to find existing rows to the Join clause.




SELECT DISTINCT m.employeeID

FROM employee AS e


INNER JOIN employee AS m ON e.Mgr = m.employeeID AND e.salary > m.salary;

The correlated subquery version of the Not Exists pattern merely inserts a strategic NOT:




SELECT DISTINCT employeeID

FROM employee AS e

WHERE NOT EXISTS ( 

  SELECT employeeID

  FROM employee AS m


  WHERE m.Mgr = e.employeeID AND e.salary > m.salary

);

The decorrelated version of the Not Exists version of this query uses an exclusion join--a LEFT JOIN with an IS NULL condition imposed on the right side of the join:

Left join the table (t1) to itself (t2) on the grouping key.
Add the condition on which you wish to find existing rows to the Join clause.
For the condition on which you wish to find missing rows in t2,
(a) add the t2 value condition to the Join clause, and
(b) add 't2 grouping key is null' to the Where clause:




SELECT DISTINCT m.employeeID

FROM employee AS e


LEFT JOIN employee AS m ON e.Mgr = m.employeeID AND e.salary > m.salary

WHERE m.employeeID IS NULL;

Back to the top

Browse the book

Buy the book

Feedback

What exams did a student not register for?

We have a students table, an exams table, and a registrations (bridge) table that records which students are registered for which exams. How do we find the exams for which a particular student is not registered?




CREATE TABLE students (

  sid int(10) unsigned PRIMARY KEY auto_increment,

  firstname varchar(45) NOT NULL default '',


  lastname varchar(45) NOT NULL default ''

);

INSERT INTO students VALUES 

(1, 'Jack', 'Malone'),(2, 'Hiro', 'Nakamura'),(3, 'Bree', 'Van de Kamp'),


(4, 'Susan', 'Mayer'),(5, 'Matt', 'Parkman'),(6, 'Claire', 'Bennet');



CREATE TABLE exams (

  eid int(10) unsigned PRIMARY KEY auto_increment,


  exam_name varchar(45) NOT NULL default '',

  active smallint(5) unsigned NOT NULL default '0'

);

INSERT INTO exams VALUES 


(1, 'Javascript Expert', 1),(2, 'Lost Survival Course', 0),(3, 'Zend PHP Certification', 1);

(4, 'Superhero Advanced Skills', 1),(5, 'Desperation Certificate', 1);




CREATE TABLE registrations (

  registration_id int(11) PRIMARY KEY auto_increment,

  eid int(10) unsigned NOT NULL default '0',


  sid int(10) unsigned NOT NULL default '0',

  registration_date datetime NOT NULL default '0000-00-00 00:00:00'

);

INSERT INTO registrations (registration_id, eid, sid, registration_date) VALUES 


(1, 5, 14, '2007-10-25 00:00:00'),(2, 5, 3, '0000-00-00 00:00:00'),

(3, 5, 4, '2007-10-23 00:00:00'),(4, 4, 2, '2007-10-16 00:00:00'),


(5, 4, 5, '2007-10-22 00:00:00'),(6, 4, 6, '2007-10-23 00:00:00'), 

(7, 5, 2, '2007-10-23 00:00:00');

It's a variation of the [Not] Exists query pattern, but there is a wrinkle: we might expect to join registrations to students to get student info into the result, yet the registrations table will be the object of the exclusion join, so how do we retrieve the required student info? One solution is to left join exams to (students left join registrations), permitting us to retrieve the required student info while imposing the registration_id IS NULL condition on registrations to show only the exams he is not taking:




SELECT e.exam_name 

FROM exams e


LEFT JOIN (

  students s 

  LEFT JOIN registrations r 

         ON s.sid=r.sid AND s.firstname='Hiro' AND s.lastname='Nakamura'


) ON e.eid=r.eid

WHERE r.registration_id IS NULL;

+------------------------+

| exam_name              |

+------------------------+

| Javascript Expert      |

| Lost Survival Course   |


| Zend PHP Certification |

+------------------------+

We can prove that this logic is correct by modifying the query to show Hiro's registration or not for all exams:




SELECT e.exam_name,IF(s.sid IS NULL, 'No', 'Yes') AS 'Hiro registered'


FROM exams e

LEFT JOIN (

  students s 

  LEFT JOIN registrations r 

    ON s.sid=r.sid 


    AND s.names='Hiro' 

    AND s.surnames='Nakamura'

) ON e.eid=r.eid;

+---------------------------+-----------------+

| exam_name                 | Hiro registered |

+---------------------------+-----------------+

| Javascript Expert         | No              |


| Lost Survival Course      | No              |

| Zend PHP Certification    | No              |

| Superhero Advanced Skills | Yes             |


| Desperation Certificate   | Yes             |

+---------------------------+-----------------+

Thanks to Pascal Mitride for the example.

Back to the top

Browse the book

Buy the book

Feedback

List NULLs at end of query output

If ordering by col...




... ORDER BY IF(col IS NULL, 1, 0 ), col ...

Back to the top

Browse the book

Buy the book

Feedback

Parents with and without children

You have parties and contracts tables. Every contracts row has a contractorpartyID value which references a row in parties, and a clientpartyID value which also references a row in parties. How to list all parties and their contracts, showing blanks as empty strings rather than NULLs?




SELECT parties.partyID, 

       IFNULL(contractorpartyID,'') AS contractor, 

       IFNULL(clientpartyID,'') AS client

FROM parties 

LEFT JOIN contractor_client ON partyID=contractorpartyID


ORDER BY partyID;

+---------+------------+--------+

| partyID | contractor | client |

+---------+------------+--------+

|       1 |            |        |


|       2 | 2          | 1      |

|       3 |            |        |

+---------+------------+--------+

Back to the top

Browse the book

Buy the book

Feedback

Next row

You have a table of names, you have retrieved a row with name $name, and you want the row for the next name in name order. MySQL LIMIT syntax makes this very easy:




SELECT *


FROM tbl

WHERE name > $name

ORDER BY name

LIMIT 1

Back to the top

Browse the book

Buy the book

Feedback

Order by a column containing digits and letters

To have column values 1abc,10abc,8abc appear in the expected order 1abc,8abc,10abc, take advantage of a trick built into MySQL string parsing ...




SELECT '3xyz'+0;

+----------+

| '3xyz'+0 |


+----------+

|        3 |

+----------+

to write ...




SELECT ...

...

ORDER BY colname+0, colname;

Back to the top

Browse the book

Buy the book

Feedback

Order by month name

The MySQL FIELD(str,str1,str2,...,strN) function returns 1 if str=str1, 2 if str=str2, etc., so ...




SELECT . 

ORDER BY FIELD(month,'JAN','FEB','MAR',...,'NOV','DEC') .

will order query output from a legacy table in month-number order.

Back to the top

Browse the book

Buy the book

Feedback

Order by numerics then alphas

Given a column type with values 1,2,3,a,b,c, how to get the order 3,2,1,c,b,a?




ORDER BY type RLIKE '^[0-9]+$' DESC, `type` DESC

Back to the top

Browse the book

Buy the book

Feedback

Suppress repeating ordering values

You have tables tracking authors and their books, for example:




CREATE TABLE author (

  id int(4) NOT NULL auto_increment PRIMARY KEY,

  name text NOT NULL


);

INSERT INTO author (id, name) 

VALUES (1,'Brad Phillips'),(2,'Don Charles'),(3,'Kur Silver');

CREATE TABLE book (

  id int(4) NOT NULL auto_increment PRIMARY KEY,


  name text NOT NULL

);

INSERT INTO book (id, name) 

VALUES (1,'MySQL in a bucket '),(2,'Databases for Delinquents'),


       (3,'Design Patterns'),(4,'PHP Professional'),(5,'Java Script Programming');

CREATE TABLE book_author (

  book_id int(4) NOT NULL default '0',

  author_id int(4) NOT NULL default '0'


);

INSERT INTO book_author (book_id, author_id) 

VALUES (1,1), (1,2), (2,3), (4,1), (3,1), (5,2);

You want to list authors' books while suppressing repeating authors' names. A simple solution is to use MySQL's extremely useful GROUP_CONCAT() function to group books by author:




SELECT 

  a.name AS Author,

  GROUP_CONCAT(b.name ORDER BY b.name) AS Books

FROM book_author AS ba


JOIN book AS b ON ba.book_id=b.id

JOIN author AS a ON ba.author_id=a.id

GROUP BY a.name;

For a neater-looking result:
1. Retrieve authors and their books.
2. Order them
3. Use a variable to remember and suppress repeating author names:




SET @last='';

SELECT 

  IF(r.author=@last,'',@last:=r.author) AS Author,


  r.book AS Book

FROM (

  SELECT DISTINCT a.name AS author,b.name AS book

  FROM book_author AS ba

  JOIN book AS b ON ba.book_id=b.id


  JOIN author AS a ON ba.author_id=a.id

  ORDER BY a.name,b.name

) AS r;

+---------------+---------------------------+

| author        | book                      |


+---------------+---------------------------+

| Brad Phillips | Design Patterns           |

|               | MySQL in a bucket         |

|               | PHP Professional          |


| Don Charles   | Java Script Programming   |

|               | MySQL in a bucket         |

| Kur Silver    | Databases for Delinquents |


+---------------+---------------------------+

Back to the top

Browse the book

Buy the book

Feedback

Pagination

Suppose you have a phone book of names, addresses, etc. You are looking at page 100 and want to now see page 99 ... How do you do this knowing only what page you are on and the name at the top of the page?

You don't need the first name on the page. Assuming...

1-based page numbers
you are on page P
each page shows N rows

then a SELECT statement that reproduces your page, assuming no rows have changed, ends with


   LIMIT ((P-1)*N, N)

so you could retrieve page 99 with a SELECT statement which ends with


   LIMIT (98*N,N).

Back to the top

Browse the book

Buy the book

Feedback

Automate the writing of pivot table queries

You have a sales table listing product, salesperson and amount:




DROP TABLE IF EXISTS sales;

CREATE TABLE sales (


  id int(11) default NULL,

  product char(5) default NULL,

  salesperson char(5) default NULL,

  amount decimal(10,2) default NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1;


INSERT INTO sales VALUES 

  (1,'radio','bob','100.00'),

  (2,'radio','sam','100.00'),

  (3,'radio','sam','100.00'),

  (4,'tv','bob','200.00'),

  (5,'tv','sam','300.00'),

  (6,'radio','bob','100.00');

SELECT * FROM sales;

+------+---------+-------------+--------+

| id   | product | salesperson | amount |


+------+---------+-------------+--------+

|    1 | radio   | bob         | 100.00 |

|    2 | radio   | sam         | 100.00 |


|    3 | radio   | sam         | 100.00 |

|    4 | tv      | bob         | 200.00 |


|    5 | tv      | sam         | 300.00 |

|    6 | radio   | bob         | 100.00 |


+------+---------+-------------+--------+

If you are asked to tabulate sales amount against salesperson and product, you write a pivot table query:




SELECT

  product,

  SUM( CASE salesperson WHEN 'bob' THEN amount ELSE 0 END ) AS 'Bob',


  SUM( CASE salesperson WHEN 'sam' THEN amount ELSE 0 END ) AS 'Sam',

  SUM( amount ) AS Total


FROM sales

GROUP BY product WITH ROLLUP;

+---------+--------+--------+--------+

| product | Bob    | Sam    | Total  |


+---------+--------+--------+--------+

| radio   | 200.00 | 200.00 | 400.00 |

| tv      | 200.00 | 300.00 | 500.00 |


| NULL    | 400.00 | 500.00 | 900.00 |

+---------+--------+--------+--------+

The query generates one product per row and one column per salesperson. The pivoting CASE expressions assign values of sales.amount to the matching salesperson's column. For two products and two salespersons, it's a snap once you've done it a few times. When there are dozens of products and salespersons, though, writing the query becomes tiresome and error-prone.

Some years ago Giuseppe Maxia published a little query that automates writing the pivot expressions. His idea was to embed the syntax for lines like the SUM( CASE ...) lines above in a query for the DISTINCT values. At the time Giuseppe was writing, MySQL did not support stored procedures. Now that it does, we can further generalise Giuseppe's idea by parameterising it in a stored procedure.

Admittedly, it's a little daunting. To write a query with variable names rather than the usual literal table and column names, we have to write PREPARE statements. What we propose to do here is to write SQL that writes PREPARE statements.

Code which writes code which writes code. Not a job for the back of a napkin.

It's easy enough to write the sproc shell. We keep generic queries in a sys database, so the routine needs parameters specifying database, table, pivot column and (in some cases) the aggregating column. Then what? What worked for us was to proceed from back to front:

Write the pivot expressions for a specific case.
Write the PREPARE statement that generates those expressions.
Parameterise the result of #2.
Put the result of #3 in an sproc.

Further complicating matters, we soon found that different summary aggregations, for example COUNT and SUM, require different sprocs. Here is the routine for generating COUNT pivot expressions:




USE sys;


DROP PROCEDURE IF EXISTS writecountpivot;

DELIMITER |

CREATE PROCEDURE writecountpivot( db CHAR(64), tbl CHAR(64), col CHAR(64) )


BEGIN

  DECLARE datadelim CHAR(1) DEFAULT '"';

  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);

  DECLARE comma CHAR(1) DEFAULT ',';

  SET @sqlmode = (SELECT @@sql_mode);


  SET @@sql_mode='';

  SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote,

                     ',SUM(IF(', col, ' = ', datadelim, singlequote, comma,


                     col, comma, singlequote, datadelim, comma, '1,0)) AS `', 

                     singlequote, comma, col, comma, singlequote, '`', singlequote, 


                     ') AS countpivotarg FROM ', db, '.', tbl,

                     ' WHERE ', col, ' IS NOT NULL' );


  -- UNCOMMENT TO SEE THE MIDLEVEL CODE:

  -- SELECT @sql; 

  PREPARE stmt FROM @sql;

  EXECUTE stmt;


  DROP PREPARE stmt;

  SET @@sql_mode=@sqlmode;

END;

|

DELIMITER ;

CALL sys.writecountpivot('test','sales','salesperson');

This generates the SQL ...




SELECT DISTINCT 

  CONCAT(',SUM(IF(salesperson = "',salesperson,'",1,0)) AS `',salesperson,'`') 

  AS countpivotarg 

FROM test.sales 

WHERE salesperson IS NOT NULL |

and returns...




+--------------------------------------------+

| countpivotarg                              |

+--------------------------------------------+

| ,SUM(IF(salesperson = "bob",1,0)) AS `bob` |


| ,SUM(IF(salesperson = "sam",1,0)) AS `sam` |

+--------------------------------------------+

which we plug into ...




SELECT 

  product


  ,SUM(IF(salesperson = "bob",1,0)) AS `bob` 

  ,SUM(IF(salesperson = "sam",1,0)) AS `sam`

  ,COUNT(*) AS Total

FROM test.sales

GROUP BY product WITH ROLLUP;


+---------+------+------+-------+

| product | bob  | sam  | Total |

+---------+------+------+-------+

| radio   |    2 |    2 |     4 |


| tv      |    1 |    1 |     2 |

| NULL    |    3 |    3 |     6 |


+---------+------+------+-------+

Not overwhelming for two columns, very convenient if there are 20. (Yes, it could also be written with COUNT( ... 1, NULL)).

One point to notice is that the two levels of code generation create quotemark nesting problems. To make the double quotemark '"' available for data value delimiting, we turn off ANSI_QUOTES during code generation, and put it back afterwards.

SUM pivot queries need different syntax:




USE sys;

DROP PROCEDURE IF EXISTS writesumpivot;

DELIMITER |

CREATE PROCEDURE writesumpivot( db CHAR(64), tbl CHAR(64), pivotcol CHAR(64), sumcol CHAR(64) )


BEGIN

  DECLARE datadelim CHAR(1) DEFAULT '"';

  DECLARE comma CHAR(1) DEFAULT ',';

  DECLARE singlequote CHAR(1) DEFAULT CHAR(39);

  SET @sqlmode = (SELECT @@sql_mode);


  SET @@sql_mode='';

  SET @sql = CONCAT( 'SELECT DISTINCT CONCAT(', singlequote, 

                     ',SUM(IF(', pivotcol, ' = ', datadelim, singlequote, comma,


                     pivotcol, comma, singlequote, datadelim, comma, sumcol, ',0)) AS `', 

                     singlequote, comma, pivotcol, comma, singlequote, '`', singlequote,  


                     ') AS sumpivotarg FROM ', db, '.', tbl, 

                     ' WHERE ', pivotcol, ' IS NOT NULL' );


   -- UNCOMMENT TO SEE THE MIDLEVEL SQL:

   -- SELECT @sql;

   PREPARE stmt FROM @sql;

   EXECUTE stmt;

   DROP PREPARE stmt;


   SET @@sql_mode=@sqlmode;

END;

|

DELIMITER ;

CALL writesumpivot('test','sales','salesperson','amount');

+-------------------------------------------------+

| sumpivotarg                                     |

+-------------------------------------------------+

| ,SUM(IF(salesperson = "bob",amount,0)) AS `bob` |


| ,SUM(IF(salesperson = "sam",amount,0)) AS `sam` |

+-------------------------------------------------+

which forms the guts of our report query:




SELECT

 product

 ,SUM(IF(salesperson = "bob",amount,0)) AS `bob`


 ,SUM(IF(salesperson = "sam",amount,0)) AS `sam`

 ,SUM(amount) AS Total

FROM test.sales

GROUP BY product;

+---------+--------+--------+--------+

| product | bob    | sam    | Total  |


+---------+--------+--------+--------+

| radio   | 200.00 | 200.00 | 400.00 |

| tv      | 200.00 | 300.00 | 500.00 |


+---------+--------+--------+--------+

There are higher levels of generality beckoning---say, a routine that generates a complete pivot table query, not just the pivot expressions.

Back to the top

Browse the book

Buy the book

Feedback

Column value associations

From table tbl( lastname CHAR(20),firstname CHAR(20)), you want to list every lastname in the table, each on one line, with a comma-separated list of every firstname it is associated with.

The simplest solution is to create the pivot table with a self-join and GROUP_CONCAT():




SELECT 

  t1.lastname, 

  GROUP_CONCAT(t2.firstname ORDER BY t2.firstname SEPARATOR ', ')


FROM tbl AS t1 INNER JOIN tbl AS t2 USING (id)

GROUP BY t1.lastname;

This is easily generalised to n-tuples. For a personnel table ...




CREATE TABLE tbl(id INT,colID INT,value CHAR(20));

INSERT INTO tbl VALUES

  (1,1,'Sampo'),(1,2,'Kallinen'),(1,3,'Office Manager'),

  (2,1,'Jakko'),(2,2,'Salovaara'),(2,3,'Vice President');

where colID=1,2,3 mean, respectively, first name,last name,job title...




ID colID value

1   1   1 Sampo

1   2   1 Kallinen

1   3   1 Office Manager


2   1   2 Jakko

2   2   2 Salovaara

2   3   2 Vice President

you can pivot first name, last name and job title on ID with this query:




SELECT 

  id, 

  GROUP_CONCAT(if(colID = 1, value, NULL)) AS 'First Name',

  GROUP_CONCAT(if(colID = 2, value, NULL)) AS 'Last Name',


  GROUP_CONCAT(if(colID = 3, value, NULL)) AS 'Job Title'

FROM tbl

GROUP BY id;

Back to the top

Browse the book

Buy the book

Feedback

Group column statistics in rows

A pivot (or crosstab, or contingency) table aggregates sets of column values into rows of statistics, and pivots target value statistics on partitioning criteria defined by any available data.

Spreadsheet applications have intuitive point-and-click interfaces for generating pivot tables. RDBMSs generally do not. The task looks difficult in SQL, though, only until you have coded a few.

If you ported the Microsoft sample database Northwind to your MySQL database (as described in chapter 11 of Get It Done with MySQL), you can execute this example step by step. Even if you haven't ported Northwind, the example is easy to follow.

Amongst the tables in the Northwind database are:




employees(employeeID, lastname, firstname, ...)

orders(orderID, customerId, employeeID, orderdate, ...)

There are nine employees, and 803 orders dated from 1996 through 1998. Each order points to an employeeID. Suppose we wish to report counts of orders taken by employees pivoted on year--how would we proceed?

We do the pivot table 3-step:

1. Write the basic aggregating query, a GROUP BY query to aggregate the data on desired variables (in this case, employee and year).

2. Write the pivoting query as an outer query that creates a column for each pivot value (year) from #1 written as an inner subquery. (Sometimes it is more efficient to write the results of #1 to a temp table and write #2 to refer to the temp table.)

3. Fix a ROLLUP display glitch by encapsulating #2 in a new outer query that labels the ROLLUP row meaningfully.

Here are the three steps in more detail:

1. Group the joined counts by the two criteria, employee and order year, yielding one result row per employee per year:




SELECT

  CONCAT(firstname,' ',lastname) AS 'Employee',

  YEAR(OrderDate) AS col,

  COUNT(*) AS Data

FROM Employees e

JOIN Orders o ON e.EmployeeID = o.EmployeeID


GROUP BY e.employeeID, YEAR(o.OrderDate);

+------------------+------+------+

| Employee         | col  | Data |

+------------------+------+------+

| Nancy Davolio    | 1996 |   26 |


| Nancy Davolio    | 1997 |   55 |

| Nancy Davolio    | 1998 |   42 |

| Andrew Fuller    | 1996 |   16 |


| Andrew Fuller    | 1997 |   41 |

| Andrew Fuller    | 1998 |   39 |

| Janet Leverling  | 1996 |   18 |


| Janet Leverling  | 1997 |   71 |

| Janet Leverling  | 1998 |   38 |

| Margaret Peacock | 1996 |   31 |


| Margaret Peacock | 1997 |   81 |

| Margaret Peacock | 1998 |   44 |

| Steven Buchanan  | 1996 |   11 |


| Steven Buchanan  | 1997 |   18 |

| Steven Buchanan  | 1998 |   13 |

| Michael Suyama   | 1996 |   15 |


| Michael Suyama   | 1997 |   33 |

| Michael Suyama   | 1998 |   19 |

| Robert King      | 1996 |   11 |


| Robert King      | 1997 |   36 |

| Robert King      | 1998 |   25 |

| Laura Callahan   | 1996 |   19 |


| Laura Callahan   | 1997 |   54 |

| Laura Callahan   | 1998 |   31 |

| Anne Dodsworth   | 1996 |    5 |


| Anne Dodsworth   | 1997 |   19 |

| Anne Dodsworth   | 1998 |   19 |

+------------------+------+------+

Nine employees for three years yield 27 aggregated rows.

2. We want one summary row per employee, and one count column for each year when an employee took an order. We pivot the rows of the above resultset on year by querying the above resultset, defining a colunn for every year found, for example:




  SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',

grouping the result by row WITH ROLLUP to provide a row of column sums at the bottom. This gives the following query:




SELECT 

  Employee,

  SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',


  SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',

  SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',


  SUM( data ) AS Total    -- sums across years by employee

FROM (

  SELECT                  -- the query from step #1


    CONCAT(firstname,' ',lastname) AS 'Employee', 

    YEAR(OrderDate) AS 'col', 

    COUNT(*) AS Data

  FROM Employees e

  JOIN Orders o ON e.EmployeeID = o.EmployeeID 


  GROUP BY e.employeeID, YEAR(o.OrderDate)

) AS stats

GROUP BY employee WITH ROLLUP;

+------------------+------+------+------+-------+

| Employee         | 1996 | 1997 | 1998 | Total |


+------------------+------+------+------+-------+

| Andrew Fuller    |   16 |   41 |   39 |    96 |

| Anne Dodsworth   |    5 |   19 |   19 |    43 |


| Janet Leverling  |   18 |   71 |   38 |   127 |

| Laura Callahan   |   19 |   54 |   31 |   104 |


| Margaret Peacock |   31 |   81 |   44 |   156 |

| Michael Suyama   |   15 |   33 |   19 |    67 |


| Nancy Davolio    |   26 |   55 |   42 |   123 |

| Robert King      |   11 |   36 |   25 |    72 |


| Steven Buchanan  |   11 |   18 |   13 |    42 |

| NULL             |  152 |  408 |  270 |   830 |


+------------------+------+------+------+-------+

3. The result of #2 is correct except that sums ought not to be reported as NULL! We fix that bit of weirdness by writing query #2 as a derived table, and having the new outer query alias the yearly sums row:




SELECT 

  IFNULL( employee, 'SUMS') AS Employee, 1996, 1997, 1998, Total


FROM (

  SELECT 

    Employee,

    SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',


    SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',

    SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',


    SUM( data ) AS Total

  FROM (

    SELECT 

      CONCAT(firstname,' ',lastname) AS 'Employee', 

      YEAR(OrderDate) AS 'col', 


      COUNT(*) AS Data

    FROM Employees e

    JOIN Orders o ON e.EmployeeID = o.EmployeeID 

    GROUP BY e.employeeID, YEAR(o.OrderDate)


  ) AS stats

  GROUP BY employee WITH ROLLUP

) AS stats2;

+------------------+------+------+------+-------+

| Employee         | 1996 | 1997 | 1998 | Total |


+------------------+------+------+------+-------+

| Andrew Fuller    | 1996 | 1997 | 1998 |    96 |

| Anne Dodsworth   | 1996 | 1997 | 1998 |    43 |


| Janet Leverling  | 1996 | 1997 | 1998 |   127 |

| Laura Callahan   | 1996 | 1997 | 1998 |   104 |


| Margaret Peacock | 1996 | 1997 | 1998 |   156 |

| Michael Suyama   | 1996 | 1997 | 1998 |    67 |


| Nancy Davolio    | 1996 | 1997 | 1998 |   123 |

| Robert King      | 1996 | 1997 | 1998 |    72 |


| Steven Buchanan  | 1996 | 1997 | 1998 |    42 |

| Sums             | 1996 | 1997 | 1998 |   830 |


+------------------+------+------+------+-------+

With multiple statistics and pivot layers, a pivot table query can get complex, but following this 3-step will keep things clear.

Back to the top

Browse the book

Buy the book

Feedback

Pivot table using math tricks

http://en.wikibooks.org/wiki/Programming:MySQL/Pivot_table

Back to the top

Browse the book

Buy the book

Feedback

Pivot table with CONCAT

Here is a MySQL pivot table query for room bookings by weekday:




SELECT slot

  , max(if(day=1, concat(subject,' ',room), '')) as day1

  , max(if(day=2, concat(subject,' ',room), '')) as day2


  , max(if(day=3, concat(subject,' ',room), '')) as day3

  , max(if(day=4, concat(subject,' ',room), '')) as day4

  , max(if(day=5, concat(subject,' ',room), '')) as day5


from schedule

group by slot

MAX(...) decides between an entry and a blank (the entry will win if one exists) while the group by lines everything up on the same row. Friendly caution: If more than one entry exists for the same day and time, you will only see the one that is alphabetically "greater".

To see how many classes are scheduled by day for each slot (to check for conflicts) try:




SELECT slot

  , sum(if(day=1,1,0)) as day1


  , sum(if(day=2,1,0)) as day2

  , sum(if(day=3,1,0)) as day3

  , sum(if(day=4,1,0)) as day4

  , sum(if(day=5,1,0)) as day5

from schedule


group by slot

There is a pattern:

Columns you want as "row headers" are listed both in the SELECT _and_ in the GROUP BY clauses
Values you want as columns are selectively chosen by IF() functions to return something or nothing so that one of the aggregate functions (MIN, MAX, SUM,AVG, etc) can condense those rows and columns into single values

What to count in any column can be as complex as you like. If you don't like the IF() function you can use CASE statements or anything else to help you provide the correct set of values for the aggregate function you are using on that column. You can mix and match aggregate functions to be what you need. Let's say you join the appropriate tables together to form a data set that looks like:




day, slot, subject, student, grade

and you wanted to build a chart showing the statistics of grades vs. days and subject. You could use this type of query to work that out:




SELECT day, subject

  , AVG(grade) as average

  , MIN(grade) as lowest


  , MAX(grade) as highest

from <necessary tables>

group by day, subject

Now while one needn't choose values for columns, to "pivot" that table by days (each column representing statistics for just one day) change the query to:




SELECT subject

  , AVG(IF(day=1, grade,null)) as D1_average

  , MIN(IF(day=1, grade, null)) as D1_lowest

  , MAX(IF(day=1,grade,null)) as D1_highest


  , AVG(IF(day=2, grade,null)) as D2_average

  , MIN(IF(day=2, grade, null)) as D2_lowest

  , MAX(IF(day=2,grade,null)) as D2_highest

  , .... (repeat for rest of the days)


FROM <necessary tables>

GROUP BY day, subject

The IF ... NULL test prevents AVG() from counting all other grades for the same subject from different days. The same trick works for MIN and MAX functions.

Back to the top

Browse the book

Buy the book

Feedback

Pivot table without GROUP_CONCAT

Data designs often require flexibility in numbers and names of data points per instance row: instead of saving all the data points belonging to a key value in a single row, you save each data point as a name-value pair in its own row.

Thus given table user_class(user_id INT, class_id CHAR(20), class_value CHAR(20)) with these rows:




user_id  class_id   class_value


1        firstname  Rogier

1        lastname   Marat

2        firstname  Jean

2        lastname   Smith

and you wish a resultset that links first names to last names for each ID...




user_id  firstname  lastname

1        Rogier     Marat

2        Jean       Smith

the following query accomplishes the required pivot via an INNER SELF-JOIN:




SELECT 


  u1.user_ID,

  class_value AS firstname,

  u2.lastname 

FROM user_class AS u1

INNER JOIN (

  SELECT 

    user_ID,

    class_value AS lastname 


  FROM user_class

  WHERE class_id='lastname'

) AS u2 

ON u1.user_ID=u2.user_ID AND u1.class_id='firstname'

Back to the top

Browse the book

Buy the book

Feedback

All possible recipes with given ingredients

We have tables for recipes (r), ingredients (i), and recipe-ingredient combinations (ri). The ri table implements a many-many relationship between recipes and ingredients, where each row specifies one ingredient for one recipe. A recipe can use any number of ingredients, and an ingredient can occur in any number of recipes:




drop table if exists r,i,ri;


create table r(id int);

insert into r values(1),(2);

create table i(id int);

insert into i values(1),(2),(3);

create table ri(rid int,iid int);


insert into ri values (1,1),(1,2),(2,1),(2,4),(3,5);

select * from r;

+------+

| id   |

+------+

|    1 |


|    2 |

+------+

select * from i;

+------+

| id   |

+------+

|    1 |

|    2 |


|    3 |

+------+

select * from ri;

+------+------+

| rid  | iid  |

+------+------+

|    1 |    1 |


|    1 |    2 |

|    2 |    1 |

|    2 |    4 |

|    3 |    5 |


+------+------+

Given our ingredients, what recipes can we make? Inspection shows the answer will be recipe #1.

SQL has no universal quantifier, so how do we proceed? 'All A is B' is logically equivalent to the double negative 'there is no A that is not B', so we can reformulate the requirement ...

list the recipes for which we have all ingredients

into terms SQL can handle ...

list the recipes for which there is no ingredient we do not have

A double negative, so a double query. One inner query, one outer. Tackle the inner one first: find the recipes for which we are missing an ingredient.

That's a straight exclusion join, i.e., a left join on ingredient from 'required' to 'available', plus a where clause that restricts the resultset to nulls on the right ('available') side of the join:




SELECT DISTINCT rid

FROM ri

LEFT JOIN i ON ri.iid=i.id


WHERE i.id IS NULL;

+------+

| rid  |

+------+

|    2 |

|    3 |

+------+

Now the outer query has to find the recipes which are not in this list. That's another exclusion join, this time from recipes to the above derived table:




SELECT r.id

FROM r

LEFT JOIN (

  SELECT DISTINCT rid

  FROM ri


  LEFT JOIN i ON ri.iid=i.id

  WHERE i.id IS NULL

) AS rno ON r.id = rno.rid

WHERE rno.rid IS NULL;


+------+

| id   |

+------+

|    1 |

+------+

It's an example of relational division, one of Codd's eight basic relational operations. Dividing a divisor table into a dividend table yields a quotient or results table:

dividend ч divisor = quotient

As in arithmetic, multiplication reverses it:

divisor * quotient = dividend




                               +-----------+

      +-----+     +------+     | table AxB |

      |  A  |     |  B   |     +-----+-----+


      +-----+     +------+     |key_a|key_b|

      |key_a|     |key_b |     +-----+-----+

      +-----+     +------+     |  2  |  1  |

      |  2  |     |  1   |     |  2  |  7  |


      |  4  |     |  7   |     |  2  |  3  |

      +-----+     |  3   |     |  4  |  1  |


                  +------+     |  4  |  7  |

                               |  4  |  3  |

                               +-----+-----+

When we multiply (CROSS JOIN) tables A and B to yield AxB, AxB gets a row combining every row of A with every row of B, and all the columns from A and B. When we reverse that operation, dividing AxB by B, we get back A by listing distinct B values associated with A values in AxB.

Back to the top

Browse the book

Buy the book

Feedback

Parties with candidates in all districts

You have candidates (name, district, party), parties (party), and districts (district)tables. Every candidate belongs to a party. Which parties have candidates in all districts?

SQL has no universal quantifier so the solution is to ask for the parties for which there is no district for which there is no candidate. It's a form of relational division. We obtain the desired result (the quotient) by dividing the list of districts (the dividend) by the list of parties (the divisor).




SELECT DISTINCT party FROM parties

WHERE NOT EXISTS (

   SELECT * FROM districts 


   WHERE NOT EXISTS (

      SELECT * FROM candidates

      WHERE candidates.party=parties.party

      AND candidates.district=districts.district

   )

);

One way to get this done without subqueries in the WHERE clause is to JOIN and GROUP BY, then check the party count with a HAVING subquery:




SELECT party

FROM candidates

INNER JOIN districts USING (district)

GROUP BY party

HAVING COUNT(party) >= (SELECT COUNT(party) FROM parties);

Back to the top

Browse the book

Buy the book

Feedback

Who makes all the parts for a given assembly?

One way to arrive at the answer is by asking: What are the assembly-supplier pairs such that no part of the assembly is not made by the supplier? That's relational division again, formulated for two tables by Stephen Todd. Given assemblyparts(assembly,part) and partsuppliers(part,supplier) tables, here is a query that Joe Celko credits to Pierre Mullin.




SELECT DISTINCT 

  AP1.assembly, 

  SP1.supplier

FROM AssemblyParts AS AP1, PartSuppliers AS SP1

WHERE NOT EXISTS (


  SELECT *

  FROM AssemblyParts AS AP2

  WHERE AP2.assembly = AP1.assembly

  AND NOT EXISTS (

    SELECT SP2.part

    FROM PartSuppliers AS SP2


    WHERE SP2.part = AP2.part AND SP2.supplier = SP1.supplier

  )

);

Back to the top

Browse the book

Buy the book

Feedback

Find blocks of unused numbers

In a table of sequential numbers, some are used and some are not. Find the blocks, if any, of unused IDs:




DROP TABLE IF EXISTS tbl;

CREATE TABLE tbl(id INT,used BOOL);


INSERT INTO tbl VALUES(1,1),(2,0),(3,0),(4,1),(5,0),(6,0);

SELECT * FROM tbl;

+------+------+

| id   | used |

+------+------+

|    1 |    1 |


|    2 |    0 |

|    3 |    0 |

|    4 |    1 |

|    5 |    0 |


|    6 |    0 |

+------+------+

To list blocks of unused numbers, join the table to itself three times. Call these four representations of the table a, b, c, d, and condition each join on used=0 on both sides:

do an exclusion join from a to b on a.id immediately preceding b.id; thus a.id values are unused id values having no immediate unused predecessor,
left join a to c on a.id < c.id, to remove backward sequences and force a stop at the last value,
do an exclusion join from c to d on c.id immediately succeeding d.id; thus MIN(c.id) is the first unused id in the current block without an unused successor,

and GROUP BY a.id. In the result, a.id begins a missing block and MIN(c.id) ends a missing block:




SELECT a.id AS Start, MIN( c.id ) AS End

FROM tbl AS a

LEFT JOIN tbl AS b ON a.id=b.id + 1 AND a.used=0 AND b.used=0


LEFT JOIN tbl AS c ON a.id<c.id AND a.used=0 AND c.used=0

LEFT JOIN tbl AS d ON c.id=d.id-1 AND c.used=0 AND d.used=0


WHERE b.id IS NULL

  AND c.id IS NOT NULL

  AND d.id IS NULL

GROUP BY a.id;

+-------+------+


| Start | End  |

+-------+------+

|     2 |    3 |

|     5 |    6 |

+-------+------+

Back to the top

Browse the book

Buy the book

Feedback

Find missing numbers in a sequence

You have a table tbl(id int) with values (1,2,4,18,19,20,21), and you wish to find the first missing number in its sequence of id values:




SELECT t1.id+1 AS Missing

FROM tbl AS t1

LEFT JOIN tbl AS t2 ON t1.id+1 = t2.id


WHERE t2.id IS NULL

ORDER BY id LIMIT 1;

+---------+

| Missing |

+---------+

|       3 |


+---------+

For all the gaps, including gaps of more than 1 value, you need something a little more baroque...




SELECT

  a.id+1 AS 'Missing From',

  MIN(b.id) - 1 AS 'To'


FROM tbl AS a, tbl AS b

WHERE a.id < b.id

GROUP BY a.id

HAVING `Missing From` < MIN(b.id);


+--------------+------+

| Missing From | To   |

+--------------+------+

|            3 |    3 |

|            5 |   17 |


+--------------+------+

We often need such lists, so the query is a natural for a stored procedure that finds missing sequence values in any table:




DROP PROCEDURE IF EXISTS MissingInSeq;

DELIMITER |

CREATE PROCEDURE MissingInSeq( db VARCHAR(64), tbl VARCHAR(64), col VARCHAR(64) )


BEGIN

  SET @sql = CONCAT( "SELECT  a.", col, 

                     "+1 AS 'Missing From',", 

                     "MIN(b.", 

                     col, 


                     ") - 1 AS 'To' FROM ",

                     db, 

                     ".", 

                     tbl, 

                     " AS a,", 


                     db, 

                     ".", 

                     tbl, 

                     " AS b WHERE a.", 

                     col, 

                     " < b.", 


                     col, 

                     " GROUP BY a.", 

                     col, 

                     " HAVING a.", 

                     col, 

                     " < MIN(b.", 


                     col, 

                     ") - 1" 

                   ); 

  -- SELECT @sql;

  PREPARE stmt FROM @sql;

  EXECUTE stmt;


  DROP PREPARE stmt;

END;

|

DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Find previous and next values in a sequence

Here is an algorithm by Baron Schwartz (xaprb.com) for retrieving the previous and next column values in a sequence, given a particular column value thisvalue. The previous value is the maximum value less than thisvalue, and the next value is the minimum value greater than thisvalue:




SELECT 

  IF(col > thisvalue,'next','prev') AS Direction,

  IF(col > thisvalue,MIN(col),MAX(col)) AS 'Prev/Next' 

FROM tablename 

WHERE col <> thisvalue


GROUP BY SIGN(col - thisvalue);

So, to find the previous and next order ids in the Northwind database table orders (nwib.orders), starting from order number 10800:




SELECT 

  IF(orderid > 10800,'next','prev') AS Direction,


  IF(orderid > 10800,MIN(orderid),MAX(orderid)) AS 'Prev/Next' 

FROM nwib.orders 

WHERE orderid <> 10800

GROUP BY SIGN(orderid - 10800);

+-----------+-----------+


| Direction | Prev/Next |

+-----------+-----------+

| prev      | 10799     |

| next      | 10801     |

+-----------+-----------+

This is a natural for a stored procedure:




DROP PROCEDURE IF EXISTS PrevNext;

DELIMITER |

CREATE PROCEDURE PrevNext( 


  IN db CHAR(64), IN tbl CHAR(64), IN col CHAR(64), IN seq INT 

)

BEGIN

  IF db IS NULL OR db = '' THEN


    SET db = SCHEMA();

  END IF;

  SET @sql = CONCAT( "SELECT ", 

                     " IF(", col, " > ", seq,",'next','prev') AS Direction,",


                     " IF(", col, " > ", seq, ",MIN(", col, "),MAX(", col, ")) AS 'Prev/Next'",

                     " FROM ", db, ".", tbl, 


                     " WHERE ", col, " <> ", seq,

                     " GROUP BY SIGN(", col, " - ", seq, ")" );


  PREPARE stmt FROM @sql;

  EXECUTE  stmt;

  DEALLOCATE PREPARE stmt;

END;

|

DELIMITER ;

Or, it can be embedded in the FROM clause of another query, for example ...




SELECT o2.OrderID,o2.Value,o.customerid

FROM orders o

JOIN (

  SELECT 'This' AS 'OrderId', 10800 AS 'Value'

  UNION


  SELECT 

    IF( orderid > 10800, 'Next', 'Prev') AS Which,

    IF( orderid > 10800, MIN(orderid), MAX(orderid )) AS 'Value' 


  FROM orders 

  WHERE orderid <> 10800

  GROUP BY SIGN( orderid - 10800 )

) AS o2 ON o.orderid=o2.value 


ORDER BY o.orderid;

+---------+-------+------------+

| OrderID | Value | customerid |

+---------+-------+------------+

| Prev    | 10799 | KOENE      |


| This    | 10800 | SEVES      |

| Next    | 10801 | BOLID      |

+---------+-------+------------+

Back to the top

Browse the book

Buy the book

Feedback

Find row with next value of specified columns

Sometimes we need next values of a column on a given row ordering. Oracle has a LEAD(...) OVER(...) construct which simplifies this query. MySQL does not. The logic is:

1. Form a resultset consisting of all relevant rows joined with all relevant rows that have greater values in the ordering columns. For example, if the table has these rows:




+------+

|    2 |

|    4 |

|    6 |

|    8 |


|   10 |

+------+

Then the resultset is




+------+------+

|    2 |    4 |

|    2 |    6 |


|    2 |    8 |

|    2 |   10 |

|    4 |    6 |

|    4 |    8 |


|    4 |   10 |

|    6 |    8 |

|    6 |   10 |

|    8 |   10 |


+------+------+

2. For each row in this resultset, find the row with the lowest ordering value amongst the higher values. For the example the result of this from the above resultset is




+-----+-----+

|   2 |   4 |

|   4 |   6 |


|   6 |   8 |

|   8 |  10 |

+-----+-----+

The fastest way to do step #2 is a self-exclusion join (see self-exclusion join examples elsewhere on this page).

Put it together:




drop table if exists t;

create table t(id int);

insert into t values(2),(4),(6),(8),(10);


select x.aid as id,x.bid as nextvalue

from (

  select a.id as aid,b.id as bid

  from t a


  join t b on a.id<b.id

) x

left join (

  select a.id as aid,b.id as bid


  from t a

  join t b on a.id<b.id

) y on x.aid=y.aid and x.bid>y.bid

where y.bid is null


order by x.aid,x.bid;

+------+-----------+

| id   | nextvalue |

+------+-----------+

|    2 |         4 |

|    4 |         6 |


|    6 |         8 |

|    8 |        10 |

+------+-----------+

Modify the algorithm to suit for next lowest &c.

Back to the top

Browse the book

Buy the book

Feedback

Find sequence starts and ends

Find the first and last values of column value sequences. Assuming table tbl and numeric column id, an exclusion join on the previous sequential value finds the first value of each sequence, and the minimum next value on a left join and an exclusion join on the previous sequential value finds the last of each sequence:




SELECT 

  a.id AS Start, 

  MIN( c.id ) AS End 

FROM tbl AS a


LEFT OUTER JOIN tbl AS b ON a.id = b.id + 1

LEFT OUTER JOIN tbl AS c ON a.id <= c.id


LEFT OUTER JOIN tbl AS d ON c.id = d.id - 1

WHERE b.id IS NULL 


  AND c.id IS NOT NULL

  AND d.id IS NULL

GROUP BY a.id;

Thanks to Scott Noyes for noticing that a.id<c.id fails to pick up sequences of 1 followed by skips of 1, but a.id<=c.id does. A variant of the problem: when some IDs are used and some are not, find blocks of unused IDs:




DROP TABLE IF EXISTS tbl;

CREATE TABLE tbl(id INT,used BOOL);

INSERT INTO tbl VALUES(1,1),(2,0),(3,0),(4,1),(5,0),(6,0);

SELECT a.id AS Start, MIN( c.id ) AS End


FROM tbl AS a

LEFT JOIN tbl AS b ON a.id=b.id + 1 AND a.used=0 AND b.used=0


LEFT JOIN tbl AS c ON a.id<=c.id AND a.used=0 AND c.used=0

LEFT JOIN tbl AS d ON c.id=d.id-1 AND c.used=0 AND d.used=0


WHERE b.id IS NULL

  AND c.id IS NOT NULL

  AND d.id IS NULL

GROUP BY a.id;

+-------+------+


| Start | End  |

+-------+------+

|     2 |    3 |

|     5 |    6 |

+-------+------+

Back to the top

Browse the book

Buy the book

Feedback

Find specific sequences

You have a table which tracks hits on named web pages...




CREATE TABLE hits (

  id INT NOT NULL DEFAULT 0,

  page CHAR(128) DEFAULT '',

  time TIMESTAMP NOT NULL DEFAULT 0,


  PRIMARY KEY(id, time)

)

where id is unique to a session. Here is a bit of sample data:




INSERT INTO hits VALUES

  (1, 'A', TIMESTAMPADD(SECOND,10,NOW())),


  (1, 'B', TIMESTAMPADD(SECOND,20,NOW())),

  (2, 'A', TIMESTAMPADD(SECOND,40,NOW())),

  (1, 'A', TIMESTAMPADD(SECOND,50,NOW())),

  (1, 'C', TIMESTAMPADD(SECOND,60,NOW())),

  (3, 'A', TIMESTAMPADD(SECOND,110,NOW())),

  (3, 'A', TIMESTAMPADD(SECOND,120,NOW())),


  (3, 'C', TIMESTAMPADD(SECOND,130,NOW())),

  (2, 'C', TIMESTAMPADD(SECOND,90,NOW())),

  (2, 'A', TIMESTAMPADD(SECOND,100,NOW()));

You desire a count of the number of sessions where a user moved from one particular page directly to another, for example from 'A' to 'C'.

To find the next hit in a given session, scope on id, order by time, and limit the output to one row. Then simply count the rows meeting the page criteria:




SELECT

  COUNT(DISTINCT h1.id) AS 'Moves from A to C'

FROM hits AS h1

WHERE

  h1.page = 'A'


  AND 'C' = (

    SELECT h2.page

    FROM hits AS h2

    WHERE h2.id = h1.id 

      AND h2.time > h1.time


    ORDER BY h2.time LIMIT 1

  );

 ------------------- 

| Moves from A to C |

 ------------------- 


|                 3 |

 -------------------

Back to the top

Browse the book

Buy the book

Feedback

Gaps in a time series

Advanced time series analysis generally requires custom software, but straightforward SQL queries can answer simple time series questions. You have a jobtimes table with columns ID, job, machine, start_time, and stop_time. You wish to know which machines have had gaps between activity periods.Here is a query that shows the start times following breaks in activity for a given machine.




SELECT 

  id,

  machine AS thismachine,

  start_time AS StartAfterGap


FROM jobtimes

WHERE id > 1 AND NOT EXISTS (

  SELECT stop_time 

  FROM jobtimes

  WHERE machine=thismachine

    AND start_time < StartAfterGap 


    AND stop_time >= StartAfterGap 

)

Back to the top

Browse the book

Buy the book

Feedback

Make values of a column sequential

You have a table tbl with an integer primary key column keycol which is not a key in another table, and which you wish to make perfectly sequential starting with 1.




SET @i=0;

UPDATE tbl SET keycol=(@i:=@i+1);

Back to the top

Browse the book

Buy the book

Feedback

Track stepwise project completion

A master table has one row for each project, and the number of sequential steps required to complete each project. A detail table has one row per project per completed step:




DROP TABLE IF EXISTS t1 ;

CREATE TABLE t1 (

  id INT, projectname CHAR(2), projectsteps INT


);

INSERT INTO t1 VALUES 

(1, 'xx', 3), 

(2, 'yy', 3),

(3, 'zz', 5);




DROP TABLE IF EXISTS t2;

CREATE TABLE t2 (

  id INT, projectID INT, xid INT

);

INSERT INTO t2 VALUES


(1, 1, 1),

(2, 1, 2),

(3, 2, 1),

(4, 1, 3),

(5, 3, 2),

(6, 1, 2),


(7, 2, 1),

(8, 2, 1);

The requirement is for a query which, for every project, reports 'OK' if there is at least one detail row for every project step, or otherwise reports the number of the last sequential completed step:

Here is one way to build such a query:

1. Join t1 to t2 on projectID.

2. Left Join t2 to itself on projectID and integer succession.

3. Add a WHERE condition which turns the left self-join into an exclusion join that finds the first missing sequential xid value.

4. To the SELECT list add this item:




  IF( a.xid < p.projectstep,a.xid,'OK' ) AS StepState

so when the first sequential missing xid is not less than the number of project steps, display 'Ok', otherwise display the xid value before the first missing xid value.

5. Remove dupes with a GROUP BY clause.




SELECT 

  p.projectname,p.projectsteps,a.xid,

  IF(a.xid < p.projectsteps, a.xid, 'OK') AS CompletionState


FROM t1 p

JOIN t2 a ON p.id = a.projectID

LEFT JOIN t2 AS b ON a.projectID = b.projectID AND a.xid+1 = b.xid


WHERE b.xid IS NULL 

GROUP BY p.projectname;

+-------------+--------------+------+-----------------+

| projectname | projectsteps | xid  | CompletionState |


+-------------+--------------+------+-----------------+

| xx          |            3 |    3 | OK              |

| yy          |            3 |    1 | 1               |


| zz          |            5 |    2 | 2               |

+-------------+--------------+------+-----------------+

Back to the top

Browse the book

Buy the book

Feedback

Winning Streaks

Given a table of IDs and won-lost results, how do we find the longest winning streak?




drop table if exists results;

create table results(id int,result char(1));


insert into results values

(1,'w'),(2,'l'),(3,'l'),(4,'w'),(5,'w'),(6,'w'),(7,'l'),(8,'w'),(9,'w');

select * from results;

+------+--------+

| id   | result |

+------+--------+

|    1 | w      |


|    2 | l      |

|    3 | l      |

|    4 | w      |

|    5 | w      |


|    6 | w      |

|    7 | l      |

|    8 | w      |

|    9 | w      |


+------+--------+

We can find streaks of two with a left join on a.id=b.id+1. To count streak lengths, initialise a counter to 0 then increment it for every hit:




set @count=0;

select a.id, a.result, b.result, @count := IF(a.result = b.result, @count + 1, 1) as Streak


from results a 

left join results b on a.id = b.id + 1 

where a.result = 'w';

The longest winning streak is the longest such streak found:




set @count=0;

select MAX(@count:=IF(a.result = b.result, @count + 1, 1)) as LongestStreak


from results a 

left join results b on a.id = b.id + 1 

where a.result = 'w';


+---------------+

| LongestStreak |

+---------------+

|             3 |

+---------------+

That solution is from a response by Jon Roshko to a question by Ed Ball on the MySQL Newbie Forum. Scott Noyes points out that our query pattern for sequence starts and ends also works for winning streaks:




SELECT MIN( c.id ) - a.id + 1 as LongestStreak


FROM results AS a

LEFT JOIN results AS b ON a.id = b.id + 1 AND b.result = 'w'


LEFT JOIN results AS c ON a.id <= c.id AND c.result = 'w'

LEFT JOIN results AS d ON c.id = d.id - 1 AND d.result = 'w'


WHERE

  a.result = 'w'

  AND b.id IS NULL

  AND c.id IS NOT NULL

  AND d.id IS NULL


GROUP BY a.id

ORDER BY LongestStreak DESC LIMIT 1;

Back to the top

Browse the book

Buy the book

Feedback

Great circle distance

Find the distance in kilometres between two points on the surface of the earth. This is just the sort of problem stored functions were made for. For a first order approximation, ignore deviations of the earth's surface from the perfectly spherical. Then the distance in radians is given by a number of trigonometric formulas. ACOS and COS behave reasonably:




             COS(lat1-lat2)*(1+COS(lon1-lon2)) - COS(lat1+lat2)*(1-COS(lon1-lon2))

rads = ACOS( --------------------------------------------------------------------- )


                                              2

We need to convert degrees latitude and longitude to radians, and we need to know the length in km of one radian on the earth's surface, which is 6378.388. The function:




set log_bin_trust_function_creators=TRUE;



DROP FUNCTION IF EXISTS GeoDistKM;

DELIMITER |


CREATE FUNCTION GeoDistKM( lat1 FLOAT, lon1 FLOAT, lat2 FLOAT, lon2 FLOAT ) RETURNS float

BEGIN

  DECLARE pi, q1, q2, q3 FLOAT;


  DECLARE rads FLOAT DEFAULT 0;

  SET pi = PI();

  SET lat1 = lat1 * pi / 180;


  SET lon1 = lon1 * pi / 180;

  SET lat2 = lat2 * pi / 180;

  SET lon2 = lon2 * pi / 180;


  SET q1 = COS(lon1-lon2);

  SET q2 = COS(lat1-lat2);

  SET q3 = COS(lat1+lat2);

  SET rads = ACOS( 0.5*((1.0+q1)*q2 - (1.0-q1)*q3) ); 


  RETURN 6378.388 * rads;

END;

|

DELIMITER ;



-- toronto to montreal (505km):

select geodistkm(43.6667,-79.4167,45.5000,-73.5833);

+----------------------------------------------+


| geodistkm(43.6667,-79.4167,45.5000,-73.5833) |

+----------------------------------------------+

|                           505.38836669921875 |

+----------------------------------------------+

(Setting log_bin_trust_function_creators is the most convenient way to step round determinacy conventions implemented since 5.0.6.)

Back to the top

Browse the book

Buy the book

Feedback

Moving average

Given a table of dates and daily values, retrieve their moving 5-day average:




DROP TABLE IF EXISTS t;


CREATE TABLE t (dt DATE, qty INT);

INSERT INTO t VALUES ('2007-1-1',5),

                     ('2007-1-2',6),

                     ('2007-1-3',7),

                     ('2007-1-4',8),

                     ('2007-1-5',9),

                     ('2007-1-6',10),


                     ('2007-1-7',11),

                     ('2007-1-8',12),

                     ('2007-1-9',13);

SELECT 

  t1.dt, 

  ( SELECT SUM(t2.qty) / COUNT(t2.qty)

    FROM t AS t2

    WHERE DATEDIFF(t1.dt, t2.dt) BETWEEN 0 AND 4


  ) AS '5dayMovingAvg'

FROM t AS t1

ORDER BY t1.dt;

Back to the top

Browse the book

Buy the book

Feedback

Multiple sums across a join

You have a parties table that holds info on peoples' names etc, and a contracts table in which each row defines one contract, identifying a client as clientpartyID and a contractor as contractorpartyID, each of these a foreign key referencing parties.partyID. You want a list of parties showing how many contracts they have participated in as client, and how many they've participated in as contractor.




SELECT 

  p.partyID,

  p.name,

  (SELECT COUNT(*) FROM contractor_client c1 WHERE c1.clientpartyID = p.partyID )

  AS ClientDeals,

  (SELECT COUNT(*) FROM contractor_client c2 WHERE c2.contractorpartyID = p.partyID)


  AS ContractorDeals

FROM parties p

ORDER BY partyID;

Back to the top

Browse the book

Buy the book

Feedback

Percentiles

Assuming a table tbl(id INT, val FLOAT), retrieve a percentile ranking of all vals.




SELECT 

  a.id ,

  ROUND( 100.0 * ( SELECT COUNT(*) FROM tbl AS b WHERE b.val <= a.val ) / total.cnt, 1 ) 


  AS percentile

FROM child a 

CROSS JOIN ( 

  SELECT COUNT(*) AS cnt 

  FROM tbl 


) AS total

ORDER BY percentile DESC;

Back to the top

Browse the book

Buy the book

Feedback

Random row selection

When your web page loads it is to provide a randomly selected Murphy's Law from your murphy table (id int, text law)):




SELECT law 

FROM murphy 

ORDER BY RAND() 


LIMIT 1;

Back to the top

Browse the book

Buy the book

Feedback

Running Sum

Calculate and display a per-row cumulative sum of column values




SET @total=0;

SELECT value, @total:=@total+value AS RunningSum

FROM tbl

Back to the top

Browse the book

Buy the book

Feedback

Sum across categories

You often need to sum across several categories to total customer purchase amounts, salesperson sales amounts, political party election spending, etc.

For this example assume three tables: candidates, parties and ridings. You want to get the total amount spent in all ridings by every party in one output row. Here is the schema:




CREATE TABLE candidates (

  id int(11) NOT NULL default '0',

  `name` char(10) ,

  riding char(12) ,


  party char(12) ,

  amt_spent decimal(10,0) NOT NULL default '0',

  PRIMARY KEY  (id)

) ENGINE=MyISAM DEFAULT CHARSET=latin1;




INSERT INTO candidates 

  VALUES (1,'Anne Jones','Essex','Liberal','5000'),

         (2,'Mary Smith','Malton','Liberal','7000'),

         (3,'Sara Black','Riverdale','Liberal','15000'),

         (4,'Paul Jones','Essex','Socialist','3000'),

         (5,'Ed While','Essex','Conservative','10000'),

         (6,'Jim kelly','Malton','Liberal','9000'),


         (7,'Fred Price','Riverdale','Socialist','4000');



CREATE TABLE ridings (

  riding char(10) NOT NULL default '',

  PRIMARY KEY  (riding)

) ENGINE=MyISAM DEFAULT CHARSET=latin1;




INSERT INTO ridings VALUES ('Essex'),('Malton'),('Riverdale');



CREATE TABLE parties (

  party char(12) NOT NULL default '',


  PRIMARY KEY  (party)

) ENGINE=MyISAM DEFAULT CHARSET=latin1;



INSERT INTO parties VALUES ('Conservative'),('Liberal'),('Socialist');

And here is the query that does it:




SELECT 

  SUM(amt_spent) AS Total,

  (SUM(amt_spent)-SUM(CASE WHEN data.party='Conservative' THEN NULL ELSE amt_spent END)) AS Cons,


  (SUM(amt_spent)-SUM(CASE WHEN data.party='Liberal' THEN NULL ELSE amt_spent END)) AS Lib,

  (SUM(amt_spent)-SUM(CASE WHEN data.party='Socialist' THEN NULL ELSE amt_spent END)) AS Soc


FROM  

  (SELECT * FROM candidates

  INNER JOIN parties ON candidates.party=parties.party

  INNER JOIN ridings ON candidates.riding=ridings.riding) AS data




 ------- ------- ------- ------ 

| Total | Cons  | Lib   | Soc  |

 ------- ------- ------- ------ 


| 53000 | 10000 | 36000 | 7000 |

 ------- ------- ------- ------

Back to the top

Browse the book

Buy the book

Feedback

Top ten

We often want to know the top 1, 2, 10 or whatever values from a query. This is dead simple in MySQL. However many JOINs and WHEREs the query has, simply ORDER BY the column(s) whose highest values are sought, and LIMIT the resultset:




  SELECT (somecolumn), (othercolumns) ...


  FROM (some tables) ...

  ORDER BY somecolumn DESC

  LIMIT 10;

Back to the top

Browse the book

Buy the book

Feedback

A cursor if necessary, but not necessarily a cursor

You have photos (id INT, photo BLOB, tally INT) and votes(id INT, userID INT, photoID INT) tables. You wish to update photos.tally values from counts per photo in the votes table. You can use a cursor to walk the photos table, updating the tally as you go:




DROP TABLE IF EXISTS photos;

CREATE TABLE photos (id INT, photo BLOB, tally INT); 


INSERT INTO photos VALUES(1,'',0),(2,'',0);

DROP TABLE IF EXISTS VOTES;

CREATE TABLE VOTES( userID INT, photoID INT);

INSERT INTO votes VALUES (1,1),(2,1),(2,2);




DROP PROCEDURE IF EXISTS updatetallies;

DELIMITER //

CREATE PROCEDURE updatetallies()

BEGIN

  DECLARE done INT DEFAULT 0;


  DECLARE pid INT;

  DECLARE cur1 CURSOR FOR SELECT id FROM photos;

  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;


  OPEN cur1;

  FETCH cur1 INTO pid;

  WHILE done = 0 DO

    UPDATE photos 

      SET tally = (SELECT COUNT(*) FROM votes WHERE photoid = pid ) 


      WHERE id = pid;

    FETCH cur1 INTO pid;

  END WHILE;

  CLOSE cur1;

  SELECT id,tally FROM photos;

END //


DELIMITER ;

CALL updatetallies();

+------+-------+

| id   | tally |

+------+-------+

|    1 |     2 |

|    2 |     1 |


+------+-------+

but a simple join does exactly the same job at much less cost:




UPDATE photos 

SET tally = (

  SELECT COUNT(*) FROM votes WHERE votes.photoid = photos.id


);

Before you burden your app with a cursor, see if you can simplify the processing to a straightforward join.

Back to the top

Browse the book

Buy the book

Feedback

Emulate sp_exec

Sometimes it is desirable to call multiple stored procedures in one command. In SQL Server this can be done with sp_exec. In MySQL we can easily write such an sproc that calls as many sprocs as we please, for example...




USE sys;

DROP PROCEDURE IF EXISTS sp_exec;

DELIMITER |

CREATE PROCEDURE sp_exec( p1 CHAR(64), p2 CHAR(64) )


BEGIN

  -- permit doublequotes to delimit data

  SET @sqlmode=(SELECT @@sql_mode);

  SET @@sql_mode='';

  SET @sql = CONCAT( "CALL ", p1 );


  PREPARE stmt FROM @sql;

  EXECUTE stmt;

  DROP PREPARE stmt;

  SET @sql = CONCAT( "CALL ", p2 );


  PREPARE stmt FROM @sql;

  EXECUTE stmt;

  DROP PREPARE stmt;

  SET @@sql_mode=@sqlmode;

END;

|

DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Variable-length argument for query IN() clause

To have an sproc accept a variable-length parameter list for an IN(...) clause in a query, code the sproc to PREPARE the query statement:




DROP PROCEDURE IF EXISTS passInParam;

DELIMITER |

CREATE PROCEDURE passInParam( IN qry VARCHAR(100), IN param VARCHAR(1000) )


BEGIN

  SET @qry = CONCAT( qry, param, ')' );

  PREPARE stmt FROM @qry;

  EXECUTE stmt;

  DROP PREPARE stmt;


END;

|

DELIMITER ;

For this example, the query string should be of the form:




SELECT ... FROM ... WHERE ... IN (

but so long as it has those elements, it can be as complex as you like. When you call the sproc:
1. Quote each argument with a pair of single quotes,
2. Separate these quoted arguments with commas,
3. Surround the whole param string with another set of single quotes:




CALL passInParam( 'SELECT * FROM tbl WHERE colval IN (', ('''abc'',''def'',''ghi''' );

Back to the top

Browse the book

Buy the book

Feedback

Count delimited substrings

Here is a function to count substrings delimited by a constant delimiting string:




DROP FUNCTION IF EXISTS strcount;


SET GLOBAL log_bin_trust_function_creators=1;

DELIMITER |

CREATE FUNCTION strCount( pDelim VARCHAR(32), pStr TEXT) RETURNS int(11)

BEGIN

  DECLARE n INT DEFAULT 0;


  DECLARE pos INT DEFAULT 1;

  DECLARE strRemain TEXT;

  SET strRemain = pStr;

  SET pos = LOCATE( pDelim, strRemain );


  WHILE pos != 0 DO

    SET n = n + 1;

    SET pos = LOCATE( pDelim, strRemain );


    SET strRemain = SUBSTRING( strRemain, pos+1 );

  END WHILE;

RETURN n;

END |

DELIMITER ;



-- example call:


SET @str  = "The quick brown fox jumped over the lazy dog"; 

SET @delim = " "; 


SELECT strCount(@delim,@str);

Back to the top

Browse the book

Buy the book

Feedback

Count substrings

To count instances of a search string in a target string ...

in the target string, replace the search string with a single character,
subtract the length of the modified target string from the length of the original target string,
divide that by the length of the search string:




SET @str  = "The quick brown fox jumped over the lazy dog"; 


SET @find = "the";

SELECT ROUND(((LENGTH(@str) - LENGTH(REPLACE(LCASE(@str), @find, '')))/LENGTH(@find)),0) 

AS COUNT; 

+-------+

| COUNT |


+-------+

|     2 |

+-------+

Note that REPLACE() does a case-sensitive search; to get a case-insensitive result you must coerce target and search strings to one case.

To remove decimals from the result:




SELECT CAST((LENGTH(@str) - LENGTH(REPLACE(LCASE(@str)), @find, '')))/LENGTH(@find) AS SIGNED) AS COUNT;

Back to the top

Browse the book

Buy the book

Feedback

Levenshtein distance

The Levenshtein distance between two strings is the minimum number of operations needed to transform one string into the other, where an operation may be insertion, deletion or substitution of one character. Jason Rust published this MySQL algorithm for it at http://www.codejanitor.com/wp/.




CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )

  RETURNS INT

  DETERMINISTIC

  BEGIN

    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;


    DECLARE s1_char CHAR;

    -- max strlen=255

    DECLARE cv0, cv1 VARBINARY(256);

    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;


    IF s1 = s2 THEN

      RETURN 0;

    ELSEIF s1_len = 0 THEN

      RETURN s2_len;

    ELSEIF s2_len = 0 THEN


      RETURN s1_len;

    ELSE

      WHILE j <= s2_len DO

        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;


      END WHILE;

      WHILE i <= s1_len DO

        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;


        WHILE j <= s2_len DO

          SET c = c + 1;

          IF s1_char = SUBSTRING(s2, j, 1) THEN 


            SET cost = 0; ELSE SET cost = 1;

          END IF;

          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;


          IF c > c_temp THEN SET c = c_temp; END IF;

            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;


            IF c > c_temp THEN 

              SET c = c_temp; 

            END IF;

            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;


        END WHILE;

        SET cv1 = cv0, i = i + 1;

      END WHILE;

    END IF;

    RETURN c;


  END;

Helper function:


 

CREATE FUNCTION levenshtein_ratio( s1 VARCHAR(255), s2 VARCHAR(255) )

  RETURNS INT


  DETERMINISTIC

  BEGIN

    DECLARE s1_len, s2_len, max_len INT;

    SET s1_len = LENGTH(s1), s2_len = LENGTH(s2);

    IF s1_len > s2_len THEN 


      SET max_len = s1_len; 

    ELSE 

      SET max_len = s2_len; 

    END IF;

    RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100);


  END;

Back to the top

Browse the book

Buy the book

Feedback

Proper case

The basic idea is...

lower-case the string
upper-case the first character if it is a-z, and any other a-z character that follows a punctuation character

Here is the function. To make it work with strings long than 128 characters, change its input and return declarations accordingly:




DROP FUNCTION IF EXISTS proper;

SET GLOBAL  log_bin_trust_function_creators=TRUE;

DELIMITER |


CREATE FUNCTION proper( str VARCHAR(128) )

RETURNS VARCHAR(128)

BEGIN

  DECLARE c CHAR(1);

  DECLARE s VARCHAR(128);

  DECLARE i INT DEFAULT 1;


  DECLARE bool INT DEFAULT 1;

  DECLARE punct CHAR(17) DEFAULT ' ()[]{},.-_!@;:?/';

  SET s = LCASE( str );


  WHILE i < LENGTH( str ) DO 

    BEGIN

      SET c = SUBSTRING( s, i, 1 );

      IF LOCATE( c, punct ) > 0 THEN


        SET bool = 1;

      ELSEIF bool=1 THEN 

        BEGIN

          IF c >= 'a' AND c <= 'z' THEN 


            BEGIN

              SET s = CONCAT(LEFT(s,i-1),UCASE(c),SUBSTRING(s,i+1));

              SET bool = 0;

            END;

          ELSEIF c >= '0' AND c <= '9' THEN


            SET bool = 0;

          END IF;

        END;

      END IF;

      SET i = i+1;

    END;

  END WHILE;

  RETURN s;


END;

|

DELIMITER ;

Back to the top

Browse the book

Buy the book

Feedback

Retrieve octets from IP addresses

If ip is for the form N.N.N.N where N may be 1, 2 or 3 digits, how to group and count by just the first three octets, ie the ip class?




SELECT 

  LEFT(ip, CHAR_LENGTH(ip) - LOCATE('.', REVERSE(ip))) as ipclass,


  COUNT(*)

FROM tbl

GROUP BY ipclass;

Hamilton Turner notes we can find the first octet with LEFT(ip,LOCATE('.',ip)-1).

Back to the top

Browse the book

Buy the book

Feedback

Return digits or alphas from a string

Return only the digits from a string:




SET GLOBAL log_bin_trust_function_creators=1;

DROP FUNCTION IF EXISTS digits;


DELIMITER |

CREATE FUNCTION digits( str CHAR(32) ) RETURNS CHAR(32)

BEGIN

  DECLARE i, len SMALLINT DEFAULT 1;


  DECLARE ret CHAR(32) DEFAULT '';

  DECLARE c CHAR(1);

  SET len = CHAR_LENGTH( str );

  REPEAT

    BEGIN

      SET c = MID( str, i, 1 );


      IF c BETWEEN '0' AND '9' THEN 

        SET ret=CONCAT(ret,c);

      END IF;

      SET i = i + 1;


    END;

  UNTIL i > len END REPEAT;

  RETURN ret;

END |

DELIMITER ;

SELECT digits('123ab45cde6789fg');

+----------------------------+

| digits('123ab45cde6789fg') |


+----------------------------+

| 123456789                  |

+----------------------------+

Return only the alphabetic characters from a string:




DROP FUNCTION IF EXISTS alphas;

DELIMITER |


CREATE FUNCTION alphas( str CHAR(32) ) RETURNS CHAR(16)

BEGIN

  DECLARE i, len SMALLINT DEFAULT 1;

  DECLARE ret CHAR(32) DEFAULT '';


  DECLARE c CHAR(1);

  SET len = CHAR_LENGTH( str );

  REPEAT

    BEGIN

      SET c = MID( str, i, 1 );


      IF c REGEXP '[[:alpha:]]' THEN

        SET ret=CONCAT(ret,c);

      END IF;

      SET i = i + 1;

    END;

  UNTIL i > len END REPEAT;


  RETURN ret;

END |

DELIMITER ;

SELECT alphas('123ab45cde6789fg');

+----------------------------+

| alphas('123ab45cde6789fg') |

+----------------------------+

| abcdefg                    |

+----------------------------+

Back to the top

Browse the book

Buy the book

Feedback

Strip HTML tags

Ported from a T-SQL function by Robert Davis:




SET GLOBAL log_bin_trust_function_creators=1;

DROP FUNCTION IF EXISTS fnStripTags;

DELIMITER |

CREATE FUNCTION fnStripTags( Dirty varchar(4000) )


RETURNS varchar(4000)

DETERMINISTIC 

BEGIN

  DECLARE iStart, iEnd, iLength int;

  WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO


    BEGIN

      SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));


      SET iLength = ( iEnd - iStart) + 1;

      IF iLength > 0 THEN

        BEGIN

          SET Dirty = Insert( Dirty, iStart, iLength, '');


        END;

      END IF;

    END;

  END WHILE;

  RETURN Dirty;

END;

|

DELIMITER ;

SELECT fnStripTags('this is a test, nothing more') AS Test;


+------------------------------+

| Test                         |

+------------------------------+

| this is a test, nothing more |

+------------------------------+

Documentation

firefox and flash 10

Flash 10 Tips, IMPORTANT if you want it to actually work
1. If you are running 64bit Fedora or RHEL5, you must install nspluginwrapper.i386 and nspluginwrapper.x86_64
for it to work. 64bit Fedora does not install nspluginwrapper.i386 by default.

2. You also need alsa-lib.i386 and alsa-plugins-pulseaudio.i386.

3. A bug in firefox/xulrunner causes the browser to crash (even with nspluginwrapper) with "windowless"
WMODE plugins like Flash Player 10. This was fixed in firefox-3.0.2. You really should upgrade firefox
anyway because it contains security fixes.

4. /etc/adobe/mms.cfg
WindowlessDisable=true

It is recommended that you use this workaround because there remain other WMODE bugs between
firefox, nspluginwrapper and the plugin. This makes things a little more stable.

5. nspluginwrapper-1.1.0 introduced WMODE support for the first time, but it was broken. You must
upgrade to at least nspluginwrapper-1.1.2 which fixes a few WMODE problems and other bugs that
would make Flash 10 crash. Here are packages for Fedora.

6. Flash 10 requires either libcurl.so.3 or libcurl.so.4 or it will refuse to load. Be sure libcurl.i386 is
installed on your system.

yum install libcurl.i386

7. Fedora 8+ requires an updated libcurl to fix a problem with curl's NSS implementation that caused Flash
crashes. Install all updates and this particular problem will go away.

8. Remove libflashsupport as it is no longer needed for anything with Flash 10, and it might even be the cause of crashes.

FAQ
Also check out the Adobe's Flash Player Support Center for help.

1. Where is Flash for x86-64 or PowerPC Linux?
After years of waiting, it sounds like one of their engineers is working on it at least unofficially.
Contact Adobe and express your opinion if you truly care about this.

2. What if I use Debian?

Install flashplugin-nonfree from Debian contrib unstable.
If this fails, you must seek help from the Debian community.

3. What if I use Gentoo?
emerge netscape-flash
If this fails, you must seek help from the Gentoo community.

4. What if I use Red Hat Enterprise Linux?
RHEL distributes flash-plugin on the Extras CD, which is essentially the same
package from Adobe.
RHEL3's flash-plugin package contains libflashsupport in order to output sound via the operating system's
OSS drivers, since flash-plugin itself is only capable of ALSA sound.

5. As reported on Adobe's Linux blog in May 2008, Flash 10 has native 3D hardware acceleration,
except due to driver bugs they blacklisted anything with a client glx vendor string of SGI. Unfortunately, Intel
seems to report "SGI" here. I was told that mms.cfg option "OverrideGPUValidation = 1" will override their blacklist.
Adobe wrote this new blog entry about this.
See the Adobe Flash Player Administration Guide to learn how to use /etc/adobe/mms.cfg.

Documentation

Why Does The Segmentation Fault Occur on Linux / UNIX Systems?

According to wikipedia:

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

Usually signal #11 (SIGSEGV) set, which is defined in the header file signal.h file. The default action for a program upon receiving SIGSEGV is abnormal termination. This action will end the process, but may generate a core file (also known as core dump) to aid debugging, or perform some other platform-dependent action. A core dump is the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally.

Segmentation fault can also occur under following circumstances:

a) A buggy program / command, which can be only fixed by applying patch.

b) It can also appear when you try to access an array beyond the end of an array under C programming.

c) Inside a chrooted jail this can occur when critical shared libs, config file or /dev/ entry missing.

d) Sometime hardware or faulty memory or driver can also create problem.

e) Maintain suggested environment for all computer equipment (overheating can also generate this problem).
Suggestions to debug Segmentation Fault errors

To debug this kind of error try one or all of the following techniques :

* Use gdb to track exact source of problem.
* Make sure correct hardware installed and configured.
* Always apply all patches and use updated system.
* Make sure all dependencies installed inside jail.
* Turn on core dumping for supported services such as Apache.
* Use strace which is a useful diagnostic, instructional, and debugging tool.
* Google and find out if there is a solution to problem.
* Fix your C program for logical errors such as pointer, null pointer, arrays and so on.
* Analyze core dump file generated by your system using gdb

-------------------------------------------------------------------------------
Example Debugging Session: Segmentation Fault Example

We are going to use gdb to figure out why the following program causes a segmentation fault. The program is meant to read in a line of text from the user and print it. However, we will see that in it's current state it doesn't work as expected...

1 : #include
2 : #include

3 : int main(int argc, char **argv)
4 : {
5 : char *buf;
6 :
7 : buf = malloc(1<<31);
8 :
9 : fgets(buf, 1024, stdin);
10: printf("%s\n", buf);
11:
12: return 1;
13: }

The first step is to compile the program with debugging flags:

prompt> gcc -g segfault.c

Now we run the program:

prompt > a.out
Hello World!
Segmentation fault
prompt >

This is not what we want. Time to fire up gdb:

prompt > gdb a.out
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb)

We'll just run it and see what happens:

(gdb) run
Starting program: /home/dgawd/cpsc/363/a.out
test string

Program received signal SIGSEGV, Segmentation fault.
0x4007fc13 in _IO_getline_info () from /lib/libc.so.6

So we received the SIGSEGV signal from the operating system. This means that we tried to access an invalid memory address. Let's take a backtrace:

(gdb) backtrace
#0 0x4007fc13 in _IO_getline_info () from /lib/libc.so.6
#1 0x4007fb6c in _IO_getline () from /lib/libc.so.6
#2 0x4007ef51 in fgets () from /lib/libc.so.6
#3 0x80484b2 in main (argc=1, argv=0xbffffaf4) at segfault.c:10
#4 0x40037f5c in __libc_start_main () from /lib/libc.so.6

We are only interested in our own code here, so we want to switch to stack frame 3 and see where the program crashed:

(gdb) frame 3
#3 0x80484b2 in main (argc=1, argv=0xbffffaf4) at segfault.c:10
10 fgets(buf, 1024, stdin)

We crashed inside the call to fgets. In general, we can assume that library functions such as fgets work properly (if this isn't the case, we are in a lot of trouble). So the problem must be one of our arguments. You may not know that 'stdin' is a global variable that is created by the stdio libraries. So we can assume this one is ok. That leaves us with 'buf':

(gdb) print buf
$1 = 0x0

The value of buf is 0x0, which is the NULL pointer. This is not what we want - buf should point to the memory we allocated on line 8. So we're going to have to find out what happened there. First we want to kill the currently-running invocation of our program:

(gdb) kill
Kill the program being debugged? (y or n) y

Now set a breakpoint on line 8:

(gdb) break segfault.c:8
Breakpoint 1 at 0x8048486: file segfault.c, line 8.

Now run the program again:

(gdb) run
Starting program: /home/dgawd/cpsc/363/a.out

Breakpoint 1, main (argc=1, argv=0xbffffaf4) at segfault.c:8
8 buf = malloc(1<<31);

We're going to check the value of buf before the malloc call. Since buf wasn't initialized, the value should be garbage, and it is:

(gdb) print buf
$2 = 0xbffffaa8 "Иъяї#\177\003@t`\001@\001"

Now step over the malloc call and examine buf again:

(gdb) next
10 fgets(buf, 1024, stdin);
(gdb) print buf
$3 = 0x0

After the call to malloc, buf is NULL. If you were to go check the man page for malloc, you would discover that malloc returns NULL when it cannot allocate the amount of memory requested. So our malloc must have failed. Let's go back and look at it again:

7 : buf = malloc(1<<31);

Well, the value of the expression 1 << 31 (the integer 1 right-shifted 31 times) is 429497295, or 4GB (gigabytes). Very few machines have this kind of memory - mine only has 256MB. So of cousre malloc would fail. Furthermore, we are only reading in 1024 bytes in the fgets call. All that extra space would be wasted, even if we could allocate it. Change the 1<<31 to 1024 (or 1<<9), and the program will work as expected:

prompt >
Hello World!
Hello World!

prompt >

So now you know how to debug segmentation faults with gdb. This is extremely useful (I use it more often then I care to admit). The example also illustrated another very important point: ALWAYS CHECK THE RETURN VALUE OF MALLOC! Have a nice day.

Documentation

cpan tips

The Comprehensive Perl Archive Network (CPAN) hosts Perl modules and scripts. The CPAN module provides an interface to query and install modules hosted on CPAN. These pages document the setup and usage of CPAN, with attention to common tasks and pitfalls along the way.

Perl modules used in production must be saved locally, either under version control, a package system, or software depot. Doing so ensures the modules can be installed on new systems, or rebuilt on new platforms. This supports reproducible module installs over time, as CPAN modules may change or be removed without warning.

Use CPAN::Reporter to submit module test results. Try SmokeAuto for automated module testing.
Alternatives

Other means of installing perl modules from CPAN include the following, which are not discussed in detail here.

* CPANPLUS. “CPAN++ (also referred to and pronounced as CPANPLUS) is a new and flexible method of perl module management and installation using the CPAN. It aims to be a rewrite, and in time a replacement to the current CPAN module. In addition to fixing some long-standing problems, CPAN++ includes new features, such as module uninstall.”

* PAR - Cross-Platform Packaging and Deployment tool.

* Perl Package Manager (PPM) - Included with ActiveState Perl.

* Other. Some vendors make perl modules available via other methods, such as ActiveState’s PPM, the FreeBSD ports system, or similar package software. On Mac OS X, install XCode, or use on of the various package systems available. These offer tight integration with the system in question, but may include outdated modules or use different naming conventions and filesystem layouts.

For example, p5-libwww is the FreeBSD port of the module distribution libwww-perl on CPAN, and LWP::UserAgent a module found in that distribution. The following commands are equivalent means to install libwww-perl via the FreeBSD ports system, yum on RedHat systems, or the cpan command included with modern versions of CPAN.

# portinstall p5-libwww
# yum -y install perl-libwww-perl
# cpan -i LWP::UserAgent

To lookup the documentation for a module, use http://search.cpan.org/perldoc/Module::Name. The documentation should also show the distribution a module belongs to, for example Data::Dumper. Other search interfaces include the CPAN Search Site or the CPAN shell search features.
* Conversion scripts. cpan2rpm helps convert Perl modules into RedHat Package Manager (RPM) files. The resulting *.rpm packages can be installed via a local Yellowdog Updater, Modified (YUM) server. On Debian, try dh-make-perl --install --cpan Module::Name.

perl Upgrade Bug

The first time CPAN is invoked, it will run through a set of questions, such as which CPAN mirror site to use. Following the questions, CPAN may recommend that you install Bundle::CPAN or Bundle::libnet. Avoid installing these bundles!

Older versions of CPAN have a bug whereby a new version of perl itself may be installed. To avoid this bug, install the latest version of CPAN first. Setting the FTP_PASSIVE=1 environment variable avoids another common problem where active File Transfer Protocol (FTP) connections are blocked by a local firewall.

# perl -MCPAN -e '$ENV{FTP_PASSIVE} = 1; install CPAN'

If already in the CPAN shell following the preferences questions, use:

cpan> install CPAN

For reference, the bug is that Bundle::CPAN or Bundle::libnet may list Data::Dumper as a dependency. Data::Dumper is a built-in perl module, so when old versions of CPAN lookup Data::Dumper, they find a new version of perl to install. This bug has been fixed in current versions of CPAN.
Active FTP versus Firewalls

Network problems may prevent access to CPAN mirror sites. The chief problem will be utilities set to use active FTP, which most firewalls block. Peruse Active FTP vs. Passive FTP, a Definitive Explanation for more details. I recommend enabling passive FTP by default, though this setting will need to be done for a variety of utilities CPAN might run, such as wget or Net::FTP. Alternatives include disabling the client firewall to allow active FTP connections back from the FTP server, or configuring a local FTP proxy.

Set the FTP_PASSIVE environment variable before running cpan.

# env FTP_PASSIVE=1 cpan -i Net::FTP

During the install of the libnet distribution, enable passive FTP.

…
Ah, I see you already have installed libnet before.

Do you want to modify/update your configuration (y|n) ? [no] y
…
Should all FTP connections be passive (y|n) ? [no] y

If Net::FTP is already installed, edit the libnet.cfg configuration file to use passive FTP.

$ perl -le 'for (@INC) { $_ .= "/".$ARGV[0]; print if -f }' Net/libnet.cfg
/usr/local/lib/perl5/5.8.1/Net/libnet.cfg
$ grep passive /usr/local/lib/perl5/5.8.1/Net/libnet.cfg
'ftp_int_passive' => 1,

If Net::FTP does not work, or certain utilities take too long to time out, consider the following options.

* Disable Net::FTP.

If Net::FTP does not work, try adding the following line to the CPAN/Config.pm preferences file. More information on dontload_hash posted to the perl.perl5.porters newsgroup. Also consider installing LWP::UserAgent, which is more capable than Net::FTP.

'dontload_hash' => { 'Net::FTP' => 1 },
* Disable utilities that do not work

To reconfigure CPAN to not call certain utilities, replace the path to the utility with a single space.

$ grep ncftp ~/.cpan/CPAN/MyConfig.pm
'ncftp' => q[ ],
'ncftpget' => q[ ],

Preferences Questions

CPAN will ask for preferences settings any time the Config.pm preferences file does not exist or is out of date. The default choice should be acceptable for most.

* CPAN build and cache directory? [/root/.cpan]

I use /var/spool/cpan, as the root filesystem tends to be small on my systems, and the spool area makes more sense to me.
* Policy on building prerequisites (follow, ask or ignore)? [ask]

Leave this setting set to ask when upgrading CPAN the first time, to prevent dependency lookups automatically installing the latest stable perl. With ask set, one can say no to queries that ask to install Bundle::CPAN, Bundle::libnet, or Data::Dumper until the latest version of CPAN is installed.
* Where is your * program? [*]

This series of questions lets one configure (or disable) paths to various utilities CPAN may use to download or unpack modules. It is safe to leave entries blank, as long as CPAN is left with at least one working program that can download modules and another to expand them.

To disable a particular utility, use a single space instead of a file path.
* Parameters for the 'make install' command?

Set this option to include UNINST=1 if installing as root. This properly cleans out files from older versions of a module being installed. For user-specific setup, I never set the UNINST option, to avoid remove warnings of unremovable system files.
* Now we need to know where your favorite CPAN sites are located.

Choose several CPAN mirror sites. Use traceroute or bandwidth testing tools to determine which mirrors are best for your system. Or, mirror CPAN to a local system.

The mirrors list may need to be updated periodically, if a previous mirror site stops hosting CPAN. Symptoms of this will be slow downloads as CPAN fails though the mirror list.

Reconfiguring CPAN

To alter the CPAN preferences, either edit the Config.pm configuration file manually, or use the following command in the CPAN shell.

cpan> o conf init

The init configuration option runs through all the configuration questions, which may be time consuming. For example, other o conf commands can be used to list, remove, and add mirror sites, and then to save the changes to disk.

cpan> o conf urllist
urllist
ftp://ftp.kernel.org/pub/CPAN/
Type 'o conf' to view configuration edit options

cpan> o conf urllist shift

cpan> o conf urllist push ftp://ftp-mirror.internap.com/pub/CPAN/

cpan> o conf urllist
urllist
ftp://ftp-mirror.internap.com/pub/CPAN/
Type 'o conf' to view configuration edit options

cpan> o conf commit
commit: wrote /usr/local/lib/perl5/5.6.1/CPAN/Config.pm

To manually edit the existing configuration file, either open the user-specific ~/.cpan/CPAN/MyConfig.pm directly, or locate the system-wide configuration file (stored somewhere under the perl @INC path list) to edit with the following command.

$ perl -le 'for (@INC) { $_ .= $ARGV[0]; print if -f }' /CPAN/Config.pm
/System/Library/Perl/CPAN/Config.pm
Usage Tips
Invocation | Case Sensitive Searches | Install a Specific Module Version | autobundle | Old Modules | Mac OS X | Manual Build | sudo | Uninstalling | Upgrading

Once the latest version of CPAN has been installed, it is safe to install Bundle::CPAN (as older versions of CPAN would attempt to upgrade perl itself). If already in the CPAN shell, run reload cpan first.

cpan> reload cpan
…
cpan> install Bundle::CPAN

The following sections outline various uses and caveats with CPAN.
Invocation

The latest version of CPAN should install a cpan command somewhere on the system, which is easier to use than the traditional means of calling CPAN. With an up to date version of CPAN, the following sets of commands are equivalent. Hereafter, the shorter cpan method will be used.

# enter the CPAN shell
# perl -MCPAN -e shell
# cpan

# install the Acme::Bleach module
# perl -MCPAN -e install Acme::Bleach
# cpan -i Acme::Bleach

Depending on the shell in question, one may need to issue the hash -r or rehash command to make the new cpan command appear in the search path after the latest CPAN is installed. Consult your shell’s documentation for more information on whether this is required.
Case Sensitive Searches

The i /SEARCH/ command will search without case sensitivity. To perform a case sensitive search, disable the default case insensitivity via i /(?-i)SEARCH/. This and other regular expression options are detailed in perlre.
Install a Specific Module Version

The CPAN documentation covers how to install a specific module version. This is done by installing the specific distribution file for the module version:

$ cpan
cpan> i /Term::CallEditor/
…
Module id = Term::CallEditor
CPAN_USERID JMATES (Jeremy Mates )
CPAN_VERSION 0.11
CPAN_FILE J/JM/JMATES/Term-CallEditor-0.11.tar.gz
INST_FILE (not installed)

cpan> install J/JM/JMATES/Term-CallEditor-0.11.tar.gz
Running make for J/JM/JMATES/Term-CallEditor-0.11.tar.gz
…

Note that the author could delete a specific module version at any time. Therefore, do not rely on specific module versions being available on CPAN over time. If specific module versions are required, copy these into a local software depot or package system.
autobundle

The CPAN autobundle can be used to ease perl upgrades, by creating a special bundle containing all the installed modules of the current version of perl. This bundle can then be installed once the new version of perl is installed.

cpan> autobundle
…

Wrote bundle file
/var/spool/cpan/Bundle/Snapshot_2003_10_01_00.pm

cpan> quit

# upgrade perl here …

# cpan
cpan> install Bundle::Snapshot_2003_10_01_00

Autobundle appears to install the modules in alphabetical order; setting the prerequisites_policy policy to ask should help. Multiple install runs may be needed to install everything properly. Module build failures will lead to much recursion on the part of CPAN.
Old Modules

Some modules found by CPAN are out of date, and will not compile. An updated module may be available on CPAN, just not linked to as “current” by CPAN. Modules known to have this problem include the following. The links below should point to the current version of the modules.

* Mail::Cclient

* OpenGL

Mac OS X

Problems with CPAN on Mac OS X.
Manual Build

On occasion one may need to manually build or test a module, or consult the module directory to read the documentation on any special needs the module may have. This can be done with the CPAN shell look command.

cpan> look Net::SSLeay
…
Working directory is /var/spool/cpan/build/Net_SSLeay.pm-1.25
# exit
cpan>
sudo

If CPAN is installed in your home directory, the system-wide cpan may cause problems when run via sudo(8), as by default sudo will invoke CPAN using the ~/.cpan/CPAN/MyConfig.pm configuration, but as root, which causes no end of trouble. To avoid this problem, reset the HOME environment variable with the -H option to sudo when invoking CPAN.

$ sudo -H cpan
Uninstalling

CPAN itself offers no uninstall method. Several of the alternatives to CPAN do, such as CPANPLUS or vendor package/port systems. If the module has been fully installed along with a .packlist file for the module, the ExtUtils::Packlist perl module documentation includes code for a modrm command to cleanly remove an installed module.

Modules built with other tools, such as Module::Build, may not write out a .packlist file for use by ExtUtils::Packlist.
Upgrading

How to upgrade all the modules on the system like apt-get is actually documented in the CPAN manual; however, the question comes up often enough that I answer it here as well.

# install everything that is outdated on my disk:
# perl -MCPAN -e 'CPAN::Shell->install(CPAN::Shell->r)'

Problems with this include new versions of modules that may break old module behavior, or various modules that cannot be upgraded due to build failures or system incompatibilities. Best to subject a test system to this sort of mass module upgrade before attempting it on a production system.
Debugging Module Build Problems

See the Debugging CPAN Build Problems guide for more information.
Using CPAN with a non-root account

Notes on using CPAN with a non-root account, or installing Perl modules to directories besides the default system areas, such as a software depot or under $HOME. Consider also local::lib to bootsrap custom Perl module installation directories.

Documentation

Bandwidth Calculator

http://web.forret.com/tools/bandwidth.asp

Documentation

Linux Test Tools

http://ltp.sourceforge.net/tooltable.php

Documentation

RHEL Cluster

Once RH Cluster is installed, you'll find the tools in /usr/sbin. Here's the breakdown of availble tools:

* RH Cluster Administration Tools /usr/sbin/clustat: Display the current status of the cluster. (Sun Cluster equiv: scstat)
* /usr/sbin/clusvcadm: Change the state of the services, such as enabling, disabling, relocating, etc. (Sun Cluster equiv: scswitch)
* /usr/sbin/system-config-cluster: A Cluster Configuration GUI. It simplifies the creation of a cluster.conf as well as acting as a GUI management interface.

There are two things to note if your new to Red Hat Cluster. Firstly, you need to use a Fence Device (you can go without one but its highly frowned upon and unsupported by many vendors). Secondly, you do not require shared storage. The device typically used as a Fence Device is an APC MasterSwitch. In the event that a node is unresponsive a surviving node can (don't laugh) power cycle its partner. This method is also apparently used in some failover situations to ensure that the node wasn't doing anything it shouldn't be doing prior to failover. In other clusters, a quorum device is typically needed, but not in RH Cluster (new in version 4 apparently), which means that you don't require shared storage for cluster operation, which can be a benefit if you don't actually need to store anything on shared storage.

The cluster configuration is stored in a single file as XML: /etc/cluster/cluster.conf. You can configure a new cluster by either creating the cluster.conf by hand, using a pre-existing one, or using the /usr/sbin/system-config-cluster GUI tool. Using the GUI is, of course, the supported method.

Cluster configuration consists of the following componants:

* Cluster Nodes: Nodes that are members of the cluster, also specified here is the number of votes that node has and which fence device port that controls that node.
* Fence Devices: One or more Fence devices, such as an APC MasterSwitch, including the IP address, username and password that can be used to login to and control the fence device.
* Failover Domains: Defines a logical grouping of nodes which can fail over to each other
* Shared Resources: A resource used by a cluster service, such GFS, a shared filesystem, ip address, NFS resource, script, or Samba service.
* Services: An HA service provided by the cluster, which combines together shared resources within a failover domain utilizing one or more nodes and their associated fence device.

Perhaps the most important of these is the "Script" shared resource. This script is a standard RC script (such as those in /etc/init.d) that aceepts at least 3 arguments: start, stop, and status (or monitor). When a cluster service is started the appropriate node is selected, and the shared resources given to it, such as mounting a shared filesystem and assuming a shared IP address. It then runs the script to start the service. Then, every 30 seconds, it runs the script with the "status" argument to monitor whether or not the service is indeed still online. In the event of a graceful failover the stop argument is given to the script to close it, before moving all the resources to the new node and starting it there.

The whole setup is pretty flimsy in comparison to other HA suites such as IBM's HACMP and Sun's SunCluster. Its akin to tying dental floss between two nodes. Using a network PDU is like holding a gun to the head of each node: answer me or else. You'll notice that there are no explicit interconnects.

[root@zimbra4 cluster]# clustat
Member Status: Quorate

Member Name Status
------ ---- ------
zimbra4.XX Online, Local, rgmanager
zimbra5.XX Online, rgmanager
zimbra6.XX Online, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
webmail1.XX zimbra4.XX started
webmail2.XX zimbra5.XX started

Although it might be flimsy, it does work well in some situations. Because you don't need explicit interconnects and don't require a shared quorum device it means that very little pre-planning is needed for a simple cluster setup, so long as you've got a MasterSwitch handy. If you, for instance, wanted to setup an HA Apache service, you'd just use the /etc/init.d/httpd script, add a shared IP, and then share your htdocs/ on, say, and NFS mount point which is setup as a shared resource, edit your httpd.conf for the right htdocs/ directory and your basically done. Of course, when doing this, make sure you don't allow Apache to startup on boot by itself (chkconfig off httpd).

So for small services it might work well, but would I run Oracle or DB2 on it? Not a chance in hell. Here are my gripes:

1. Shared IP's don't show up as IP aliases in ifconfig. This has got to be a bug. If a shared IP is present, I should see its address in ifconfig as eth0:1 or something, but you don't. This makes checking the current location of the address difficult (ie: telnet/ssh to it and see where you end up.) This seems to be due to the fact that RH Cluster doesn't tie shared IP's to specific interfaces, which in and of itself, is problematic imho. Either way, it still would be nice if it showed up as like "clu1" or something.
2. Shared IP Address "Drift". I have run into numberous problems with the shared IP just drifting to its failover partner. The shared storage doesn't move and the service itself doesn't move, just the IP, which means that service is effectively down, although the cluster is totally unaware of the problem (as checked with clustat). To resolve the issue I've got to disable the service completely and then restart it on the appropriate node (ie: clusvcadm -d mysvc & clusvcadm -e mysvc -m node1).
3. Unexpected shutdown of a service: Things are humming along fine and then I get a call from QA, service is down. If it wasn't IP drift it would be an unexpected failover or shutdown of the service. clustat may or may not know whats going on in these cases, and often in the case of a failover reported that the service was still running on the previous (pre-failover) node when in fact is was not.

I just can't find anything to like about Red Hat Cluster Suite. If I wanted a light cluster solutions I'd opt for something thats tried and true and enterprise grade, such as Veritas Cluster Suite. If you want a totally integrated and comprehensive clustering solution, Sun Cluster is the way to go, hands down, but that requires Solaris, and thus doesn't really apply here.

I'm aware that some of these issues listed above may be unresolved bugs, some may be monitor issues, etc. But this is supposed to be an enterprise ready suite that I paid a lot of money for and it just doesn't act like one. Some of these issues are possibly due to Zimbra's monitoring scripts, but reguardless I'm bothered that RH Cluster doesn't have a way to deal with these situations like a true solution (say, Sun Cluster or HACMP) does. Couple this with the fact that the documentation is some of the worst I've ever seen. Flip through the docs here.

UPDATE: I've been digging around the source for RH Cluster this afternoon. Aprarently although ifconfig won't show you the shared IP, ip (yes, thats a command) will. Example:

[root@zimbra4 cluster]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:50:8B:D3:8D:51
inet addr:10.10.0.144 Bcast:10.10.3.255 Mask:255.255.252.0
inet6 addr: fe80::250:8bff:fed3:8d51/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:25040869 errors:0 dropped:0 overruns:0 frame:0
TX packets:18583752 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1373798465 (1.2 GiB) TX bytes:893112790 (851.7 MiB)

eth1 Link encap:Ethernet HWaddr 00:50:8B:D3:8D:5B
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2757771 errors:0 dropped:0 overruns:0 frame:0
TX packets:2757771 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:296459762 (282.7 MiB) TX bytes:296459762 (282.7 MiB)

sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

[root@zimbra4 cluster]# ip addr list
1: lo: mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:8b:d3:8d:51 brd ff:ff:ff:ff:ff:ff
inet 10.10.0.144/22 brd 10.10.3.255 scope global eth0
inet 10.10.0.147/32 scope global eth0 <--- Thats the Shared IP

inet6 fe80::250:8bff:fed3:8d51/64 scope link
valid_lft forever preferred_lft forever
3: eth1: mtu 1500 qdisc noop qlen 1000
link/ether 00:50:8b:d3:8d:5b brd ff:ff:ff:ff:ff:ff
4: sit0: mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0

As for the drifting IP address problem... I started to wonder if it might be because of the way Red Hat Cluster monitors the interface. If it was doing a ping test, it would explain what I've been seeing, because the address would in fact be online, it just isn't on the right system. Looking at rgmanager/src/resources/ip.sh it appears that this is exactly the problem. Why its drifting in the first place, I can't say, but clearly Red Hat Cluster's method of monitoring the links is open to some serious issues.

Documentation

using suexec for apache

The challenge with securing a shared hosting server is how to secure the website from attack both from the outside and from the inside. PHP has built-in features to help, but ultimately it’s the wrong place to address the problem.

So what can Apache do to help?

It turns out that there are quite a few alternative ways that Apache can help. This article will look at what we can do with stock Apache, and the next few articles will look at what we can do with some interesting third-party Apache modules.

* suexec: Running CGI Programs As A Specified User
* Configuring Apache With PHP/CGI
* Configuring suexec With PHP/CGI
* Configuring suexec For Shared Servers
* Some Benchmarks
* Other Considerations
* Conclusions

suexec: Running CGI Programs As A Specified User

To secure a shared hosting server, we want to be able to run PHP as the user who owns that particular website. One way to do this with stock Apache is with suexec.

suexec is a standard Apache module which allows you to run a CGI executable as a specified user and group. CGI executables date back to the very early days of the web, back when we all had to use Perl to create dynamic websites. Although PHP is commonly run as an Apache module, it still provides support for CGI.

Check with your Linux vendor to make sure that you have PHP/CGI installed on your box.
Configuring Apache With PHP/CGI

The first step for getting suexec working is to configure Apache to run PHP as a CGI executable, instead of using mod_php. Add the following configuration to your httpd.conf file:

ScriptAlias /php5-cgi /usr/bin/php-cgi
Action php5-cgi /php5-cgi
AddHandler php5-cgi .php
AddDirectoryIndex index.php

… and add the following line to your virtual host:

AddHandler php5-cgi .php

In your httpd.conf file (or in one of the files that httpd.conf includes), there will be a entry for the directory on disk where your virtual host is stored. Inside that entry, there should be an “Options” line, which might look like this:

Options Indexes FollowSymLinks

Add “ExecCGI” to the end of your Options line.

Make sure to comment out mod_php from Apache. Then, restart Apache, and do some testing to make sure that PHP 5 is working.

For reference, here is the Apache config from my test system:

ScriptAlias /php5-cgi /usr/bin/php-cgi
Action php5-cgi /php5-cgi
AddHandler php5-cgi .php
AddDirectoryIndex index.php index.phtml

DocumentRoot /var/www/localhost/htdocs

Options Indexes FollowSymLinks ExecCGI
AllowOverride All
Order allow,deny
Allow from all

AddHandler php5-cgi .php

Configuring suexec For PHP/CGI

With Apache now running PHP as a CGI executable, we’re ready to get Apache running PHP as the owner of each website.

In your test virtual host, add the following:

SuexecUserGroup stuart users

Replace “stuart” with the user who owns the website, and replace “users” with the group that the user belongs to. This sets the privileges that PHP will run as.

To ensure the security of your server, suexec is very particular about what conditions must be met before it will execute your PHP scripting engine. A full list of conditions can be found in the Apache docs. To make sense of the conditions, you’ll need to know what settings your copy of suexec has been compiled with. Run the command suexec -V to find out your system’s settings. This is the output from my Seed Linux LAMP Server system:

belal vhosts.d # suexec -V
-D AP_DOC_ROOT="/var/www"
-D AP_GID_MIN=100
-D AP_HTTPD_USER="apache"
-D AP_LOG_EXEC="/var/log/apache2/suexec_log"
-D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
-D AP_SUEXEC_UMASK=077
-D AP_UID_MIN=1000
-D AP_USERDIR_SUFFIX="public_html"

The first condition (and one that isn’t obvious from the Apache manual!) is that the PHP CGI executable must be installed under AP_DOC_ROOT. Chances are that it isn’t installed there at the moment, so go ahead and copy it there.

mkdir /var/www/localhost/cgi-bin
cp /usr/bin/php-cgi /var/www/localhost/cgi-bin

The second condition is that the PHP CGI executable must be owned by the same user and group you listed in the SuexecUserGroup statement earlier. This causes problems for shared hosting; I’ll show you how to fix that later in this article.

chown stuart users /var/www/localhost/cgi-bin/php-cgi

Update your Apache httpd.conf file to use this copy of PHP:

ScriptAlias /php5-cgi /var/www/localhost/cgi-bin/php-cgi

Restart Apache, and test to make sure that PHP 5 is still working. You should also start to see log messages appearing in AP_LOG_EXEC. This is the first place to look if PHP isn’t working (although the log messages can be a little terse and cryptic).

For reference, here is the Apache config from my test system:

ScriptAlias /php5-cgi /var/www/localhost/cgi-bin/php-cgi
Action php5-cgi /php5-cgi
AddHandler php5-cgi .php
AddDirectoryIndex index.php index.phtml

DocumentRoot /var/www/localhost/htdocs

Options Indexes FollowSymLinks ExecCGI
AllowOverride All
Order allow,deny
Allow from all

SuexecUserGroup stuart users
AddHandler php5-cgi .ph

Configuring suexec For Shared Servers

I mentioned earlier that there was a problem with using suexec + PHP/CGI on shared servers - the very environment where suexec is needed the most :( In one of the steps above, we created a copy of the PHP CGI executable, and changed its ownership on disk to match the ownership of the website.

chown stuart users /var/www/localhost/cgi-bin/php-cgi

What happens when we have two websites, each owned by a different user? Or five, or ten, or hundreds? Apache’s suexec will refuse to re-use this copy of the PHP CGI executable for each of the websites, because it isn’t owned by the right user and group.

Each website needs its own copy of the PHP CGI executable, owned by the user and group that owns the website itself. We don’t want to create hundreds of copies of the actual PHP CGI executable (it’s a large waste of space, and a pain for managing PHP upgrades), so instead we can point each website at its own copy of a simple bash script:

#!/bin/bash

/usr/bin/php-cgi "$@"

This script simply executes our central copy of the PHP CGI executable, passing through whatever parameters Apache has called the bash script with.

To configure Apache to use this script, simply move the ScriptAlias statement from outside the VirtualHost config to inside.
Some Benchmarks

Because Apache is having to execute a new suexec process every page hit (and suexec executes a new PHP CGI process every page hit), it’s going to be slower than running mod_php. But how much slower? To find out, I used Apache’s ab benchmarking program to load a phpinfo() page 1000 times. I ran the benchmark five times and averaged out the results.

* suexec: average of 127.219 seconds
* suexec + bash script: average of 134.836 seconds
* mod_php: average of 3.753 seconds

suexec on its own is some 34 times slower than using mod_php; suexec + the bash script needed for shared hosting environments is even worse, at 36 times slower than using mod_php.

This benchmark doesn’t provide the full picture. Once you take into account the extra memory used by the suexec method, and the extra memory and CPU (and process context switches!) required to transfer output from PHP/CGI to Apache to send back to the website’s user, the final cost of using suexec + PHP/CGI will be substantially higher.
Other Considerations

Performance isn’t the only thing to think about when evaluating suexec + PHP/CGI.

* suexec + PHP/CGI does solve the security challenge, without requiring your application to support safe_mode.
* HTTP authentication is only supported by mod_php, not PHP/CGI. If your application relies on this, then suexec + PHP/CGI is not for you.

Conclusions

Apache’s suexec mechanism does secure a shared hosting server from attack from within. However, this is achieved at a heavy performance cost, which inevitably will translate into needing lots of extra servers - which is expensive.

So, if Apache itself doesn’t come with a solution that’s worth a damn, maybe there are third-party solutions out there that can do a better job? The next article in the series will take a look at what others have done to try and plug this gap.

Documentation

debugging sendmail

Debugging sendmail problems

There are many tools available for debugging problems with
sendmail. For the most part, standard types of configurations
shouldn't provide any problems. However, for complex configurations
requiring the rewriting of address parsing rules, problems may arise
and debugging may be necessary.

Verbose Delivery

First of all, the standard mail utility, mail (or
Mail, or mailx) can be invoked with the -v flag
to turn on verbose delivery. Verbose delivery shows the steps that
sendmail takes in delivering the mail message. For example, the
following is a transcript from an unsuccessful mail connection:


fard@uwsg.indiana.edu... Connecting to herbie.ucs.indiana.edu. via smtp...
220 herbie.ucs.indiana.edu ESMTP Sendmail 8.7.5/8.6.12 ready at Thu, 18 Jul 1996 09:16:24 -0500 (EST)
>>> EHLO outland.uwsg.indiana.edu
250-herbie.ucs.indiana.edu Hello outland.uwsg.indiana.edu [129.79.17.234], pleased to meet you
250-EXPN
250-8BITMIME
250-SIZE
250-DSN
250-VERB
250-ONEX
250 HELP
>>> MAIL From:<brier@outland.uwsg.indiana.edu> SIZE=28
250 <brier@outland.uwsg.indiana.edu>... Sender ok
>>> RCPT To:<fard@uwsg.indiana.edu>
550 <fard@uwsg.indiana.edu>... User unknown
fard@uwsg.indiana.edu... User unknown

>>> RSET
250 Reset state
/home/brier/dead.letter... Saved message in /home/brier/dead.letter
/home/brier/dead.letter... Closing connection to herbie.ucs.indiana.edu.
>>> QUIT
221 herbie.ucs.indiana.edu closing connection

This transcript provides a lot of information about what happened
with this mail message. The first line shows that the message was
intended for the address fard@uwsg.indiana.edu, but that the
SMTP connection was made to the machine herbie.ucs.indiana.edu.
A little investigation with nslookup shows that
uwsg.indiana.edu is a mail exchange record in the Domain Name
System (DNS) for the machine herbie.ucs.indiana.edu, so all
mail sent to uwsg.indiana.edu is actually delivered to

herbie.ucs.indiana.edu.

Most of the remaining lines show the SMTP conversation between the
two hosts. Much of this conversation is standard, but at line 15
there is an error: 550 <fard@uwsg.indiana.edu>... User unknown. Here is our problem. There is no user fard on
herbie.ucs.indiana.edu, and so the mail was not delivered.

Invoking sendmail In Verify
Mode

One useful command line option of sendmail is -bv.
This option puts sendmail into verify mode; it verifies
addresses only, it does not deliver them. This is a good method for
validating users or mailing lists.

For example, the following invocation:

$ /usr/lib/sendmail -bv brier
brier... deliverable: mailer local, user brier

shows us that the address brier is a valid address and
will be delivered locally. The next invocation:

$ /usr/lib/sendmail -bv ombuds
ombuds... deliverable: mailer smtp, host indiana.edu, user ombuds@indiana.edu

shows that the address ombuds is also a valid address, but
instead of being delivered locally, it will be delivered to
ombuds@indiana.edu.

Invoking sendmail In Test Mode

sendmail has a test mode which only performs and displays
address rewriting, it does not actually deliver any mail. This test
mode is an excellent resource for researching what sendmail

does to an address during it's rewriting stage, and where mail sent to
that address ends up being transported. Test mode can also be used to
test changes made to a configuration file, especially changes to the
address rewriting rules.

Test mode is invoked with the -bt option to
sendmail. Since it is used often with experimental .cf
files, it is often invoked in conjunction with the -C option
which allows specification of an alternate .cf file. For
example:

/usr/lib/sendmail -bt -Ciu-standalone.cf

would invoke sendmail in test mode, using the configuration
file iu-standalone.cf in the current working directory, rather
than the default /etc/sendmail.cf.

Using sendmail in test mode requires some knowledge about
how sendmail rewrites addresses to allow them to be delivered.
Briefly, the .cf file contains a number of rules for
rewriting addresses. These rules are gathered up into rulesets

which perform some task. So, a ruleset is a collection of rules
responsible for a specific purpose in the address rewriting process,
for example, canonifying names. These sendmail rulesets are
numbered (0-99 in V8). The details of rewriting rules are a bit
outside the scope of this course. For more information, check out the
optional reference, Open Computing "Hands-On"
Tutorial: January 1994, href="/usail/cgi-bin/signpost?http://www.unixworld.com/uworld/archives/94/tutorial/01/01.txt.html">Explosion
in a Punctuation Factory by Bryan Costales, or
Bryan's book, href="/usail/cgi-bin/signpost?http://www.ora.com/catalog/sendmail/">sendmail,
also called "the bat book".

In test mode, sendmail requires the user to specify an
address to rewrite, and which rulesets through which that address
should be sent. Without knowing a whole lot about rulesets and a mail
message's flow through them, it is a little hard to know which
rulesets to use. For general purposes, rulesets 3 and 0 give good
information. Both of these rulesets are run on every message that goes
through sendmail (ruleset 3 is the very first one, and ruleset
0 is the very last). For example, here is a sample transcript of a
sendmail in test mode:

$ /usr/lib/sendmail -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>

> 3,0 uwsg@uwsg
rewrite: ruleset  3   input: uwsg @ uwsg
rewrite: ruleset 96   input: uwsg < @ uwsg >
rewrite: ruleset 96 returns: uwsg < @ herbie . ucs . indiana . edu . >
rewrite: ruleset  3 returns: uwsg < @ herbie . ucs . indiana . edu . >
rewrite: ruleset  0   input: uwsg < @ herbie . ucs . indiana . edu . >

rewrite: ruleset 98   input: uwsg < @ herbie . ucs . indiana . edu . >
rewrite: ruleset 98 returns: uwsg < @ herbie . ucs . indiana . edu . >
rewrite: ruleset 95   input: < > uwsg < @ herbie . ucs . indiana . edu . >
rewrite: ruleset 95 returns: uwsg < @ herbie . ucs . indiana . edu . >

rewrite: ruleset  0 returns: $# smtp $@ herbie . ucs . indiana . edu . $: uwsg < @ herbie . ucs . indiana . edu . >

Ruleset 3 canonifies the address, invoking ruleset 96 to do part of
it's work. First it puts angle brackets around the host part of the
address (this notation is for sendmail's internal use, they are
stripped off before sending), then it calls ruleset 96 to look up the
hostname and convert it, if needed, to a fully qualified domain name.
Note that ruleset 96 also returns the true name of the host. In this
case, internally, uwsg became uwsg.indiana.edu, which
was discovered to be an MX record and became the true hostname,
herbie.ucs.indiana.edu.

Ruleset 0 is in charge of doing final parsing of addresses. In this
example, ruleset 0 calls ruleset 98 (which is empty by default, so
does nothing), then calls ruleset 95, which resolves addresses to the
triple that sendmail expects as a final result. This triple
consists of the mailer to use (in this case smtp as denoted
by $#), the address to which the message should be
transported (in this case herbie.ucs.indiana.edu as denoted by
$@, and the address to put on the message (in this case
uwsg < @ herbie . ucs . indiana . edu . > as denoted by

$:).

**Table 1**

**8.7 Test Mode Extensions**
`/canon host`	Canonify host name.
`/mx host`	Look up MX records.
`/map mapname key`	Do Map lookup.
`/parse address`	Parse address.
`/try mailer address`	Externalize address.
`/tryflags [HE][SR]`	Tweak try.
`.Dx value`	Define macro.
`.Cc value`	Extend class.
`=Sruleset`	Print ruleset.
`=M`	Print mailers.
`$m`	Dump macro.
`$=c`	Dump class.
`-ddebugflags`	Set debug flags.

To summarize, with this sendmail configuration, a mail sent
to uwsg@uwsg will have it's address rewritten to

uwsg@herbie.ucs.indiana.edu and will be delivered to
herbie.ucs.indiana.edu by the smtp mailer.

Beginning with version 8.7, sendmail supports a set of
extensions to test mode for making certain debugging operations
easier. These options are summarized in Table 1 (also available as href="table1.txt">plain text).

sendmail Command Line
Debugging Options

**Table 2**

**Useful *sendmail* debugging flags**
`-d21.2`	Shows rewriting of addresses.
`-d21.12`	Shows more detailed rewriting of addresses.
`-d60`	Shows map lookups.
`-d11`	Shows mailer invocations
`-d8.7`	Shows DNS chat.
`-d27.4`	Shows alias, forward, and include traffic.
`-d28.4`	Shows user database traffic.
`-d12`	Shows remote name editing.
`-d0.1`	Shows compile flags and system identity.
`-d0.4`	Shows proving for local names.
`-d0.10`	Shows OS-dependent compile flags.

sendmail has a -d flag for debugging options. For
essential information on sendmail's command line debugging
options, see href="/usail/cgi-bin/signpost/badbuttons?http://www.uwsg.indiana.edu/usail/mail/op/op-sh-3.4.html">Sendmail
Installation and Operation Guide - Debugging.

The information reported by these debugging options is often very
low-level and may not always be helpful without the source at hand.
However, Table 2 (a plain text version is
available) shows a list of some of the more useful debugging flags.
The debugging options are most often used in tandem with the
-v option which turns on verbose delivery for sendmail
(the same as it does for mail).

Terms used: SMTP, mail
exchange, DNS, sendmail, sendmail.cf.

Documentation

perl special variables quick reference

$_ The default or implicit variable.
@_ Subroutine parameters.
$a
$b sort comparison routine variables.
Regular Expressions
$ Regexp parenthetical capture holders.
$& Last successful match (degrades performance).
$` Prematch for last successful match string (degrades performance).
$' Postmatch for last successful match string (degrades performance).
$+ Last paren match.
$^N Last closed paren match.
@+ Offsets of ends of successful submatches in scope.
@- Offsets of starts of successful submatches in scope.
$* Boolean for multi-line matching. Deprecated. Use /s and /m.
$^R Last regexp (?{code}) result.
IO and Separators
$. Current line number (or record number) of most recent filehandle.
$/ Input record separator.
$| Output autoflush. 1=autoflush, 0=default
$, Output field separator (lists)
$\ Output record separator.
$" Output list separator. (interpolated lists)
$; Subscript separator. (Use a real multidimensional array instead.)
Formats
$# Output format for printed numbers (deprecated).
$% Page number for currently selected output channel.
$= Current page length.
$- Number of lines left on page.
$~ Format name.
$^ Name of top-of-page format.
$: Format line break characters
$^L Form feed (default "\f").
$^A Format Accumulator
Status Reporting
$? Child error. Status code of most recent system call or pipe.
$! Operating System Error. (What just went 'bang'?)
%! Error number hash
$^E Extended Operating System Error (Extra error explanation).
$@ Eval error.
ID's and Process Information
$$ Process ID
$< Real user id of process.
$> Effective user id of process.
$( Real group id of process.
$) Effective group id of process.
$0 Program name.
$^O Operating System name.
Perl Status Info
$] Version and patch number of perl interpreter.
$^C Current value of flag associated with -c switch.
$^D Current value of debugging flags
$^F Maximum file descriptor.
$^I Value of the -i (inplace edit) switch.
$^M Emergency Memory pool.
$^P Internal variable for debugging support.
$^R Last regexp (?{code}) result.
$^S Exceptions being caught. (eval)
$^T Base time of program start.
$^V Perl version.
$^W Status of -w switch
$^X Perl executable name.
Command Line Args
ARGV Filehandle iterates over files from command line (see also <>).
$ARGV Name of current file when reading <>
@ARGV List of command line args.
ARGVOUT Output filehandle for -i switch
Miscellaneous
@F Autosplit (-a mode) recipient.
@INC List of library paths.
%INC Keys are filenames, values are paths to modules included via use, require, or do.
%ENV Hash containing current environment variables
%SIG Signal handlers.
$[ Array and substr first element (Deprecated!).

Documentation

lilo

LILO - Generic Boot Loader for Linux ("LInux LOader") by Werner Almesberger
===========================================================================

Version 21-4 (release) -- John Coffman

Minor changes suggested by Werner to show that this release is derived
from the source code to his version 21. Added VERSION_MAJOR and
VERSION_MINOR to replace VERSION. The file VERSION is replaced by
'version.h'.

The first and second stage loaders have been modified so that ONLY
'lba32' will use EDD packet calls. 'linear' will now always ask the
BIOS for the disk geometry, and then use C:H:S addressing.

Version 22 (beta) -- John Coffman

This version was created to allow booting on disks larger than 8.4Gb using
the Enhanced BIOS call (int 0x13, AH=0x42) and the packet-call interface.
These calls are supported on post-1998 systems, and through software BIOS
extensions such as EZ-DRIVE(tm).

My primary objective in making these changes to LILO, is not to break any-
thing. Hence, the 32-bit direct addressing of sectors is supported by a
new keyword in the 'lilo.conf' file: "lba32". This keyword is mutually
exclusive with the keyword "linear". On the command line, these options
are invoked with the new "-L" switch, or the old "-l" switch, respectively.

Apologies to Werner Almesberger for not communicating these changes/addi-
tions to him directly; but I have not been able to contact him by e-mail
at the address he provided in the latest 'lilo.lsm' file. I do now want
versions of LILO to get out-of-step.

Disk sector addresses are conveyed from the Map Installer (lilo executable)
to the boot loaders, first- and second-stages through a 5 byte structure:

sector [1..63] plus 2 high bits of Cylinder
cylinder [0..1023] low eight bits in this byte
device [0..3] for floppies, [0x80..0x8F] for hard disks
head [0..254] no, the max is 254, not 255
count [1...] number of sectors to transfer

The first two bytes are normally loaded into the CX register, the second
two
bytes into the DX register, and the last byte into the AL register. This
is
the call used for the C:H:S addressing scheme of the original IBM-PC BIOS.
LILO checks that the 64Kb DMA boundary is never crossed by a call, and that
the count never exceeds the sector count of one track on the disk.

When 'linear' is specified, a 24-bit, 0 based addressing scheme is
employed.
The low 8 bits are in 'sector'; the middle 8 bits in 'cylinder'; and the
high
8 bits are in 'head'. To flag this as a 'linear' address, the 'device'
byte
has bit 6 set (mask 0x40 or'ed in). The count field is a maximum of 128,
since anything greater would cross a 64Kb DMA boundary. It is up to the
loader code (second stage only), to check that no track boundary is
crossed.

The situation is more complicated with 'lba32' addressing. It is flagged
with
bit 5 of the 'device' byte being set (mask 0x20 or'ed in). The 32-bit
address
is spread over the 'sector', 'cylinder', 'head', and 'count' fields, from
low
byte to high byte, respectively. Whenever this full 32-bit address is
specified explicitly, the actual sector count to transfer is implied to be
one (1), and this fact is flagged in the 'device' byte by setting bit 5
(mask 0x10 or'ed in). When a multi-sector transfer is called for, the high
8-bits of the address are NOT specified explicitly, but are assumed to be
the same as the previous transfer, the 'count' of sectors to transfer will
be in the range [2..127], and bit 5 (mask 0x10) of the 'device' byte will
be
clear.

The first-stage loader uses single sector transfers only, so it uses a
simplified disk read routine, always assumes a sector transfer count of 1,
and always assumes that the full 32-bit address of the sector is specified
when 'lba32' mode is detected in the 'device' byte. However, the second-
stage loader is capable of multi-sector transfers when map-compaction has
been used (-c switch, or 'compact' global option), so it uses the fully
capable read routine to load the -initrd- image, and the -kernel- image.

Both 'linear' and 'lba32' will use the Enhanced BIOS packet calls, if they
are available. Otherwise, the disk address is converted to C:H:S, using
the
disk geometry returned by (int 0x13, AH=8). If cylinder overflow occurs --
i.e., cylinder > 1023, then error code '9f' is issued.

The BIOS calls used to implement large disk booting conform to the Enhanced
Disk Drive Specification, version 3.0, rev 0.8, dated March 12, 1998. This
document is available on-line from Phoenix Technologies Ltd., at:

http://www.phoenix.com/products/specs.html

Known bugs:
The chain loader, 'os2_d.b', still will not boot OS2 4.0 from a
secondary
partition on my "D:" drive. Boot Manager can, and I am still working on
the problem.
----------------------------------------------------------------------------

Version 21

Important: The file INCOMPAT contains vital (in)compatibility information
for this release of LILO. Read it before proceeding.

Installing boot loaders is inherently dangerous. Be sure to have some means
to boot your system from a different media if you install LILO on your hard
disk.

There is also a LaTeX version of this document in the "doc" directory.
It is much nicer to read than pure ASCII.

Installation
------------

Please read the file INCOMPAT for compatibility notes.

The installation procedure is described in the section "Normal first-time
installation". Please read "Booting basics" for the whole story.

*** QUICK INSTALLATION ***

If you want to install LILO on your hard disk and if you don't want
to use all its features, you can use the quick installation script.
Read QuickInst for details.

LILO is a versatile boot loader for Linux. It does not depend on a
specific file system, can boot Linux kernel images from floppy disks and
from hard disks and can even act as a "boot manager" for other operating
systems.*

* PC/MS-DOS, DR DOS, OS/2, Windows 95, Windows NT, 386BSD, SCO UNIX,
Unixware, ...

One of up to sixteen different images can be selected at boot time. Various
parameters, such as the root device, can be set independently for each
kernel. LILO can even be used as the master boot record.

This document introduces the basics of disk organization and booting,
continues with an overview of common boot techniques and finally describes
installation and use of LILO in greater detail. The troubleshooting section
at the end describes diagnostic messages and contains suggestions for most
problems that have been observed in the past.

Please read at least the sections about installation and configuration if
you're already using an older version of LILO. This distribution is
accompanied by a file named INCOMPAT that describes further
incompatibilities to older versions.

For the impatient: there is a quick-installation script to create a simple
but quite usable installation. See section "Quick installation" for
details.

But wait ... here are a few easy rules that will help you to avoid most
problems people experience with LILO:

- _Don't panic._ If something doesn't work, try to find out what is
wrong, try to verify your assumption and only then attempt to fix it.
- Read the documentation. Especially if what the system does doesn't
correspond to what you think it should do.
- Make sure you have an emergency boot disk, that you know how to use it,
and that it is always kept up to date.
- Run /sbin/lilo _whenever_ the kernel or any part of LILO, including its
configuration file, has changed. When in doubt, run it. You can't run
/sbin/lilo too many times.
- If performing a destructive upgrade and/or erasing your Linux
partitions, de-install LILO _before_ that if using it as the MBR.
- Don't trust setup scripts. Always verify the /etc/lilo.conf they create
before booting.
- If using a big disk, be prepared for inconveniences: you may have to
use the LINEAR option.

System overview
---------------

LILO is a collection of several programs and other files:

The map installer is the program you run under Linux to put all files
belonging to LILO at the appropriate places and to record information
about the location of data needed at boot time. This program normally
resides in /sbin/lilo. It has to be run to refresh that information
whenever any part of the system that LILO knows about changes, e.g.
after installing a new kernel.
Various files contain data LILO needs at boot time, e.g. the boot
loader. Those files normally reside in /boot. The most important files
are the boot loader (see below) and the map file (/boot/map), where the
map installer records the location of the kernel(s).* Another important
file is the configuration file, which is normally called /etc/lilo.conf
The boot loader is the part of LILO that is loaded by the BIOS and that
loads kernels or the boot sectors of other operating systems. It also
provides a simple command-line interface to interactively select the
item to boot and to add boot options.

* LILO does not know how to read a file system. Instead, the map
installer asks the kernel for the physical location of files (e.g. the
kernel image(s)) and records that information. This allows LILO to work
with most file systems that are supported by Linux.

LILO primarily accesses the following parts of the system:

The root file system partition is important for two reasons: first, LILO
sometimes has to tell the kernel where to look for it. Second, it is
frequently a convenient place for many other items LILO uses, such as
the boot sector, the /boot directory, and the kernels.
The boot sector contains the first part of LILO's boot loader. It loads
the much larger second-stage loader. Both loaders are typically stored
in the file /boot/boot.b
The kernel is loaded and started by the boot loader. Kernels typically
reside in the root directory or in /boot.

Note that many of the files LILO needs at boot time have to be accessible
with the BIOS. This creates certain restrictions, see section "BIOS
restrictions".

Introduction
============

The following sections describe how PCs boot in general and what has to be
known when booting Linux and using LILO in particular.

Disk organization
-----------------

When designing a boot concept, it is important to understand some of the
subtleties of how PCs typically organize disks. The most simple case are
floppy disks. They consist of a boot sector, some administrative data (FAT
or super block, etc.) and the data area. Because that administrative data
is irrelevant as far as booting is concerned, it is regarded as part of the
data area for simplicity.

+---------------------------+
|Boot sector| |
|-----------+ |
| |
| Data area |
| |
| |
+---------------------------+

The entire disk appears as one device (e.g. /dev/fd0) on Linux.

The MS-DOS boot sector has the following structure:

+------------------------+
0x000 |Jump to the program code|
|------------------------|
0x003 | |
| Disk parameters |
| |
|------------------------|
0x02C/0x03E | |
| Program code |
| |
| |
|------------------------|
0x1FE | Magic number (0xAA55) |
+------------------------+

LILO uses a similar boot sector, but it does not contain the disk
parameters part. This is no problem for Minix, Ext2 or similar file
systems, because they don't look at the boot sector, but putting a LILO
boot sector on an MS-DOS file system would make it inaccessible for MS-DOS.

Hard disks are organized in a more complex way than floppy disks. They
contain several data areas called partitions. Up to four so-called primary
partitions can exist on an MS-DOS hard disk. If more partitions are needed,
one primary partition is used as an extended partition that contains
several logical partitions.

The first sector of each hard disk contains a partition table, and an
extended partition and _each_ logical partition contains a partition table
too.

The entire disk can be accessed as /dev/hda, /dev/hdb, /dev/sda, etc. The
primary partitions are /dev/hda1 ... /dev/hda4.

+--------------------------------------------+
| Partition table /dev/hda |
| +------------------------------------------|
| | Partition 1 /dev/hda1 |
| | |
| |------------------------------------------|
| | Partition 2 /dev/hda2 |
| | |
| |------------------------------------------|
| | Extended partition /dev/hda3 |
| | +----------------------------------------|
| | | Extended partition table |
| | |----------------------------------------|
| | | Partition 3 /dev/hda5 |
| | | |
| | |----------------------------------------|
| | | Extended partition table |
| | |----------------------------------------|
| | | Partition 4 /dev/hda6 |
| | | |
+--------------------------------------------+

This hard disk has two primary partitions and an extended partition that
contains two logical partitions. They are accessed as /dev/hda5 ...

Note that the partition tables of logical partitions are not accessible as
the first blocks of some devices, while the main partition table, all boot
sectors and the partition tables of extended partitions are.

Partition tables are stored in partition boot sectors. Normally, only the
partition boot sector of the entire disk is used as a boot sector. It is
also frequently called the master boot record (MBR). Its structure is as
follows:

+------------------------+
0x000 | |
| Program code |
| |
| |
|------------------------|
0x1BE | Partition table |
| |
|------------------------|
0x1FE | Magic number (0xAA55) |
+------------------------+

The LILO boot sector is designed to be usable as a partition boot sector.
(I.e. there is room for the partition table.) Therefore, the LILO boot
sector can be stored at the following locations:

- boot sector of a Linux floppy disk. (/dev/fd0, ...)
- MBR of the first hard disk. (/dev/hda, /dev/sda, ...)
- boot sector of a primary Linux file system partition on the first hard
disk. (/dev/hda1, ...)
- partition boot sector of an extended partition on the first hard disk.
(/dev/hda1, ...)*

* Most FDISK-type programs don't believe in booting from an extended
partition and refuse to activate it. LILO is accompanied by a simple
program (activate) that doesn't have this restriction. Linux fdisk also
supports activating extended partitions.

It _can't_ be stored at any of the following locations:

- boot sector of a non-Linux floppy disk or primary partition.
- a Linux swap partition.
- boot sector of a logical partition in an extended partition.*
- on the second hard disk. (Unless for backup installations, if the
current first disk will be removed or disabled, or if some other boot
loader is used, that is capable of loading boot sectors from other
drives.)

* LILO can be forced to put the boot sector on such a partition by using
the -b option or the BOOT variable. However, only few programs that
operate as master boot records support booting from a logical
partition.

Although LILO tries to detect attempts to put its boot sector at an invalid
location, you should not rely on that.

Booting basics
--------------

When booting from a floppy disk, the first sector of the disk, the
so-called boot sector, is loaded. That boot sector contains a small program
that loads the respective operating system. MS-DOS boot sectors also
contain a data area, where disk and file system parameters (cluster size,
number of sectors, number of heads, etc.) are stored.

When booting from a hard disk, the very first sector of that disk, the
so-called master boot record (MBR) is loaded. This sector contains a loader
program and the partition table of the disk. The loader program usually
loads the boot sector, as if the system was booting from a floppy.

Note that there is no functional difference between the MBR and the boot
sector other than that the MBR contains the partition information but
doesn't contain any file system-specific information (e.g. MS-DOS disk
parameters).

The first 446 (0x1BE) bytes of the MBR are used by the loader program. They
are followed by the partition table, with a length of 64 (0x40) bytes. The
last two bytes contain a magic number that is sometimes used to verify that
a given sector really is a boot sector.

There is a large number of possible boot configurations. The most common
ones are described in the following sections.

MS-DOS alone
- - - - - -

+-------------------------------------------------------+
| Master Boot Record Boot sector Operating system |
|-------------------------------------------------------|
| DOS-MBR ------------> MS-DOS ------> COMMAND.COM |
+-------------------------------------------------------+

This is what usually happens when MS-DOS boots from a hard disk: the
DOS-MBR determines the active partition and loads the MS-DOS boot sector.
This boot sector loads MS-DOS and finally passes control to COMMAND.COM.
(This is greatly simplified.)

LOADLIN
- - - -

+------------------------------------------------------------+
| Master Boot Record Boot sector Operating system |
|------------------------------------------------------------|
| DOS-MBR ------------> MS-DOS ------> COMMAND.COM |
| ---> LOADLIN ------> Linux |
+------------------------------------------------------------+

A typical LOADLIN setup: everything happens like when booting MS-DOS, but
in CONFIG.SYS or AUTOEXEC.BAT, LOADLIN is invoked. Typically, a program
like BOOT.SYS is used to choose among configuration sections in CONFIG.SYS
and AUTOEXEC.BAT. This approach has the pleasant property that no boot
sectors have to be altered.

Please refer to the documentation accompanying the LOADLIN package for
installation instructions and further details.

LILO started by DOS-MBR
- - - - - - - - - - - -

+-------------------------------------------------------+
| Master Boot Record Boot sector Operating system |
|-------------------------------------------------------|
| DOS-MBR ------------> LILO --------> Linux |
| ---> other OS |
+-------------------------------------------------------+

This is a "safe" LILO setup: LILO is booted by the DOS-MBR. No other boot
sectors have to be touched. If the other OS (or one of them, if there are
several other operating systems being used) should be booted without using
LILO, the other partition has to be marked "active" with fdisk or activate.

Installation:

- install LILO with its boot sector on the Linux partition.
- use fdisk or activate to make that partition active.
- reboot.

Deinstallation:

- make a different partition active.
- install whatever should replace LILO and/or Linux.

Several alternate branches
- - - - - - - - - - - - -

+------------------------------------------------------------+
| Master Boot Record Boot sector Operating system |
|------------------------------------------------------------|
| DOS-MBR ------------> MS-DOS ------> COMMAND.COM |
| ---> LOADLIN ------> Linux |
| ---> LILO --------> Linux |
| ---> MS-DOS --- ... |
+------------------------------------------------------------+

An extended form of the above setup: the MBR is not changed and both
branches can either boot Linux or MS-DOS. (LILO could also boot other
operating systems.)

LILO started by BOOTACTV*
- - - - - - - - - - - - -

* Other, possibly better known boot switchers, e.g. OS/2 BootManager
operate in a similar way. The installation procedures typically vary.

+-------------------------------------------------------+
| Master Boot Record Boot sector Operating system |
|-------------------------------------------------------|
| BOOTACTV -----------> LILO --------> Linux |
| ---> other OS |
+-------------------------------------------------------+

Here, the MBR is replaced by BOOTACTV (or any other interactive boot
partition selector) and the choice between Linux and the other operating
system(s) can be made at boot time. This approach should be used if LILO
fails to boot the other operating system(s).*

* And the author would like to be notified if booting the other
operating system(s) doesn't work with LILO, but if it works with an
other boot partition selector.

Installation:

- boot Linux.
- make a backup copy of your MBR on a floppy disk, e.g.
dd if=/dev/hda of=/fd/MBR bs=512 count=1
- install LILO with the boot sector on the Linux partition.
- install BOOTACTV as the MBR, e.g.
dd if=bootactv.bin of=/dev/hda bs=446 count=1
- reboot.

Deinstallation:

- boot Linux.
- restore the old MBR, e.g.
dd if=/MBR of=/dev/hda bs=446 count=1
or FDISK /MBR under MS-DOS.

If replacing the MBR appears undesirable and if a second Linux partition
exists (e.g. /usr, _not_ a swap partition), BOOTACTV can be merged with the
partition table and stored as the "boot sector" of that partition. Then,
the partition can be marked active to be booted by the DOS-MBR.

Example:

# dd if=/dev/hda of=/dev/hda3 bs=512 count=1
# dd if=bootactv.bin of=/dev/hda3 bs=446 count=1

_WARNING:_ Whenever the disk is re-partitioned, the merged boot sector on
that "spare" Linux partition has to be updated too.

LILO alone
- - - - -

+----------------------------------------+
| Master Boot Record Operating system |
|----------------------------------------|
| LILO ---------------> Linux |
| ---> other OS |
+----------------------------------------+

LILO can also take over the entire boot procedure. If installed as the MBR,
LILO is responsible for either booting Linux or any other OS. This approach
has the disadvantage, that the old MBR is overwritten and has to be
restored (either from a backup copy, with FDISK /MBR on recent versions of
MS-DOS or by overwriting it with something like BOOTACTV) if Linux should
ever be removed from the system.

You should verify that LILO is able to boot your other operating system(s)
before relying on this method.

Installation:

- boot Linux.
- make a backup copy of your MBR on a floppy disk, e.g.
dd if=/dev/hda of=/fd/MBR bs=512 count=1
- install LILO with its boot sector as the MBR.
- reboot.

Deinstallation:

- boot Linux.
- restore the old MBR, e.g.
dd if=/fd/MBR of=/dev/hda bs=446 count=1

If you've installed LILO as the master boot record, you have to explicitly
specify the boot sector (configuration variable BOOT=...) when updating the
map. Otherwise, it will try to use the boot sector of your current root
partition, which will usually work, but it will probably leave your system
unbootable.

Names
- - -

The following names have been used to describe boot sectors or parts of
operating systems:

"DOS-MBR" is the original MS-DOS MBR. It scans the partition table for a
partition that is marked "active" and loads the boot sector of that
partition. Programs like MS-DOS' FDISK, Linux fdisk or activate
(accompanies LILO) can change the active marker in the partition table.
"MS-DOS" denotes the MS-DOS boot sector that loads the other parts of
the system (IO.SYS, etc.).
"COMMAND.COM" is the standard command interpreter of MS-DOS.
"LOADLIN" is a program that loads a Linux kernel image from an MS-DOS
partition into memory and executes it. It is usually invoked from
CONFIG.SYS and is used in combination with a CONFIG.SYS configuration
switcher, like BOOT.SYS.*
"LILO" can either load a Linux kernel or the boot sector of any other
operating system. It has a first stage boot sector that loads the
remaining parts of LILO from various locations.**
"BOOTACTV" permits interactive selection of the partition from which the
boot sector should be read. If no key is pressed within a given
interval, the partition marked active is booted. BOOTACTV is included
in the pfdisk package. There are also several similar programs, like
PBOOT and OS-BS.***

* LOADLIN is available for anonymous FTP from
ftp://tsx-11.mit.edu/pub/linux/dos_utils/lodlin.tar.gz
ftp://sunsite.unc.edu/pub/Linux/system/boot/dualboot/lodlin.tgz
BOOT.SYS is available for anonymous FTP from
ftp://ftp.funet.fi/pub/Linux/tools/boot142.zip

** LILO can be found in
ftp://tsx-11.mit.edu/pub/linux/packages/lilo/lilo-.tar.gz
ftp://sunsite.unc.edu/pub/Linux/system/boot/lilo/lilo-.tar.gz
ftp://lrcftp.epfl.ch/pub/linux/local/lilo/lilo-.tar.gz

*** pfdisk is available for anonymous FTP from
ftp://sunsite.unc.edu/pub/Linux/utils/disk-management/pfdisk.tar.Z or
ftp://ftp.funet.fi/pub/Linux/tools/pfdisk.tar.Z
PBOOT can be found in
ftp://ftp.funet.fi/pub/Linux/tools/pboot.zip

Choosing the "right" boot concept
-----------------------------------

Although LILO can be installed in many different ways, the choice is
usually limited by the present setup and therefore, typically only a small
number of configurations which fit naturally into an existing system
remains. The following sections describe various possible cases. See also
section "BIOS restrictions".

The configuration file /etc/lilo.conf for the examples could look like
this:

boot = /dev/hda2
compact
image = /vmlinuz
image = /vmlinuz.old
other = /dev/hda1
table = /dev/hda
label = msdos

It installs a Linux kernel image (/vmlinuz), an alternate Linux kernel
image (/vmlinuz.old) and a chain loader to boot MS-DOS from /dev/hda1. The
option COMPACT on the second line instructs the map installer to optimize
loading.

In all examples, the names of the IDE-type hard disk devices (/dev/hda...)
are used. Everything applies to other disk types (e.g. SCSI disks;
/dev/sda...) too.

BIOS restrictions
- - - - - - - - -

Nowadays, an increasing number of systems is equipped with comparably large
disks or even with multiple disks. At the time the disk interface of the
standard PC BIOS has been designed (about 16 years ago), such
configurations were apparently considered to be too unlikely to be worth
supporting.

The most common BIOS restrictions that affect LILO are the limitation to
two hard disks and the inability to access more than 1024 cylinders per
disk. LILO can detect both conditions, but in order to work around the
underlying problems, manual intervention is necessary.

The drive limit does not exist in every BIOS. Some modern motherboards and
disk controllers are equipped with a BIOS that supports more (typically
four) disk drives. When attempting to access the third, fourth, etc. drive,
LILO prints a warning message but continues. Unless the BIOS really
supports more than two drives, the system will _not_ be able to boot in
that case.*

* However, if only "unimportant" parts of the system are located on the
"high" drives, some functionality may be available.

The cylinder limit is a very common problem with IDE disks. There, the
number of cylinders may already exceed 1024 if the drive has a capacity of
more than 504 MB. Many SCSI driver BIOSes present the disk geometry in a
way that makes the limit occur near 1 GB. Modern disk controllers may even
push the limit up to about 8 GB. All cylinders beyond the 1024th are
inaccessible for the BIOS. LILO detects this problem and aborts the
installation (unless the LINEAR option is used, see section "Global
options").

Note that large partitions that only partially extend into the "forbidden
zone" are still in jeopardy even if they appear to work at first, because
the file system does not know about the restrictions and may allocate disk
space from the area beyond the 1024th cylinder when installing new kernels.
LILO therefore prints a warning message but continues as long as no
imminent danger exists.

There are four approaches of how such problems can be solved:

- use of a different partition which is on an accessible disk and which
does not exceed the 1024 cylinder limit. If there is only a DOS
partition which fulfills all the criteria, that partition can be used
to store the relevant files. (See section "/boot on a DOS partition".)
- rearranging partitions and disks. This is typically a destructive
operation, so extra care should be taken to make good backups.
- if the system is running DOS or Windows 95, LOADLIN can be used instead
of LILO.
- if all else fails, installation of a more capable BIOS, a different
controller or a different disk configuration.

LILO depends on the BIOS to load the following items:

- /boot/boot.b
- /boot/map (created when running /sbin/lilo)
- all kernels
- the boot sectors of all other operating systems it boots
- the startup message, if one has been defined

Normally, this implies that the Linux root file system should be in the
"safe" area. However, it is already sufficient to put all kernels into
/boot and to either mount a "good" partition on /boot or to let /boot be a
symbolic link pointing to or into such a partition.

See also /usr/src/linux/Documentation/ide.txt (or
/usr/src/linux/drivers/block/README.ide in older kernels) for a detailed
description of problems with large disks.

One disk, Linux on a primary partition
- - - - - - - - - - - - - - - - - - -

If at least one primary partition of the first hard disk is used as a Linux
file system (/, /usr, etc. but _not_ as a swap partition), the LILO boot
sector should be stored on that partition and it should be booted by the
original master boot record or by a program like BOOTACTV.

+--------------------------+
| MBR /dev/hda |
| +------------------------|
| | MS-DOS /dev/hda1 |
| |------------------------|
--> | | Linux / /dev/hda2 |
+--------------------------+

In this example, the BOOT variable could be omitted, because the boot
sector is on the root partition.

One disk, Linux on a logical partition
- - - - - - - - - - - - - - - - - - -

If no primary partition is available for Linux, but at least one logical
partition of an extended partition on the first hard disk contains a Linux
file system, the LILO boot sector should be stored in the partition sector
of the extended partition and it should be booted by the original master
boot record or by a program like BOOTACTV.

+--------------------------+
| MBR /dev/hda |
| +------------------------|
| | MS-DOS /dev/hda1 |
| |------------------------|
--> | | Extended /dev/hda2 |
| | +----------------------|
| | | Linux /dev/hda5 |
| | |----------------------|
| | | ... /dev/hda6 |
+--------------------------+

Because many disk partitioning programs refuse to make an extended
partition (in our example /dev/hda2) active, you might have to use
activate, which comes with the LILO distribution.

OS/2 BootManager should be able to boot LILO boot sectors from logical
partitions. The installation on the extended partition itself is not
necessary in this case.

Two disks, Linux (at least partially) on the first disk
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

This case is equivalent to the configurations where only one disk is in the
system. The Linux boot sector resides on the first hard disk and the second
disk is used later in the boot process.

Only the location of the boot sector matters - everything else
(/boot/boot.b, /boot/map, the root file system, a swap partition, other
Linux file systems, etc.) can be located anywhere on the second disk,
provided that the constraints described in section "BIOS restrictions" are
met.

Two disks, Linux on second disk, first disk has an extended partition
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

If there is no Linux partition on the first disk, but there is an extended
partition, the LILO boot sector can be stored in the partition sector of
the extended partition and it should be booted by the original master boot
record or by a program like BOOTACTV.

FIRST DISK SECOND DISK
+--------------------------+ +--------------------------+
| MBR /dev/hda | | MBR /dev/hdb |
| +------------------------| | +------------------------|
| | MS-DOS /dev/hda1 | | | Linux /dev/hdb1 |
| |------------------------| | |------------------------|
--> | | Extended /dev/hda2 | | | ... /dev/hdb2 |
| | +----------------------| | | |
| | | ... /dev/hda5 | | | |
| | |----------------------| | | |
| | | ... /dev/hda6 | | | |
+--------------------------+ +--------------------------+

The program activate, that accompanies LILO, may have to be used to set the
active marker on an extended partition, because MS-DOS' FDISK and some
older version of Linux fdisk refuse to do that. (Which is generally a good
idea.)

Two disks, Linux on second disk, first disk has no extended partition
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

If there is neither a Linux partition nor an extended partition on the
first disk, then there's only one place left, where a LILO boot sector
could be stored: the master boot record.

In this configuration, LILO is responsible for booting all other operating
systems too.

FIRST DISK SECOND DISK
+--------------------------+ +--------------------------+
--> | MBR /dev/hda | | MBR /dev/hdb |
| +------------------------| | +------------------------|
| | MS-DOS /dev/hda1 | | | Linux /dev/hdb1 |
| |------------------------| | |------------------------|
| | ... /dev/hda2 | | | ... /dev/hdb2 |
+--------------------------+ +--------------------------+

You should back up your old MBR before installing LILO and verify that LILO
is able to boot your other operating system(s) before relying on this
approach.

The line boot = /dev/hda2 in /etc/lilo.conf would have to be changed to
boot = /dev/hda in this example.

More than two disks
- - - - - - - - - -

On systems with more than two disks, typically only the first two can be
accessed. The configuration choices are therefore the same as with two
disks.

When attempting to access one of the extra disks, LILO displays a warning
message ( Warning: BIOS drive 0x may not be accessible ) but does
not abort. This is done in order to allow the lucky few whose BIOS (or
controller-BIOS) does support more than two drives to make use of this
feature. By all others, this warning should be considered a fatal error.

Note that the two disks restriction is only imposed by the BIOS. Linux
normally has no problems using all disks once it is booted.

/boot on a DOS partition
- - - - - - - - - - - -

Recent kernels support all the functions LILO needs to map files also on
MS-DOS (or UMSDOS) file systems. Since DOS partitions tend to occupy
exactly the places where BIOS restrictions (see section "BIOS
restrictions") are invisible, they're an ideal location for /boot if the
native Linux file systems can't be used because of BIOS problems.

In order to accomplish this, the DOS partition is mounted read-write, a
directory (e.g. /dos/linux) is created, all files from /boot are moved to
that directory, /boot is replaced by a symbolic link to it, the kernels are
also moved to the new directory, their new location is recorded in
/etc/lilo.conf, and finally /sbin/lilo is run.

From then on, new kernels must always be copied into that directory on the
DOS partition before running /sbin/lilo, e.g. when recompiling a kernel,
the standard procedure changes from

# make zlilo

# make zImage
# mv /dos/linux/vmlinuz /dos/linux/vmlinuz.old
# mv arch/i386/boot/zImage /dos/linux/vmlinuz
# /sbin/lilo

_WARNING:_ De-fragmenting such a DOS partition is likely to make Linux or
even the whole system unbootable. Therefore, the DOS partition should
either not be de-fragmented, or a Linux boot disk should be prepared (and
tested) to bring up Linux and to run /sbin/lilo after the
de-fragmentation.*

* Setting the "system" attribute from DOS on the critical files (e.g.
everything in C:\LINUX) may help to protect them from being rearranged.
However, the boot floppy should still be ready, just in case.

The boot prompt
===============

Immediately after it's loaded, LILO checks whether one of the following is
happening:

- any of the [Shift], [Control] or [Alt] keys is pressed.
- [CapsLock] or [ScrollLock] is set.

If this is the case, LILO displays the boot: prompt and waits for the name
of a boot image (i.e. Linux kernel or other operating system). Otherwise,
it boots the default boot image* or - if a delay has been specified - waits
for one of the listed activities until that amount of time has passed.

* The default boot image is either the first boot image, the image
specified with the DEFAULT variable, or the image that has been
selected at the boot prompt.

At the boot prompt, the name of the image to boot can be entered. Typing
errors can be corrected with [BackSpace], [Delete], [Ctrl U] and [Ctrl X].
A list of known images can be obtained by pressing [?] or [Tab].

If [Enter] is pressed and no file name has been entered, the default image
is booted.

Boot command-line options
-------------------------

LILO is also able to pass command-line options to the kernel. Command-line
options are words that follow the name of the boot image and that are
separated by spaces.

Example:

boot: linux single root=200

This document only gives an overview of boot options. Please consult Paul
Gortmaker's BootPrompt-HOWTO for a more complete and more up to date list.
You can get it from
ftp://sunsite.unc.edu/pub/Linux/docs/HOWTO/BootPrompt-HOWTO.gz or from one
of the many mirror sites.

Standard options
- - - - - - - -

Recent kernels recognize a large number of options, among them are debug ,
no387 , no-hlt , ramdisk= , reserve=,,... ,
root= , ro , and rw . All current init programs also recognize
the option single . The options lock and vga are processed by the boot
loader itself. Boot command-line options are always case-sensitive.

single boots the system in single-user mode. This bypasses most system
initialization procedures and directly starts a root shell on the console.
Multi-user mode can typically be entered by exiting the single-user shell
or by rebooting.

root= changes the root device. This overrides settings that may
have been made in the boot image and on the LILO command line. is
either the hexadecimal device number or the full path name of the device,
e.g. /dev/hda3.*

* The device names are hard-coded in the kernel. Therefore, only the
"standard" names are supported and some less common devices may not be
recognized. In those cases, only numbers can be used.

reserve=,,... reserves IO port regions. This can be used to
prevent device drivers from auto-probing addresses where other devices are
located, which get confused by the probing.

ro instructs the kernel to mount the root file system read-only. rw
mounts it read-write. If neither ro nor rw is specified, the setting
from the boot image is used.

no-hlt avoids executing a HLT instructions whenever the system is idle.
HLT normally significantly reduces power consumption and therefore also
heat dissipation of the CPU, but may not work properly with some clone
CPUs. no387 disables using the hardware FPU even if one is present.

debug enables more verbose console logging.

Recent kernels also accept the options init= and noinitrd . init
specifies the name of the init program to execute. Therefore, if single
mode cannot be entered because init is mis-configured, one may still be
able to reach a shell using init=/bin/sh. noinitrd disables automatic
loading of the initial RAM disk. Instead, its content is then available on
/dev/initrd.

vga= alters the VGA mode set at startup. The values normal ,
extended , ask or a decimal number are recognized. (See also "Booting
kernel images from a file".)

kbd=,... preloads a sequence of keystrokes in the BIOS keyboard buffer. The keystrokes have to be entered as 16 bit hexadecimal numbers, with the upper byte containing the scan code and the lower byte containing the ASCII code. Note that most programs only use the ASCII code, so the scan code can frequently be omitted. Scan code tables can be found in many books on PC hardware. Note that scan codes depend on the keyboard layout.


Finally,  lock  stores the current command-line as the default

command-line, so that LILO boots the same image with the same options

(including  lock ) when invoked the next time.
Device-specific options

- - - - - - - - - - - -
There is also a plethora of options to specify certain characteristics

(e.g. IO and memory addresses) of devices. Some common ones are  ether ,

floppy ,  hd ,  bmouse , and  sound . The usage of these options is

=,... . Please consult the corresponding FAQs and HOWTOs

for details. For an overview of all available options, consult the file

init/main.c in the kernel source tree.
Other options

- - - - - - -
Options of the type  =  which are neither standard options

nor device-specific options, cause the respective variables to be set in

the environment passed to init. The case of the variable name is preserved,

i.e. it isn't automatically converted to upper case.
Note that environment variables passed to init are typically available in

system initialization scripts (e.g. /etc/rc.local), but they're not visible

from ordinary login sessions, because the login program removes them from

the user's environment.
Repeating options

- - - - - - - - -
The effect of repeating boot command-line options depends on the options.*

There are three possible behaviours:
  *  Options are frequently repeated when a string defined with APPEND or

    LITERAL is prepended to the parameters typed in by the user. Also, LILO

    implicitly prepends the options  ramdisk ,  ro ,  root , or  rw  when

    RAMDISK, READ-ONLY, READ-WRITE, or ROOT, respectively, are set in the

    configuration file. ( lock  and  vga  are handled by a different

    internal mechanism.)
Options that only enable or disable a certain functionality can be repeated

any number of times.  debug ,  lock ,  no-hlt , and  no387  fall into this

category.
Other options change a global setting whenever they appear, so only the

value or presence of the last option matters. The antagonists  ro  and  rw

are such options. Also,  ramdisk ,  root , and  vga  work this way.

Example:  ro rw  would mount the root file system read-write.
Finally, when  reserve  and many device-specific options are repeated, each

occurrence has its own meaning, e.g.  hd=... hd=...  would configure two

hard disks, and  reserve=0x300,8 reserve=0x5f0,16  would reserve the ranges

0x300 to 0x307 and 0x5f0 to 0x5ff (which is equivalent to writing

reserve=0x300,8,0x5f0,16 ).
Implicit options

- - - - - - - -
LILO always passes the string  BOOT_IMAGE=  to the kernel, where

 is the name by which the kernel is identified (e.g. the label). This

variable can be used in /etc/rc to select a different behaviour, depending

on the kernel.
When booting automatically, i.e. without human intervention, the word  auto

is also passed on the command line. This can be used by init to suppress

interactive prompts in the boot phase.
Boot image selection

--------------------
The details of selecting the boot image are somewhat complicated. The

following tables illustrate them. First, if neither PROMPT is set nor a

shift key is being pressed:
Externally  Command   |  Auto-  Booted image

 provided   line in   |  matic

cmd. line* map file** | boot***

---------------------------------------------------------

    No         No     |   Yes   Default image

   Yes         -      |   Yes   Specified by external

                      |         command line

    No        Yes     |   Yes   Specified by command line

                      |         in map file              
  *  Externally provided command lines could be used to add front-ends to

    LILO. They would pass the respective command string to LILO, which

    would then interpret it like keyboard input. This feature is currently

    not used.
  **  This command line is set by invoking the map installer with the  -R

    option, by using the boot command-line option  lock , or if a fallback

    command line is set (with FALLBACK).
  ***  I.e. the keyword  auto  is added.
If PROMPT is not set and a shift key is being pressed:
 Input  Empty  Extern. Cmd.l. | Auto- Booted image

timeout cmd.l. cmd.l.  in map | matic

                        file  | boot

-----------------------------------------------------------

  No      No      -      -    |  No   Specified by the user

  No     Yes      -      -    |  No   Default image

  Yes    n/a      -      -    |  Yes  Default image        
Finally, if the configuration variable PROMPT is set:
 Input  Empty  Extern. Cmd.l. | Auto- Booted image

timeout cmd.l. cmd.l.  in map | matic

                        file  | boot

-----------------------------------------------------------

  No      No     No      No   |  No   Specified by the user

  No     Yes     No      No   |  No   Default image

  Yes    n/a     No      No   |  Yes  Default image

  n/a    n/a     Yes     -    |  Yes  Specified by external

                              |       command line

  n/a    n/a     No     Yes   |  Yes  Specified by command

                              |       line in map file     
Note that LILO pauses for the amount of time specified in DELAY when at the

end of a default command line. The automatic boot can then be interrupted

by pressing a modifier key ([Shift], [Ctrl], etc.).
The default image is the first image in the map file or the image specified

with the DEFAULT variable. However, after an unsuccessful boot attempt, the

respective image becomes the default image.
Map installer

=============
The map installer program /sbin/lilo updates the boot sector and creates

the map file. If the map installer detects an error, it terminates

immediately and does not touch the boot sector and the map file.
Whenever the map installer updates a boot sector, the original boot sector

is copied to /boot/boot., where  is the hexadecimal device

number. If such a file already exists, no backup copy is made. Similarly, a

file  /boot/part.  is created if LILO modifies the partition table.

(See "General per-image options".)
Command-line options

--------------------
The LILO map installer can be invoked in the following ways:
Show current installation

- - - - - - - - - - - - -
The currently mapped files are listed. With  -v , also many parameters are

shown.
  /sbin/lilo [ -C  ] -q [ -m  ] [ -v ... ]  
   -C

     Specifies the configuration file that is used by the map installer

    (see section "Configuration"). If  -C  is omitted, /etc/lilo.conf is

    used.

   -m

     Specifies an alternate map file. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -q

     Lists the currently mapped files.

   -v ...

     Increase verbosity. See also sections "Options corresponding to

    configuration variables" and "Global options". 
Create or update map

- - - - - - - - - -
A new map is created for the images described in the configuration file

/etc/lilo.conf and they are registered in the boot sector.
  /sbin/lilo [ -C  ] [ -b  ] [ -c ] [ -l ] [ -i

     ] [ -f  ] [ -m  ] [ -d  ] [ -v

    ... ] [ -t ] [ -s  | -S  ] [ -P fix | -P ignore ]

    [ -r  ]  
   -b

     Specifies the boot device. See also sections "Options corresponding to

    configuration variables" and "Global options".

   -c

     Enables map compaction. See also sections "Options corresponding to

    configuration variables" and "Global options".

   -C

     Specifies an alternate configuration file. See also section "Show

    current installation".

   -d

     Sets the delay before LILO boots the default image. Note that the

    delay is specified in _tenths_ of a second. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -D

     Specifies the default image. See also sections "Options corresponding

    to configuration variables" and "Global options".

   -f

     Specifies a disk parameter table file. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -i

     Specifies an alternate boot file. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -l

     Enables linear sector addresses. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -m

     Specifies an alternate map file. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -P

     Specifies how invalid partition table entries should be handled. See

    also sections "Options corresponding to configuration variables" and

    "Global options".

   -r

     Chroots to the specified directory before doing anything else. This is

    useful when running the map installer while the normal root file system

    is mounted somewhere else, e.g. when recovering from an installation

    failure with a recovery disk. The  -r  option is implied if the

    environment variable  ROOT  is set.* The current directory is changed

    to the new root directory, so using relative paths may not work.

   -s

     Specifies an alternate boot sector save file. See also sections

    "Options corresponding to configuration variables" and "Global

    options".

   -S

     Like  -s , but overwrites old save files.

   -t

     Test only. This performs the entire installation procedure except

    replacing the map file, writing the modified boot sector and fixing

    partition tables. This can be used in conjunction with the  -v  option

    to verify that LILO will use sane values.

   -v ...

     Increase verbosity. See also sections "Options corresponding to

    configuration variables" and "Global options". 
  *  E.g. if your root partition is mounted on /mnt, you can update the map

    by simply running  ROOT=/mnt /mnt/sbin/lilo 
Change default command line

- - - - - - - - - - - - - -
Changes LILO's default command line. See also section "Boot image

selection".
  /sbin/lilo [ -C  ] [ -m  ] -R [  ... ]  
   -C

     Specifies an alternate configuration file. See also section "Show

    current installation".

   -m

     Specifies an alternate map file. See also sections "Options

    corresponding to configuration variables" and "Global options".

   -R

     Stores the specified words in the map file. The boot loader uses those

    words as the default command line when booting the next time. That

    command line is removed from the map file by the boot loader by

    overwriting the sector immediately after reading it. The first word has

    to be the name of a boot image. If  -R  is not followed by any words,

    the current default command line in the map file is erased.* If the

    command line isn't valid, the map installer issues an error message and

    returns a non-zero exit code. 
  *   -R  is typically used in reboot scripts, e.g.

     #!/bin/sh

     cd /

     /sbin/lilo -R "$*" && reboot
Kernel name translation

- - - - - - - - - - - -
Determines the path of the kernel.
  /sbin/lilo [ -C  ] -I  [  ]  
   -C

     Specifies an alternate configuration file. See also section "Show

    current installation".

   -I  [  ]

     Translates the specified label name to the path of the corresponding

    kernel image and prints that path on standard output. This can be used

    to synchronize files that depend on the kernel (e.g. the ps database).

    The image name can be obtained from the environment variable

    BOOT_IMAGE . An error message is issued and a non-zero exit code is

    returned if no matching label name can be found. The existence of the

    image file is verified if the option character  v  is added. 
De-installation

- - - - - - - -
Restores the boot sector that was used before the installation of LILO.

Note that this option only works properly if LILO's directories (e.g.

/boot) have not been touched since the first installation. See also section

"LILO de-installation".
  /sbin/lilo [ -C  ] [ -s  ] -u | -U [

     ]  
   -C

     Specifies an alternate configuration file. See also section "Show

    current installation".

   -s

     Specifies an alternate boot sector save file. See also sections

    "Options corresponding to configuration variables" and "Global

    options".

   -u [  ]

     Restores the backup copy of the specified boot sector. If no device is

    specified, the value of the boot variable is used. If this one is also

    unavailable, LILO uses the current root device. The name of the backup

    copy is derived from the device name. The -s option or the backup

    variable can be used to override this. LILO validates the backup copy

    by checking a time stamp.

   -U [  ]

     Like -u, but does not check the time stamp. 
Print version number

- - - - - - - - - -
  /sbin/lilo -V  
   -V

     Print the version number and exit. 
Options corresponding to configuration variables

- - - - - - - - - - - - - - - - - - - - - - - -
There are also many command-line options that correspond to configuration

variables. See section "Global options" for a description.
Command-line option | Configuration variable

------------------------------------------------

-b     | boot=

-c                  | compact

-d           | delay=

-D            | default=

-i     | install=

-l                  | linear

-m        | map=

-P fix              | fix-table

-P ignore           | ignore-table

-s     | backup=

-S     | force-backup=

-v ...              | verbose=           
Configuration

-------------
The configuration information is stored in the file /etc/lilo.conf and

consists of variable assignments.
Syntax

- - -
The following syntax rules apply:
  - flag variables consist of a single word and are followed by whitespace

    or the end of the file.

  - string variables consist of the variable name, optional whitespace, an

    equal sign, optional whitespace, the value and required whitespace, or

    the end of the file.

  - a non-empty sequence of blanks, tabs, newlines and comments counts as

    whitespace.

  - variable names are case-insensitive. Values are usually case-sensitive,

    but there are a few exceptions. (See below.)

  - tabs and newlines are special characters and may not be part of a

    variable name or a value. The use of other control characters and

    non-ASCII characters is discouraged.

  - blanks and equal signs may only be part of a variable name or a value

    if they are escaped by a backslash or if the value is embedded in

    double quotes. An equal sign may not be the only character in a name or

    value.

  - an escaped tab is converted to an escaped blank. An escaped newline is

    removed from the input stream. An escaped backslash (i.e. two

    backslashes) is converted to a backslash. Inside quoted strings, only

    double quotes, backslashes, dollar signs, and newlines can be escaped.

  - quoted strings can be continued over several lines by ending each

    incomplete line with a backslash. A single space is inserted in the

    string for the line end and all spaces or tabs that follow immediately

    on the next line.

  - environment variables can be used by specifying them in the form

    $ or ${}. Dollar signs can be escaped.

  - comments begin with a number sign and end with the next newline. All

    characters (including backslashes) until the newline are ignored. 
Example:
boot = $FD

install = $MNT/boot.b

map = $MNT/map

compact

read-only

append = "nfsroot=/home/linux-install/root \

  nfsaddrs=128.178.156.28:128.178.156.24::255.255.255.0:lrcinst"

image = $MNT/zImage
Global options

- - - - - - -
/etc/lilo.conf begins with a possibly empty global options section. Many

global options can also be set from the command line, but storing permanent

options in the configuration file is more convenient.
The following global options are recognized:
  BACKUP=  Copy the original boot sector to

    (which may also be a device, e.g. /dev/null) instead of

    /boot/boot.

  BOOT=  Sets the name of the device (e.g. a hard disk

    partition) that contains the boot sector. If BOOT is omitted, the boot

    sector is read from (and possibly written to) the device that is

    currently mounted as root.

  CHANGE-RULES  Defines partition type numbers. See section "Partition type

    change rules" for details.

  COMPACT  Tries to merge read requests for adjacent sectors into a single

    read request. This drastically reduces load time and keeps the map

    smaller. Using COMPACT is especially recommended when booting from a

    floppy disk. COMPACT may conflict with LINEAR, see section "Other

    problems".

  DEFAULT=  Uses the specified image as the default boot image. If

    DEFAULT is omitted, the image appearing first in the configuration file

    is used.

  DELAY=  Specifies the number of _tenths_ of a second LILO should

    wait before booting the first image. This is useful on systems that

    immediately boot from the hard disk after enabling the keyboard. LILO

    doesn't wait if DELAY is omitted or if DELAY is set to zero.

  DISK=  Defines non-standard parameters for the specified

    disk. See section "Disk geometry" for details.

  FIX-TABLE  Allows LILO to adjust 3D addresses in partition tables. Each

    partition entry contains a 3D (sector/head/cylinder) and a linear

    address of the first and the last sector of the partition. If a

    partition is not track-aligned and if certain other operating systems

    (e.g. PC/MS-DOS or OS/2) are using the same disk, they may change the

    3D address. LILO can store its boot sector only on partitions where

    both address types correspond. LILO re-adjusts incorrect 3D start

    addresses if FIX-TABLE is set.
     _WARNING:_ This does not guarantee that other operating systems may

    not attempt to reset the address later. It is also possible that this

    change has other, unexpected side-effects. The correct fix is to

    re-partition the drive with a program that does align partitions to

    tracks. Also, with some disks (e.g. some large EIDE disks with address

    translation enabled), under some circumstances, it may even be

    unavoidable to have conflicting partition table entries.

  FORCE-BACKUP=  Like BACKUP, but overwrite an old backup copy

    if it exists. BACKUP= is ignored if FORCE-BACKUP appears

    in the same configuration file.

  IGNORE-TABLE  Tells LILO to ignore corrupt partition tables and to put

    the boot sector even on partitions that appear to be unsuitable for

    that.

  INSTALL=  Install the specified file as the new boot sector.

    If INSTALL is omitted, /boot/boot.b is used as the default.

  KEYTABLE=  Re-map the keyboard as specified in this file. See

    section "Keyboard translation" for details.

  LINEAR  Generate linear sector addresses instead of sector/head/cylinder

    addresses. Linear addresses are translated at run time and do not

    depend on disk geometry. Note that boot disks may not be portable if

    LINEAR is used, because the BIOS service to determine the disk geometry

    does not work reliably for floppy disks. When using LINEAR with large

    disks, /sbin/lilo may generate references to inaccessible disk areas

    (see section "BIOS restrictions"), because 3D sector addresses are not

    known before boot time. LINEAR may conflict with COMPACT, see section

    "Other problems".

  MAP=  Specifies the location of the map file. If MAP is

    omitted, a file /boot/map is used.

  MESSAGE=  Specifies a file containing a message that is

    displayed before the boot prompt. No message is displayed while waiting

    for a modifier key ([Shift], etc.) after printing "LILO ". In the

    message, the  FF  character ([Ctrl L]) clears the local screen. The

    size of the message file is limited to 65535 bytes. The map file has to

    be rebuilt if the message file is changed or moved.

  NOWARN  Disables warnings about possible future dangers.

  PROMPT  Forces entering the boot prompt without expecting any prior

    key-presses. Unattended reboots are impossible if PROMPT is set and

    TIMEOUT isn't.

  SERIAL=  Enables control from a serial line. The specified

    serial port is initialized and LILO is accepting input from it and from

    the PC's keyboard. Sending a break on the serial line corresponds to

    pressing a shift key on the console in order to get LILO's attention.

    All boot images should be password-protected if the serial access is

    less secure than access to the console, e.g. if the line is connected

    to a modem. The parameter string has the following syntax:

      ,

     The components ,  and  can be omitted. If a

    component is omitted, all following components have to be omitted too.

    Additionally, the comma has to be omitted if only the port number is

    specified. 
        the number of the serial port, zero-based. 0 corresponds to

        COM1 alias /dev/ttyS0, etc. All four ports can be used (if

        present).

        the baud rate of the serial port. The following baud rates are

        supported: 110, 300, 1200, 2400, 4800, 9600, 19200, and 38400 bps.

        Default is 2400 bps.

        the parity used on the serial line. LILO ignores input

        parity and strips the 8th bit. The following (upper or lower case)

        characters are used to describe the parity:  n  for no parity,  e

        for even parity and  o  for odd parity.

        the number of bits in a character. Only 7 and 8 bits are

        supported. Default is 8 if parity is "none", 7 if parity is "even"

        or "odd". 
    If SERIAL is set, the value of DELAY is automatically raised to 20.
     Example: serial=0,2400n8 initializes COM1 with the default parameters.

  TIMEOUT=  Sets a timeout (in tenths of a second) for keyboard

    input. If no key is pressed for the specified time, the first image is

    automatically booted. Similarly, password input is aborted if the user

    is idle for too long. The default timeout is infinite.

  VERBOSE=  Turns on lots of progress reporting. Higher numbers give

    more verbose output. If  -v  is additionally specified on the command

    line,  is increased accordingly. The following verbosity levels

    exist: 
      <0  only warnings and errors are shown

      0  prints one line for each added or skipped image

      1  mentions names of important files and devices and why they are

        accessed. Also displays informational messages for exceptional but

        harmless conditions and prints the version number.

      2  displays statistics and processing of temporary files and devices

      3  displays disk geometry information and partition table change

        rules

      4  lists sector mappings as they are written into the map file (i.e.

        after compaction, in a format suitable to pass it to the BIOS)

      5  lists the mapping of each sector (i.e. before compaction, raw) 
    When using the  -q  option, the levels have a slightly different

    meaning: 
      0  displays only image names

      1  also displays all global and per-image settings

      2  displays the address of the first map sector 
Additionally, the kernel configuration parameters APPEND, INITRD, RAMDISK,

READ-ONLY, READ-WRITE, ROOT and VGA, and the general per-image options

FALLBACK, LOCK, OPTIONAL, PASSWORD, RESTRICTED, and SINGLE-KEY can be set

in the global options section. They are used as defaults if they aren't

specified in the configuration sections of the respective images. See below

for a description.
The plethora of options may be intimidating at first, but in "normal"

configurations, hardly any options but BOOT, COMPACT, DELAY, ROOT, and VGA

are used.
General per-image options

- - - - - - - - - - - - -
The following options can be specified for all images, independent of their

type: 
  ALIAS=  Specifies a second name for the current entry.

  FALLBACK=  Specifies a string that is stored as the default

    command line if the current image is booted. This is useful when

    experimenting with kernels which may crash before allowing interaction

    with the system. If using the FALLBACK option, the next reboot (e.g.

    triggered by a manual reset or by a watchdog timer) will load a

    different (supposedly stable) kernel. The command line by the fallback

    mechanism is cleared by removing or changing the default command line

    with the  -R  option, see "Change default command line".

  LABEL=  By default, LILO uses the main file name (without its path)

    of each image specification to identify that image. A different name

    can be used by setting the variable LABEL.

  LOCK  Enables automatic recording of boot command lines as the defaults

    for the following boots. This way, LILO "locks" on a choice until it is

    manually overridden.

  OPTIONAL  Omit this image if its main file is not available at map

    creation time. This is useful to specify test kernels that are not

    always present.

  PASSWORD=  Ask the user for a password when trying to load this

    image. Because the configuration file contains unencrypted passwords

    when using this option, it should only be readable for the super-user.

    Passwords are always case-sensitive.

  RESTRICTED  Relaxes the password protection by requiring a password only

    if parameters are specified on the command line (e.g. single).

    RESTRICTED can only be used together with PASSWORD.

  SINGLE-KEY  Enables booting the image by hitting a single key, without

    the need to press [Enter] afterwards. SINGLE-KEY requires that either

    the image's label or its alias (or both) is a single character.

    Furthermore, no other image label or alias may start with that

    character, e.g. an entry specifying a label  linux  and an alias  l  is

    not allowed with SINGLE-KEY. Note that you can't specify command-line

    parameters for an entry for which only SINGLE-KEYed names exist. 
All general per-image options, with the exception of LABEL and ALIAS, can

also be set in the global options section as defaults for all images.
Example:
password = Geheim

single-key

image = /vmlinuz

  label = linux

  alias = 1

  restricted

other = /dev/hda1

  label = dos

  alias = 2
Per-image options for kernels

- - - - - - - - - - - - - - -
Each (kernel or non-kernel) image description begins with a special

variable (see section "Booting kernel images from a device") which is

followed by optional variables. The following variables can be used for all

image descriptions that describe a Linux kernel:
  APPEND=  Appends the options specified in  to the

    parameter line passed to the kernel. This is typically used to specify

    parameters of hardware that can't be entirely auto-detected, e.g.

     append = "hd=64,32,202"

  INITRD=  Specifies the file that will be loaded at boot time as the

    initial RAM disk.

  LITERAL=  like APPEND, but removes all other options (e.g.

    setting of the root device). Because vital options can be removed

    unintentionally with LITERAL, this option cannot be set in the global

    options section.

  RAMDISK=  Specifies the size of the optional RAM disk. A value of

    zero indicates that no RAM disk should be created. If this variable is

    omitted, the RAM disk size configured into the boot image is used.

  READ-ONLY  Specifies that the root file system should be mounted

    read-only. Typically, the system startup procedure re-mounts the root

    file system read-write later (e.g. after fsck'ing it).

  READ-WRITE  specifies that the root file system should be mounted

    read-write.

  ROOT=  Specifies the device that should be mounted as root.

    If the special name CURRENT is used, the root device is set to the

    device on which the root file system is currently mounted. If the root

    has been changed with  -r , the respective device is used. If the

    variable ROOT is omitted, the root device setting contained in the

    kernel image is used. It can be changed with the rdev program.

  VGA=   Specifies the VGA text mode that should be selected when

    booting. The following values are recognized (case is ignored): 
      NORMAL  select normal 80x25 text mode.

      EXTENDED  select 80x50 text mode. The word EXTENDED can be

        abbreviated to EXT.

      ASK  stop and ask for user input (at boot time).

        use the corresponding text mode. A list of available modes

        can be obtained by booting with  vga=ask  and pressing [Enter]. 
    If this variable is omitted, the VGA mode setting contained in the

    kernel image is used. rdev supports manipulation of the VGA text mode

    setting in the kernel image. 
All kernel per-image options but LITERAL can also be set in the global

options section as defaults for all kernels.
If one of RAMDISK, READ-ONLY, READ-WRITE, ROOT, or VGA is omitted in the

configuration file and the corresponding value in the kernel image is

changed, LILO or the kernel will use the new value.
It is perfectly valid to use different settings for the same image, because

LILO stores them in the image descriptors and not in the images themselves.
Example:
image = /vmlinuz

  label = lin-hd

  root = /dev/hda2

image = /vmlinuz

  label = lin-fd

  root = /dev/fd0
Boot image types

----------------
LILO can boot the following types of images: 
  - kernel images from a file.

  - kernel images from a block device. (E.g. a floppy disk.)

  - the boot sector of some other operating system. 
The image type is determined by the name of the initial variable of the

configuration section.
The image files can reside on any media that is accessible at boot time.

There's no need to put them on the root device, although this certainly

doesn't hurt.
Booting kernel images from a file

- - - - - - - - - - - - - - - - -
The image is specified as follows: IMAGE=
Example:
image = /linux
See sections "Per-image options for kernels" and "Boot image types" for the

options that can be added in a kernel image section.
Booting kernel images from a device

- - - - - - - - - - - - - - - - - -
The range of sectors that should be mapped has to be specified. Either a

range ( - ) or a start and a distance ( + ) have

to be specified.  and  are zero-based. If only the start is

specified, only that sector is mapped.
The image is specified as follows: IMAGE= Additionally, the

RANGE variable must be set.
Example:
image = /dev/fd0

  range = 1+512
All kernel options can also be used when booting the kernel from a device.
Booting a foreign operating system

- - - - - - - - - - - - - - - - -
LILO can even boot other operating systems, i.e. MS-DOS. To boot an other

operating system, the name of a loader program, the device or file that

contains the boot sector and the device that contains the partition table

have to be specified.
The boot sector is merged with the partition table and stored in the map

file.
Currently, the loaders chain.b and os2_d.b exist. chain.b simply starts the

specified boot sector.* os2_d.b it a variant of chain.b that can boot OS/2

from the second hard disk. The MAP-DRIVE option has to be used with os2_d.b

to actually swap the drives.
  *  The boot sector is loaded by LILO's secondary boot loader before

    control is passed to the code of chain.b.
The image is specified as follows: OTHER= or OTHER=
In addition to the options listen in section "Per-image options for

kernels", the following variables are recognized: 
  CHANGE  Change the partition table according to the rules specified in

    this CHANGE section. This option is intended for booting systems which

    find their partitions by examining the partition table. See section

    "Partition type changes" for details.

  LOADER=  Specifies the chain loader that should be used. If

    it is omitted, /boot/chain.b is used.

  MAP-DRIVE=  Instructs chain.b to installs a resident

    driver that re-maps the floppy or hard disk drives. This way, one can

    boot any operating system from a hard disk different from the first

    one, as long as that operating system uses _only_ the BIOS to access

    that hard disk.* This is known to work for PC/MS-DOS.
      *  So you should be very suspicious if the operating system requires

        any specific configuration or even drivers to use the disk it is

        booted from. Since there is a general trend to use optimized

        drivers to fully exploit the hardware capabilities (e.g.

        non-blocking disk access), booting systems from the second disk may

        become increasingly difficult.
     MAP-DRIVE is followed by the variable TO= which

    specifies the drive that should effectively be accessed instead of the

    original one. The list of mappings is only searched until the first

    match is found. It is therefore possible to "swap" drives, see the

    second example below.

  TABLE=  Specifies the device that contains the partition table.

    LILO does not pass partition information to the booted operating system

    if this variable is omitted. (Some operating systems have other means

    to determine from which partition they have been booted. E.g. MS-DOS

    usually stores the geometry of the boot disk or partition in its boot

    sector.) Note that /sbin/lilo must be re-run if a partition table

    mapped referenced with TABLE is modified.

  UNSAFE  Do not access the boot sector at map creation time. This disables

    some sanity checks, including a partition table check. If the boot

    sector is on a fixed-format floppy disk device, using UNSAFE avoids the

    need to put a readable disk into the drive when running the map

    installer. UNSAFE and TABLE are mutually incompatible. 
None of these options can be set in the global options section.
Examples:
other = /dev/hda2

  label = dos

  table = /dev/hda
other = /dev/hdb2

  label = os2

  loader = /boot/os2_d.b

  map-drive = 0x80

    to = 0x81

  map-drive = 0x81

    to = 0x80
Disk geometry

-------------
For floppies and most hard disks, LILO can obtain the disk geometry

information from the kernel. Unfortunately, there are some exotic disks or

adapters which may either not supply this information or which may even

return incorrect information.
If no geometry information is available, LILO reports either the error

  geo_query_dev HDIO_GETGEO (dev 0x)

 or

  Device 0x: Got bad geometry // 
If incorrect information is returned, booting may fail in several ways,

typically with a partial "LILO" banner message. In this document, that is

called a "geometry mismatch".
The next step should be to attempt setting the LINEAR configuration

variable or the  -l  command-line option. If this doesn't help, the entire

disk geometry has to be specified explicitly. Note that LINEAR doesn't

always work with floppy disks.
Another common use of disk sections is if an (E)IDE and a SCSI drive are

used in the same system and the BIOS is configured to use the SCSI drive as

the first drive. (Normally, the (E)IDE drive would be the first drive and

the SCSI drive would be the second one.) Since LILO doesn't know how the

BIOS is configured, it needs to be told explicitly about this arrangement.

(See the second example below.)
Obtaining the geometry

- - - - - - - - - - -
The disk geometry parameters can be obtained by booting MS-DOS and running

the program DPARAM.COM with the hexadecimal BIOS code of the drive as its

argument, e.g. dparam 0x80 for the first hard disk. It displays the number

of sectors per track, the number of heads per cylinder and the number of

cylinders. All three numbers are one-based.
Alternatively, the geometry may also be determined by reading the

information presented by the "setup" section of the ROM-BIOS or by using

certain disk utilities under operating systems accessing the disk through

the BIOS.
Specifying the geometry

- - - - - - - - - - - -
Disk geometry parameters are specified in the options section of the

configuration file. Each disk parameter section begins with

DISK=, similar to the way how boot images are specified. It is

suggested to group disk parameter sections together, preferably at the

beginning or the end of the options section.
For each disk, the following variables can be specified: 
  BIOS=  Is the number the BIOS uses to refer to that

    device. Normally, it's  0x80  for the first hard disk and  0x81  for

    the second hard disk. Note that hexadecimal numbers have to begin with

    "0x". If BIOS is omitted, LILO tries to "guess" that number.

  SECTORS=  and

  HEADS=  specify the number of sectors per track and the number of

    heads, i.e. the number of tracks per cylinder. Both parameters have to

    be either specified together or they have to be entirely omitted. If

    omitted, LILO tries to obtain that geometry information from the

    kernel.

  CYLINDERS=  Specifies the number of cylinders. This value is

    only used for sanity checks. If CYLINDERS is omitted, LILO uses the

    information obtained from the kernel if geometry information had to be

    requested in order to determine some other parameter. Otherwise,* it

    just assumes the number of cylinders to be 1024, which is the cylinder

    limit imposed by the BIOS.

  INACCESSIBLE  Marks the device as inaccessible (for the BIOS). This is

    useful if some disks on the system can't be read by the BIOS, although

    LILO "thinks" they can. If one accidentally tries to use files located

    on such disks for booting, the map installer won't notice and the

    system becomes unbootable. The most likely use of INACCESSIBLE is to

    prevent repetition after experiencing such a situation once. No other

    variables may be specified if a device is configured as INACCESSIBLE. 
  *  I.e. if the BIOS device code, the number of sectors, the number of

    heads and the partition start are specified. Note that the number of

    cylinders may appear to vary if CYLINDERS is absent and only some of

    the partition starts are specified.
Additionally, partition subsections can be added with

PARTITION=. Each partition section can contain only one

variable: 
  START=  Specifies the zero-based number of the start

    sector of that partition. The whole disk always has a partition offset

    of zero. The partition offset is only necessary when using devices for

    which the kernel does not provide that information, e.g. CD-ROMs. 
Examples:
disk = /dev/sda

  bios = 0x80

  sectors = 32

  heads = 64

  cylinders = 632

  partition = /dev/sda1

    start = 2048

  partition = /dev/sda2

    start = 204800

  partition = /dev/sda3

    start = 500000

  partition = /dev/sda4

    start = 900000
disk = /dev/sda

  bios = 0x80

disk = /dev/hda

  bios = 0x81
Partition table manipulation

----------------------------
Some non-Linux operating systems obtain information about their partitions

(e.g. their equivalent of the root file system) from the partition table.

If more than one such operating system is installed on a PC, they may have

conflicting interpretations of the content of the partition table. Those

problems can be avoided by changing the partition table, depending on which

operating system is being booted.
Partition table changes are specified in a CHANGE section in the

configuration file section describing the foreign operating system. Note

that CHANGE sections are only accepted if the build-time option

REWRITE_TABLE  is set.
The CHANGE section contains subsections for each partition whose table

entry needs to be modified. Partitions are specified with

PARTITION=
Changes are applied in the sequence in which they appear in the

configuration file. Configurations containing changes that are redundant

either by repeating a previous change or by changing its result further are

invalid and refused by the map installer.
Internally, all changes are expressed as rules which specify the location

(disk and offset in the partition table), the value this location must

contain before the change, and the value that has to be stored. As a safety

measure, the rule is ignored if the previous value is found to be

different.
Partition activation

- - - - - - - - - -
This option is intended for booting systems which determine their boot

partition by examining the active flag in the partition table. The flag is

enabled with ACTIVATE and disabled with DEACTIVATE. Note that only the

current partition is affected. LILO does not automatically change the

active flags of other partitions and it also allows more than one partition

to be active at the same time.
Example:
other = /dev/sda4

  label = sco

  change

    partition = /dev/sda4

      activate

    partition = /dev/sda3

      deactivate
Partition type change rules

- - - - - - - - - - - - - -
Partition type changes are normally a transition between two possible

values, e.g. a typical convention is to set the lowest bit in the upper

nibble of the partition type (i.e. 0x10) in order to "hide", and to clear

it to "unhide" a partition. LILO performs these changes based on a set of

rules. Each rule defines the name of a partition type, its normal value,

and the value when hidden. Those rules are defined in the options section

of the configuration file. The section defining them begins with

CHANGE-RULES.
The following options and variables can appear in the section: 
  RESET  Removes all previously defined rules. This is needed if a user

    doesn't wish to use the pre-defined rules (see below).

  TYPE=  Adds a rule for the type with the specified name. Type names

    are case-insensitive. The values are defined with NORMAL= and

    HIDDEN=. Values can be specified as decimal or as hexadecimal

    numbers with a leading  0x . If only one of the values is present, the

    other value is assumed to be the same number, but with the most

    significant bit inverted. 
LILO pre-defines rules for the three partition types of DOS partitions. The

following example removes the pre-defined rules and creates them again:
change-rules

  reset

  type = DOS12

    normal = 0x01

    hidden = 0x11

  type = DOS16_small

    normal = 4 # hidden is 0x14

  type = DOS16_big

    hidden = 0x16
Partition type changes

- - - - - - - - - - -
Partition type changes are specified in the partition section as

SET=_, where  is the name of the partition type, and

 is its state, i.e. NORMAL or HIDDEN.
Example:
other = /dev/sda3

  label = dos

  change

    partition = /dev/sda2

      set = dos16_big_normal

    partition = /dev/sda3

      activate

      set = DOS16_big_normal
Only one SET variable is allowed per partition section. In the rare event

that more than one SET variable is needed, further partition sections can

be used.
Keyboard translation

--------------------
The PC keyboard emits so-called scan codes, which are basically key

numbers. The BIOS then translates those scan codes to the character codes

of the characters printed on the key-caps. By default, the BIOS normally

assumes that the keyboard has a US layout. Once an operating system is

loaded, this operating system can use a different mapping.
At boot time, LILO only has access to the basic services provided by the

BIOS and therefore receives the character codes for an US keyboard. It

provides a simple mechanism to re-map the character codes to what is

appropriate for the actual layout.*
  *  The current mechanism isn't perfect, because it sits on top of the

    scan code to character code translation performed by the BIOS. This

    means that key combinations that don't produce any useful character on

    the US keyboard will be ignored by LILO. The advantage of this approach

    is its simplicity.
Compiling keyboard translation tables

- - - - - - - - - - - - - - - - - - -
LILO obtains layout information from the keyboard translation tables Linux

uses for the text console. They are usually stored in

/usr/lib/kbd/keytables. LILO comes with a program keytab-lilo.pl that reads

those tables and generates a table suitable for use by the map installer.

keytab-lilo.pl invokes the program loadkeys to print the tables in a format

that is easy to parse.*
  *  On some systems, only root can execute loadkeys. It is then necessary

    to run keytab-lilo.pl as root too.
keytab-lilo.pl is used as follows:
  keytab-lilo.pl [ -p = ] ...

    [][.] ]

     [][.] ]  
   -p =

     Specifies corrections ("patches") to the mapping obtained from the

    translation table files. E.g. if pressing the upper case "A" should

    yield an at sign, -p 65=64 would be used. The  -p  option can be

    repeated any number of times. The codes can also be given as

    hexadecimal or as octal numbers if they are prefixed with 0x or 0,

    respectively.

    The directory in which the file resides. The default path is

    /usr/lib/kbd/keytables.

    Usually the trailing .map, which is automatically added if

    the file name doesn't contain dots.

    Is the layout which specifies the translation by the

    BIOS. If none is specified, us is assumed.

    Is the actual layout of the keyboard. 
keytab-lilo.pl writes the resulting translation table as a binary string to

standard output. Such tables can be stored anywhere with any name, but the

suggested naming convention is /boot/.ktl ("Keyboard Table for Lilo"),

where  is the name of the keyboard layout.
Example:
keytab-lilo.pl de >/boot/de.ktl
Using keyboard translation tables

- - - - - - - - - - - - - - - - -
The keyboard translation table file is specified with the global

configuration option  keytable= . The complete name of the file

has to be given.
Example:
keytable = /boot/de.klt
Installation and updates

========================
Installation

------------
This section describes the installation of LILO. See section "LILO

de-installation" for how to uninstall LILO.
Compatibility

- - - - - - -
The kernel header files have to be in /usr/include/linux and the kernel

usually has to be configured by running  make config  before LILO can be

compiled.
/bin/sh has to be a real Bourne shell. bash is sufficiently compatible, but

some ksh clones may cause problems.
A file named INCOMPAT is included in the distribution. It describes

incompatibilities to older versions of LILO and may also contain further

compatibility notes.
Quick installation

- - - - - - - - -
If you want to install LILO on your hard disk and if you don't want to use

all its features, you can use the quick installation script. Read QuickInst

for details.
QuickInst can only be used for first-time installations or to entirely

replace an existing installation, _not_ to update or modify an existing

installation of LILO. Be sure you've extracted LILO into a directory that

doesn't contain any files of other LILO installations.
Files

- - -
Some of the files contained in lilo-21.tar.gz:
  lilo/README

     This documentation in plain ASCII format. Some sections containing

    complex tables are only included in the LaTeX version in doc/user.tex

  lilo/INCOMPAT

     List of incompatibilities to previous versions of LILO.

  lilo/CHANGES

     Change history.

  lilo/VERSION

     The version number of the respective release.

  lilo/QuickInst

     Quick installation script.

  lilo/lilo-.lsm

     The LSM ("Linux Software Map") entry of the respective LILO release.

  lilo/Makefile

     Makefile to generate everything else.

  lilo/*.c, lilo/*.h

     LILO map installer C source and common header files.

  lilo/*.S

     LILO boot loader assembler source.

  lilo/activate.c

     C source of a simple boot partition setter.

  lilo/dparam.s

     Assembler source of a disk parameter dumper.

  lilo/mkdist

     Shell script used to create the current LILO distribution.

  lilo/keytab-lilo.pl

     Perl script to generate keyboard translation tables.

  lilo/doc/README

     Description of how to generate the documentation.

  lilo/doc/Makefile

     Makefile used to convert the LaTeX source into either DVI output or

    the plain ASCII README file.

  lilo/doc/user.tex

     LaTeX source of LILO's user's guide (this document).

  lilo/doc/tech.tex

     LaTeX source of LILO's technical overview.

  lilo/doc/*.fig

     Various xfig pictures used in the technical overview.

  lilo/doc/fullpage.sty

     Style file to save a few square miles of forest.

  lilo/doc/rlatex

     Shell script that invokes LaTeX repeatedly until all references have

    settled.

  lilo/doc/t2a.pl

     Perl script to convert the LaTeX source of the user's guide to plain

    ASCII. 
Files created after  make  in lilo/ (among others):
  lilo/boot.b

     Combined boot sector.  make install  puts this file into /boot

  lilo/chain.b

     Generic chain loader.  make install  puts this file into /boot

  lilo/os2_d.b

     Chain loader to load OS/2 from the second hard disk.  make install

    puts this file into /boot

  lilo/lilo

     LILO (map) installer.  make install  puts this file into /sbin

  lilo/activate

     Simple boot partition setter.

  lilo/dparam.com

     MS-DOS executable of the disk parameter dumper. 
Normal first-time installation

- - - - - - - - - - - - - - -
First, you have to install the LILO files:
  - extract all files from lilo-.tar.gz in a new directory.*

  - configure the Makefile (see section "Build-time configuration")

  - run  make  to compile and assemble all parts.

  - run  make install  to copy all LILO files to the directories where

    they're installed. /sbin should now contain the file lilo, /usr/sbin

    should contain keytab-lilo.pl, and /boot should contain boot.b,

    chain.b, and os2_d.b. 
  *  E.g. /usr/src/lilo
If you want to use LILO on a non-standard disk, you might have to determine

the parameters of your disk(s) and specify them in the configuration file.

See section "Disk geometry" for details. If you're using such a

non-standard system, the next step is to test LILO with the boot sector on

a floppy disk:
  - insert a blank (but low-level formatted) floppy disk into /dev/fd0.

  - run  echo image=  |

      /sbin/lilo -C - -b /dev/fd0 -v -v -v

     If you've already installed LILO on your system, you might not want to

    overwrite your old map file. Use the  -m  option to specify an

    alternate map file name.

  - reboot. LILO should now load its boot loaders from the floppy disk and

    then continue loading the kernel from the hard disk. 
Now, you have to decide, which boot concept you want to use. Let's assume

you have a Linux partition on /dev/hda2 and you want to install your LILO

boot sector there. The DOS-MBR loads the LILO boot sector.
  - get a working boot disk, e.g. an install or recovery disk. Verify that

    you can boot with this setup and that you can mount your Linux

    partition(s) with it.

  - if the boot sector you want to overwrite with LILO is of any value

    (e.g. it's the MBR or if it contains a boot loader you might want to

    use if you encounter problems with LILO), you should mount your boot

    disk and make a backup copy of your boot sector to a file on that

    floppy, e.g. dd if=/dev/hda of=/fd/boot_sector bs=512 count=1

  - create the configuration file /etc/lilo.conf, e.g.
     ...

     Be sure to use absolute paths for all files. Relative paths may cause

    unexpected behaviour when using the  -r  option.

  - now, you can check what LILO would do if you were about to install it

    on your hard disk:

     /sbin/lilo -v -v -v -t

  - if you need some additional boot utility (i.e. BOOTACTV), you should

    install that now

  - run /sbin/lilo to install LILO on your hard disk

  - if you have to change the active partition, use fdisk or activate to do

    that

  - reboot 
Build-time configuration

- - - - - - - - - - - -
Certain build-time parameters can be configured. They can either be edited

in the top-level Makefile or they can be stored in a file

/etc/lilo.defines. Settings in the Makefile are ignored if that file

exists.
The following items can be configured: 
   BEEP   Enables beeping after displaying "LILO". This is useful on

    machines which don't beep at the right time when booting and when

    working over a serial console. This option is disabled by default.

   IGNORECASE   Makes image name matching case-insensitive, i.e. "linux"

    and "Linux" are identical. This option is enabled by default. Note that

    password matching is always case-sensitive.

   LARGE_EDBA   Loads LILO at a lower address in order to leave more space

    for the EBDA (Extended BIOS Data Area). This is necessary on some

    recent MP systems. Note that enabling  LARGE_EDBA  reduces the maximum

    size of "small" images (e.g. "Image" or "zImage").

   NO1STDIAG   Do not generate diagnostics on read errors in the first

    stage boot loader. This avoids possibly irritating error codes if the

    disk controller has transient read problems. This option is disabled by

    default.

   NODRAIN   The boot loader empties the keyboard buffer when starting,

    because it may contain garbage on some systems. Draining the keyboard

    buffer can be disabled by setting the  NODRAIN  option.  NODRAIN  is

    disabled by default.

   NOINSTDEF   If the option INSTALL is omitted, don't install a new boot

    sector, but try to modify the old boot sector instead. This option is

    disabled by default.

   ONE_SHOT   Disables the command-line timeout (configuration variable

    TIMEOUT) if any key is pressed. This way, very short timeouts can be

    used if PROMPT is set.  ONE_SHOT  is disabled by default.

   READONLY   Disallows overwriting the default command line sector of the

    map file. This way, command lines set with  -R  stay in effect until

    they are explicitly removed.  READONLY  also disables LOCK, FALLBACK,

    and everything enabled by  REWRITE_TABLE . This option is disabled by

    default.

   REWRITE_TABLE   Enables rewriting the partition table at boot time. This

    may be necessary to boot certain operating systems who expect the

    active flag to be set on their partition or who need changes in

    partition types. See also section "Partition table manipulation". This

    option is _dangerous_ and it is disabled by default.

   USE_TMPDIR   Use the directory indicated in the  TMPDIR  environment

    variable when creating temporary device files. If  TMPDIR  is not set

    or if LILO is compiled without  USE_TMPDIR , temporary device files are

    created in /tmp.* This option is disabled by default.

   VARSETUP   Enables the use of variable-size setup segments. This option

    is enabled by default and is only provided to fall back to fixed-size

    setup segments in the unlikely case of problems when using prehistoric

    kernels.

   XL_SECS=   Enable support for extra large (non-standard) floppy

    disks. The number of sectors is set in the BIOS disk parameter table to

    the specified value. Note that this hack may yield incorrect behaviour

    on some systems. This option is disabled by default. 
  *  Note that, while honoring  TMPDIR  is the "right" thing to do, the

    fact that LILO has to create temporary device files at all may indicate

    that the operating environment is not completely set up, so  TMPDIR

    may point to an invalid location.
/etc/lilo.defines should be used if one wishes to make permanent

configuration changes. The usual installation procedures don't touch that

file. Example:
-DIGNORECASE -DONE_SHOT
After changing the build-time configuration, LILO has to be recompiled with

the following commands:
make spotless

make
Floppy disk installation

- - - - - - - - - - - -
In some cases*, it may be desirable to install LILO on a floppy disk in a

way that it can boot a kernel without accessing the hard disk.
  *  E.g. if no hard disk is accessible through the BIOS.
The basic procedure is quite straightforward (see also section "BIOS

restrictions"): 
  - a file system has to be created on the file system

  - the kernel and boot.b have to be copied to the floppy disk

  - /sbin/lilo has to be run to create the map file 
This can be as easy as
/sbin/mke2fs /dev/fd0

[ -d /fd ] || mkdir /fd

mount /dev/fd0 /fd

cp /boot/boot.b /fd

cp /zImage /fd

echo image=/fd/zImage label=linux |

  /sbin/lilo -C - -b /dev/fd0 -i /fd/boot.b -c -m /fd/map

umount /fd
The command line of /sbin/lilo is a little tricky.  -C -  takes the

configuration from standard input (naturally, one could also write the

configuration into a file),  -b /dev/fd0  specifies that the boot sector is

written to the floppy disk,  -i /fd/boot.b  takes the first and second

stage loaders from the floppy,  -c  speeds up the load process, and  -m

/fd/map  puts the map file on the floppy too.
Updates

-------
LILO is affected by updates of kernels, the whole system and (trivially) of

LILO itself. Typically, only /sbin/lilo has to be run after any of those

updates and everything will be well again (at least as far as LILO is

concerned).
LILO update

- - - - - -
Before updating to a new version of LILO, you should read at least the file

INCOMPAT which describes incompatibilities with previous releases.
After that, the initial steps are the same as for a first time

installation: extract all files, configure the Makefile, run  make  to

build the executables and run  make install  to install the files.
The old versions of boot.b, chain.b, etc. are automatically renamed to

boot.old, chain.old, etc. This is done to ensure that you can boot even if

the installation procedure does not finish. boot.old, chain.old, etc. can

be deleted after the map file is rebuilt.
Because the locations of boot.b, chain.b, etc. have changed and because the

map file format may be different too, you have to update the boot sector

and the map file. Run /sbin/lilo to do this.
Kernel update

- - - - - - -
Whenever any of the kernel files that are accessed by LILO is moved or

overwritten, the map has to be re-built.* Run /sbin/lilo to do this.
  *  It is advisable to keep a second, stable, kernel image that can be

    booted if you forget to update the map after a change to your usual

    kernel image.
The kernel has a make target "zlilo" that copies the kernel to /vmlinuz and

runs /sbin/lilo.
System upgrade

- - - - - - -
Normally, system upgrades (i.e. installation or removal of packages,

possibly replacement of a large part of the installed binaries) do not

affect LILO. Of course, if a new kernel is installed in the process, the

normal kernel update procedure has to be followed (see section "Kernel

update"). Also, if kernels are removed or added, it may be necessary to

update the configuration file.
If LILO is updated by this system upgrade, /sbin/lilo should be run before

booting the upgraded system. It is generally a good idea not to rely on the

upgrade procedure to perform this essential step automatically.
However, system upgrades which involve removal and re-creation of entire

partitions (e.g. /, /usr, etc.) are different. First, they should be

avoided, because they bear a high risk of losing other critical files, e.g.

the /etc/XF86Config you've spent the last week fiddling with. If an upgrade

really has to be performed in such a brute-force way, this is equal with

total removal of LILO, followed by a new installation. Therefore, the

procedures described in the sections "LILO de-installation" and "LILO

update" have to be performed. If you've forgotten to make a backup copy of

/etc/lilo.conf before the destructive upgrade, you might also have to go

through section "Normal first-time installation" again.
LILO de-installation

--------------------
In order to stop LILO from being invoked when the system boots, its boot

sector has to be either removed or disabled. All other files belonging to

LILO can be deleted _after_ removing the boot sector, if desired.*
  *  Backup copies of old boot sectors may be needed when removing the boot

    sector. They are stored in /boot.
Again, _when removing Linux, LILO must be de-installed before (!) its files

(/boot, etc.) are deleted._ This is especially important if LILO is

operating as the MBR.
LILO 14 (and newer) can be de-installed with lilo -u. If LILO 14 or newer

is currently installed, but the first version of LILO installed was older

than 14, lilo -U may work. When using -U, the warning at the end of this

section applies.
If LILO's boot sector has been installed on a primary partition and is

booted by the "standard" MBR or some partition switcher program, it can be

disabled by making a different partition active. MS-DOS' FDISK, Linux fdisk

or LILO's activate can do that.
If LILO's boot sector is the master boot record (MBR) of a disk, it has to

be replaced with a different MBR, typically MS-DOS' "standard" MBR. When

using MS-DOS 5.0 or above, the MS-DOS MBR can be restored with FDISK /MBR.

This only alters the boot loader code, not the partition table.
LILO automatically makes backup copies when it overwrites boot sectors.

They are named /boot/boot., with  corresponding to the device

number, e.g.  0300  is /dev/hda,  0800  is /dev/sda, etc. Those backups can

be used to restore the old MBR if no easier method is available. The

commands are

  dd if=/boot/boot.0300 of=/dev/hda bs=446 count=1  or

  dd if=/boot/boot.0800 of=/dev/sda bs=446 count=1

 respectively.
_WARNING:_ Some Linux distributions install boot. files from the

system where the distribution was created. Using those files may yield

unpredictable results. Therefore, the file creation date should be

carefully checked.
Installation of other operating systems

---------------------------------------
Some other operating systems (e.g. MS-DOS 6.0) appear to modify the MBR in

their install procedures. It is therefore possible that LILO will stop to

work after such an installation and Linux has to be booted from floppy

disk. The original state can be restored by either re-running /sbin/lilo

(if LILO is installed as the MBR) or by making LILO's partition active (if

it's installed on a primary partition).
It is generally a good idea to install LILO after the other operating

systems have been installed. E.g. OS/2 is said to cause trouble when

attempting to add it to an existing Linux system. (However, booting from

floppy and running /sbin/lilo should get around most interferences.)
Typically, the new operating system then has to be added to LILO's

configuration (and /sbin/lilo has to be re-run) in order to boot it.
See also section "Other problems" for a list of known problems with some

other operating systems.
Troubleshooting

===============
All parts of LILO display some messages that can be used to diagnose

problems.
Map installer warnings and errors

---------------------------------
Most messages of the map installer (/sbin/lilo) should be self-explanatory.

Some messages that indicate common errors are listed below. They are

grouped into fatal errors and warnings (non-fatal errors).
Fatal errors

- - - - - - 
   Boot sector of  doesn't have a boot signature

   Boot sector of  doesn't have a LILO signature

     The sector from which LILO should be uninstalled doesn't appear to be

    a LILO boot sector.

   Can't put the boot sector on logical partition

     An attempt has been made to put LILO's boot sector on the current root

    file system partition which is on a logical partition. This usually

    doesn't have the desired effect, because common MBRs can only boot

    primary partitions. This check can be bypassed by explicitly specifying

    the boot partition with the  -b  option or by setting the configuration

    variable BOOT.

   Checksum error

     The descriptor table of the map file has an invalid checksum. Refresh

    the map file _immediately_ !

   Device 0x: Configured as inaccessible.

     There is a DISK section entry indicating that the device is

    inaccessible from the BIOS. You should check carefully that all files

    LILO tries to access when booting are on the right device.

   Device 0x: Got bad geometry //

     The device driver for your SCSI controller does not support geometry

    detection. You have to specify the geometry explicitly (see section

    "Disk geometry").

   Device 0x: Invalid partition table, entry

     The 3D and linear addresses of the first sector of the specified

    partition don't correspond. This is typically caused by partitioning a

    disk with a program that doesn't align partitions to tracks and later

    using PC/MS-DOS or OS/2 on that disk. LILO can attempt to correct the

    problem, see "General per-image options".

   Device 0x: Partition type 0x does not seem suitable for

  a LILO boot sector

     The location where the LILO boot sector should be placed does not seem

    to be suitable for that. (See also also section "Disk organization").

    You should either adjust the partition type to reflect the actual use

    or put the boot sector on a different partition. This consistency check

    only yields a warning (i.e. LILO continues) if the option IGNORE-TABLE

    is set.

    is not a valid partition device

     The specified device is either not a device at all, a whole disk, or a

    partition on a different disk than the one in whose section its entry

    appears.

    is not a whole disk device

     Only the geometry of whole disks (e.g. /dev/hda, /dev/sdb, etc.) can

    be redefined when using DISK sections.

   DISKTAB and DISK are mutually exclusive

     You cannot use a disktab file and disk geometry definitions in the

    configuration file at the same time. Maybe /etc/disktab was

    accidentally used, because that's the default for

    backward-compatibility. You should delete /etc/disktab after completing

    the transition to DISK sections.

   Duplicate entry in partition table

     A partition table entry appears twice. The partition table has to be

    fixed with fdisk.

   Duplicate geometry definition for

     A disk or partition geometry definition entry for the same device

    appears twice in the configuration file. Note that you mustn't write a

    partition section for the whole disk - its start sector is always the

    first sector of the disk.

   First sector of  doesn't have a valid boot signature

     The first sector of the specified device does not appear to be a valid

    boot sector. You might have confused the device name.*

   geo_comp_addr: Cylinder  beyond end of media ()

     A file block appears to be located beyond the last cylinder of the

    disk. This probably indicates an error in the disk geometry

    specification (see section "Disk geometry") or a file system

    corruption.

   geo_comp_addr: Cylinder number is too big ( > 1023)

     Blocks of a file are located beyond the 1024th cylinder of a hard

    disk. LILO can't access such files, because the BIOS limits cylinder

    numbers to the range 0...1023. Try moving the file to a different

    place, preferably a partition that is entirely within the first 1024

    cylinders of the disk.

   Hole found in map file ()

     The map installer is confused about the disk organization. Please

    report this error.

    doesn't have a valid LILO signature

     The specified item has been located, but is not part of LILO.

    has an invalid stage code ()

     The specified item has probably been corrupted. Try re-building LILO.

    is version . Expecting version .

     The specified entity is either too old or too new. Make sure all parts

    of LILO (map installer, boot loaders and chain loaders) are from the

    same distribution. **

   Kernel  is too big

     The kernel image (without the setup code) is bigger than 512 kbytes

    (or 448 kbytes, if built with  LARGE_EDBA ). LILO would overwrite

    itself when trying to load such a kernel. This limitation only applies

    to old kernels which are loaded below 0x10000 (e.g. "Image" or

    "zImage"). Try building the kernel with "bzImage". If this is

    undesirable for some reason, try removing some unused drivers and

    compiling the kernel again. This error may also occur if the kernel

    image is damaged or if it contains trailing "junk", e.g. as the result

    of copying an entire boot floppy to the hard disk.

   LOCK and FALLBACK are mutually exclusive

     Since LOCK and FALLBACK both change the default command line, they

    can't be reasonably used together.

   Map  is not a regular file.

     This is probably the result of an attempt to omit writing a map file,

    e.g. with  -m /dev/null . The  -t  option should be used to accomplish

    this.

   Must specify SECTORS and HEADS together

     It is assumed that disks with a "strange" number of sectors will also

    have a "strange" number of heads. Therefore, it's all or nothing.

   No geometry variables allowed if INACCESSIBLE

     If a device is configured as INACCESSIBLE (see section "Specifying the

    geometry"), its DISK section must not contain any geometry variables.

   No image  is defined

     The command line specified either with the  -R  option or with

    FALLBACK does not contain the name of a valid image. Note that optional

    images which have not been included in the map file are not considered

    as valid.

   Partition entry not found

     The partition from which an other operating system should be booted

    isn't listed in the specified partition table. This either means that

    an incorrect partition table has been specified or that you're trying

    to boot from a logical partition. The latter usually doesn't work. You

    can bypass this check by omitting the partition table specification

    (e.g. omitting the variable TABLE).

   Single-key clash: "" vs. ""

     The specified image labels or aliases conflict because one of them is

    a single character and has the SINGLE-KEY option set, and the other

    name begins with that character.

   Sorry, don't know how to handle device

     LILO uses files that are located on a device for which there is no

    easy way to determine the disk geometry. Such devices have to be

    explicitly described, see section "Disk geometry".

   This LILO is compiled READONLY and doesn't support ...

     If LILO is not allowed to write to the disk at boot time (see section

    "Build-time configuration"), options like LOCK and FALLBACK are

    unavailable.

   This LILO is compiled without REWRITE_TABLE and doesn't support ...

     If LILO is not allowed to rewrite partition tables at boot time (see

    section "Partition table manipulation"), options like ACTIVATE and SET

    (in a CHANGE section) are unavailable. You may also get this error if

    LILO is compiled with  READONLY  enabled.

   Timestamp in boot sector of  differs from date of

     The backup copy of the boot sector does not appear to be an ancestor

    of the current boot sector. If you are absolutely sure that the boot

    sector is indeed correct, you can bypass this check by using  -U

    instead of  -u .

   Trying to map files from unnamed device 0x (NFS ?)

     This is probably the same problem as described below, only with the

    root file system residing on NFS.

   Trying to map files from your RAM disk. Please check -r option or ROOT

  environment variable.

     Most likely, you or some installation script is trying to invoke LILO

    in a way that some of the files is has to access reside on the RAM

    disk. Normally, the  ROOT  environment variable should be set to the

    mount point of the effective root device if installing LILO with a

    different root directory. See also sections "Create or update map" and

    "Normal first-time installation".

   VGA mode presetting is not supported by your kernel.

     Your kernel sources appear to be very old ('93 ?). LILO may work on

    your system if you remove the VGA option.

   write :

     The disk is probably full or mounted read-only. 
  *  Because different partition programs may display the partitions in a

    different order, it is possible that what you think is your first

    partition isn't /dev/hda1, etc. A good method to verify the content of

    a partition is to try to mount it.
  **  The expected version number may be different from the version number

    of the LILO package, because file version numbers are only increased

    when the file formats change.
Warnings

- - - -
Messages labeled with "Warning" can be turned off with the NOWARN option.
   FIGETBSZ : < error_reason>

     The map installer is unable to determine the block size of a file

    system. It assumes a block size of two sectors (1kB).

   Ignoring entry ''

     The command-line option corresponding to the specified variable is

    set. Therefore, the configuration file entry is ignored.

   Setting DELAY to 20 (2 seconds)

     Because accidentally booting the wrong kernel or operating system may

    be very inconvenient on systems that are not run from a local display,

    the minimum delay is two seconds if the SERIAL variable is set.

   (temp) :

     Deleting a temporary file has failed for the specified reason.

   Warning: BIOS drive 0x may not be accessible

     Because most BIOS versions only support two floppies and two hard

    disks, files located on additional disks may be inaccessible. This

    warning indicates that some kernels or even the whole system may be

    unbootable.

   Warning: COMPACT may conflict with LINEAR on some systems

     Please see section "Other problems" for a description of this problem.

   Warning:  should be owned by root

     In order to prevent users from compromising system integrity, the

    configuration file should be owned by root and write access for all

    other users should be disabled.

   Warning:  should be readable only for root if using

  PASSWORD

     Users should not be allowed to read the configuration file when using

    the PASSWORD option, because then, it contains unencrypted passwords.

   Warning:  should be writable only for root

     See " Warning:  should be owned by root ".

   Warning: device 0x exceeds 1024 cylinder limit

     A disk or partition exceeds the 1024 cylinder limit imposed by the

    BIOS. This may result in a fatal error in the current installation run

    or in later installation runs. See " geo_comp_addr: Cylinder number is

    too big ( > 1023) " for details.

   Warning:  is not on the first disk

     The specified partition is probably not on the first disk. LILO's boot

    sector can only be booted from the first disk unless some special boot

    manager is used.

   WARNING: The system is unbootable !

     One of the last installation steps has failed. This warning is

    typically followed by a fatal error describing the problem. 
Boot loader messages

--------------------
The boot loader generates three types of messages: progress and error

messages while it is loading, messages indicating disk access errors, and

error messages in response to invalid command-line input. Since messages of

the latter type are usually self-explanatory, only the two other categories

are explained.
LILO start message

- - - - - - - - -
When LILO loads itself, it displays the word "LILO". Each letter is printed

before or after performing some specific action. If LILO fails at some

point, the letters printed so far can be used to identify the problem. This

is described in more detail in the technical overview.
Note that some hex digits may be inserted after the first "L" if a

transient disk problem occurs. Unless LILO stops at that point, generating

an endless stream of error codes, such hex digits do not indicate a severe

problem.
  ()  No part of LILO has been loaded. LILO either isn't installed

    or the partition on which its boot sector is located isn't active.

   L  ...   The first stage boot loader has been loaded and started,

    but it can't load the second stage boot loader. The two-digit error

    codes indicate the type of problem. (See also section "Disk error

    codes".) This condition usually indicates a media failure or a geometry

    mismatch (e.g. bad disk parameters, see section "Disk geometry").

   LI   The first stage boot loader was able to load the second stage boot

    loader, but has failed to execute it. This can either be caused by a

    geometry mismatch or by moving /boot/boot.b without running the map

    installer.

   LIL   The second stage boot loader has been started, but it can't load

    the descriptor table from the map file. This is typically caused by a

    media failure or by a geometry mismatch.

   LIL?   The second stage boot loader has been loaded at an incorrect

    address. This is typically caused by a subtle geometry mismatch or by

    moving /boot/boot.b without running the map installer.

   LIL-   The descriptor table is corrupt. This can either be caused by a

    geometry mismatch or by moving /boot/map without running the map

    installer.

   LILO   All parts of LILO have been successfully loaded. 
Disk error codes

- - - - - - - -
If the BIOS signals an error when LILO is trying to load a boot image, the

respective error code is displayed. The following BIOS error codes are

known:
   0x00   "Internal error". This code is generated by the sector read

    routine of the LILO boot loader whenever an internal inconsistency is

    detected. This might be caused by corrupt files. Try re-building the

    map file. Another possible cause for this error are attempts to access

    cylinders beyond 1024 while using the LINEAR option. See section "BIOS

    restrictions" for more details and for how to solve the problem.

   0x01   "Illegal command". This shouldn't happen, but if it does, it may

    indicate an attempt to access a disk which is not supported by the

    BIOS. See also "Warning: BIOS drive 0x may not be accessible"

    in section "Warnings".

   0x02   "Address mark not found". This usually indicates a media problem.

    Try again several times.

   0x03   "Write-protected disk". This should only occur on write

    operations.

   0x04   "Sector not found". This typically indicates a geometry mismatch.

    If you're booting a raw-written disk image, verify whether it was

    created for disks with the same geometry as the one you're using. If

    you're booting from a SCSI disk or a large IDE disk, you should check,

    whether LILO has obtained correct geometry data from the kernel or

    whether the geometry definition corresponds to the real disk geometry.

    (See section "Disk geometry".) Removing COMPACT may help too. So may

    adding LINEAR.

   0x06   "Change line active". This should be a transient error. Try

    booting a second time.

   0x07   "Invalid initialization". The BIOS failed to properly initialize

    the disk controller. You should control the BIOS setup parameters. A

    warm boot might help too.

   0x08   "DMA overrun". This shouldn't happen. Try booting again.

   0x09   "DMA attempt across 64k boundary". This shouldn't happen. Try

    omitting the COMPACT option.

   0x0C   "Invalid media". This shouldn't happen and might be caused by a

    media error. Try booting again.

   0x10   "CRC error". A media error has been detected. Try booting several

    times, running the map installer a second time (to put the map file at

    some other physical location or to write "good data" over the bad

    spot), mapping out the bad sectors/tracks and, if all else fails,

    replacing the media.

   0x11   "ECC correction successful". A read error occurred, but was

    corrected. LILO does not recognize this condition and aborts the load

    process anyway. A second load attempt should succeed.

   0x20   "Controller error". This shouldn't happen.

   0x40   "Seek failure". This might be a media problem. Try booting again.

   0x80   "Disk timeout". The disk or the drive isn't ready. Either the

    media is bad or the disk isn't spinning. If you're booting from a

    floppy, you might not have closed the drive door. Otherwise, trying to

    boot again might help.

   0xBB   "BIOS error". This shouldn't happen. Try booting again. If the

    problem persists, removing the COMPACT option or adding/removing LINEAR

    might help. 
If the error occurred during a write operation, the error code (two hex

digits) is prefixed with a "W". Although write errors don't affect the boot

process, they might indicate a severe problem, because they usually imply

that LILO has tried to write to an invalid location. If spurious write

errors occur on a system, it might be a good idea to configure LILO to run

read-only (see section "Build-time configuration").
Generally, invalid geometry and attempts to use more than two disks without

a very modern BIOS may yield misleading error codes. Please check carefully

if /sbin/lilo doesn't emit any warnings. Then try using the LINEAR option

(see section "Global options").
Other problems

--------------
This section contains a collection of less common problems that have been

observed. See also section "Installation of other operating systems" for

general remarks on using LILO with other operating systems. Some of the

problems are obscure and so are the work-arounds.
  - If LILO doesn't go away even if you erase its files, format your Linux

    partition, etc., you've probably installed LILO as your MBR and you've

    forgotten to deinstall it before deleting its files. See section "LILO

    de-installation" for what you can do now.

  - For yet unknown reasons, LILO may fail on some systems with AMI BIOS if

    the "Hard Disk Type 47 RAM area" is set to "0:300" instead of "DOS 1K".

  - Some disk controller BIOSes perform disk geometry/address translations

    that are incompatible with the way the device's geometry is seen from

    Linux, i.e. without going through the BIOS. Particularly, large IDE

    disks and some PCI SCSI controllers appear to have this problem. In

    such cases, either the translated geometry has to be specified in a

    DISK section or the sector address translation can be deferred by using

    the LINEAR option. In a setup where floppies are not normally used for

    booting, the LINEAR approach should be preferred, because this avoids

    the risk of specifying incorrect numbers.

  - OS/2 is said to be bootable from a logical partition with LILO acting

    as the primary boot selector if LILO is installed on the MBR, the OS/2

    BootManager is on an active primary partition and LILO boots

    BootManager. Putting LILO on an extended partition instead is said to

    crash the OS/2 FDISK in this scenario.
     Note that booting LILO from BootManager (so BootManager is the primary

    selector) or booting OS/2 directly from a primary partition (without

    BootManager) should generally work. See also section "Installation of

    other operating systems".

  - Windows NT is reported to be bootable with LILO when LILO acts as the

    MBR and the Windows NT boot loader is on the DOS partition. However,

    NT's disk manager complains about LILO's MBR when trying to edit the

    partition table.

  - Some PC UNIX systems (SCO and Unixware have been reported to exhibit

    this problem) depend on their partition being active. See section

    "Partition table manipulation" for how this can be accomplished.

  - Future Domain TMC-1680 adapters with the BIOS versions 3.4 and 3.5

    assign BIOS device numbers in the wrong order, e.g. on a two-disk

    system, /dev/sda becomes  0x81  and /dev/sdb becomes  0x80 . This can

    be fixed with the following DISK section:

     disk=/dev/sda bios=0x81 disk=/dev/sdb bios=0x80

     Note that this is only valid for a two-disk system. In three-disk

    systems, /dev/sdc would become  0x80 , etc. Also, single-disk systems

    don't have this problem (and the "fix" would break them).

  - Some BIOSes don't properly recognize disks with an unusual partition

    table (e.g. without any partition marked active) and refuse to boot

    from them. This can also affect the second hard disk and the problem

    may only occur if the system is booted in a particular way (e.g. only

    after a cold boot).

  - On some systems, using LINEAR and COMPACT together leads to a boot

    failure. The exact circumstances under which this happens are still

    unknown.

  - If the kernel crashes after booting on a multi-processor system, LILO

    may have overwritten data structures set up by the BIOS. Try the option

    LARGE_EBDA  in this case.



  
	
    
          Documentation





  Recover Data from a dead hard drive using ddrescue

  
  
    Like dd, dd_rescue does copy data from one file or block device to another.dd_rescue is a tool to help you to save data from crashed partition. It tries to read and if it fails, it will go on with the next sectors where tools like dd will fail. If the copying process is interrupted by the user it is possible to continue at any position later. It can copy backwards.


dd_rescue Advantages
  

  
	
    
          Documentation

        

          Read more

      





  NAT solution with QOS

  
  
    
15.10. Example of a full nat solution with QoS
Here I'm describing a common set up where we have lots of users in a private network connected to the Internet trough a Linux router with a public ip address that is doing network address translation (NAT). I use this QoS setup to give access to the Internet to 198 users in a university dorm, in which I live and I'm netadmin of. The users here do heavy use of peer to peer programs, so proper traffic control is a must.
  

  
	
    
          Documentation

        

          Read more

      





  Writing Custom udev Rules

  
  
    


Here's a link to writing udev rules, udev is used to keep a persistent name from changing.


For example, we use st0, add another st device and suddenly st0 becomes st1. To prevent this problem, we use /dev/exabyte, st0, ie:





BUS="scsi", SYSFS{model}="EXB-8500-85Qanx0", NAME="exabyte", OWNER="root", GROUP="blaster", MODE="0664"


To get the SYSFS{model} variable, go to /sys/class/scsi_tape or run: udevinfo -a -p /sys/class/scsi_tape/st2.


After writing rules, run udevtest /sys/class/scsi_tape/st2 and to activate change, run: udevstart

  

  
	
    
          Documentation

        

          Read more

      





  rpm notes

  
  
    rpm checksum:

rpm --verify packagename
list of files in rpm package:

rpm -qlp httpd-2.0.53-3.3.i386.rpm
extract rpm:

cd /tmp

 md rpmtmp

 cd rpmtmp

 rpm2cpio {FullNameOfRPM} | cpio -iumd {FullNameOfFileToExtract}
  

  
	
    
          Documentation

        

      





  Iptables and limiting traffic

  
  
    Configuring the Quality of Service Classes for Limiting Peer-to-Peer Clients

Enter the Quality of Service menu and choose the option to configure Quality of Service rules. Enter the following commands in the router's Quality of Service configuration file:
#Set up the Bandwidth Management for eth0

#(traffic transmitted TO users coming FROM the WAN)

tc qdisc add dev eth0 root handle 1: htb
# Setup the root class for all Traffic classes

# rate is set to 100 Mbps.

#The 'ceil' ceiling value is omitted,

#and will be set to the same value as the rate (100 Mbps)
  

  
	
    
          Documentation

        

          Read more

      





  Kickstart Notes

  
  
    Performing a Kickstart


Boot the machine from a Red Hat CD or floppy


At the boot prompt type:

  linux ks=nfs:lcs.dst.ca.boeing.com:/kickstart/kickstart-file


Or on internal Network:
  

  
	
    
          Documentation

        

          Read more

      





  Request Tracker Installation

  
  
    Intro

RT, the helpdesk system uses mysql, apache, and sendmail all together.

RTFM: Request Tracker FAQ Manager, is an addon to RT that allows to easily and quickly write documentations.
Mailing List/Request Tracker Web Page

http://lists.bestpractical.com/cgi-bin/htsearch

http://www.bestpractical.com/
Requirements

Perl

Mysql

Apache with mod_perl

sendmail
Installation Instructions
  

  
	
    
          Documentation

        

          Read more

      





  LDAP Notes

  
  
    Kerberos and LDAP

Samba and LDAP
  

  
	
    
          Documentation

        

      





  SSL Cert Quicknotes

  
  
    Very brief introduction to create a CA and a CERT
To make certificate authority:
mkdir CA

cd CA

mkdir certs crl newcerts private

echo "01" > serial

cp /dev/null index.txt

cp /usr/local/openssl/openssl.cnf.sample openssl.cnf

vi openssl.cnf   (set values)

openssl req -new -x509 -keyout private/cakey.pem -out cacert.pem -days 365 -config openssl.cnf
To make a new certificate:
cd CA        (same directory created above)

openssl req -nodes -new -x509 -keyout newreq.pem -out newreq.pem -days 365 -config openssl.cnf
  

  
	
    
          Documentation

        

          Read more

      





  SQL Quicknotes

  
  
    # Mysql interaction subroutine

sub sql($$$$$$)

{

        my ($username, $computer, $service, $date, $status, $rawline) = @_;
        my $dbh = DBI->connect("DBI:mysql:authlog", "authlog");
# create table LOG (date Timestamp, service Char(40), computer Char(40), username Char(40), status Char(40));

# INSERT into LOG VALUES ({ts '2002-06-04 13:25:45'}, 'ssh', 'maxwell', 'bhatt', 'logon');

# select * FROM LOG where {ts '2002-06-05 00:00:00'} < {ts '2002-06-04 00:00:00'};

# delete from LOG where username = 'lawrence';
        # Parse date
  

  
	
    
          Documentation

        

          Read more

      





  Solaris Tips

  
  
      



    
      

        SUN Free Software


        PCNFS installieren


        Installation Solaris

        mit Openwindows (Grafikkarte)

        How to Backup a System

  

  
	
    
          Documentation

        

          Read more

      





  Sendmail Testing

  
  
    
Testing sendmail.cf files




If you don't have a machine to play around with, it is well worth doing some

testing before installing your new sendmail and the associated config file.

You can invoke sendmail in a `test mode'.  The idea is to let you see how

it sees addresses, how it transforms them, and to see each step along the

way.  That is very useful if you're trying to debug a hand built config file.

But if you ignore the details it gives you, it can also be a very convenient

way to see if it does roughly the right thing.  If you want to understand all
  

  
	
    
          Documentation

        

          Read more

      





  Linux Boot Parameters

  
  
    Installing grub manually:

boot into rescue cd, chroot to disk

run grub-install /dev/diskname

or

run grub and:

grub> root (hd2,0)

grub> kernel /boot/vmlinuz-2.6.13-2 root=/dev/hde1

grub> setup (hd2)

grub> exit 
The Linux kernel accepts boot time parameters as it starts to boot system. This is used to inform kernel about various hardware parameter. You need boot time parameters:
* Troubleshoot system

* Hardware parameters that the kernel would not able to determine on its own

* Force kernel to override the default hardware parameters in order to increase performance
  

  
	
    
          Documentation

        

          Read more

      





  nslookup commands

  
  
    
Using nslookup, dig, and host








nslookup, dig, and host
are useful commands that allow you to perform DNS queries,

and to test out your DNS configuration.



You can use the nslookup command

interactively

to enter a shell from which you can

change servers, set query options, and debug DNS.

You can also use nslookup

non-interactively
  

  
	
    
          Documentation

        

          Read more

      

12next ›last »
        

	
	


	
		
	Copyright © 2008 Rlogix Solutions. All Rights Reserved.

RLogix Solutions

Documentation

Basic aggregation

Aggregates across multiple joins

Aggregates excluding leaders

Aggregates of specified size

Avoiding repeat aggregation

Cascading aggregates

Cross-aggregates

Group data by datetime periods

League table

Sales commissions: double aggregation

Show only one child row per parent row

Skip repeating values

Within-group aggregates

Within-group aggregates with a wrinkle

Within-group quotas (Top N per group)

Average the top 50% values per group

Averages from bands of values

Count unique values of one column

Median

Mode

Rank order

Backslashes in data

Compare data in two tables

Show rows where column value changed

Age in years

Appointments available

Count business days between two dates

Count Tuesdays between two dates

Date of first Friday of next month

Date of Monday in a given week of the year

Datetime difference

Find available reservation periods

Find overlapping periods

Find sequenced duplicates

Is a given booking period available?

Julian date

Last business day before a reference date

Make a calendar table

Peak visit counts by datetime period

Sum time values

The date of next Thursday

Game schedule

Pivot table schedule

Add auto-incrementing primary key to a table

Auto-increment: reset next value

Change or drop a foreign key

Compare structures of two tables

Compare two databases

Find child tables

Find parent tables

Find primary key of a table

Find the size of all databases on the server

List differences between two databases

List users of a database

Rename Database

Show Create Trigger

Show Tables

Display column values which occur N times

Display every Nth row

Trees, networks and parts explosions in MySQL

Dijkstra’s shortest path algorithm

Trees of known depth

Approximate joins

Cascading JOINs

Data-driven joins

Full Outer Join

Intersection and difference

Many-to-many joins

What else did buyers of X buy?

Join or subquery?

Parents without children

Parties who have contracts with one another

The unbearable slowness of IN()

The [Not] Exists query pattern

What exams did a student not register for?

List NULLs at end of query output

Parents with and without children

Next row