THE BUDDHIST SCRIPTURES ON CD-ROM
Dr. Supachai Tangwongsan
Dr. Damras Wongsawang
Miss Jiraporn Kiatpibool
INTRODUCTION
Many people said "This software would be another brave new
world". It is the first of its kind in the universe of international
Buddhism perusal.
Mahidol University Computing Center (MUCC) is very proud to
present the world's first complete digital edition of The Buddhist
Scripture, Tipitaka which is a collection of scriptures representing the
collected teachings and sayings of the Buddha and the scripture's
commentary, the Atthakatha. The Tipitaka's importance is in being the root
and basic reference for all teachings and explanations of Buddhism, the
standard for measuring the teachings presented as Buddhism, a record of
beliefs, religions, traditions and events of times many centuries past, an
invaluable source of reference material relating to other fields of
knowledge.
Since the commencement of the Buddhist era over 2500 years ago, there
have been continuous efforts to preserve and maintain the Tipitaka so that
it remains as a religious heritage for the coming generations. Various
media depending on the technology of the age have been used to preserve the
contents of the Tipitaka e.g. using the method known as "Mukhapatha" or
memorization and spread by "word of mouth", later, engraved stone , leaf,
cloth, paper etc. were used to store the contents of the scriptures. With
more complex technological advancement, various storage devices of the
computer have been used to store the contents e.g. hard disk, optical disk
and finally on CD-ROM which is 120 mm wide and 1.25 mm thick. What more,
only a single CD- ROM carries the entire Thai and Romanized Pali of the
Tipitaka (45 volumes) Attakatha (55 volumes) and special scriptures (15
volumes) totalling to more than 450 million characters. The CD-ROM is very
small in size, is light and costs relatively cheap and needs very little
care. Also, data in a CD-ROM is virus-safe which is a problem found with
other computer media. BUDSIR (BUDdhist Scriptures Information Retrieval) on
CD-ROM as named by the University, and the first of its kind will be
available globally making the study and research of Buddhism virtually
boundless.
MUCC also developed the program BUDSIR which aids in the
search and retrieval of the contents of the digital edition of the scriptures
and its commentary. Development of BUDSIR took off grounds as a
project to develop a computerized version of the Tipitaka in honour of His
Majesty the King's Ratchamangklaphisek Ceremony (The celebration of
the Longest Royal Enthronement Anniversary) and the celebration of His
Majesty the King's 60th birthday. BUDSIR II, the first Romanized version
of the Tipitaka was developed in September 1989 providing another
channel through which the study of Buddhism is accessible to the
international community. BUDSIR III was developed in April 1990
allowing more complex search queries using the mathematical concept of
Boolean Algebra. His Majesty the King Bhumibhol Adulyadej The Great
continued to support the study of computerizing of the Buddhist scriptures
and its commentary and BUDSIR IV was developed in November 1991
which included 45 volumes of Tipitaka and 70 volumes of the Atthakatha
and its related scriptures. BUDSIR IV includes both the Thai and
Romanized Pali versions of the scriptures and is thus the most complete.
BUDSIR IV was developed to store the scriptures on a hard disk which
was found to be prone to virus attacks and often caused loss to
information. BUDSIR on CD-ROM was thus developed and was
completed in July 1994.
BUDSIR's internal structure is elaborately developed using
mature and efficient information retrieval techniques usually used in large
databases and specially designed with the ease of use for users of all levels
of competence in mind.
OBJECTIVES
In the endeavor to pursue a particular subject in Tipitaka and
Atthakatha that contain tremendous amounts of information, not only
does one have to overcome the barrier of the Pali language, but also
overwhelming amounts of information so widely scattered under a
variety of headings within a volume. Hence it is extremely difficult to
retrieve the information in question, accurately and exhaustively. An
attempt has been made to store the entire Tipitaka and Atthakatha in
digital form so that any research that needs to gain access to this huge
database will be greatly facilitated.
BUDSIR is unique in its accuracy, speed and completeness. It can
retrieve any word (including compounds), phrase or stretch of text that
can be found in the Buddhist Scriptures. Moreover, this digital edition is
also capable of searching both the Tipitaka and Atthakatha simultaneously,
showing the results in two separate windows so that they can be studied
and compared.
THE DATA CONTENTS
The Buddhist Scriptures included in the Digital Tipitaka and Attakatha
consist of 115 volumes, or 50,189 pages of text. The data can be divided
into two groups as follows:
1. The Pali Tipitaka in Thai script, Siamrattha version, 45 volumes with
a total of 24 million characters. After computerized transliteration in
Romanized script, the size becomes 31 million characters.
2. The Atthakatha, commentary and other important scriptures, 70
volumes with a total of 37 million characters comprising:
a. The Atthakatha: 55 volumes,
b. The text used in Thai monastic Pali examinations and two essential
scriptures: The Milindapanha and The Bhikkhu Patimokkha-Pali.
After computerized transliteration in Romanized script, the size
increases to 47 million characters.
The data was prepared with Pali text editor developed by the MUCC.
The data from each volume was entered twice and verified by a computer
program which pin-pointed any discrepancies between the two versions,
which were then corrected until the two versions were identical. This was
done by eighty typists, each working at a rate of thirty Pali words a
minute, or on average 15 pages a day.
THE BUDSIR DATABASE
The database structure of the Digital Edition of Buddhist Scriptures is
essentially an inverted file similar to that in the STAIRS system on the
IBM main-frame. The system is composed of three main groups of data
files: (1) the Text-block file, (2) the Dictionary file, and (3) the Inverted
file.
The Text-block file is a computerized collection of all the data from
115 printed volumes of the Tipitaka and Atthakatha.
The Dictionary file is a collection of all lexical items found in the
Tipitaka and Atthakatha. Each lexical items are arranged in the form of a
B-tree structure with the pointers cross-referring to the hierarchical orders
on the tree.
The Inverted file actually is a list of occurrences of all the words found
in the Text-block file. Each word will be cross-referred from the
Dictionary file. The occurrence code consists of the volume number, page
number, line number, word number and, when applicable, a flag to indicate
last word of the line or the page. This is to facilitate data management in
searching, particularly in adjacent words, including searching via Boolean
operators for the future version.
BUDSIR IV - FEATURES SUMMARY
1. Inherent B-TREE Architecture
Since B-Tree has been known as the most efficient structure for
any heavily accessed database. BUDSIR is crafted on this superb
architecture.
2. Several Efficient Search Methodologies
BUDSIR features 2 efficient search methods. User is able to
launch a search using word/phrase keyword or using volume/page/item
indicator.
3. Dual Windows Display
BUDSIR independently displays the Tipitaka and the Atthakatha
in separate windows. User is able to freehandedly select which
window to display which manuscript.
4. Working brilliantly in graphical environment
BUDSIR completely runs in graphics mode display; definitely
no need to modify the video graphic adapter to display the characters.
5. Pull-Down Menus and Mouse Support
Any feature can be accessed using hot-key, pull-down menus or
a mouse.
6. Printing
BUDSIR supports every de facto standard 9-pin and 24-pin dot-
matrix printer and also HP Laser Jet printer or compatible.
7. Saving a Scripture Passage to Disk
BUDSIR allows user to save any passage displaying on the
screen to disk for private use. The text file saved by BUDSIR can be
edited using general text editors.
HARDWARE REQUIREMENTS
To perform gracefully, BUDSIR essentially needs equipment along the
following specifications:
1. An IBM PC, AT, PS/2 computers, or a true compatible using Intel-
based 80386, or 80486 microprocessors,
2. At least 2 MegaBytes of RAMs,
3. A superVGA color graphic adapter and a matching monitor,
4. A standard CD-ROM drive for reading data on a CD-ROM,
5. A hard disk drive with capacity not less than 5 MB for BUDSIR's
temporary working area,
6. A keyboard and a Microsoft compatible mouse,
7. A floppy disk drive,
8. A printer,
9. MS-DOS version 5 or higher.
Moreover, for Macintosh users, BUDSIR IV can also run on Macintosh
computers, e.g., Mac II, LC, Classic, Quadra, Power PC, etc., with
SoftWindows (or SoftAT or SoftPC) emulator program and OS version 7.0
or higher.
___________________________________________________________
Authors Address : Mahidol University Computing Center, Faculty of Science,
Rama VI Rd., Bangkok 10400, THAILAND
Tel : (662) 247-0333, FAX : (662) 246-7308,
Email : budsir@mahidol.ac.th
.