File Organizations in Computer

File Organizations

A file is a collection of data, usually stored on disk. As a logical entity, a file enables you to divide your data into meaningful groups, for example, you can use one file to hold all of a company's product information and another to hold all of its personnel information. As a physical entity, a file should be considered in terms of its organization.

File Organizations - The term "file organization" refers to the way in which data is stored in a file and, consequently, the method(s) by which it can be accessed. This COBOL system supports three file organizations: sequential, relative and indexed.

Sequential Files - A sequential file is one in which the individual records can only be accessed sequentially, that is, in the same order as they were originally written to the file. New records are always added to the end of the file. Three types of sequential file are supported by this COBOL system: Record sequential; Line sequential and Printer sequential

Record Sequential Files - Record sequential files are nearly always referred to simply as sequential files because when you create a file and specify the organization as sequential, a record sequential file is created by default. To define a file as record sequential, specify ORGANIZATION IS RECORD SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example: select recseq assign to "recseq.dat" and organization is record sequential. Because record sequential is the default for sequential files, you don't actually need to specify ORGANIZATION IS RECORD SEQUENTIAL, you could simply use ORGANIZATION IS SEQUENTIAL (as long as the Compiler directive, SEQUENTIAL, has not been set).
Line Sequential Files - The primary use of line sequential files (which are also known as "text files" or "ASCII files") is for display-only data. Most PC editors, for example Notepad, produce line sequential files. In a line sequential file, each record in the file is separated from the next by a record delimiter. The record delimiter, which comprises the carriage return (x"0D") and the line feed (x"0A") characters, is inserted after the last non-space character in each record. A WRITE statement removes trailing spaces from the data record and appends the record delimiter. A READ statement removes the record delimiter and, if necessary, pads the data record (with trailing spaces) to the record size defined by the program reading the data. To define a file as line sequential, specify ORGANIZATION IS LINE SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example: select lineseq assign to "lineseq.dat" and organization is line sequential.
Printer Sequential Files - Printer sequential files are files which are destined for a printer, either directly, or by spooling to a disk file. They consist of a sequence of print records with zero or more vertical positioning characters (such as line-feed) between records. A print record consists of zero or more printable characters and is terminated by a carriage return (x"0D"). With a printer sequential file, the OPEN statement causes a x"0D" to be written to the file to ensure that the printer is located at the first character position before printing the first print record. The WRITE statement causes trailing spaces to be removed from the print record before it is written to the printer with a terminating carriage return (x"0D") The BEFORE or AFTER clause can be specified in the WRITE statement to cause one or more line-feed characters (x"0A"), a formfeed character (x"0C"), or a vertical tab character (x"0B") to be sent to the printer before or after writing the print record. Printer sequential files should not be opened for INPUT or I/O. You can define a file as printer sequential by specifying ASSIGN TO LINE ADVANCING FILE or ASSIGN TO PRINTER in the SELECT statement, for example: select printseq assign to line advancing file "printseq.dat".

Relative Files - A relative file is a file in which each record is identified by its ordinal position within the file (record 1, record 2 and so on). This means that records can be accessed randomly as well as sequentially. For sequential access, you simply execute a READ or WRITE statement to access the next record in the file. For random access, you must define a data-item as the relative key and then specify, in the data-item, the ordinal number of the record that you want to READ or WRITE. Because records can be accessed randomly, access to relative files are fast, but if you need to save disk space, you should avoid them because, although you can declare variable length records for a relative file, the system assumes the maximum record length for all WRITE statements to the file, and pads the unused character positions. This is done so that the COBOL file handling routines can quickly calculate the physical location of any record given its record number within the file. As relative files always contain fixed length records, no space is saved by specifying data compression. In fact, if data compression is specified for a relative file, it is ignored by the Micro Focus File Handler. Each record in a relative file is followed by a two-byte record marker which indicates the current status of the record. The status of a record can be: x"0D0A" - record present and x"0D00" - record deleted or never written. When you delete a record from a relative file, the record's record marker is updated to show that it has been deleted but the contents of a deleted record physically remain in the file until a new record is written. If, for security reasons, you want to ensure that the actual data does not exist in the file, you must overwrite the record (for example with space characters) using REWRITE before you delete it. To define a relative file, specify ORGANIZATION IS RELATIVE in the SELECT statement for the file in your COBOL program. If you want to be able to access records randomly, you must also: Specify ACCESS MODE IS RANDOM or ACCESS MODE IS DYNAMIC in the SELECT statement for the file. Define a relative key in the working-storage section of your program. For example:

select relfil assign to "relfil.dat"
organization is relative
access mode is random
relative key is relfil-key.
working-storage section.
01 relfil-key pic 9(8) comp-x.
The example code above defines a relative file. The access mode is random and so a relative key is defined, relfil-key. For random access, you must always supply a record number in the relative key, before attempting to read a record from the file. If you specify ACCESS MODE IS DYNAMIC, you can access the file both sequentially and randomly.

Indexed Files - An indexed file is a file in which each record includes a primary key. To distinguish one record from another, the value of the primary key must be unique for each record. Records can then be accessed randomly by specifying the value of the record's primary key. Indexed file records can also be accessed sequentially. As well as a primary key, indexed files can contain one or more additional keys known as alternate keys. The value of a record's alternate key(s) does not have to be unique. To define a file as indexed, specify ORGANIZATION IS INDEXED in the SELECT statement for the file in your COBOL program. You must also specify a primary key using the RECORD KEY clause:

select idxfile assign to "idx.dat"
organization is indexed
record key is idxfile-record-key.

Most types of indexed file actually comprise two separate files: the data file (containing the record data) and the index file (containing the index structure). Where this is the case, the name that you specify in your COBOL program is given to the data file and the name of the associated index file is produced by adding an .idx extension to the data file name. You should avoid using the .idx extension in other contexts. The index is built up as an inverted tree structure that grows as records are added. With indexed files, the number of disk accesses required to locate a randomly selected record depends primarily on the number of records in the file and the length of the record key. File I/O is faster when reading the file sequentially. We strongly recommend that you take regular backups of all file types but there are situations with indexed files (for example, media corruption) that can lead to only one of the two files becoming unusable. If the index file is lost in this way it is possible, using the Rebuild utility, to recover the index from the data file and so reduce the time lost due to a failure.

Indexed Files - An indexed file is a file in which each record includes a primary key. To distinguish one record from another, the value of the primary key must be unique for each record. Records can then be accessed randomly by specifying the value of the record's primary key. Indexed file records can also be accessed sequentially. As well as a primary key, indexed files can contain one or more additional keys known as alternate keys. The value of a record's alternate key(s) does not have to be unique. To define a file as indexed, specify ORGANIZATION IS INDEXED in the SELECT statement for the file in your COBOL program. You must also specify a primary key using the RECORD KEY clause:

select idxfile assign to "idx.dat"
organization is indexed
record key is idxfile-record-key.

Most types of indexed file actually comprise two separate files: the data file (containing the record data) and the index file (containing the index structure). Where this is the case, the name that you specify in your COBOL program is given to the data file and the name of the associated index file is produced by adding an .idx extension to the data file name. You should avoid using the .idx extension in other contexts. The index is built up as an inverted tree structure that grows as records are added. With indexed files, the number of disk accesses required to locate a randomly selected record depends primarily on the number of records in the file and the length of the record key. File I/O is faster when reading the file sequentially. We strongly recommend that you take regular backups of all file types but there are situations with indexed files (for example, media corruption) that can lead to only one of the two files becoming unusable. If the index file is lost in this way it is possible, using the Rebuild utility, to recover the index from the data file and so reduce the time lost due to a failure. This means that it is possible to reach the maximum number of duplicate values, even if some of those keys have already been deleted. Some types of indexed file contain a duplicate occurrence record in the data file (look up Indexed file, Types in the online help file for a full list of indexed file types and their characteristics). In these files, each record in the data file is followed by a system record holding, for each duplicate key in that record, the occurrence number of the key. This number is just a counter of the number of times that key value has been used during the history of the file. The prescience of the duplicate occurrence record makes REWRITE and DELETES operations on a record with many duplicates much faster but causes the data records of such files to be larger than those of a standard file. To enable duplicate values to be specified for alternate keys, use WITH DUPLICATES in the ALTERNATE RECORD KEY clause in the SELECT statement:

file-control.
select idxfile assign to "idx.dat"
organization is indexed
record key is idxfile-record-key
alternate record key is idxfile-alt-key with duplicates.

Sparse Keys - A sparse key is a key for which no index entry is stored for a given value of that key. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the part of the record it occupies contains only space characters. Only alternate keys can be sparse. Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your saving of disk space. To enable sparse keys, use SUPPRESS WHEN ALL in the ALTERNATE RECORD KEY clause in the SELECT statement:

file-control.
select idxfile assign to "idx.dat"
organization is indexed
record key is idxfile-record-key
alternate record key is idxfile-alt-key with duplicates suppress when all "A".

In this example, if a record is written for which the value of the alternate key is all A's, the actual key value is not stored in the index file.
Indexed File Access - Both the primary and alternate keys can be used to read records from an indexed file, either directly (random access) or in key sequence (sequential access). The access mode can be:

SEQUENTIAL - This is the default, records are accessed in order of ascending (or descending) record key value.
RANDOM - The value of the record key indicates the record to be accessed.
DYNAMIC - Your program can switch between sequential and random access, by using the appropriate forms of I/O statement.

The method of accessing an indexed file is defined using the ACCESS MODE IS clause in the SELECT statement, for example:

file-control.
select idxfile assign to "idx.dat"
organization is indexed
access mode is dynamic
record key is idxfile-record-key
alternate record key is idxfile-alt-key.

Fixed and Variable Length Records - A file can contain fixed length records (all the records are exactly the same length) or variable length records (the length of each record varies). Using variable length records may enable you to save disk space. For example, if your application generates many short records with occasional long ones and you use fixed length records, you need to make the fixed record length equal to the length of the longest record. This wastes a lot of disk space, so using variable length records would be a great advantage.

The type of record is determined as follows:

If the RECORDING MODE IS V clause is specified, the file will contain variable length records. If the RECORDING MODE IS F clause is specified, the file will contain fixed length records.
If neither of the above is true: If the RECORD IS VARYING clause is specified, the file will contain variable length records. If the RECORD CONTAINS n CHARACTERS clause is specified, the file will contain fixed length records.
If none of the above are true: If the RECMODE"V" Compiler directive is specified, the file will contain variable length records. If the RECMODE"F" is specified, the file will contain fixed length records.
If none of the above are true: If the RECMODE"OSVS" Compiler directive is specified, the file will contain variable length records if either: RECORD CONTAINS n TO m CHARACTERS is specified More than one record area is specified, and they have different lengths

File Headers - A file header is a block of 128 bytes at the start of the file. Indexed files, record sequential files with variable length records and relative files with variable length records all contain file headers. In addition, each record in these files is preceded by a 2- or 4-byte record header. Further detail on file and record headers and the structure of files with headers is available in the online help file. Look under Structure, files with headers in the help file index.

Sandeep Ghatuary

Finance & Accounting blogger simplifying complex topics.

View full author profile →

Fintaxman

File Organizations in Computer

File Organizations

Sandeep Ghatuary

Post a Comment

0 Comments

Popular Post

Comprehensive Guide to Financial Ratios: From Definitions to Practical Applications

Admission of Partner -Revaluation of Asset and Liabilities - Revaluation Account

Master the Golden Rules of Accounting: A Guide to Personal, Real, and Nominal Accounts

Menu Footer Widget

Fintaxman

File Organizations in Computer

File Organizations

Related Posts

Sandeep Ghatuary

Post a Comment

0 Comments

Popular Post

Comprehensive Guide to Financial Ratios: From Definitions to Practical Applications

Admission of Partner -Revaluation of Asset and Liabilities - Revaluation Account

Master the Golden Rules of Accounting: A Guide to Personal, Real, and Nominal Accounts

Menu Footer Widget