Welcome to the Reiser4 Wiki, the Wiki for users and developers of the ReiserFS and Reiser4 filesystems.

For now, most of the documentation is just a snapshot of the old Namesys site (archive.org, 2007-09-29).

There was also a Reiser4 Wiki (archive.org, 2007-07-06) once on pub.namesys.com.


From Reiser4 FS Wiki
(Difference between revisions)
Jump to: navigation, search
m (http://web.archive.org/web/20061113154921/http://www.namesys.com/benchmarks/journal_relocation_to_NVRAM.html)
(Mongo output)
Line 211: Line 211:
The statistics of each phase variation are specified by subheading (numerated by #1, #2, ...).
The statistics of each phase variation are specified by subheading (numerated by #1, #2, ...).
[[Mongo/journal_relocation_to_NVRAM|Here] is an example of Mongo comparative table.
[[Mongo/journal_relocation_to_NVRAM|Here]] is an example of Mongo comparative table.

Revision as of 18:50, 27 June 2009

Mongo is the main benchmark script we use for comparing ReiserFS variations.

Untar the archive in a directory and read the Introduction to Mongo Testsuites.


Introduction to the Mongo Testsuites

Mongo is a set of the programs to test linux filesystems for performance and functionality. The main program is mongo.pl script which creates the set of statistics for the file system variations specified by special mongo options. The mongo_parser.pl script parses those statistics and creates for them comparative html-table.

The mongo.pl script

 # ./mongo.pl opt11=val11 opt12=val12 ... \
   RUN [opt21=val21 opt22=val22 ... \
   RUN opt31=val31 opt32=val32 ... \
   RUN ... | @<file to include>],

where opt1j (j = 1, 2, ...) are required and maybe another mongo options, optij (i = 2, ...; j = 1, 2, ...) - mongo options.

The expression optij=valij means that mongo option optij was specified by the value valij. Here is a description of acceptable values of all mongo options:

Required mongo options

  • FSTYPE - filesystem type (e.g. ext3)
  • DEV - device file name (e.g. /dev/hda9)
  • DIR - mount-point for the filesystem (e.g. /mnt/testfs)
  • FILE_SIZE - file size in bytes (e.g. 10000) used in reiser_fract_tree, this is passed to the main generator function determine_size() (see below).
  • BYTES - file set size in bytes (e.g. 250000000) created by all instances of reiser_fract_tree in one pass. To have results free from buffer cache influence, it has to satisfy to the property: BYTES * REP_COUNTER > ramsize.

Other mongo options

  • MKFS - path to the executable file that creates testing filesystem (e.g. mkreiserfs). By default (if it is not reiserfs or ext2) mongo.pl tries to create it by the command mkfs.filesystem_name, so make sure it is available.
  • MOUNT_OPTIONS - list of mount options separated as usual by commas (e.g. rw,notail).
  • NPROC - number of processes running simultaneously (3 by default).
  • REP_COUNTER - number of passes of each mongo phase (3 by default). Each mongo statistics is an average value of REP_COUNTER results. So using REP_COUNTER > 1 reduces dispersion and improves mongo statistics.
  • SYNC - this option requires one of two strings :"on"/"off" ("off" by default). "on" means forcing of syncing to iozone of regular files in create, copy, append, modify phases.
  • WRITE_BUFFER - read/write buffer size in bytes for mongo utilities (4096 by default).
  • GAMMA - the exponent of the core file size distribution of the random value generator determine_size() used in mongo_fract_tree() (see below). GAMMA values are in [0,1] (e.g. 0.2, default value is 0.0).
  • JOURNAL_DEV - journal device name. This is an option only for reiserfs with non-standard journal support. By default mkreiserfs creates journal on main device (DEV).
  • JOURNAL_SIZE - journal size in blocks including journal header (e.g. 513). This is an option only for reiserfs with non-standard journal support. By default mkreiserfs creates journal of standard size (8193).
  • DD_MBCOUNT - size in megabytes of the large file that we want to read (write) by dd(1) program. If this option specified mongo executes two special phases dd_reading_largefile and dd_writing_largefile (see mongo phases description below).

Special options

  • LOG - the name of the file where you wish to store statistics result tree that mongo.pl creates for each mongo run (see below). Regardless of this option, mongo.pl writes all the results into stdout, but we recommend specify it for each file system variations you want to compare, as it will enable you to create comparative html-table by mongo_parser.pl script.
  • INFO_R4 - string information the benchmarked Reiser4 version about. This is required option if FSTYPE=reiser4 is set.

Mongo phases settings options

(see mongo phases description below).

  • PHASE_CREATE - setting for create phase: on/off (on by default).
  • PHASE_COPY - setting for copy phase. This option requires one of the following values: off/cp/list". In "cp" mode cp(1) is invoked to copy files. In "list" mode (deafult) uses mongo_copy to copy files. See mongo_copy.c for details.
  • PHASE_APPEND - setting for append phase: on/off (on by default).
  • PHASE_MODIFY - setting for modify phase: on/off (on by default).
  • PHASE_READ - setting for read phase. The required values are off/find/list. In "find" mode, find(1) is used to read the files. In "list" mode (deafult) mongo_read is used. See mongo_read.c for details.
  • PHASE_STATS - setting for stats phase: on/off (on by default).
  • PHASE_DELETE - setting for delete phase. This option requires one of the following values: off/rm/list. In "cp" mode rm(1) is used to delete the working file set. In "list" mode (deafult) mongo_delete is used. See mongo_delete.c for details.

Special required command

  • RUN - defines one mongo run (while the whole string defines one mongo session) which starts all default and maybe some special mongo phases (see below) defined by the options specified before this command. The mongo options keep its values (specified or default) during all the mongo.pl session unless you respecify another ones.


 # ./mongo.pl LOG=/tmp/logfile1 file_size=10000 \
   bytes=10000000 fstype=reiserfs dev=/dev/hda9 \
   dir=/mnt/testfs RUN log=/tmp/logfile2 \
   mount_options=notail RUN
  • <file_to_include> - We recommend to specify all the mongo options you want in one file instead of command string, since to edit a file is more convenient then the command string. Each specification must occupy one string in this file. For example, previous command can be rewritten if you place all the options with first "RUN" in the file "mongo.opts":
 # ./mongo.pl log=/tmp/logfile1 @mongo.opts \
   log=/tmp/logfile2 mount_options=notail RUN

mongo.pl executes one or more mongo runs defined by specified options. For each run mongo.pl creates the tree of mongo statistics (statistics result tree).

WARNING: mongo.pl will format each specified device DEV by mkfs.xxx and mount it at MNT directory.

The mongo_parser.pl script

 #./mongo_parser.pl log1 [log2 log3 ...] > comparative_table.html

where log1, log2, log3, ... are names of the files which contains statistics result trees created by mongo.pl. Each those file should contain only one statistics result tree.

WARNING: The result trees of all specified files file1, file2, file3, ... must be mutually phase-isomorphic.

Example: The result trees of logfile1, logfile2 from the example above are phase-isomorphic. On the other hand, specifying of log1, log2, log3 from the following example is not available, since the result trees of log2, log3 are non-isomorphic (different file_size):

 ./mongo.pl log=log1 file_size=10000 bytes=10000000 \
  fstype=reiserfs dev=/dev/hda9 dir=/mnt/testfs RUN \
  log=log2 mount_options=notail RUN log=log3 file_size=20000

mongo_parser.pl creates a comparative html-table of specified result trees.

What is Mongo doing?

For each run Mongo executes 8 default, and maybe some special phases. In each phase Mongo runs NPROC processes (the parent one with (NPROC - 1) children) defined by appropriate mongo utility and creates the set of mongo statistics.

Currently mongo supports three kind of statistics: REAL_TIME, CPU_TIME, and DF. REAL_TIME and CPU_TIME are timing statistics about the run of the specified number (NPROC) of processes of appropriate phase. REAL_TIME is the elapsed real (in seconds) time between invocation and termination. CPU_TIME is the system CPU time (in CPU-seconds) - the sum of the tms_stime and tms_cstime values in a struct tms as returned by times(2). DF is space usage statistic of the specified device DEV. For default phases DF means disc space usage in bytes after all the previous phases including the current one. For the special dd_writing_largefile, dd_reading_largefile phases DF means the size in bytes of the file created during appropriate phases. The default mongo phases model the basic user's processes which use file API. In order to run special mongo phases you should specify special phase-options.

Currently Mongo supports 8 default and 2 special phases. Each phase defined by appropriate mongo utility:

Create phase

The reiser_fract_tree program creates files in a tree of random depth and branching (maybe fsync each files)

  # ./reiser_fract_tree <bytes_to_consume> <median_file_size> \
  <max_file_size> <median_dir_nr_files> <max_directory_nr_files> \
  <median_dir_branching> <max_dir_branching> <write_buffer_size> \
  <testfs_mount_point> <print_stats_flag> <max_fname> <flist_name> \
  <sync_flag> <gamma_exponent>

Files vary in size randomly according to the core file size generator (off_t determine_size( off_t F, off_t max_size)) used in reiser_fract_tree. This generator is constructed by random variables that have uniform distributions (see fig.1).

FIGURE 1: The distribution function of the main generator determine_size().

Every this variable we get by mapping of standard gnu pseudo-random generator rand() defined on [0, RAND_MAX] onto [A, B] for suitable A,B by using high-order bits. The file sizes of first uniform chunk are in [0, F], and P(file_size in [0, F]) = 1-gamma.

The square of next uniform chunks exponentially depends on its number with exponent gamma, and the size of the stride exponentially depends on its number with exponent scale (we use scale=10). F is the range of first uniform chunk in bytes (the value of the option FILE_SIZE in mongo.pl). Median file size is hypothesized to be proportional to the average per file space wastage. Notice how that implies that, with a more efficient filesystem, file size usage patterns will in the long term move to a lower median file size.) It has a maximum size of max_file_size. Directories vary in size according to the same distribution function, but with separate parameters to control both the median and maximum size for the number of files within them, and the number of subdirectories within them. This program prunes some empty subdirectories in a manner that causes the parents of leaf directories to branch less than the median_dir_branching.

To avoid having one large file distort the results such that you have to benchmark many times, set max_file_size to not more than bytes_to_consume/10. If the maximum/median is a small integer, then randomness will be very poor.

For isolating the performance consequences of design variations on particular file or directory size ranges, try setting their median_size and max_size to both equal the max size of the file size range you want to test.

In order to provide the same conditions for various testing file systems in next phases mongo_fract_tree creates in /var/tmp a list of all files sorted in the order they were created in.

Copy phase

mongo_copy() program copies files created by reiser_fract_tree in specified order (maybe fsync each new file). The order and the files specified by flist:

 # ./mongo_copy <source_dir> <dest_dir> <writebuffer_size> <flist> <sync_flag>

Append phase

The mongo_append program reads filenames from stdin and appends to each file (filesize * append_factor) bytes, and maybe fsync it:

 # ./mongo_append <append_factor> <writebuffer_size> <sync_flag>

Modify phase

The mongo_modify program reads filenames from stdin and modifies its (filesize * modify_factor) bytes starting with random position, and maybe fsync it:

 # ./mongo_modify <modify_factor> <writebuffer_size> <sync_flag>

Overwrite phase

This phase uses mongo_modify program with modify_factor = 1, so it modifies filesize bytes, i.e. overwrites (and maybe fsync()) it.

Read phase

The mongo_read program reads files created by mongo_fract_tree in specified order.

Stats phase

We do find -type f on the expected partition. Zam believes that it should be enough for stat for all files.

Delete phase

We do "rm -r" on all files and directories.

dd_writing_largefile phase

This is a special mongo phase which requires the option DD_MBCOUNT to be specified. We do dd if=/dev/zero of=DIR/largefile bs=1M count=DD_MBCOUNT".

dd_reading_largefile phase

This is a special mongo phase which requires the option DD_MBCOUNT to be specified. We do dd if=DIR/largefile of=/dev/null bs=1M count=DD_MBCOUNT".

Look at the source code if you need more information than this introduction contains.

Mongo output

The main purpose of Mongo is comparing of file system variations. The following mongo options (fs-options) are to specify these variations:


Note, that SYSTEM is a "fake" fs-option which means the kernel version that the mongo was run under. For example:

 SYSTEM = linux-2.4.19-rc1+01-relocation4.patch+02-commit_super-8-relocation.patch+03-data-logging-24.patch.

For the same file system variation Mongo define one or more phase variations by following mongo phase-options:


These options specify the parameters which are passed to the mongo utilities. For each file system variations mongo.pl prepares statistics for one or more phase variations.

mongo_parser.pl script is to prepare comparative table for one or more file system variations. First you should prepare the appropriate mongo output files for each variation by using mongo.pl script. Make sure that all these files contain phase-isomorphic result trees (see above). Then specify these filenames for mongo_parser.pl (see the usage above) which will create comparative html table (by default in stdout).

The file system variations are represented in this table by the columns of statistics marked by letter A, B, C, etc.. in the order they were specified in mongo_parser.pl. The header of this table contains specifications of each this variation as the set of the same mongo fs-options which have different values. Absence of any fs-option means it was specified by default value. If this table represents more then one file system variations, we assume by default that A is main, and B, C, ... is A-relative variations. It means that all the statistics of B, C,... are divided on the appropriate statistics of A.

The options specified by identical values for all the file system variations locates in special header. The statistics of each phase variation are specified by subheading (numerated by #1, #2, ...).

Here is an example of Mongo comparative table.

Personal tools