If you are a guy like me, then you came up using the command line. And you love data integrity! To keep track of my backups located on different discs and in different places I wrote a shell script, called mapdir, that maps out directory structures and file hierarchies, for when copying and backing up data to external storage or transmitting it over the network there's always a risk that data gets lost, corrupted or that sectors of your disc may be broken.
The idea behind the script is as follows: It is intended to be run once on the original file/directory to obtain more detailed information about its structure and the files contained in them. Mapdir keeps a report about what it found in the users home directory ~/ in a file called "mapdir_path_to_folder_mapped_date.txt", whereas path_to_folder is the file/folder mapdir was invoked with and date is the date of the run. If at invocation time no argument is passed to the script, then by default mapdir examines the current directory. After having copied your files/directories to another machine or drive, run it once more, keeping a report file of the run as well. These two report files can later be passed to the diff utility to check for data integrity. When diff exits with an exit status - echo $? - of 0, then the directory structures are in sync. If this is not the case, then some further investigation is necessary.
As we are primarily interested in regular files, we will log the size and md5 checksum next to the filename and type only for this kind of file. All other common UNIX file types such as directories, pipes, links.... are only logged with their filename and type.
So much for the mission statement.
And here goes mapdir:
#!/bin/sh
# Copyright (c) 2015-2020 Oliver Mahmoudi
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted providing that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
# mapdir - A utility to map files and directory hierarchies
# Functions:
# readfile()
# check_excludes()
# process_new_dir()
# log_entry()
# get_file_stats()
# get_filename()
# pretty_output()
# check_savefile_existence()
# usage()
# Global variables:
READLINK=
CHECKSUM=
DATE=$(date +%m%d%Y)
DEPTH=0
SAVEFILE=
STRLEN=
EXCLUDES=
# Global variables that are used for the statistics at the end of the program.
DIRS=0
UNREADABLE_DIRS=0
FILES=0
UNREADABLE_FILES=0
BLOCK_SPECIALS=0
CHARACTER_SPECIALS=0
PIPES=0
SOCKETS=0
SYMBOLIC_LINKS=0
UNKNOWN=0
TOTAL_FILES=0
SAVEIFS=$IFS # save the current Internal Field Seperator
IFS=$(echo -en "\n\b") # and set a new one
export LC_COLLATE=C
# Global flags for getopts:
d_flag=0 # dotglob flag
e_flag=0 # exclude files flag
e_list= # list of files to be excluded
f_flag=0 # omit startdir flag
h_flag=0 # use sha256sum flag
n_flag=0 # no savefile flag
p_flag=0 # display output as directory tree
s_flag=0 # alternate savefile flag
s_file= # the alternate savefile itself
t_flag=0 # print statistics flag
x_flag=0 # excludes from file flag
x_file= # the excludes file itself
#
# This is the main function.
#
readfile()
{
local _file _depth _psymbol _pnoes _pbinstring
_file=$1
_depth=$3
_psymbol=$4
_pnoes=$5
_pbinstring=$6
# Continue?
$(check_excludes $_file)
if [ $? -eq 1 ] ; then
return
fi
if [ -d $_file ]; then
if [ -r $_file ] && [ -x $_file ]; then
DIRS=$((DIRS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file directory fullpath
process_new_dir $_file
else
if [ $p_flag -eq 1 ]; then
pretty_output $_file $_psymbol $_pbinstring
process_new_dir $_file $_depth
elif [ $2 -eq 0 ]; then
log_entry $_file directory filename_only
process_new_dir $_file
elif [ $2 -eq 1 ]; then
process_new_dir $_file
fi
fi
else
UNREADABLE_DIRS=$((UNREADABLE_DIRS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file "directory is not readable" fullpath
else
if [ $p_flag -eq 1 ]; then
pretty_output $_file $_psymbol $_pbinstring
elif [ $2 -eq 0 ]; then
log_entry $_file "directory is not readable" filename_only
fi
fi
fi
elif [ -b $_file ]; then
BLOCK_SPECIALS=$((BLOCK_SPECIALS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file "block special file" fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file "block special file" filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
elif [ -c $_file ]; then
CHARACTER_SPECIALS=$((CHARACTER_SPECIALS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file "character special file" fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file "character special file" filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
elif [ -L $_file ]; then
SYMBOLIC_LINKS=$((SYMBOLIC_LINKS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file "symbolic link" fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file "symbolic link" filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
elif [ -f $_file ]; then
if [ -r $_file ]; then # it's a readable file
FILES=$((FILES+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
get_file_stats $_file readable fullpath
else
if [ $p_flag -eq 0 ]; then
get_file_stats $_file readable filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
else # unreadable file
UNREADABLE_FILES=$((UNREADABLE_FILES+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
get_file_stats $_file unreadable fullpath
else
if [ $p_flag -eq 0 ]; then
get_file_stats $_file unreadable filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
fi
elif [ -p $_file ]; then
PIPES=$((PIPES+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file pipe fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file pipe filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
elif [ -S $_file ]; then
SOCKETS=$((SOCKETS+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file socket fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file socket filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
else
UNKNOWN=$((UNKNOWN+1))
TOTAL_FILES=$((TOTAL_FILES+1))
if [ $f_flag -eq 0 ]; then
log_entry $_file "unknown file type" fullpath
else
if [ $p_flag -eq 0 ]; then
log_entry $_file "unknown file type" filename_only
else
pretty_output $_file $_psymbol $_pbinstring
fi
fi
fi
}
#
# Check for files that are to be excluded from the search process. Passed via -e and/or -x.
#
check_excludes()
{
local _file_to_check
_file_to_check=$1
# We need the "standard" IFS to parse the array. Otherwise it won't work...
IFS=$SAVEIFS
for i in $EXCLUDES ; do
if [ "$_file_to_check" = "$i" ] ; then
IFS=$(echo -en "\n\b")
return 1
fi
done
IFS=$(echo -en "\n\b")
return 0
}
#
# In case we encounter a new directory, this function gets called.
# It checks whether or nor the new directory has contents and calls
# readline again.
#
process_new_dir()
{
local _dir _contents _depth _entry _newpath _noes _pcontents _pdir _pnoes _psstring
# For the pretty_output function, first examine all the parent directories,
# to find out what type of entries we have and set binary flags accordingly.
_depth=$2
if [ $p_flag -eq 1 ] && [ $_depth -ge 1 ]; then
_pdir=$1
while [ $_depth -ge 0 ]; do
cd $_pdir
cd ..
_pcontents=*
_pcontents=($_pcontents)
_pnoes=${#_pcontents[@]}
_pdir=$(get_filename $_pdir)
if [ "${_pcontents[$_pnoes-1]}" = "$_pdir" ]; then
_psstring="0$_psstring"
else
_psstring="1$_psstring"
fi
_pdir=$(pwd)
_depth=$(($_depth-1))
done
fi
# Now process the directory that actually got passed to the function.
_dir=$1
_depth=$2
cd $_dir
_contents=*
_noes=($_contents)
_noes=${#_noes[@]}
if [ $_noes -ne 0 ]; then
_depth=$(($_depth+1))
for _entry in $_contents ; do
_newpath="$_dir/$_entry"
if [ $_noes -eq 1 ]; then
readfile $_newpath 0 $_depth lastentry $_pnoes $_psstring
else
readfile $_newpath 0 $_depth middleentry $_pnoes $_psstring
fi
_noes=$(($_noes-1))
done
fi
}
#
# Log filetype to stdout and to SAVEFILE if desired. This function logs all
# filetypes except for regular files, which need a little more handling based
# on OS and accessability.
#
log_entry()
{
local _entry _filetype _logtype
_filetype=$2
_logtype=$3
if [ "$_logtype" = "fullpath" ]; then
_entry=$1
elif [ "$_logtype" = "filename_only" ]; then
_entry=$(get_filename $1)
fi
echo $_entry - $_filetype
if [ $n_flag -eq 0 ]; then
echo $_entry - $_filetype >> $SAVEFILE
fi
}
#
# This function can be considered to be the log_entry function for regular files.
#
get_file_stats()
{
local _file
_file=$1
if [ "$2" = "readable" ] && [ "$3" = "fullpath" ]; then
echo $_file - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: `${CHECKSUM} $_file | awk '{ print $1 }'`
if [ $n_flag -eq 0 ]; then
echo $_file - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: `${CHECKSUM} $_file | awk '{ print $1 }'` >> $SAVEFILE
fi
elif [ "$2" = "readable" ] && [ "$3" = "filename_only" ]; then
echo $(get_filename $_file) - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: `${CHECKSUM} $_file | awk '{ print $1 }'`
if [ $n_flag -eq 0 ]; then
echo $(get_filename $_file) - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: `${CHECKSUM} $_file | awk '{ print $1 }'` >> $SAVEFILE
fi
elif ["$2" = "unreadable" ] && [ "$3" = "fullpath" ]; then
echo $_file - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: not readable
if [ $n_flag -eq 0 ]; then
echo $_file - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: not readable >> $SAVEFILE
fi
elif ["$2" = "unreadable" ] && [ "$3" = "filename_only" ]; then
echo $(get_filename $_file) - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: not readable
if [ $n_flag -eq 0 ]; then
echo $(get_filename $_file) - regular file - Size: `ls -l $_file | awk '{ print $5 }'` bytes - \
${CHECKSUM}: not readable >> $SAVEFILE
fi
fi
}
#
# When invoking mapdir with the -f switch, we only log the filename.
#
get_filename()
{
local _filename
# Use awk to get the last "/" character and extract the part to the right of it.
_filename=$(awk -v filename=$1 'BEGIN {
n = split(filename, a, "/");
print a[n];
}')
echo $_filename
}
#
# Print the structure of the argument passed to mapdir to stdout as a pretty output tree.
# Invoked with the -p switch. Needs -f to work.
#
pretty_output()
{
local _entry _space _symboltype _binstring _str _box_rh _box_hl _box_vl _box_mh
# http://jrgraphix.net/r/Unicode/2500-257F
_box_rh=$(echo -e "\u2514") # └
_box_hl=$(echo -e "\u2500") # ─
_box_vl=$(echo -e "\u2502") # │
_box_mh=$(echo -e "\u251C") # ├
_entry=$(get_filename $1)
_symboltype=$2
_binstring=$3
_space=" "
_str=""
# Examine the binstring and prepare the middle part of the entry.
if [ ! -z $_binstring ]; then
for i in $(seq 0 1 $((${#_binstring}-1))) ; do
if [ $i -eq 0 ]; then
continue
elif [ "${_binstring:i:1}" = "0" ]; then
_str="$_str "
elif [ "${_binstring:i:1}" = "1" ]; then
_str="$_str$_box_vl"
fi
done
fi
# Now prettyprint the entry.
if [ $TOTAL_FILES -eq 1 ]; then
echo "$_box_hl$_box_hl$_entry"
if [ $n_flag -eq 0 ]; then
echo "$_box_hl$_box_hl$_entry" >> $SAVEFILE
fi
elif [ "$_symboltype" = "middleentry" ]; then
echo "$_space$_str$_box_mh$_entry"
if [ $n_flag -eq 0 ]; then
echo "$_space$_str$_box_mh$_entry" >> $SAVEFILE
fi
elif [ "$_symboltype" = "lastentry" ]; then
echo "$_space$_str$_box_rh$_entry"
if [ $n_flag -eq 0 ]; then
echo "$_space$_str$_box_rh$_entry" >> $SAVEFILE
fi
fi
}
check_savefile_existence()
{
local choice
if [ -e $1 ] ; then # Confirm
echo -n "The savefile: $1 already exists. Do you want to overwrite it Yes/No? "
read -t 30 choice # We got 30 seconds to make a choice
case $choice in
[Yy][Ee][Ss] | [Yy] )
return
;;
[Nn][Oo] | [Nn] )
echo 'Aborted!'
exit 1
;;
*)
echo "No input received. Terminating."
exit 1
;;
esac
fi
}
usage()
{
echo "usage: mapdir [-dfhnpt] [-e excludes] [-s savefile] \
[-x excludes_file] [file]||[directory]"
exit 1
}
### Point of entry ###
while getopts ":de:fhnps:tx:" opt ; do
case $opt in
d)
d_flag=1
;;
e)
e_flag=1
e_list=$OPTARG
;;
f)
f_flag=1
;;
h)
h_flag=1
;;
n)
n_flag=1
;;
p)
p_flag=1
;;
s)
s_flag=1
s_file=$OPTARG
;;
t)
t_flag=1
;;
x)
x_flag=1
x_file=$OPTARG
;;
\?)
echo "unkown flag: -$OPTARG."
usage
exit
;;
esac
done
shift $((OPTIND-1))
# If -p == 1 => -f == 1
if [ $p_flag -eq 1 ] && [ $f_flag -eq 0 ] ; then
echo "The -p option can only be used with the -f option."
exit
fi
# Allowing (-p && -e) || (-p && -x) would break the formatting of the output tree.
if ([ $p_flag -eq 1 ] && [ $e_flag -eq 1 ]) || ([ $p_flag -eq 1 ] && [ $x_flag -eq 1 ]) ; then
echo "The -p option cannot be used with the -e or -x options."
exit
fi
# Process the other options
if [ $d_flag -eq 0 ]; then
shopt -s dotglob nullglob
else
shopt -s nullglob
fi
if [ $h_flag -eq 1 ]; then # Change CHECKSUM to sha256sum
CHECKSUM=sha256sum
else
CHECKSUM=md5sum
fi
if [ $e_flag -eq 1 ]; then
for i in $e_list ; do
EXCLUDES="$i $EXCLUDES"
done
fi
if [ $x_flag -eq 1 ]; then
while read line
do
EXCLUDES="$line $EXCLUDES"
done < "$x_file"
fi
#
# If an argument is given, take it, otherwise process the current directory.
#
if [ $# -eq 1 ]; then
READLINK=$(readlink -f $1)
if [ ! -e $READLINK ]; then
echo "The file: $READLINK doesn\'t exist."
usage
fi
if [ $n_flag -eq 0 ]; then
if [ $s_flag -eq 0 ]; then
SAVEFILE=~/mapdir$(readlink -f $1 | sed s#/#_#g)_$DATE.txt
check_savefile_existence $SAVEFILE
: > $SAVEFILE
else
SAVEFILE=~/${s_file}
check_savefile_existence $SAVEFILE
: > $SAVEFILE
fi
fi
else
READLINK=$(readlink -f ./)
if [ $n_flag -eq 0 ]; then
if [ $s_flag -eq 0 ]; then
SAVEFILE=~/mapdir$(pwd | sed s#/#_#g)_$DATE.txt
check_savefile_existence $SAVEFILE
:> $SAVEFILE
else
SAVEFILE=~/${s_file}
check_savefile_existence $SAVEFILE
: > $SAVEFILE
fi
fi
fi
# When calling the readline function for the first time, we pass a second argument
# of "1" to it. This serves the purpose of pleasing the diff utility when invoking
# mapdir with the -f switch and having a directory as the first argument. If we would
# map the starting directory to the $SAVEFILE and would later on compare it with
# another the $SAVEFILE, the diff utility would obviously exit with a return value
# other than 0, even though the contents of the directories may be truly equivalent.
# Consider for example the folders:
#
# /media/filesystem_a and /media/filesystem_b that both have the same content.
#
# The logic is as follows: if the file is a folder, then the readlink function detects
# this in the "is directory" part and skips mapping its occurence to the $SAVEFILE.
# For subsequent calls to readlink we will pass a second argument of "0" to the
# function, which this time maps it.
# Start processing the file/folder...
readfile $READLINK 1 $DEPTH
#
# At this point, we are done parsing. Now print statistics if desired as per the -t flag.
#
if [ $t_flag -eq 1 ]; then
echo
if [ $n_flag -eq 0 ]; then
echo >> $SAVEFILE
fi
STRLEN="########## Statistics for $READLINK ##########"
echo $STRLEN
if [ $n_flag -eq 0 ]; then
if [ $f_flag -eq 0 ]; then
echo $STRLEN >> $SAVEFILE
else
echo "########## Statistics ##########" >> $SAVEFILE
fi
fi
if [ $DIRS -ne 0 ]; then
echo Number of directories: $DIRS
if [ $n_flag -eq 0 ]; then
echo Number of directories: $DIRS >> $SAVEFILE
fi
fi
if [ $UNREADABLE_DIRS -ne 0 ]; then
echo Number of unreadable directories: $UNREADABLE_DIRS
if [ $n_flag -eq 0 ]; then
echo Number of unreadable directories: $UNREADABLE_DIRS >> $SAVEFILE
fi
fi
if [ $FILES -ne 0 ]; then
echo Number of regular files: $FILES
if [ $n_flag -eq 0 ]; then
echo Number of regular files: $FILES >> $SAVEFILE
fi
fi
if [ $UNREADABLE_FILES -ne 0 ]; then
echo Number of unreadble files: $UNREADABLE_FILES
if [ $n_flag -eq 0 ]; then
echo Number of unreadble files: $UNREADABLE_FILES >> $SAVEFILE
fi
fi
if [ $BLOCK_SPECIALS -ne 0 ]; then
echo Number of block special files: $BLOCK_SPECIALS
if [ $n_flag -eq 0 ]; then
echo Number of block special files: $BLOCK_SPECIALS >> $SAVEFILE
fi
fi
if [ $CHARACTER_SPECIALS -ne 0 ]; then
echo Number of character speial files: $CHARACTER_SPECIAL
if [ $n_flag -eq 0 ]; then
echo Number of character speial files: $CHARACTER_SPECIAL >> $SAVEFILE
fi
fi
if [ $PIPES -ne 0 ]; then
echo Number of pipes: $PIPE
if [ $n_flag -eq 0 ]; then
echo Number of pipes: $PIPE >> $SAVEFILE
fi
fi
if [ $SOCKETS -ne 0 ]; then
echo Number of sockets: $SOCKET
if [ $n_flag -eq 0 ]; then
echo Number of sockets: $SOCKET >> $SAVEFILE
fi
fi
if [ $SYMBOLIC_LINKS -ne 0 ]; then
echo Number of symbolic links: $SYMBOLIC_LINKS
if [ $n_flag -eq 0 ]; then
echo Number of symbolic links: $SYMBOLIC_LINKS >> $SAVEFILE
fi
fi
if [ $UNKNOWN -ne 0 ]; then
echo Number of symbolic links: $UNKNOWN
if [ $n_flag -eq 0 ]; then
echo Number of symbolic links: $UNKNOWN >> $SAVEFILE
fi
fi
if [ $TOTAL_FILES -ne 0 ]; then
echo Total number of files: $TOTAL_FILES
if [ $n_flag -eq 0 ]; then
echo Total number of files: $TOTAL_FILES >> $SAVEFILE
fi
fi
# Formatted output
STRLEN=${#STRLEN}
while [ $STRLEN -gt 0 ]
do
echo -n "#"
if [ $n_flag -eq 0 ]; then
if [ $f_flag -eq 0 ]; then
echo -n "#" >> $SAVEFILE
fi
fi
STRLEN=$((STRLEN-1))
done
echo
if [ $n_flag -eq 0 ]; then
if [ $f_flag -eq 0 ]; then
echo >> $SAVEFILE
else
echo "################################" >> $SAVEFILE
fi
fi
fi # t_flag
IFS=$SAVEIFS # reset the old IFS
exit 0
The following script can be used to dynamically verify the consistency between directory trees, that are assumed to be equal.
#!/bin/sh
# Copyright (c) 2020 Oliver Mahmoudi
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted providing that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
# STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
# IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
### mapdircmp - a comparison utility for mapdir(1)
k_flag=0 # keep savefiles flag
v_flag=0 # verbose flag
x_flag=0 # excludes
x_file=
exit_status=0
check_savefile_existence()
{
local _savefile _choice
_savefile=$1
if [ -e $_savefile ] ; then # Confirm
echo -n "The savefile: $_savefile already exists. Do you want to overwrite it Yes/No? "
read -t 30 _choice # We got 30 seconds to make a choice
case $_choice in
[Yy][Ee][Ss] | [Yy] )
if [ $v_flag -eq 1 ] ; then
rm -v $_savefile
else
:> $_savefile
fi
;;
[Nn][Oo] | [Nn] )
echo 'Aborted!'
exit 1
;;
*)
echo "No input received. Terminating."
exit 1
;;
esac
fi
}
process_files()
{
local _savefile _directory _excludes_file
_savefile=$1
_directory=$2
check_savefile_existence ~/$_savefile
if [ $v_flag -eq 1 ] ; then
if [ $x_flag -eq 1 ] ; then
_excludes_file=$3
mapdir -fht -s $_savefile -x $_excludes_file $_directory
else
mapdir -fht -s $_savefile $_directory
fi
else
if [ $x_flag -eq 1 ] ; then
_excludes_file=$3
mapdir -fht -s $_savefile -x $_excludes_file $_directory > /dev/null 2>&1
if [ $? -ne 0 ] ; then
echo "Error while running: mapdir -fht -s $_savefile -x $_excludes_file $_directoy"
exit -1
fi
else
mapdir -fht -s $_savefile $_directory > /dev/null 2>&1
if [ $? -ne 0 ] ; then
echo "Error while running: mapdir -fht -s $_savefile $_directoy"
exit -1
fi
fi
fi
}
check_files()
{
local _file1 _file2 _retval
_file1=~/$1
_file2=~/$2
if [ $v_flag -eq 1 ] ; then
diff $_file1 $_file2
else
diff $_file1 $_file2 > /dev/null 2>&1
fi
_retval=$?
if [ $_retval -eq 0 ] && [ $v_flag -eq 1 ] ; then
echo "files are equal"
elif [ $_retval -ne 0 ] && [ $v_flag -eq 1 ] ; then
echo "files differ"
exit_status=100
elif [ $_retval -ne 0 ] && [ $v_flag -eq 0 ] ; then
exit_status=100
fi
}
usage()
{
echo "usage: mapdircmp [-hkv] [-x excludes_file] savefile1 dirtree1 savefile2 dirtree2"
echo " -h: print usage information and exit"
echo " -k: keep savefiles"
echo " -v: be more verbose"
echo " -x excludes_file: files and folders to be excluded"
}
# Point of entry
while getopts ":hkvx:" opt ; do
case $opt in
h)
usage
exit $exit_status
;;
k)
k_flag=1 # keep savefiles flag
;;
v)
v_flag=1 # be more verbose
;;
x)
x_flag=1 # excludes from file flag
x_file=$OPTARG
;;
\?)
echo "unkown flag: -$OPTARG."
usage
exit 1
;;
esac
done
shift $((OPTIND-1))
if [ $# -ne 4 ] ; then
usage
exit -1
fi
# 1st run
if [ $x_flag -eq 1 ] ; then
process_files $1 $2 $x_file
else
process_files $1 $2
fi
# 2nd run
if [ $x_flag -eq 1 ] ; then
process_files $3 $4 $x_file
else
process_files $3 $4
fi
# Check for equality
check_files $1 $3
if [ $k_flag -eq 0 ] ; then
rm ~/$1 ~/$3
fi
exit $exit_status
The above script should work on pretty much any UNIX type system, as it makes use of standard UNIX utilities.
Let's watch it in action:
First, let's create a directory structure and a few files:
[om@pc192-168-2-119 ~]$ mkdir testdir
[om@pc192-168-2-119 ~]$ cd testdir/
[om@pc192-168-2-119 testdir]$ mkdir a_dir
[om@pc192-168-2-119 testdir]$ mkdir b_dir
[om@pc192-168-2-119 testdir]$ mkdir c_dir
[om@pc192-168-2-119 testdir]$ cd a_dir/
[om@pc192-168-2-119 a_dir]$ touch file_1.txt
[om@pc192-168-2-119 a_dir]$ echo "hello world" > file_2.txt
[om@pc192-168-2-119 a_dir]$ cd ../b_dir/
[om@pc192-168-2-119 b_dir]$ touch file_3.txt
[om@pc192-168-2-119 b_dir]$ echo "hello world number 2" > file_4.txt
[om@pc192-168-2-119 b_dir]$ cd ../c_dir/
[om@pc192-168-2-119 c_dir]$ echo "more stuff in this file" > file_5.txt
[om@pc192-168-2-119 c_dir]$ echo "yet more stuff here" > file_6.txt
Now we can run mapdir on the testdir directory:
[om@pc192-168-2-119 ~]$ mapdir testdir
Mapping structure of: /home/om/testdir
/home/om/testdir - directory
/home/om/testdir/a_dir - directory
/home/om/testdir/a_dir/file_1.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
/home/om/testdir/a_dir/file_2.txt - regular file - Size: 12 bytes - MD5: 6f5902ac237024bdd0c176cb93063dc4
/home/om/testdir/b_dir - directory
/home/om/testdir/b_dir/file_3.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
/home/om/testdir/b_dir/file_4.txt - regular file - Size: 21 bytes - MD5: 1d64c0e7aa142fe642b94eac89c52388
/home/om/testdir/c_dir - directory
/home/om/testdir/c_dir/file_5.txt - regular file - Size: 24 bytes - MD5: 05a0df1800afb8f3fc30460c74ac21a3
/home/om/testdir/c_dir/file_6.txt - regular file - Size: 20 bytes - MD5: 66df841f6f18d6deb62f767ba1ae884a
########## Statistics for /home/om/testdir ##########
Number of directories: 4
Number of regular files: 6
#####################################################
Done!
[om@pc192-168-2-119 ~]$
This created the following report file in our home directory:
[om@pc192-168-2-119 ~]$ ll mapdir_home_om_testdir_01112016.txt
-rw-rw-r--. 1 om om 974 Jan 11 21:50 mapdir_home_om_testdir_01112016.txt
[om@pc192-168-2-119 ~]$
One thing to note here is that the md5 sum of a file, is independent of the file's name but rather depends on the file's contents, as /home/om/testdir/a_dir/file_1.txt and /home/om/testdir/b_dir/file_3.txt are obviously two different files but have the same md5 sum.
However, this kind of report file is not exactly suitable for passing to diff, to compare it with the same directory structure located in another folder or mount point. For this reason mapdir provides the -f switch. When invoking mapdir with -f, the script does essentially the same, however doesn't map the entire path to a given file but only the file itself.
[om@pc192-168-2-119 ~]$ rm mapdir_home_om_testdir_01112016.txt
[om@pc192-168-2-119 ~]$ mapdir -f testdir
Mapping structure of: /home/om/testdir
a_dir - directory
file_1.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
file_2.txt - regular file - Size: 12 bytes - MD5: 6f5902ac237024bdd0c176cb93063dc4
b_dir - directory
file_3.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
file_4.txt - regular file - Size: 21 bytes - MD5: 1d64c0e7aa142fe642b94eac89c52388
c_dir - directory
file_5.txt - regular file - Size: 24 bytes - MD5: 05a0df1800afb8f3fc30460c74ac21a3
file_6.txt - regular file - Size: 20 bytes - MD5: 66df841f6f18d6deb62f767ba1ae884a
########## Statistics for /home/om/testdir ##########
Number of directories: 4
Number of regular files: 6
#####################################################
Done!
Now let's create another folder with the same directory structure:
[om@pc192-168-2-119 ~]$ mkdir testdir2
[om@pc192-168-2-119 ~]$ cd testdir
[om@pc192-168-2-119 testdir]$ cp -r * ../testdir2
And run mapdir on that directory as well:
[om@pc192-168-2-119 testdir]$ cd ~/
[om@pc192-168-2-119 ~]$ mapdir -f testdir2
Mapping structure of: /home/om/testdir2
a_dir - directory
file_1.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
file_2.txt - regular file - Size: 12 bytes - MD5: 6f5902ac237024bdd0c176cb93063dc4
b_dir - directory
file_3.txt - regular file - Size: 0 bytes - MD5: d41d8cd98f00b204e9800998ecf8427e
file_4.txt - regular file - Size: 21 bytes - MD5: 1d64c0e7aa142fe642b94eac89c52388
c_dir - directory
file_5.txt - regular file - Size: 24 bytes - MD5: 05a0df1800afb8f3fc30460c74ac21a3
file_6.txt - regular file - Size: 20 bytes - MD5: 66df841f6f18d6deb62f767ba1ae884a
########## Statistics for /home/om/testdir2 ##########
Number of directories: 4
Number of regular files: 6
######################################################
Done!
Now we can run the diff utility on the two report files, to check wether or not they - and along with that the two directory structures - are truly equivalent:
[om@pc192-168-2-119 ~]$ diff mapdir_home_om_testdir_01112016.txt mapdir_home_om_testdir2_01112016.txt
[om@pc192-168-2-119 ~]$ echo $?
0
Which is the exit status we wanted, confirming the desired relationship.
Finally to map the entire system and in the process obtain as much information as possible run: "mapdir /" as the root user.
The code shown above is for the latest release:
Version 1.2.1:
File: mapdir-1.2.1.tar.gz
sha256sum: 3d18ab2f7fbd94fcd20c4a7d2b548b6235470c0b9cb92be499993e4a58f8db89
If you, however, prefer to obtain the source code from github, then you can clone it like this:
git clone https://github.com/olimah/mapdir