Safest and the least vulnerable way to add a user to the sudoers with full rights


demouser ALL=(ALL) PASSWD:ALL, !/bin/su root, !/bin/su -, !/bin/sudo su -, !/bin/su – root, !/bin/sudo su root, !/bin/su, !/usr/bin/passwd, !/usr/bin/passwd root, !/bin/vi /etc/sudoers, !/usr/bin/sudo vi /etc/sudoers, !/usr/sbin/visudo, !/usr/sbin/sudo visudo, !/bin/chmod 777 /etc/sudoers, !/bin/chmod ugo+rwx /etc/sudoers



Monitoring Commands in Linux – Intermediate Level

1: top – Process Activity Command

The top program provides a dynamic real-time view of a running system i.e. actual process activity. By default, it displays the most CPU-intensive tasks running on the server and updates the list every five seconds.

Commonly Used Hot Keys

The top command provides several useful hot keys:

Hot Key Usage
t Displays summary information off and on.
m Displays memory information off and on.
A Sorts the display by top consumers of various system resources. Useful for quick identification of performance-hungry tasks on a system.
f Enters an interactive configuration screen for top. Helpful for setting up top for a specific task.
o Enables you to interactively select the ordering within top.
r Issues renice command.
k Issues kill command.
z Turn on or off color/mono
=> Related: How do I Find Out Linux CPU Utilization?

2: vmstat – System Activity, Hardware and System Information

The command vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
# vmstat 3
Sample Outputs:

procs ———–memory———- —swap– —–io—- –system– —–cpu——
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 2540988 522188 5130400 0 0 2 32 4 2 4 1 96 0 0
1 0 0 2540988 522188 5130400 0 0 0 720 1199 665 1 0 99 0 0
0 0 0 2540956 522188 5130400 0 0 0 0 1151 1569 4 1 95 0 0
0 0 0 2540956 522188 5130500 0 0 0 6 1117 439 1 0 99 0 0
0 0 0 2540940 522188 5130512 0 0 0 536 1189 932 1 0 98 0 0
0 0 0 2538444 522188 5130588 0 0 0 0 1187 1417 4 1 96 0 0
0 0 0 2490060 522188 5130640 0 0 0 18 1253 1123 5 1 94 0 0
Display Memory Utilization Slabinfo

# vmstat -m

Get Information About Active / Inactive Memory Pages

# vmstat -a
=> Related: How do I find out Linux Resource utilization to detect system bottlenecks?

3: w – Find Out Who Is Logged on And What They Are Doing

w command displays information about the users currently on the machine, and their processes.
# w username
# w vivek
Sample Outputs:

17:58:47 up 5 days, 20:28, 2 users, load average: 0.36, 0.26, 0.24
root pts/0 14:55 5.00s 0.04s 0.02s vim /etc/resolv.conf
root pts/1 17:43 0.00s 0.03s 0.00s w

4: uptime – Tell How Long The System Has Been Running

The uptime command can be used to see how long the server has been running. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
# uptime

18:02:41 up 41 days, 23:42, 1 user, load average: 0.00, 0.00, 0.00
1 can be considered as optimal load value. The load can change from system to system. For a single CPU system 1 – 3 and SMP systems 6-10 load value might be acceptable.

5: ps – Displays The Processes

ps command will report a snapshot of the current processes. To select all processes use the -A or -e option:
# ps -A
Sample Outputs:

1 ? 00:00:02 init
2 ? 00:00:02 migration/0
3 ? 00:00:01 ksoftirqd/0
4 ? 00:00:00 watchdog/0
5 ? 00:00:00 migration/1
6 ? 00:00:15 ksoftirqd/1
4881 ? 00:53:28 java
4885 tty1 00:00:00 mingetty
4886 tty2 00:00:00 mingetty
4887 tty3 00:00:00 mingetty
4888 tty4 00:00:00 mingetty
4891 tty5 00:00:00 mingetty
4892 tty6 00:00:00 mingetty
4893 ttyS1 00:00:00 agetty
12853 ? 00:00:00 cifsoplockd
12854 ? 00:00:00 cifsdnotifyd
14231 ? 00:10:34 lighttpd
14232 ? 00:00:00 php-cgi
54981 pts/0 00:00:00 vim
55465 ? 00:00:00 php-cgi
55546 ? 00:00:00 bind9-snmp-stat
55704 pts/1 00:00:00 ps
ps is just like top but provides more information.

Show Long Format Output

# ps -Al
To turn on extra full mode (it will show command line arguments passed to process):
# ps -AlF

To See Threads ( LWP and NLWP)

# ps -AlFH

To See Threads After Processes

# ps -AlLm

Print All Process On The Server

# ps ax
# ps axu

Print A Process Tree

# ps -ejH
# ps axjf
# pstree

Print Security Information

# ps -eo euser,ruser,suser,fuser,f,comm,label
# ps axZ
# ps -eM

See Every Process Running As User Vivek

# ps -U vivek -u vivek u

Set Output In a User-Defined Format

# ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
# ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
# ps -eopid,tt,user,fname,tmout,f,wchan

Display Only The Process IDs of Lighttpd

# ps -C lighttpd -o pid=
# pgrep lighttpd
# pgrep -u vivek php-cgi

Display The Name of PID 55977

# ps -p 55977 -o comm=

Find Out The Top 10 Memory Consuming Process

# ps -auxf | sort -nr -k 4 | head -10

Find Out top 10 CPU Consuming Process

# ps -auxf | sort -nr -k 3 | head -10

6: free – Memory Usage

The command free displays the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel.
# free
Sample Output:

total used free shared buffers cached
Mem: 12302896 9739664 2563232 0 523124 5154740
-/+ buffers/cache: 4061800 8241096
Swap: 1052248 0 1052248
=> Related: :

Linux Find Out Virtual Memory PAGESIZE
Linux Limit CPU Usage Per Process
How much RAM does my Ubuntu / Fedora Linux desktop PC have?

7: iostat – Average CPU Load, Disk Activity

The command iostat report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).
# iostat
Sample Outputs:

Linux 2.6.18-128.1.14.el5 ( 06/26/2009
avg-cpu: %user %nice %system %iowait %steal %idle
3.50 0.09 0.51 0.03 0.00 95.86
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22.04 31.88 512.03 16193351 260102868
sda1 0.00 0.00 0.00 2166 180
sda2 22.04 31.87 512.03 16189010 260102688
sda3 0.00 0.00 0.00 1615 0
=> Related: : Linux Track NFS Directory / Disk I/O Stats

8: sar – Collect and Report System Activity

The sar command is used to collect, report, and save system activity information. To see network counter, enter:
# sar -n DEV | more
To display the network counters from the 24th:
# sar -n DEV -f /var/log/sa/sa24 | more
You can also display real time usage using sar:
# sar 4 5
Sample Outputs:

Linux 2.6.18-128.1.14.el5 ( 06/26/2009
06:45:12 PM CPU %user %nice %system %iowait %steal %idle
06:45:16 PM all 2.00 0.00 0.22 0.00 0.00 97.78
06:45:20 PM all 2.07 0.00 0.38 0.03 0.00 97.52
06:45:24 PM all 0.94 0.00 0.28 0.00 0.00 98.78
06:45:28 PM all 1.56 0.00 0.22 0.00 0.00 98.22
06:45:32 PM all 3.53 0.00 0.25 0.03 0.00 96.19
Average: all 2.02 0.00 0.27 0.01 0.00 97.70
=> Related: : How to collect Linux system utilization data into a file

9: mpstat – Multiprocessor Usage

The mpstat command displays activities for each available processor, processor 0 being the first one. mpstat -P ALL to display average CPU utilization per processor:
# mpstat -P ALL
Sample Output:

Linux 2.6.18-128.1.14.el5 ( 06/26/2009
06:48:11 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
06:48:11 PM all 3.50 0.09 0.34 0.03 0.01 0.17 0.00 95.86 1218.04
06:48:11 PM 0 3.44 0.08 0.31 0.02 0.00 0.12 0.00 96.04 1000.31
06:48:11 PM 1 3.10 0.08 0.32 0.09 0.02 0.11 0.00 96.28 34.93
06:48:11 PM 2 4.16 0.11 0.36 0.02 0.00 0.11 0.00 95.25 0.00
06:48:11 PM 3 3.77 0.11 0.38 0.03 0.01 0.24 0.00 95.46 44.80
06:48:11 PM 4 2.96 0.07 0.29 0.04 0.02 0.10 0.00 96.52 25.91
06:48:11 PM 5 3.26 0.08 0.28 0.03 0.01 0.10 0.00 96.23 14.98
06:48:11 PM 6 4.00 0.10 0.34 0.01 0.00 0.13 0.00 95.42 3.75
06:48:11 PM 7 3.30 0.11 0.39 0.03 0.01 0.46 0.00 95.69 76.89
=> Related: : Linux display each multiple SMP CPU processors utilization individually.

10: pmap – Process Memory Usage

The command pmap report memory map of a process. Use this command to find out causes of memory bottlenecks.
# pmap -d PID
To display process memory information for pid # 47394, enter:
# pmap -d 47394
Sample Outputs:

47394: /usr/bin/php-cgi
Address Kbytes Mode Offset Device Mapping
0000000000400000 2584 r-x– 0000000000000000 008:00002 php-cgi
0000000000886000 140 rw— 0000000000286000 008:00002 php-cgi
00000000008a9000 52 rw— 00000000008a9000 000:00000 [ anon ]
0000000000aa8000 76 rw— 00000000002a8000 008:00002 php-cgi
000000000f678000 1980 rw— 000000000f678000 000:00000 [ anon ]
000000314a600000 112 r-x– 0000000000000000 008:00002
000000314a81b000 4 r—- 000000000001b000 008:00002
000000314a81c000 4 rw— 000000000001c000 008:00002
000000314aa00000 1328 r-x– 0000000000000000 008:00002
000000314ab4c000 2048 —– 000000000014c000 008:00002
00002af8d48fd000 4 rw— 0000000000006000 008:00002
00002af8d490c000 40 r-x– 0000000000000000 008:00002
00002af8d4916000 2044 —– 000000000000a000 008:00002
00002af8d4b15000 4 r—- 0000000000009000 008:00002
00002af8d4b16000 4 rw— 000000000000a000 008:00002
00002af8d4b17000 768000 rw-s- 0000000000000000 000:00009 zero (deleted)
00007fffc95fe000 84 rw— 00007ffffffea000 000:00000 [ stack ]
ffffffffff600000 8192 —– 0000000000000000 000:00000 [ anon ]
mapped: 933712K writeable/private: 4304K shared: 768000K
The last line is very important:

mapped: 933712K total amount of memory mapped to files
writeable/private: 4304K the amount of private address space
shared: 768000K the amount of address space this process is sharing with others
=> Related: : Linux find the memory used by a program / process using pmap command

11: netstat and ss – Network Statistics

The command netstat displays network connections, routing tables, interface statistics, masquerade connections, and multicast memberships. ss command is used to dump socket statistics. It allows showing information similar to netstat. See the following resources about ss and netstat commands:

ss: Display Linux TCP / UDP Network and Socket Information
Get Detailed Information About Particular IP address Connections Using netstat Command

12: iptraf – Real-time Network Statistics

The iptraf command is interactive colorful IP LAN monitor. It is an ncurses-based IP LAN monitor that generates various network statistics including TCP info, UDP counts, ICMP and OSPF information, Ethernet load info, node stats, IP checksum errors, and others. It can provide the following info in easy to read format:

Network traffic statistics by TCP connection
IP traffic statistics by network interface
Network traffic statistics by protocol
Network traffic statistics by TCP/UDP port and by packet size
Network traffic statistics by Layer2 address

13: tcpdump – Detailed Network Traffic Analysis

The tcpdump is simple command that dump traffic on a network. However, you need good understanding of TCP/IP protocol to utilize this tool. For.e.g to display traffic info about DNS, enter:
# tcpdump -i eth1 ‘udp port 53’
To display all IPv4 HTTP packets to and from port 80, i.e. print only packets that contain data, not, for example, SYN and FIN packets and ACK-only packets, enter:
# tcpdump ‘tcp port 80 and (((ip[2:2] – ((ip[0]&0xf)<<2)) – ((tcp[12]&0xf0)>>2)) != 0)’
To display all FTP session to, enter:
# tcpdump -i eth1 ‘dst and (port 21 or 20’
To display all HTTP session to
# tcpdump -ni eth0 ‘dst and tcp and port http’
Use wireshark to view detailed information about files, enter:
# tcpdump -n -i eth1 -s 0 -w output.txt src or dst port 80

14: strace – System Calls

Trace system calls and signals. This is useful for debugging webserver and other server problems. See how to use to trace the process and see What it is doing.

15: /Proc file system – Various Kernel Statistics

/proc file system provides detailed information about various hardware devices and other Linux kernel information. See Linux kernel /proc documentations for further details. Common /proc examples:
# cat /proc/cpuinfo
# cat /proc/meminfo
# cat /proc/zoneinfo
# cat /proc/mounts

Process Affinity – Linux

  • 1. Introduction
  • 2. Types of Thread Scheduling
    • 2.1. Compact Scheduling
    • 2.2. Round-Robin Scheduling
    • 2.3. Stupid Scheduling
  • 3. Defining Affinity
    • 3.1. The Linux-Portable Way (taskset)
    • 3.2. The Other Linux-Portable Way (numactl)
    • 3.3. Using OpenMP Runtime Extensions
    • 3.4. getfreesocket

1. Introduction

Although a compute node or workstation may appear to have 16 cores and 64 GB of DRAM, these resources are not uniformly accessible to your applications. The best application performance is usually obtained by keeping your code’s parallel workers (e.g., threads or MPI processes) as close to the memory on which they are operating as possible. While you might like to think that the Linux thread scheduler would do this automatically for you, the reality is that most HPC applications benefit greatly from a little bit of help in manually placing threads on different processor cores.

To get an idea of what your multithreaded application is doing while it is running, you can use the pscommand.

Assuming your executable is called application.x, you can easily see what cores each thread is using by issuing the following command in bash:

$ for i in $(pgrep application.x); do ps -mo pid,tid,fname,user,psr -p $i;done

The PSR field is the OS identifier for the core each TID (thread id) is utilizing.

2. Types of Thread Scheduling

Certain types of unevenly loaded applications can experience serious performance degradation caused by the Linux scheduler treating high-performance application codes in the same way it would treat a system daemon that might spend most of its time idle.

These sorts of scheduling issues are best described with diagrams. Let’s assume we have compute nodes with two processor sockets, and each processor has four cores:

topology of a dual-socket, quad-core node

When you run a multithreaded application with four threads (or even four serial applications), Linux will schedule those threads for execution by assigning each one to a CPU core. Without being explicitly told how to do this scheduling, Linux may decide to

  1. run thread0 to thread3 on core0 to core3 on socket0
  2. run thread0 and thread1 on core0 and core1 on socket0, and run thread2 and thread3 on socket1
  3. run thread0 and thread1 on core0 only, run thread2 on core1, run thread3 on core2, and leave core3 completely unutilized
  4. any number of other nonsensical allocations involving assigning multiple threads to a single core while other cores sit idle

It should be obvious that option #3 and #4 are very bad for performance, but the fact is that Linux will happily schedule your multithreaded job (or multiple single-thread jobs) this way if your threads behave in a way that is confusing to the operating system.

compact scheduling

2.1. Compact Scheduling

Option #1 is often referred to as “compact” scheduling and is depicted in the diagram to the right. It keeps all of your threads running on a single physical processor if possible, and this is what you would want if all of the threads in your application need to repeatedly access different parts of a large array. This is because all of the cores on the same physical processor can access the memory banks associated with (or “owned by”) that processor at the same speed. However, cores cannot access memory stored on memory banks owned by a different processor as quickly; this is phenomenon is called NUMA (non-uniform memory access). If your threads all need to access data stored in the memory owned by one processor, it is often best to put all of your threads on the processor who owns that memory.

2.2. Round-Robin Scheduling

scatter or round-robin scheduling

Option #2 is called “scatter” or “round-robin” scheduling and is ideal if your threads are largely independent of each other and don’t need to access a lot of memory that other threads need. The benefit to round-robin thread scheduling is that not all threads have to share the same memory channel and cache, effectively doubling the memory bandwidth and cache sizes available to your application. The tradeoff is that memory latency becomes higher as threads have to start accessing memory that might be owned by another processor.

2.3. Stupid Scheduling

stupid scheduling

Option #3 and #4 are what I call “stupid” scheduling (see diagram to the right) and can often be the default behavior of the Linux thread scheduler if you don’t tell Linux where your threads should run. This happens because in traditional Linux server environments, most of the proceses that are running at any given time aren’t doing anything. To conserve power, Linux will put a lot of these quiet processes on the same processor or cores, then move them to their own dedicated core when they wake up and have to start processing.

If your application is running at full bore 100% of the time, Linux will probably keep it on its own dedicated CPU core. However, if your application has an uneven load (e.g., threads are mostly idle while the last thread finishes), Linux will see that the application is mostly quiet and pack all the quiet threads (e.g., t0 and t1 in the diagram to the right) on to the same CPU core. This wouldn’t be so bad, but the cost of moving a thread from one core to another requires context switches which get very expensive when done hundreds or thousands of times a minute.

3. Defining affinity

3.1. The Linux-Portable Way (taskset)

If you want to launch a job (e.g., simulation.x) on a certain set of cores (e.g., core0, core2, core4, and core6), issue

$ taskset -c 0,2,4,6 simulation.x

If your process is already running, you can define thread affinity while in flight. It also lets you bind specific TIDs to specific processors at a level of granularity greater than specifying -c 0,2,4,6because Linux may still schedule two threads on core2 and nothing on core0. For example,

$ for i in $(pgrep application.x);do ps -mo pid,tid,fname,user,psr -p $i;done
21654     - applicat glock      -
    - 21654 -        glock      0
    - 21655 -        glock      2
    - 21656 -        glock      2
    - 21657 -        glock      6
    - 21658 -        glock      4
$ taskset -p -c 0 21654
$ taskset -p -c 0 21655
$ taskset -p -c 2 21656
$ taskset -p -c 4 21657
$ taskset -p -c 6 21658

This sort of scheduling will happen under certain conditions, so specifying a set of cpus to a set of threads without specifically assigning each thread to a physical core may not always behave optimally.

3.2. The Other Linux-Portable Way (numactl)

The emerging standard for easily binding processes to processors on Linux-based supercomputers isnumactl. It can operate on a coarser-grained basis (i.e., CPU sockets rather than individual CPU cores) than taskset (only CPU cores) because it is aware of the processor topology and how the CPU cores map to CPU sockets. Using numactl is typically easier–after all, the common goal is to confine a process to a numa pool (or “cpu node”) rather than specific CPU cores. To that end, numactl also lets you bind a processor’s memory locality to prevent processes from having to jump across NUMA pools (called “memory nodes” in numactl parlance).

Whereas if you wanted to bind a specific process to one processor socket with taskset you would have to

$ taskset -c 0,2,4,6 simulation.x

the same operation is greatly simplified with numactl:

$ numactl --cpunodebind=0 simulation.x

If you want to also restrict simulation.x’s memory use to the numa pool associated with cpu node 0, you can do

$ numactl --cpunodebind=0 --membind=0 simulation.x

or just

$ numactl -C 0 -N 0 simulation.x

You can see what cpu nodes and their corresponding memory nodes are available on your system by using numactl -H:

$ numactl -H
available: 2 nodes (0-1)
node 0 size: 32728 MB
node 0 free: 12519 MB
node 1 size: 32768 MB
node 1 free: 16180 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10

numactl also lets you supply specific cores (like taskset) with the –physcpubind or -C. Unlike taskset, though, numactl does not appear to let you change the CPU affinity of a process that is already running.

An alternative syntax to numactl -C is something like

$ numactl -C +0,1,2,3 simulation.x

By prefixing your list of cores with a +, you can have numactl bind to relative cores. When combined with cpusets (which are enabled by default for all jobs on Gordon), the above command will use the 0th, 1st, 2nd, and 3rd core of the job’s given cpuset instead of literally core 0,1,2,3.

3.3. Using OpenMP Runtime Extensions

OpenMP 4.0 now includes standardized controls for binding threads to cores. I haven’t caught up with these changes but I will document them here once I do.

Multithreaded programs compiled with Intel Compilers can utilize Intel’s Thread Affinity Interface for OpenMP applications. Set and export the KMP_AFFINITY env variable to express binding preferences.KMP_AFFINITY has three principal binding strategies:

  • compact fills up one socket before allocating to other sockets
  • scatter evenly spreads threads across all sockets and cores
  • explicit allows you define exactly which cores/sockets to use

Using KMP_AFFINITY=compact will preferentially bind all your threads, one per core, to a single socket before it tries binding them to other sockets. Unfortunately, it will start at socket0 regardless of if other processes (such as another SMP job) is already bound to that socket. You can explicitly specify an offset to force the job to bind to a specific socket, but you need to know exactly what is running on what cores and sockets on your node in order to specify this in your submit script.

You can also explicitly define which cores your job should use. Combined with a little knowledge of your system’s CPU topology ([Intel’s Processor Topology Enumeration tool][intel’s processor enumeration tool] is great for this). If you wanted to run on cores 0, 2, 4, and 6, you would do

export KMP_AFFINITY='proclist=[0,2,4,6],explicit'

GNU’s implementation of OpenMP has a environment variable similar to KMP_AFFINITY calledGOMP_CPU_AFFINITY. Incidentally, Intel’s OpenMP supports GOMP_CPU_AFFINITY, so using this variable may be a relatively portable way to specify thread affinity at runtime. The equivalent GOMP_CPU_AFFINITY for the KMP_AFFINITY I gave above would be:

export GOMP_CPU_AFFINITY='0,2,4,6'

3.4. getfreesocket

I wrote a small perl script called getfreesocket that uses KMP_AFFINITY=explicit (or GOMP_CPU_AFFINITY) and some probing of the Linux OS at runtime to bind SMP jobs to free processor sockets. It should be invoked in a job script something like this:



nprocs=$(grep '^physical id' /proc/cpuinfo  | sort -u | wc -l)
ncores=$(grep '^processor' /proc/cpuinfo | sort -u | wc -l)

freesock=$(getfreesocket -explicit=${NPROCS})
if [ "z$freesock" == "z" ]
  echo "Not enough free processors!  aborting"
  exit 1
  GOMP_CPU_AFFINITY="$(echo $freesock | sed -e 's/,/ /g')"



This was a very simple solution to get single-socket jobs to play nicely on the shared batch system we were using at the Interfacial Molecular Science Laboratory. While numactl is an easier way to accomplish some of this, it still requires that you know what other processes are sharing your node and on what CPU cores they are running. I’ve experienced problems with Linux’s braindead thread scheduling so this getfreesocket finds completely unused sockets that can be fed into taskset,KMP_AFFINITY, or numactl.

This is not as great an issue if your resource manager supports launching jobs within cpusets. Your resource manager will provide a cpuset, and using relative specifiers for numactl cores (e.g., numactl -C +0-3) will bind to the free socket provided by the batch environment. Of course, this will not specifically bind one thread to one core, so using KMP_AFFINITY or GOMP_CPU_AFFINITY may remain necessary.

Concepts Of Linux Programming – Files and Filesystem

The file is the most basic and fundamental abstraction in Linux. Linux follows the everything-is-a-file philosophy (although not as strictly as some other systems, such as Plan 9).Consequently, much interaction occurs via reading of and writing to files, even when the object in question is not what you would consider a normal file.

In order to be accessed, a file must first be opened. Files can be opened for reading, writing, or both. An open file is referenced via a unique descriptor, a mapping from the metadata associated with the open file back to the specific file itself. Inside the Linux kernel, this descriptor is handled by an integer (of the C type int) called the file descriptor, abbreviated fd. File descriptors are shared with user space, and are used directly by user programs to access files. A large part of Linux system programming consists of opening, manipulating, closing, and otherwise using file descriptors.

Regular files

What most of us call “files” are what Linux labels regular files. A regular file contains bytes of data, organized into a linear array called a byte stream. In Linux, no further organization or formatting is specified for a file. The bytes may have any values, and they may be organized within the file in any way. At the system level, Linux does not enforce a structure upon files beyond the byte stream. Some operating systems, such as VMS, provide highly structured files, supporting concepts such as records. Linux does not.

Any of the bytes within a file may be read from or written to. These operations start at a specific byte, which is one’s conceptual “location” within the file. This location is called the file position or file offset. The file position is an essential piece of the metadata that the kernel associates with each open file. When a file is first opened, the file position is zero. Usually, as bytes in the file are read from or written to, byte-by-byte, the file position increases in kind. The file position may also be set manually to a given value, even a value beyond the end of the file. Writing a byte to a file position beyond the end of the file will cause the intervening bytes to be padded with zeros. While it is possible to write bytes in this manner to a position beyond the end of the file, it is not possible to write bytes to a position before the beginning of a file. Such a practice sounds nonsensical, and, indeed, would have little use. The file position starts at zero; it cannot be negative. Writing a byte to the middle of a file overwrites the byte previously located at that offset. Thus, it is not possible to expand a file by writing into the middle of it. Most file writing occurs at the end of the file. The file position’s maximum value is bounded only by the size of the C type used to store it, which is 64 bits on a modern Linux system.

The size of a file is measured in bytes and is called its length. The length, in other words, is simply the number of bytes in the linear array that make up the file. A file’s length can be changed via an operation called truncation. A file can be truncated to a new size smaller than its original size, which results in bytes being removed from the end of the file. Confusingly, given the operation’s name, a file can also be “truncated” to a new size larger than its original size. In that case, the new bytes (which are added to the end of the file) are filled with zeros. A file may be empty (that is, have a length of zero), and thus contain no valid bytes. The maximum file length, as with the maximum file position, is bounded only by limits on the sizes of the C types that the Linux kernel uses to manage files. Specific filesystems, however, may impose their own restrictions, imposing a smaller ceiling on the maximum length.

A single file can be opened more than once, by a different or even the same process. Each open instance of a file is given a unique file descriptor. Conversely, processes can share their file descriptors, allowing a single descriptor to be used by more than one process. The kernel does not impose any restrictions on concurrent file access. Multiple processes are free to read from and write to the same file at the same time. The results of such concurrent accesses rely on the ordering of the individual operations, and are generally unpredictable. User-space programs typically must coordinate amongst themselves to ensure that concurrent file accesses are properly synchronized.

Although files are usually accessed via filenames, they actually are not directly associated with such names. Instead, a file is referenced by an inode (originally short for information node), which is assigned an integer value unique to the filesystem (but not necessarily unique across the whole system). This value is called the inode number, often abbreviated as i-number or ino. An inode stores metadata associated with a file, such as its modification timestamp, owner, type, length, and the location of the file’s data—but no filename! The inode is both a physical object, located on disk in Unix-style filesystems, and a conceptual entity, represented by a data structure in the Linux kernel.

Directories and links

Accessing a file via its inode number is cumbersome (and also a potential security hole), so files are always opened from user space by a name, not an inode number. Directories are used to provide the names with which to access files. A directory acts as a mapping of human-readable names to inode numbers. A name and inode pair is called a link. The physical on-disk form of this mapping—for example, a simple table or a hash—is implemented and managed by the kernel code that supports a given filesystem. Conceptually, a directory is viewed like any normal file, with the difference that it contains only a mapping of names to inodes. The kernel directly uses this mapping to perform name-to-inode resolutions.

When a user-space application requests that a given filename be opened, the kernel opens the directory containing the filename and searches for the given name. From the filename, the kernel obtains the inode number. From the inode number, the inode is found. The inode contains metadata associated with the file, including the on-disk location of the file’s data.

Initially, there is only one directory on the disk, the root directory. This directory is usually denoted by the path /. But, as we all know, there are typically many directories on a system. How does the kernel know whichdirectory to look in to find a given filename?

As mentioned previously, directories are much like regular files. Indeed, they even have associated inodes. Consequently, the links inside of directories can point to the inodes of other directories. This means directories can nest inside of other directories, forming a hierarchy of directories. This, in turn, allows for the use of the pathnames with which all Unix users are familiar—for example,/home/blackbeard/concorde.png.

When the kernel is asked to open a pathname like this, it walks each directory entry (called a dentry inside of the kernel) in the pathname to find the inode of the next entry. In the preceding example, the kernel starts at /, gets the inode for home, goes there, gets the inode for blackbeard, runs there, and finally gets the inode for concorde.png. This operation is called directory or pathname resolution. The Linux kernel also employs a cache, called the dentry cache, to store the results of directory resolutions, providing for speedier lookups in the future given temporal locality.

A pathname that starts at the root directory is said to be fully qualified, and is called an absolute pathname. Some pathnames are not fully qualified; instead, they are provided relative to some other directory (for example, todo/plunder). These paths are called relative pathnames. When provided with a relative pathname, the kernel begins the pathname resolution in the current working directory. From the current working directory, the kernel looks up the directory todo. From there, the kernel gets the inode for plunder. Together, the combination of a relative pathname and the current working directory is fully qualified.

Although directories are treated like normal files, the kernel does not allow them to be opened and manipulated like regular files. Instead, they must be manipulated using a special set of system calls. These system calls allow for the adding and removing of links, which are the only two sensible operations anyhow. If user space were allowed to manipulate directories without the kernel’s mediation, it would be too easy for a single simple error to corrupt the filesystem.

Hard links

Conceptually, nothing covered thus far would prevent multiple names resolving to the same inode. Indeed, this is allowed. When multiple links map different names to the same inode, we call them hard links.

Hard links allow for complex filesystem structures with multiple pathnames pointing to the same data. The hard links can be in the same directory, or in two or more different directories. In either case, the kernel simply resolves the pathname to the correct inode. For example, a specific inode that points to a specific chunk of data can be hard linked from /home/bluebeard/treasure.txtand /home/blackbeard/to_steal.txt.

Deleting a file involves unlinking it from the directory structure, which is done simply by removing its name and inode pair from a directory. Because Linux supports hard links, however, the filesystem cannot destroy the inode and its associated data on every unlink operation. What if another hard link existed elsewhere in the filesystem? To ensure that a file is not destroyed until all links to it are removed, each inode contains a link count that keeps track of the number of links within the filesystem that point to it. When a pathname is unlinked, the link count is decremented by one; only when it reaches zero are the inode and its associated data actually removed from the filesystem.

Symbolic links

Hard links cannot span filesystems because an inode number is meaningless outside of the inode’s own filesystem. To allow links that can span filesystems, and that are a bit simpler and less transparent, Unix systems also implementsymbolic links (often shortened to symlinks).

Symbolic links look like regular files. A symlink has its own inode and data chunk, which contains the complete pathname of the linked-to file. This means symbolic links can point anywhere, including to files and directories that reside on different filesystems, and even to files and directories that do not exist. A symbolic link that points to a nonexistent file is called a broken link.

Symbolic links incur more overhead than hard links because resolving a symbolic link effectively involves resolving two files: the symbolic link and then the linked-to file. Hard links do not incur this additional overhead—there is no difference between accessing a file linked into the filesystem more than once and one linked only once. The overhead of symbolic links is minimal, but it is still considered a negative.

Symbolic links are also more opaque than hard links. Using hard links is entirely transparent; in fact, it takes effort to find out that a file is linked more than once! Manipulating symbolic links, on the other hand, requires special system calls. This lack of transparency is often considered a positive, as the link structure is explicitly made plain, with symbolic links acting more as shortcutsthan as filesystem-internal links.

Special files

Special files are kernel objects that are represented as files. Over the years, Unix systems have supported a handful of different special files. Linux supports four: block device files, character device files, named pipes, and Unix domain sockets. Special files are a way to let certain abstractions fit into the filesystem, continuing the everything-is-a-file paradigm. Linux provides a system call to create a special file.

Device access in Unix systems is performed via device files, which act and look like normal files residing on the filesystem. Device files may be opened, read from, and written to, allowing user space to access and manipulate devices (both physical and virtual) on the system. Unix devices are generally broken into two groups: character devices and block devices. Each type of device has its own special device file.

A character device is accessed as a linear queue of bytes. The device driver places bytes onto the queue, one by one, and user space reads the bytes in the order that they were placed on the queue. A keyboard is an example of a character device. If the user types “peg,” for example, an application would want to read from the keyboard device the p, the e, and, finally, the g, in exactly that order. When there are no more characters left to read, the device returns end-of-file (EOF). Missing a character, or reading them in any other order, would make little sense. Character devices are accessed via character device files.

A block device, in contrast, is accessed as an array of bytes. The device driver maps the bytes over a seekable device, and user space is free to access any valid bytes in the array, in any order—it might read byte 12, then byte 7, and then byte 12 again. Block devices are generally storage devices. Hard disks, floppy drives, CD-ROM drives, and flash memory are all examples of block devices. They are accessed via block device files.

Named pipes (often called FIFOs, short for “first in, first out”) are aninterprocess communication (IPC) mechanism that provides a communication channel over a file descriptor, accessed via a special file. Regular pipes are the method used to “pipe” the output of one program into the input of another; they are created in memory via a system call and do not exist on any filesystem. Named pipes act like regular pipes but are accessed via a file, called a FIFO special file. Unrelated processes can access this file and communicate.

Sockets are the final type of special file. Sockets are an advanced form of IPC that allow for communication between two different processes, not only on the same machine, but even on two different machines. In fact, sockets form the basis of network and Internet programming. They come in multiple varieties, including the Unix domain socket, which is the form of socket used for communication within the local machine. Whereas sockets communicating over the Internet might use a hostname and port pair for identifying the target of communication, Unix domain sockets use a special file residing on a filesystem, often simply called a socket file.

Filesystems and namespaces

Linux, like all Unix systems, provides a global and unified namespace of files and directories. Some operating systems separate different disks and drives into separate namespaces—for example, a file on a floppy disk might be accessible via the pathname A:\plank.jpg, while the hard drive is located atC:\. In Unix, that same file on a floppy might be accessible via the pathname/media/floppy/plank.jpg or even via /home/captain/stuff/plank.jpg, right alongside files from other media. That is, on Unix, the namespace is unified.

A filesystem is a collection of files and directories in a formal and valid hierarchy. Filesystems may be individually added to and removed from the global namespace of files and directories. These operations are calledmounting and unmounting. Each filesystem is mounted to a specific location in the namespace, known as a mount point. The root directory of the filesystem is then accessible at this mount point. For example, a CD might be mounted at/media/cdrom, making the root of the filesystem on the CD accessible at/media/cdrom. The first filesystem mounted is located in the root of the namespace, /, and is called the root filesystem. Linux systems always have a root filesystem. Mounting other filesystems at other mount points is optional.

Filesystems usually exist physically (i.e., are stored on disk), although Linux also supports virtual filesystems that exist only in memory, and network filesystems that exist on machines across the network. Physical filesystems reside on block storage devices, such as CDs, floppy disks, compact flash cards, or hard drives. Some such devices are partionable, which means that they can be divided up into multiple filesystems, all of which can be manipulated individually. Linux supports a wide range of filesystems—certainly anything that the average user might hope to come across—including media-specific filesystems (for example, ISO9660), network filesystems (NFS), native filesystems (ext4), filesystems from other Unix systems (XFS), and even filesystems from non-Unix systems (FAT).

The smallest addressable unit on a block device is the sector. The sector is a physical attribute of the device. Sectors come in various powers of two, with 512 bytes being quite common. A block device cannot transfer or access a unit of data smaller than a sector and all I/O must occur in terms of one or more sectors.

Likewise, the smallest logically addressable unit on a filesystem is the block. The block is an abstraction of the filesystem, not of the physical media on which the filesystem resides. A block is usually a power-of-two multiple of the sector size. In Linux, blocks are generally larger than the sector, but they must be smaller than the page size (the smallest unit addressable by the memory management unit, a hardware component). Common block sizes are 512 bytes, 1 kilobyte, and 4 kilobytes.

Historically, Unix systems have only a single shared namespace, viewable by all users and all processes on the system. Linux takes an innovative approach and supports per-process namespaces, allowing each process to optionally have a unique view of the system’s file and directory hierarchy. By default, each process inherits the namespace of its parent, but a process may elect to create its own namespace with its own set of mount points and a unique root directory.

Simple utility to allocate memory on a Linux Machine

1. What can I use this for?

  • Test swap
  • Test behaviors on a machine when there is little memory available


2. Usage



cd /tmp
vim memtest.c
<enter the contents in the file and save it>
vim Makefile
<enter the contents in the file and save it>
sudo make install



all: memtest.c
$(CC) memtest.c -o memtest

install: memtest
install -m 0755 memtest $(PREFIX)/bin/

rm -rf *o memtest


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <unistd.h>

#if defined(_SC_PHYS_PAGES) && defined(_SC_AVPHYS_PAGES) && defined(_SC_PAGE_SIZE)

size_t getTotalSystemMemory(){
long pages = sysconf(_SC_PHYS_PAGES);
long page_size = sysconf(_SC_PAGE_SIZE);
return pages * page_size;

size_t getFreeSystemMemory(){
long pages = sysconf(_SC_AVPHYS_PAGES);
long page_size = sysconf(_SC_PAGE_SIZE);
return pages * page_size;

bool eat(long total,int chunk){
long i;
short *buffer=malloc(sizeof(char)*chunk);
return false;
return true;

int main(int argc, char *argv[]){

printf(“Currently total memory: %zd\n”,getTotalSystemMemory());
printf(“Currently avail memory: %zd\n”,getFreeSystemMemory());

int i;
char *arg=argv[i];
if(strcmp(arg, “-h”)==0 || strcmp(arg,”-?”)==0  || argc==1){
printf(“Usage: eatmemory <size>\n”);
printf(“Size can be specified in megabytes or gigabytes in the following way:\n”);
printf(“#          # Bytes      example: 1024\n”);
printf(“#M         # Megabytes  example: 15M\n”);
printf(“#G         # Gigabytes  example: 2G\n”);
printf(“#%%         # Percent    example: 50%%\n”);
}else if(i>0){
int len=strlen(arg);
char unit=arg[len – 1];
long size=-1;
int chunk=1024;
if(!isdigit(unit) ){
if(unit==’M’ || unit==’G’){
size=atol(arg) * (unit==’M’?1024*1024:1024*1024*1024);
else if (unit==’%’) {
size = (atol(arg) * (long)getFreeSystemMemory())/100;
printf(“Invalid size format\n”);
printf(“Eating %ld bytes in chunks of %d…\n”,size,chunk);
printf(“Done, press any key to free the memory\n”);
printf(“ERROR: Could not allocate the memory”);




memtest <size>

Size is in number of bytes, megabytes or gigabytes.



memtest 1024
memtest 10M
memtest 4G


Linux Concepts – File/Directory Permissions

Although there are already a lot of good security features built into Linux-based systems, one very important potential vulnerability can exist when local access is granted – – that is file permission based issues resulting from a user not assigning the correct permissions to files and directories. So based upon the need for proper permissions, I will go over the ways to assign permissions and show you some examples where modification may be necessary.

Basic File Permissions

Permission Groups

Each file and directory has three user based permission groups:

  • owner – The Owner permissions apply only the owner of the file or directory, they will not impact the actions of other users.
  • group – The Group permissions apply only to the group that has been assigned to the file or directory, they will not effect the actions of other users.
  • all users – The All Users permissions apply to all other users on the system, this is the permission group that you want to watch the most.

Permission Types

Each file or directory has three basic permission types:

  • read – The Read permission refers to a user’s capability to read the contents of the file.
  • write – The Write permissions refer to a user’s capability to write or modify a file or directory.
  • execute – The Execute permission affects a user’s capability to execute a file or view the contents of a directory.

Viewing the Permissions

You can view the permissions by checking the file or directory permissions in your favorite GUI File Manager (which I will not cover here) or by reviewing the output of the \”ls -l\” command while in the terminal and while working in the directory which contains the file or folder.

The permission in the command line is displayed as: _rwxrwxrwx 1 owner:group

  1. User rights/Permissions
    1. The first character that I marked with an underscore is the special permission flag that can vary.
    2. The following set of three characters (rwx) is for the owner permissions.
    3. The second set of three characters (rwx) is for the Group permissions.
    4. The third set of three characters (rwx) is for the All Users permissions.
  2. Following that grouping since the integer/number displays the number of hardlinks to the file.
  3. The last piece is the Owner and Group assignment formatted as Owner:Group.

Modifying the Permissions

When in the command line, the permissions are edited by using the command chmod. You can assign the permissions explicitly or by using a binary reference as described below.

Explicitly Defining Permissions

To explicity define permissions you will need to reference the Permission Group and Permission Types.

The Permission Groups used are:

  • u – Owner
  • g – Group
  • o or a – All Users

The potential Assignment Operators are + (plus) and – (minus); these are used to tell the system whether to add or remove the specific permissions.

The Permission Types that are used are:

  • r – Read
  • w – Write
  • x – Execute

So for an example, lets say I have a file named file1 that currently has the permissions set to _rw_rw_rw, which means that the owner, group and all users have read and write permission. Now we want to remove the read and write permissions from the all users group.

To make this modification you would invoke the command: chmod a-rw file1
To add the permissions above you would invoke the command: chmod a+rw file1

As you can see, if you want to grant those permissions you would change the minus character to a plus to add those permissions.

Using Binary References to Set permissions

Now that you understand the permissions groups and types this one should feel natural. To set the permission using binary references you must first understand that the input is done by entering three integers/numbers.

A sample permission string would be chmod 640 file1, which means that the owner has read and write permissions, the group has read permissions, and all other user have no rights to the file.

The first number represents the Owner permission; the second represents the Group permissions; and the last number represents the permissions for all other users. The numbers are a binary representation of the rwx string.

  • r = 4
  • w = 2
  • x = 1

You add the numbers to get the integer/number representing the permissions you wish to set. You will need to include the binary permissions for each of the three permission groups.

So to set a file to permissions on file1 to read _rwxr_____, you would enter chmod 740 file1.

Owners and Groups

I have made several references to Owners and Groups above, but have not yet told you how to assign or change the Owner and Group assigned to a file or directory.

You use the chown command to change owner and group assignments, the syntax is simple chown owner:group filename, so to change the owner of file1 to user1 and the group to family you would enter chown user1:family file1.

Advanced Permissions

The special permissions flag can be marked with any of the following:

  • _ – no special permissions
  • d – directory
  • l – The file or directory is a symbolic link
  • s – This indicated the setuid/setgid permissions. This is not set displayed in the special permission part of the permissions display, but is represented as a s in the read portion of the owner or group permissions.
  • t – This indicates the sticky bit permissions. This is not set displayed in the special permission part of the permissions display, but is represented as a t in the executable portion of the all users permissions

Setuid/Setgid Special Permissions

The setuid/setguid permissions are used to tell the system to run an executable as the owner with the owner\’s permissions.

Be careful using setuid/setgid bits in permissions. If you incorrectly assign permissions to a file owned by root with the setuid/setgid bit set, then you can open your system to intrusion.

You can only assign the setuid/setgid bit by explicitly defining permissions. The character for the setuid/setguid bit is s.

So do set the setuid/setguid bit on you would issue the command chmod g+s

Sticky Bit Special Permissions

The sticky bit can be very useful in shared environment because when it has been assigned to the permissions on a directory it sets it so only file owner can rename or delete the said file.

You can only assign the sticky bit by explicitly defining permissions. The character for the sticky bit is t.

To set the sticky bit on a directory named dir1 you would issue the command chmod +t dir1.

When Permissions Are Important

To some users of Mac- or Windows-based computers you don’t think about permissions, but those environments don’t focus so aggressively on user based rights on files unless you are in a corporate environment. But now you are running a Linux-based system and permission based security is simplified and can be easily used to restrict access as you please.

So I will show you some documents and folders that you want to focus on and show you how the optimal permissions should be set.

  • home directories – The users\’ home directories are important because you do not want other users to be able to view and modify the files in another user\’s documents of desktop. To remedy this you will want the directory to have the drwx______ (700) permissions, so lets say we want to enforce the correct permissions on the user user1\’s home directory that can be done by issuing the command chmod 700 /home/user1.
  • bootloader configuration files – If you decide to implement password to boot specific operating systems then you will want to remove read and write permissions from the configuration file from all users but root. To do you can change the permissions of the file to 700.
  • system and daemon configuration files – It is very important to restrict rights to system and daemon configuration files to restrict users from editing the contents, it may not be advisable to restrict read permissions, but restricting write permissions is a must. In these cases it may be best to modify the rights to 644.
  • firewall scripts – It may not always be necessary to block all users from reading the firewall file, but it is advisable to restrict the users from writing to the file. In this case the firewall script is run by the root user automatically on boot, so all other users need no rights, so you can assign the 700 permissions.