Main, Network, Oracle

What is a Gossip Protocol

gossip-protocol-e1526457689553

 

Computer nodes are not far behind men and viruses when it comes to gossiping. The fundamentals of gossip are the same. It is the way of spreading an information to the people of the same world. When you get to know that your manager has put down her papers, you break this information to your colleague on a smoke break. Your colleague likes the information and passes it on to another colleague while you break this information to another colleague of yours. In no time, every person in the company would be aware of the “gossip” about your manager leaving her job. Actually, “in no time” is incorrect; time taken for everyone to know would be of the order of logarithmic of the number of employees in the company. That’s exactly how computer nodes gossip with each other.

Consider a network of computer nodes. Let’s say, node N1 receives a new information. N1 would then randomly select a peer (say, node N2) and share the information. N1 and N2 would then pick a peer each (say, node N3 and N4) randomly and share the information. The process continues in this fashion till the information is passed on to all the connected nodes. Typically, nodes store the time of information exchange also. In the example above, in the first exchange, N2 would store the details of N1, the information, and the time at which N2 got to know about the information.

Where is Gossip Protocol Used?

Gossip protocol works beautifully in a decentralized network of nodes. It is a decentralized way of information exchange. Rules can be built on these nodes to determine the truthfulness of an information. Let’s say if a network obeying gossip protocol holds a rule that when two-thirds of the nodes return the same information, that information will be considered as the truth. In this process, all the nodes are treated equally. It does not matter if a node is more powerful than its peers. The only thing that matters here is the network bandwidth.

Advertisements
Hudson Jenkins

NodeJS on Jenkins with Slack integration

Hi Guys,

 

i was feeling so bored couple of hours back, so i thought of creating a jenkins pipeline and integrate Slack with Node Jobs in it.

Please watch the video and suggest some use cases that i can work upon for ya’all

Main, Operating System, Redhat / CEntOS / Oracle Linux, Ubuntu

Process Management in Linux

Process Types

Before we start talking about Linux process management, we should review process types. There are four common types of processes:

  • Parent process
  • Child process
  • Orphan Process
  • Daemon Process
  • Zombie Process

Parent process is a process which runs the fork() system call. All processes except process 0 have one parent process.

Child process is created by a parent process.

Orphan Process it continues running while its parent process has terminated or finished.

Daemon Process is always created from a child process and then exit.

Zombie Process exists in the process table although it is terminated.

The orphan process is a process that still executing and its parent process has died while orphan processes do not become zombie processes.

Memory Management

In server administration, memory management is one of your responsibility that you should care about as a system administrator.

One of most used commands in Linux process management is the free command:

$ freem

The -m option to show values in megabytes.

Certificates – Digital Certificates (Summary)01-linux-process-managment-free-command

Our main concern in buff/cache.

The output of free command here means 536 megabytes is used while 1221 megabytes is available.

The second line is the swap. Swapping occurs when memory becomes to be crowded.

The first value is the total swap size which is 3070 megabytes.

The second value is the used swap which is 0.

The third value is the available swap for usage which is 3070.

From the above results, you can say that memory status is good since no swap is used, so while we are talking about the swap, let’s discover what proc directory provides us about the swap.

$ cat /proc/swapscat /proc/swaps

01-linux-process-managment-free-command

This command shows the swap size and how much is used:

$ cat /proc/sys/vm/swappinesscat /proc/sys/vm/swappiness

01-linux-process-managment-free-commandThis command shows a value from 0 to 100, this value means the system will use the swap if the memory becomes 70% used.

Notice: the default value for most distros for this value is between 30 and 60, you can modify it like this:

$ echo 50 >/proc/sys/vm/swappinessecho 50 >/proc/sys/vm/swappiness

Or using sysctl command like this:

$ sudo sysctl -wvm.swappiness=50sudo sysctl -wvm.swappiness=50

Changing the swappiness value using the above commands is not permanent, you have to write it on /etc/sysctl.conf file like this:

$ nano /etc/sysctl.conf

vm.swappiness=50

01-linux-process-managment-free-command

Cool!!

The swap level measures the chance to transfer a process from the memory to the swap.

Choosing the accurate swappiness value for your system requires some experimentation to choose the best value for your server.

Managing virtual memory with vmstat

Another important command used in Linux process management which is vmstat. vmstat command gives a summary reporting about memory, processes, and paging.

$ vmstat -avmstat -a

-a option is used to get all active and inactive processes.

01-linux-process-managment-free-command

And this is the important column outputs from this command:

si: How much swapped in from disk.

so: How much swapped out to disk.

bi: How much sent to block devices.

bo: How much obtained from block devices.

us: The user time.

sy: The system time.

id: The idle time.

Our main concern is the (si) and (so) columns, where (si) column shows page-ins while (so) column provides page-outs.

A better way to look at these values is by viewing the output with a delay option like this:

$ vmstat 2 5vmstat 2 5

01-linux-process-managment-free-command

Where 2 is the delay in seconds and 5 is the number of times vmstat is called. It shows five updates of the command and all data is presented in kilobytes.

Page-in (si) happens when you start an application and the information is paged-in. Page out (so) happens when the kernel is freeing up memory.

System Load & top Command

In Linux process management, the top command gives you a list of the running processes and how they are using CPU and memory ; the output is a real-time data.

If you have a dual core system may have the first core at 40 percent and the second core at 70 percent, in this case, the top command may show a combined result of 110 percent, but you will not know the individual values for each core.

$ top -c-c

01-linux-process-managment-free-command

We use -c option to show the command line or the executable path behind that process.

You can press 1 key while you watch the top command statistics to show individual CPU statuses.

01-linux-process-managment-free-command

Keep in mind that certain processes are spawned like the child processes, you will see multiple processes for the same program like httpd and PHP-fpm.

You shouldn’t rely on top command only, you should review other resources before making a final action.

Monitoring Disk I/O with iotop

The system starts to be slow as a result of high disk activities, so it is important to monitor disk activities. That means figuring out which processes or users cause this disk activity.

The iotop command in Linux process management helps us to monitor disk I/O in real-time. You can install it if you don’t have it:

$ yum install iotop

Running iotop without any options will result in a list all processes.

To view the processes that cause to disk activity, you should use -o option:

$ iotop -o-o

01-linux-process-managment-free-command

You can easily know what program is impacting the system.

ps command

We’ve talked about ps command before on a previous post and how to order the processes by memory usage and CPU usage.

Monitoring System Health with iostat and lsof

iostat command gives you CPU utilization report; it can be used with -c option to display the CPU utilization report.

$ iostat -ciostat -c

The output result is easy to understand, but if the system is busy, you will see %iowait increases. That means the server is transferring or copying a lot of files.

With this command, you can check the read and write operations, so you should have a solid knowledge of what is hanging your disk and take the right decision.

Additionally, lsof command is used to list the open files:

lsof command shows which executable is using the file, the process ID, the user, and the name of the opened file.

Calculating the system load

Calculating system load is very important in Linux process management. The system load is the amount of processing for the system which is currently working. It is not the perfect way to measure system performance, but it gives you some evidence.

The load is calculated like this:

Actual Load = Total Load (uptime) / No. of CPUs

You can calculate the uptime by reviewing uptime command or top command:

$ uptimeuptime

$ toptop

The server load is shown in 1, 5, and 15 minutes.

As you can see, the average load is 0.00 at the first minute, 0.01 at the fifth minute, and 0.05 at fifteenth minutes.

When the load increases, processors are queued, and if there are many processor cores, the load is distributed equally across the server’s cores to balance the work.

You can say that the good load average is about 1. This does not mean if the load exceeds 1 that there is a problem, but if you begin to see higher numbers for a long time, that means a high load and there is a problem.

pgrep and systemctl

You can get the process ID using pgrep command followed by the service name.

$ pgrep servicename

This command shows the process ID or PID.

Note if this command shows more than process ID like httpd or SSH, the smallest process ID is the parent process ID.

On the other hand, you can use the systemctl command to get the main PID like this:

$ systemctl status<service_name>.service

There are more ways to obtain the required process ID or parent process ID, but this one is easy and straight.

Managing Services with systemd

If we are going to talk about Linux process management, we should take a look at systemd. The systemd is responsible for controlling how services are managed on modern Linux systems like CentOS 7.

Instead of using chkconfig command to enable and disable a service during the boot, you can use the systemctl command.

Systemd also ships with its own version of the top command, and in order to show the processes that are associated with a specific service, you can use the system-cgtop command like this:

$ systemdcgtop

As you can see, all associated processes, path, the number of tasks, the % of CPU used, memory allocation, and the inputs and outputs related.

This command can be used to output a recursive list of service content like this:

$ systemdcgls

This command gives us very useful information that can be used to make your decision.

Nice and Renice Processes

The process nice value is a numeric indication that belongs to the process and how it’s fighting for the CPU.

A high nice value indicates a low priority for your process, so how nice you are going to be to other users, and from here the name came.

The nice range is from -20 to +19.

nice command sets the nice value for a process at creation time, while renice command adjusts the value later.

$ nice –n 5 ./myscriptnice –n 5 ./myscript

This command increases the nice value which means lower priority by 5.

$ sudo renice 5 22132213

This command decreases the nice value means increased priority and the number (2213) is the PID.

You can increase its nice value (lower priority) but cannot lower it (high priority) while root user can do both.

Sending the kill signal

To kill a service or application that causes a problem, you can issue a termination signal (SIGTERM). You can review the previous post about signals and jobs.

$ kill process IDkill process IDID

This method is called safe kill. However, depending on your situation, maybe you need to force a service or application to hang up like this:

$ kill -1 process -1 process ID

Sometimes the safe killing and reloading fail to do anything, you can send kill signal SIGKILL by using -9 option which is called forced kill.

$ kill -9 process IDkill -9 process ID

There are no cleanup operations or safe exit with this command and not preferred. However, you can do something more proper by using the pkill command.

$ pkill -9 serviceName-9 serviceNameserviceName

And you can use pgrep command to ensure that all associated processes are killed.

$ pgrep serviceNamepgrep serviceName

I hope you have a good idea about Linux process management and how to make a good action to make the system healthy.

Thank you

Main, RabbitMQ

RabbitMQ :- rabbitmqctl

NAME

rabbitmqctl — command line for managing a RabbitMQ broker

SYNOPSIS

rabbitmqctl [-q] [-l] [-n node] [-t timeoutcommand [command_options]

DESCRIPTION

RabbitMQ is a multi-protocol open source messaging broker.

rabbitmqctl is a command line tool for managing a RabbitMQ broker. It performs all actions by connecting to one of the broker’s nodes.

Diagnostic information is displayed if the broker was not running, could not be reached, or rejected the connection due to mismatching Erlang cookies.

OPTIONS

-n node
Default node is “rabbit@server”, where server is the local host. On a host named “myserver.example.com”, the node name of the RabbitMQ Erlang node will usually be “rabbit@myserver” (unless RABBITMQ_NODENAME has been set to some non-default value at broker startup time). The output of “hostname -s” is usually the correct suffix to use after the “@” sign. See rabbitmq-server(8) for details of configuring the RabbitMQ broker.
-q–quiet
Quiet output mode is selected. Informational messages are suppressed when quiet mode is in effect.
–dry-run
Do not run the command. Only print information message.
-t timeout–timeout timeout
Operation timeout in seconds. Only applicable to “list” commands. Default is infinity.
-l–longnames
Use longnames for erlang distribution. If RabbitMQ broker uses long node names for erlang distribution, the option must be specified.
–erlang-cookie cookie
Erlang distribution cookie. If RabbitMQ node is using a custom erlang cookie value, the cookie value must be set vith this parameter.

COMMANDS

help [-l] [command_name]
Prints usage for all available commands.

-l–list-commands
List command usages only, without parameter explanation.
command_name
Prints usage for the specified command.

Application Management

force_reset
Forcefully returns a RabbitMQ node to its virgin state.

The force_reset command differs from reset in that it resets the node unconditionally, regardless of the current management database state and cluster configuration. It should only be used as a last resort if the database or cluster configuration has been corrupted.

For reset and force_reset to succeed the RabbitMQ application must have been stopped, e.g. with stop_app.

For example, to reset the RabbitMQ node:

rabbitmqctl force_reset
hipe_compile directory
Performs HiPE-compilation and caches resulting .beam-files in the given directory.

Parent directories are created if necessary. Any existing .beam files from the directory are automatically deleted prior to compilation.

To use this precompiled files, you should set RABBITMQ_SERVER_CODE_PATH environment variable to directory specified in hipe_compile invokation.

For example, to HiPE-compile modules and store them to /tmp/rabbit-hipe/ebin directory:

rabbitmqctl hipe_compile /tmp/rabbit-hipe/ebin
reset
Returns a RabbitMQ node to its virgin state.

Removes the node from any cluster it belongs to, removes all data from the management database, such as configured users and vhosts, and deletes all persistent messages.

For reset and force_reset to succeed the RabbitMQ application must have been stopped, e.g. with stop_app.

For example, to resets the RabbitMQ node:

rabbitmqctl reset
rotate_logs
Instructs the RabbitMQ node to perform internal log rotation.

Log rotation is performed according to lager settings specified in configuration file.

Note that there is no need to call this command in case of external log rotation (e.g. from logrotate(8)), because lager detects renames and automatically reopens log files.

For example, this command starts internal log rotation process:

rabbitmqctl rotate_logs

Rotation is performed asynchronously, so there is no guarantee that it will be completed when this command returns.

shutdown
Shuts down the Erlang process on which RabbitMQ is running. The command is blocking and will return after the Erlang process exits. If RabbitMQ fails to stop, it will return a non-zero exit code.

Unlike the stop command, the shutdown command:

  • does not require a pid_file to wait for the Erlang process to exit
  • returns a non-zero exit code if RabbitMQ node is not running

For example, to shut down the Erlang process on which RabbitMQ is running:

rabbitmqctl shutdown
start_app
Starts the RabbitMQ application.

This command is typically run after performing other management actions that required the RabbitMQ application to be stopped, e.g. reset.

For example, to instruct the RabbitMQ node to start the RabbitMQ application:

rabbitmqctl start_app
stop [pid_file]
Stops the Erlang node on which RabbitMQ is running. To restart the node follow the instructions for “Running the Server” in the installation guide.

If a pid_file is specified, also waits for the process specified there to terminate. See the description of the wait command for details on this file.

For example, to instruct the RabbitMQ node to terminate:

rabbitmqctl stop
stop_app
Stops the RabbitMQ application, leaving the Erlang node running.

This command is typically run prior to performing other management actions that require the RabbitMQ application to be stopped, e.g. reset.

For example, to instruct the RabbitMQ node to stop the RabbitMQ application:

rabbitmqctl stop_app
wait pid_filewait –pid pid
Waits for the RabbitMQ application to start.

This command will wait for the RabbitMQ application to start at the node. It will wait for the pid file to be created if pidfile is specified, then for a process with a pid specified in the pid file or the –pid argument, and then for the RabbitMQ application to start in that process. It will fail if the process terminates without starting the RabbitMQ application.

If the specified pidfile is not created or erlang node is not started within –timeout the command will fail. Default timeout is 10 seconds.

A suitable pid file is created by the rabbitmq-server(8) script. By default this is located in the Mnesia directory. Modify the RABBITMQ_PID_FILE environment variable to change the location.

For example, this command will return when the RabbitMQ node has started up:

rabbitmqctl wait /var/run/rabbitmq/pid

Cluster Management

join_cluster clusternode [–ram]
clusternode
Node to cluster with.
–ram
If provided, the node will join the cluster as a RAM node.

Instructs the node to become a member of the cluster that the specified node is in. Before clustering, the node is reset, so be careful when using this command. For this command to succeed the RabbitMQ application must have been stopped, e.g. with stop_app.

Cluster nodes can be of two types: disc or RAM. Disc nodes replicate data in RAM and on disc, thus providing redundancy in the event of node failure and recovery from global events such as power failure across all nodes. RAM nodes replicate data in RAM only (with the exception of queue contents, which can reside on disc if the queue is persistent or too big to fit in memory) and are mainly used for scalability. RAM nodes are more performant only when managing resources (e.g. adding/removing queues, exchanges, or bindings). A cluster must always have at least one disc node, and usually should have more than one.

The node will be a disc node by default. If you wish to create a RAM node, provide the –ram flag.

After executing the join_cluster command, whenever the RabbitMQ application is started on the current node it will attempt to connect to the nodes that were in the cluster when the node went down.

To leave a cluster, reset the node. You can also remove nodes remotely with theforget_cluster_node command.

For more details see the Clustering guide.

For example, this command instructs the RabbitMQ node to join the cluster that “hare@elena” is part of, as a ram node:

rabbitmqctl join_cluster hare@elena –ram
cluster_status
Displays all the nodes in the cluster grouped by node type, together with the currently running nodes.

For example, this command displays the nodes in the cluster:

rabbitmqctl cluster_status
change_cluster_node_type type
Changes the type of the cluster node.

The type must be one of the following:

The node must be stopped for this operation to succeed, and when turning a node into a RAM node the node must not be the only disc node in the cluster.

For example, this command will turn a RAM node into a disc node:

rabbitmqctl change_cluster_node_type disc
forget_cluster_node [–offline]
–offline
Enables node removal from an offline node. This is only useful in the situation where all the nodes are offline and the last node to go down cannot be brought online, thus preventing the whole cluster from starting. It should not be used in any other circumstances since it can lead to inconsistencies.

Removes a cluster node remotely. The node that is being removed must be offline, while the node we are removing from must be online, except when using the –offline flag.

When using the –offline flag , rabbitmqctl will not attempt to connect to a node as normal; instead it will temporarily become the node in order to make the change. This is useful if the node cannot be started normally. In this case the node will become the canonical source for cluster metadata (e.g. which queues exist), even if it was not before. Therefore you should use this command on the latest node to shut down if at all possible.

For example, this command will remove the node “rabbit@stringer” from the node “hare@mcnulty”:

rabbitmqctl -n hare@mcnulty forget_cluster_node rabbit@stringer
rename_cluster_node oldnode1 newnode1 [oldnode2 newnode2 …]
Supports renaming of cluster nodes in the local database.

This subcommand causes rabbitmqctl to temporarily become the node in order to make the change. The local cluster node must therefore be completely stopped; other nodes can be online or offline.

This subcommand takes an even number of arguments, in pairs representing the old and new names for nodes. You must specify the old and new names for this node and for any other nodes that are stopped and being renamed at the same time.

It is possible to stop all nodes and rename them all simultaneously (in which case old and new names for all nodes must be given to every node) or stop and rename nodes one at a time (in which case each node only needs to be told how its own name is changing).

For example, this command will rename the node “rabbit@misshelpful” to the node “rabbit@cordelia”

rabbitmqctl rename_cluster_node rabbit@misshelpful rabbit@cordelia
update_cluster_nodes clusternode
clusternode
The node to consult for up-to-date information.

Instructs an already clustered node to contact clusternode to cluster when waking up. This is different from join_cluster since it does not join any cluster – it checks that the node is already in a cluster with clusternode.

The need for this command is motivated by the fact that clusters can change while a node is offline. Consider the situation in which node A and B are clustered. A goes down, Cclusters with B, and then B leaves the cluster. When A wakes up, it’ll try to contact B, but this will fail since B is not in the cluster anymore. The following command will solve this situation:

update_cluster_nodes -n A C
force_boot
Ensures that the node will start next time, even if it was not the last to shut down.

Normally when you shut down a RabbitMQ cluster altogether, the first node you restart should be the last one to go down, since it may have seen things happen that other nodes did not. But sometimes that’s not possible: for instance if the entire cluster loses power then all nodes may think they were not the last to shut down.

In such a case you can invoke force_boot while the node is down. This will tell the node to unconditionally start next time you ask it to. If any changes happened to the cluster after this node shut down, they will be lost.

If the last node to go down is permanently lost then you should use forget_cluster_node –offline in preference to this command, as it will ensure that mirrored queues which were mastered on the lost node get promoted.

For example, this will force the node not to wait for other nodes next time it is started:

rabbitmqctl force_boot
sync_queue [-p vhostqueue
queue
The name of the queue to synchronise.

Instructs a mirrored queue with unsynchronised slaves to synchronise itself. The queue will block while synchronisation takes place (all publishers to and consumers from the queue will block). The queue must be mirrored for this command to succeed.

Note that unsynchronised queues from which messages are being drained will become synchronised eventually. This command is primarily useful for queues which are not being drained.

cancel_sync_queue [-p vhostqueue
queue
The name of the queue to cancel synchronisation for.

Instructs a synchronising mirrored queue to stop synchronising itself.

purge_queue [-p vhostqueue
queue
The name of the queue to purge.

Purges a queue (removes all messages in it).

set_cluster_name name
Sets the cluster name to name. The cluster name is announced to clients on connection, and used by the federation and shovel plugins to record where a message has been. The cluster name is by default derived from the hostname of the first node in the cluster, but can be changed.

For example, this sets the cluster name to “london”:

rabbitmqctl set_cluster_name london

User Management

Note that rabbitmqctl manages the RabbitMQ internal user database. Users from any alternative authentication backend will not be visible to rabbitmqctl.

add_user username password
username
The name of the user to create.
password
The password the created user will use to log in to the broker.

For example, this command instructs the RabbitMQ broker to create a (non-administrative) user named “tonyg” with (initial) password “changeit”:

rabbitmqctl add_user tonyg changeit
delete_user username
username
The name of the user to delete.

For example, this command instructs the RabbitMQ broker to delete the user named “tonyg”:

rabbitmqctl delete_user tonyg
change_password username newpassword
username
The name of the user whose password is to be changed.
newpassword
The new password for the user.

For example, this command instructs the RabbitMQ broker to change the password for the user named “tonyg” to “newpass”:

rabbitmqctl change_password tonyg newpass
clear_password username
username
The name of the user whose password is to be cleared.

For example, this command instructs the RabbitMQ broker to clear the password for the user named “tonyg”:

rabbitmqctl clear_password tonyg

This user now cannot log in with a password (but may be able to through e.g. SASL EXTERNAL if configured).

authenticate_user username password
username
The name of the user.
password
The password of the user.

For example, this command instructs the RabbitMQ broker to authenticate the user named “tonyg” with password “verifyit”:

rabbitmqctl authenticate_user tonyg verifyit
set_user_tags username [tag …]
username
The name of the user whose tags are to be set.
tag
Zero, one or more tags to set. Any existing tags will be removed.

For example, this command instructs the RabbitMQ broker to ensure the user named “tonyg” is an administrator:

rabbitmqctl set_user_tags tonyg administrator

This has no effect when the user logs in via AMQP, but can be used to permit the user to manage users, virtual hosts and permissions when the user logs in via some other means (for example with the management plugin).

This command instructs the RabbitMQ broker to remove any tags from the user named “tonyg”:

rabbitmqctl set_user_tags tonyg
list_users
Lists users. Each result row will contain the user name followed by a list of the tags set for that user.

For example, this command instructs the RabbitMQ broker to list all users:

rabbitmqctl list_users

Access Control

Note that rabbitmqctl manages the RabbitMQ internal user database. Permissions for users from any alternative authorisation backend will not be visible to rabbitmqctl.

add_vhost vhost
vhost
The name of the virtual host entry to create.

Creates a virtual host.

For example, this command instructs the RabbitMQ broker to create a new virtual host called “test”:

rabbitmqctl add_vhost test
delete_vhost vhost
vhost
The name of the virtual host entry to delete.

Deletes a virtual host.

Deleting a virtual host deletes all its exchanges, queues, bindings, user permissions, parameters and policies.

For example, this command instructs the RabbitMQ broker to delete the virtual host called “test”:

rabbitmqctl delete_vhost test
list_vhosts [vhostinfoitem …]
Lists virtual hosts.

The vhostinfoitem parameter is used to indicate which virtual host information items to include in the results. The column order in the results will match the order of the parameters. vhostinfoitem can take any value from the list that follows:

name
The name of the virtual host with non-ASCII characters escaped as in C.
tracing
Whether tracing is enabled for this virtual host.

If no vhostinfoitem are specified then the vhost name is displayed.

For example, this command instructs the RabbitMQ broker to list all virtual hosts:

rabbitmqctl list_vhosts name tracing
set_permissions [-p vhostuser conf write read
vhost
The name of the virtual host to which to grant the user access, defaulting to “/”.
user
The name of the user to grant access to the specified virtual host.
conf
A regular expression matching resource names for which the user is granted configure permissions.
write
A regular expression matching resource names for which the user is granted write permissions.
read
A regular expression matching resource names for which the user is granted read permissions.

Sets user permissions.

For example, this command instructs the RabbitMQ broker to grant the user named “tonyg” access to the virtual host called “/myvhost”, with configure permissions on all resources whose names starts with “tonyg-”, and write and read permissions on all resources:

rabbitmqctl set_permissions -p /myvhost tonyg “^tonyg-.*” “.*” “.*”
clear_permissions [-p vhostusername
vhost
The name of the virtual host to which to deny the user access, defaulting to “/”.
username
The name of the user to deny access to the specified virtual host.

Sets user permissions.

For example, this command instructs the RabbitMQ broker to deny the user named “tonyg” access to the virtual host called “/myvhost”:

rabbitmqctl clear_permissions -p /myvhost tonyg
list_permissions [-p vhost]
vhost
The name of the virtual host for which to list the users that have been granted access to it, and their permissions. Defaults to “/”.

Lists permissions in a virtual host.

For example, this command instructs the RabbitMQ broker to list all the users which have been granted access to the virtual host called “/myvhost”, and the permissions they have for operations on resources in that virtual host. Note that an empty string means no permissions granted:

rabbitmqctl list_permissions -p /myvhost
list_user_permissions username
username
The name of the user for which to list the permissions.

Lists user permissions.

For example, this command instructs the RabbitMQ broker to list all the virtual hosts to which the user named “tonyg” has been granted access, and the permissions the user has for operations on resources in these virtual hosts:

rabbitmqctl list_user_permissions tonyg
set_topic_permissions [-p vhostuser exchange write read
vhost
The name of the virtual host to which to grant the user access, defaulting to “/”.
user
The name of the user the permissions apply to in the target virtual host.
exchange
The name of the topic exchange the authorisation check will be applied to.
write
A regular expression matching the routing key of the published message.
read
A regular expression matching the routing key of the consumed message.

Sets user topic permissions.

For example, this command instructs the RabbitMQ broker to let the user named “tonyg” publish and consume messages going through the “amp.topic” exchange of the “/myvhost” virtual host with a routing key starting with “tonyg-”:

rabbitmqctl set_topic_permissions -p /myvhost tonyg amq.topic “^tonyg-.*” “^tonyg-.*”

Topic permissions support variable expansion for the following variables: username, vhost, and client_id. Note that client_id is expanded only when using MQTT. The previous example could be made more generic by using “^{username}-.*”:

rabbitmqctl set_topic_permissions -p /myvhost tonyg amq.topic “^{username}-.*” “^{username}-.*”
clear_topic_permissions [-p vhostusername [exchange]
vhost
The name of the virtual host to which to clear the topic permissions, defaulting to “/”.
username
The name of the user to clear topic permissions to the specified virtual host.
exchange
The name of the topic exchange to clear topic permissions, defaulting to all the topic exchanges the given user has topic permissions for.

Clear user topic permissions.

For example, this command instructs the RabbitMQ broker to remove topic permissions for user named “tonyg” for the topic exchange “amq.topic” in the virtual host called “/myvhost”:

rabbitmqctl clear_topic_permissions -p /myvhost tonyg amq.topic
list_topic_permissions [-p vhost]
vhost
The name of the virtual host for which to list the users topic permissions. Defaults to “/”.

Lists topic permissions in a virtual host.

For example, this command instructs the RabbitMQ broker to list all the users which have been granted topic permissions in the virtual host called “/myvhost:”

rabbitmqctl list_topic_permissions -p /myvhost
list_user_topic_permissions username
username
The name of the user for which to list the topic permissions.

Lists user topic permissions.

For example, this command instructs the RabbitMQ broker to list all the virtual hosts to which the user named “tonyg” has been granted access, and the topic permissions the user has in these virtual hosts:

rabbitmqctl list_topic_user_permissions tonyg

Parameter Management

Certain features of RabbitMQ (such as the federation plugin) are controlled by dynamic, cluster-wide parameters. There are 2 kinds of parameters: parameters scoped to a virtual host and global parameters. Each vhost-scoped parameter consists of a component name, a name and a value. The component name and name are strings, and the value is an Erlang term. A global parameter consists of a name and value. The name is a string and the value is an Erlang term. Parameters can be set, cleared and listed. In general you should refer to the documentation for the feature in question to see how to set parameters.

set_parameter [-p vhostcomponent_name name value
Sets a parameter.

component_name
The name of the component for which the parameter is being set.
name
The name of the parameter being set.
value
The value for the parameter, as a JSON term. In most shells you are very likely to need to quote this.

For example, this command sets the parameter “local_username” for the “federation” component in the default virtual host to the JSON term “guest”:

rabbitmqctl set_parameter federation local_username “guest”
clear_parameter [-p vhostcomponent_name key
Clears a parameter.

component_name
The name of the component for which the parameter is being cleared.
name
The name of the parameter being cleared.

For example, this command clears the parameter “local_username” for the “federation” component in the default virtual host:

rabbitmqctl clear_parameter federation local_username
list_parameters [-p vhost]
Lists all parameters for a virtual host.

For example, this command lists all parameters in the default virtual host:

rabbitmqctl list_parameters
set_global_parameter name value
Sets a global runtime parameter. This is similar to set_parameter but the key-value pair isn’t tied to a virtual host.

name
The name of the global runtime parameter being set.
value
The value for the global runtime parameter, as a JSON term. In most shells you are very likely to need to quote this.

For example, this command sets the global runtime parameter “mqtt_default_vhosts” to the JSON term {“O=client,CN=guest”:”/”}:

rabbitmqctl set_global_parameter mqtt_default_vhosts ‘{“O=client,CN=guest”:”/”}’
clear_global_parameter name
Clears a global runtime parameter. This is similar to clear_parameter but the key-value pair isn’t tied to a virtual host.

name
The name of the global runtime parameter being cleared.

For example, this command clears the global runtime parameter “mqtt_default_vhosts”:

rabbitmqctl clear_global_parameter mqtt_default_vhosts
list_global_parameters
Lists all global runtime parameters. This is similar to list_parameters but the global runtime parameters are not tied to any virtual host.

For example, this command lists all global parameters:

rabbitmqctl list_global_parameters

Policy Management

Policies are used to control and modify the behaviour of queues and exchanges on a cluster-wide basis. Policies apply within a given vhost, and consist of a name, pattern, definition and an optional priority. Policies can be set, cleared and listed.

set_policy [-p vhost] [–priority priority] [–apply-to apply-toname pattern definition
Sets a policy.

name
The name of the policy.
pattern
The regular expression, which when matches on a given resources causes the policy to apply.
definition
The definition of the policy, as a JSON term. In most shells you are very likely to need to quote this.
priority
The priority of the policy as an integer. Higher numbers indicate greater precedence. The default is 0.
apply-to
Which types of object this policy should apply to. Possible values are:

The default is all ..

For example, this command sets the policy “federate-me” in the default virtual host so that built-in exchanges are federated:

rabbitmqctl set_policy federate-me ^amq. ‘{“federation-upstream-set”:”all”}’
clear_policy [-p vhostname
Clears a policy.

name
The name of the policy being cleared.

For example, this command clears the “federate-me” policy in the default virtual host:

rabbitmqctl clear_policy federate-me
list_policies [-p vhost]
Lists all policies for a virtual host.

For example, this command lists all policies in the default virtual host:

rabbitmqctl list_policies
set_operator_policy [-p vhost] [–priority priority] [–apply-to apply-toname pattern definition
Sets an operator policy that overrides a subset of arguments in user policies. Arguments are identical to those of set_policy.

Supported arguments are:

  • expires
  • message-ttl
  • max-length
  • max-length-bytes
clear_operator_policy [-p vhostname
Clears an operator policy. Arguments are identical to those of clear_policy.
list_operator_policies [-p vhost]
Lists operator policy overrides for a virtual host. Arguments are identical to those oflist_policies.

Virtual Host Limits

It is possible to enforce certain limits on virtual hosts.

set_vhost_limits [-p vhostdefinition
Sets virtual host limits.

definition
The definition of the limits, as a JSON term. In most shells you are very likely to need to quote this.

Recognised limits are:

  • max-connections
  • max-queues

Use a negative value to specify “no limit”.

For example, this command limits the max number of concurrent connections in vhost “qa_env” to 64:

rabbitmqctl set_vhost_limits -p qa_env ‘{“max-connections”: 64}’

This command limits the max number of queues in vhost “qa_env” to 256:

rabbitmqctl set_vhost_limits -p qa_env ‘{“max-queues”: 256}’

This command clears the max number of connections limit in vhost “qa_env”:

rabbitmqctl set_vhost_limits -p qa_env ‘{“max-connections”: -1}’

This command disables client connections in vhost “qa_env”:

rabbitmqctl set_vhost_limits -p qa_env ‘{“max-connections”: 0}’
clear_vhost_limits [-p vhost]
Clears virtual host limits.

For example, this command clears vhost limits in vhost “qa_env”:

rabbitmqctl clear_vhost_limits -p qa_env
list_vhost_limits [-p vhost] [–global]
Displays configured virtual host limits.

–global
Show limits for all vhosts. Suppresses the -p parameter.

Server Status

The server status queries interrogate the server and return a list of results with tab-delimited columns. Some queries ( list_queueslist_exchangeslist_bindings and list_consumers) accept an optional vhost parameter. This parameter, if present, must be specified immediately after the query.

The list_queueslist_exchanges and list_bindings commands accept an optional virtual host parameter for which to display results. The default value is “/”.

list_queues [-p vhost] [–offline | –online | –local] [queueinfoitem …]
Returns queue details. Queue details of the “/” virtual host are returned if the -p flag is absent. The -p flag can be used to override this default.

Displayed queues can be filtered by their status or location using one of the following mutually exclusive options:

–offline
List only those durable queues that are not currently available (more specifically, their master node isn’t).
–online
List queues that are currently available (their master node is).
–local
List only those queues whose master process is located on the current node.

The queueinfoitem parameter is used to indicate which queue information items to include in the results. The column order in the results will match the order of the parameters. queueinfoitem can take any value from the list that follows:

name
The name of the queue with non-ASCII characters escaped as in C.
durable
Whether or not the queue survives server restarts.
auto_delete
Whether the queue will be deleted automatically when no longer used.
arguments
Queue arguments.
policy
Policy name applying to the queue.
pid
Id of the Erlang process associated with the queue.
owner_pid
Id of the Erlang process representing the connection which is the exclusive owner of the queue. Empty if the queue is non-exclusive.
exclusive
True if queue is exclusive (i.e. has owner_pid), false otherwise.
exclusive_consumer_pid
Id of the Erlang process representing the channel of the exclusive consumer subscribed to this queue. Empty if there is no exclusive consumer.
exclusive_consumer_tag
Consumer tag of the exclusive consumer subscribed to this queue. Empty if there is no exclusive consumer.
messages_ready
Number of messages ready to be delivered to clients.
messages_unacknowledged
Number of messages delivered to clients but not yet acknowledged.
messages
Sum of ready and unacknowledged messages (queue depth).
messages_ready_ram
Number of messages from messages_ready which are resident in ram.
messages_unacknowledged_ram
Number of messages from messages_unacknowledged which are resident in ram.
messages_ram
Total number of messages which are resident in ram.
messages_persistent
Total number of persistent messages in the queue (will always be 0 for transient queues).
message_bytes
Sum of the size of all message bodies in the queue. This does not include the message properties (including headers) or any overhead.
message_bytes_ready
Like message_bytes but counting only those messages ready to be delivered to clients.
message_bytes_unacknowledged
Like message_bytes but counting only those messages delivered to clients but not yet acknowledged.
message_bytes_ram
Like message_bytes but counting only those messages which are in RAM.
message_bytes_persistent
Like message_bytes but counting only those messages which are persistent.
head_message_timestamp
The timestamp property of the first message in the queue, if present. Timestamps of messages only appear when they are in the paged-in state.
disk_reads
Total number of times messages have been read from disk by this queue since it started.
disk_writes
Total number of times messages have been written to disk by this queue since it started.
consumers
Number of consumers.
consumer_utilisation
Fraction of the time (between 0.0 and 1.0) that the queue is able to immediately deliver messages to consumers. This can be less than 1.0 if consumers are limited by network congestion or prefetch count.
memory
Bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures.
slave_pids
If the queue is mirrored, this gives the IDs of the current slaves.
synchronised_slave_pids
If the queue is mirrored, this gives the IDs of the current slaves which are synchronised with the master – i.e. those which could take over from the master without message loss.
state
The state of the queue. Normally “running”, but may be “{syncing, message_count}” if the queue is synchronising.

Queues which are located on cluster nodes that are currently down will be shown with a status of “down” (and most other queueinfoitem will be unavailable).

If no queueinfoitem are specified then queue name and depth are displayed.

For example, this command displays the depth and number of consumers for each queue of the virtual host named “/myvhost”

rabbitmqctl list_queues -p /myvhost messages consumers
list_exchanges [-p vhost] [exchangeinfoitem …]
Returns exchange details. Exchange details of the “/” virtual host are returned if the -pflag is absent. The -p flag can be used to override this default.

The exchangeinfoitem parameter is used to indicate which exchange information items to include in the results. The column order in the results will match the order of the parameters. exchangeinfoitem can take any value from the list that follows:

name
The name of the exchange with non-ASCII characters escaped as in C.
type
The exchange type, such as:

  • direct
  • topic
  • headers
  • fanout
durable
Whether or not the exchange survives server restarts.
auto_delete
Whether the exchange will be deleted automatically when no longer used.
internal
Whether the exchange is internal, i.e. cannot be directly published to by a client.
arguments
Exchange arguments.
policy
Policy name for applying to the exchange.

If no exchangeinfoitem are specified then exchange name and type are displayed.

For example, this command displays the name and type for each exchange of the virtual host named “/myvhost”:

rabbitmqctl list_exchanges -p /myvhost name type
list_bindings [-p vhost] [bindinginfoitem …]
Returns binding details. By default the bindings for the “/” virtual host are returned. The -pflag can be used to override this default.

The bindinginfoitem parameter is used to indicate which binding information items to include in the results. The column order in the results will match the order of the parameters. bindinginfoitem can take any value from the list that follows:

source_name
The name of the source of messages to which the binding is attached. With non-ASCII characters escaped as in C.
source_kind
The kind of the source of messages to which the binding is attached. Currently always exchange. With non-ASCII characters escaped as in C.
destination_name
The name of the destination of messages to which the binding is attached. With non-ASCII characters escaped as in C.
destination_kind
The kind of the destination of messages to which the binding is attached. With non-ASCII characters escaped as in C.
routing_key
The binding’s routing key, with non-ASCII characters escaped as in C.
arguments
The binding’s arguments.

If no bindinginfoitem are specified then all above items are displayed.

For example, this command displays the exchange name and queue name of the bindings in the virtual host named “/myvhost”

rabbitmqctl list_bindings -p /myvhost exchange_name queue_name
list_connections [connectioninfoitem …]
Returns TCP/IP connection statistics.

The connectioninfoitem parameter is used to indicate which connection information items to include in the results. The column order in the results will match the order of the parameters. connectioninfoitem can take any value from the list that follows:

pid
Id of the Erlang process associated with the connection.
name
Readable name for the connection.
port
Server port.
host
Server hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was disabled.
peer_port
Peer port.
peer_host
Peer hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was not enabled.
ssl
Boolean indicating whether the connection is secured with SSL.
ssl_protocol
SSL protocol (e.g. “tlsv1”).
ssl_key_exchange
SSL key exchange algorithm (e.g. “rsa”).
ssl_cipher
SSL cipher algorithm (e.g. “aes_256_cbc”).
ssl_hash
SSL hash function (e.g. “sha”).
peer_cert_subject
The subject of the peer’s SSL certificate, in RFC4514 form.
peer_cert_issuer
The issuer of the peer’s SSL certificate, in RFC4514 form.
peer_cert_validity
The period for which the peer’s SSL certificate is valid.
state
Connection state; one of:

  • starting
  • tuning
  • opening
  • running
  • flow
  • blocking
  • blocked
  • closing
  • closed
channels
Number of channels using the connection.
protocol
Version of the AMQP protocol in use; currently one of:

  • {0,9,1}
  • {0,8,0}

Note that if a client requests an AMQP 0-9 connection, we treat it as AMQP 0-9-1.

auth_mechanism
SASL authentication mechanism used, such as “PLAIN”.
user
Username associated with the connection.
vhost
Virtual host name with non-ASCII characters escaped as in C.
timeout
Connection timeout / negotiated heartbeat interval, in seconds.
frame_max
Maximum frame size (bytes).
channel_max
Maximum number of channels on this connection.
client_properties
Informational properties transmitted by the client during connection establishment.
recv_oct
Octets received.
recv_cnt
Packets received.
send_oct
Octets send.
send_cnt
Packets sent.
send_pend
Send queue size.
connected_at
Date and time this connection was established, as timestamp.

If no connectioninfoitem are specified then user, peer host, peer port, time since flow control and memory block state are displayed.

For example, this command displays the send queue size and server port for each connection:

rabbitmqctl list_connections send_pend port
list_channels [channelinfoitem …]
Returns information on all current channels, the logical containers executing most AMQP commands. This includes channels that are part of ordinary AMQP connections, and channels created by various plug-ins and other extensions.

The channelinfoitem parameter is used to indicate which channel information items to include in the results. The column order in the results will match the order of the parameters. channelinfoitem can take any value from the list that follows:

pid
Id of the Erlang process associated with the connection.
connection
Id of the Erlang process associated with the connection to which the channel belongs.
name
Readable name for the channel.
number
The number of the channel, which uniquely identifies it within a connection.
user
Username associated with the channel.
vhost
Virtual host in which the channel operates.
transactional
True if the channel is in transactional mode, false otherwise.
confirm
True if the channel is in confirm mode, false otherwise.
consumer_count
Number of logical AMQP consumers retrieving messages via the channel.
messages_unacknowledged
Number of messages delivered via this channel but not yet acknowledged.
messages_uncommitted
Number of messages received in an as yet uncommitted transaction.
acks_uncommitted
Number of acknowledgements received in an as yet uncommitted transaction.
messages_unconfirmed
Number of published messages not yet confirmed. On channels not in confirm mode, this remains 0.
prefetch_count
QoS prefetch limit for new consumers, 0 if unlimited.
global_prefetch_count
QoS prefetch limit for the entire channel, 0 if unlimited.

If no channelinfoitem are specified then pid, user, consumer_count, and messages_unacknowledged are assumed.

For example, this command displays the connection process and count of unacknowledged messages for each channel:

rabbitmqctl list_channels connection messages_unacknowledged
list_consumers [-p vhost]
Lists consumers, i.e. subscriptions to a queue´s message stream. Each line printed shows, separated by tab characters, the name of the queue subscribed to, the id of the channel process via which the subscription was created and is managed, the consumer tag which uniquely identifies the subscription within a channel, a boolean indicating whether acknowledgements are expected for messages delivered to this consumer, an integer indicating the prefetch limit (with 0 meaning “none”), and any arguments for this consumer.
status
Displays broker status information such as the running applications on the current Erlang node, RabbitMQ and Erlang versions, OS name, memory and file descriptor statistics. (See the cluster_status command to find out which nodes are clustered and running.)

For example, this command displays information about the RabbitMQ broker:

rabbitmqctl status
node_health_check
Health check of the RabbitMQ node. Verifies the rabbit application is running, list_queues and list_channels return, and alarms are not set.

For example, this command performs a health check on the RabbitMQ node:

rabbitmqctl node_health_check -n rabbit@stringer
environment
Displays the name and value of each variable in the application environment for each running application.
report
Generate a server status report containing a concatenation of all server status information for support purposes. The output should be redirected to a file when accompanying a support request.

For example, this command creates a server report which may be attached to a support request email:

rabbitmqctl report > server_report.txt
eval expr
Evaluate an arbitrary Erlang expression.

For example, this command returns the name of the node to which rabbitmqctl has connected:

rabbitmqctl eval “node().”

Miscellaneous

close_connection connectionpid explanation
connectionpid
Id of the Erlang process associated with the connection to close.
explanation
Explanation string.

Instructs the broker to close the connection associated with the Erlang process id connectionpid (see also the list_connections command), passing the explanation string to the connected client as part of the AMQP connection shutdown protocol.

For example, this command instructs the RabbitMQ broker to close the connection associated with the Erlang process id “<rabbit@tanto.4262.0>”, passing the explanation “go away” to the connected client:

rabbitmqctl close_connection “<rabbit@tanto.4262.0>” “go away”
close_all_connections [-p vhost] [–global] [–per-connection-delay delay] [–limit limit]explanation
-p vhost
The name of the virtual host for which connections should be closed. Ignored when –global is specified.
–global
If connections should be close for all vhosts. Overrides -p
–per-connection-delay delay
Time in milliseconds to wait after each connection closing.
–limit limit
Number of connection to close. Only works per vhost. Ignored when –global is specified.
explanation
Explanation string.

Instructs the broker to close all connections for the specified vhost or entire RabbitMQ node.

For example, this command instructs the RabbitMQ broker to close 10 connections on “qa_env” vhost, passing the explanation “Please close”:

rabbitmqctl close_all_connections -p qa_env –limit 10 ‘Please close’

This command instructs broker to close all connections to the node:

rabbitmqctl close_all_connections –global
trace_on [-p vhost]
vhost
The name of the virtual host for which to start tracing.

Starts tracing. Note that the trace state is not persistent; it will revert to being off if the server is restarted.

trace_off [-p vhost]
vhost
The name of the virtual host for which to stop tracing.

Stops tracing.

set_vm_memory_high_watermark fraction
fraction
The new memory threshold fraction at which flow control is triggered, as a floating point number greater than or equal to 0.
set_vm_memory_high_watermark absolute memory_limit
memory_limit
The new memory limit at which flow control is triggered, expressed in bytes as an integer number greater than or equal to 0 or as a string with memory units (e.g. 512M or 1G). Available units are:

kkiB
kibibytes (2^10 bytes)
MMiB
mebibytes (2^20 bytes)
GGiB
gibibytes (2^30 bytes)
kB
kilobytes (10^3 bytes)
MB
megabytes (10^6 bytes)
GB
gigabytes (10^9 bytes)
set_disk_free_limit disk_limit
disk_limit
Lower bound limit as an integer in bytes or a string with memory units (see vm_memory_high_watermark), e.g. 512M or 1G. Once free disk space reaches the limit, a disk alarm will be set.
set_disk_free_limit mem_relative fraction
fraction
Limit relative to the total amount available RAM as a non-negative floating point number. Values lower than 1.0 can be dangerous and should be used carefully.
encode value passphrase [–cipher cipher] [–hash hash] [–iterations iterations]
value passphrase
Value to encrypt and passphrase.

For example:

rabbitmqctl encode ‘<<“guest”>>’ mypassphrase
–cipher cipher –hash hash –iterations iterations
Options to specify the encryption settings. They can be used independently.

For example:

rabbitmqctl encode –cipher blowfish_cfb64 –hash sha256 –iterations 10000 ‘<<“guest”>>’ mypassphrase
decode value passphrase [–cipher cipher] [–hash hash] [–iterations iterations]
value passphrase
Value to decrypt (as produced by the encode command) and passphrase.

For example:

rabbitmqctl decode ‘{encrypted, <<“…”>>}’ mypassphrase
–cipher cipher –hash hash –iterations iterations
Options to specify the decryption settings. They can be used independently.

For example:

rabbitmqctl decode –cipher blowfish_cfb64 –hash sha256 –iterations 10000 ‘{encrypted,<<“…”>>} mypassphrase
list_hashes
Lists hash functions supported by encoding commands.

For example, this command instructs the RabbitMQ broker to list all hash functions supported by encoding commands:

rabbitmqctl list_hashes
list_ciphers
Lists cipher suites supported by encoding commands.

For example, this command instructs the RabbitMQ broker to list all cipher suites supported by encoding commands:

rabbitmqctl list_ciphers

PLUGIN COMMANDS

RabbitMQ plugins can extend rabbitmqctl tool to add new commands when enabled. Currently available commands can be found in rabbitmqctl help output. Following commands are added by RabbitMQ plugins, available in default distribution:

Shovel plugin

shovel_status
Prints a list of configured shovels
delete_shovel [-p vhostname
Instructs the RabbitMQ node to delete the configured shovel by name.

Federation plugin

federation_status [–only-down]
Prints a list of federation links.

–only-down
Only list federation links which are not running.
restart_federation_link link_id
Instructs the RabbitMQ node to restart the federation link with specified link_id.

AMQP-1.0 plugin

list_amqp10_connections [amqp10_connectioninfoitem …]
Similar to the list_connections command, but returns fields which make sense for AMQP-1.0 connections. amqp10_connectioninfoitem parameter is used to indicate which connection information items to include in the results. The column order in the results will match the order of the parameters. amqp10_connectioninfoitem can take any value from the list that follows:

pid
Id of the Erlang process associated with the connection.
auth_mechanism
SASL authentication mechanism used, such as “PLAIN”.
host
Server hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was disabled.
frame_max
Maximum frame size (bytes).
timeout
Connection timeout / negotiated heartbeat interval, in seconds.
user
Username associated with the connection.
state
Connection state; one of:

  • starting
  • waiting_amqp0100
  • securing
  • running
  • blocking
  • blocked
  • closing
  • closed
recv_oct
Octets received.
recv_cnt
Packets received.
send_oct
Octets send.
send_cnt
Packets sent.
ssl
Boolean indicating whether the connection is secured with SSL.
ssl_protocol
SSL protocol (e.g. “tlsv1”).
ssl_key_exchange
SSL key exchange algorithm (e.g. “rsa”).
ssl_cipher
SSL cipher algorithm (e.g. “aes_256_cbc”).
ssl_hash
SSL hash function (e.g. “sha”).
peer_cert_subject
The subject of the peer’s SSL certificate, in RFC4514 form.
peer_cert_issuer
The issuer of the peer’s SSL certificate, in RFC4514 form.
peer_cert_validity
The period for which the peer’s SSL certificate is valid.
node
The node name of the RabbitMQ node to which connection is established.

MQTT plugin

list_mqtt_connections [mqtt_connectioninfoitem]
Similar to the list_connections command, but returns fields which make sense for MQTT connections. mqtt_connectioninfoitem parameter is used to indicate which connection information items to include in the results. The column order in the results will match the order of the parameters. mqtt_connectioninfoitem can take any value from the list that follows:

host
Server hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was disabled.
port
Server port.
peer_host
Peer hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was not enabled.
peer_port
Peer port.
protocol
MQTT protocol version, which can be on of the following:

  • {‘MQTT’, N/A}
  • {‘MQTT’, 3.1.0}
  • {‘MQTT’, 3.1.1}
channels
Number of channels using the connection.
channel_max
Maximum number of channels on this connection.
frame_max
Maximum frame size (bytes).
client_properties
Informational properties transmitted by the client during connection establishment.
ssl
Boolean indicating whether the connection is secured with SSL.
ssl_protocol
SSL protocol (e.g. “tlsv1”).
ssl_key_exchange
SSL key exchange algorithm (e.g. “rsa”).
ssl_cipher
SSL cipher algorithm (e.g. “aes_256_cbc”).
ssl_hash
SSL hash function (e.g. “sha”).
conn_name
Readable name for the connection.
connection_state
Connection state; one of:

  • starting
  • running
  • blocked
connection
Id of the Erlang process associated with the internal amqp direct connection.
consumer_tags
A tuple of consumer tags for QOS0 and QOS1.
message_id
The last Packet ID sent in a control message.
client_id
MQTT client identifier for the connection.
clean_sess
MQTT clean session flag.
will_msg
MQTT Will message sent in CONNECT frame.
exchange
Exchange to route MQTT messages configured in rabbitmq_mqtt application environment.
ssl_login_name
SSL peer cert auth name
retainer_pid
Id of the Erlang process associated with retain storage for the connection.
user
Username associated with the connection.
vhost
Virtual host name with non-ASCII characters escaped as in C.

STOMP plugin

list_stomp_connections [stomp_connectioninfoitem]
Similar to the list_connections command, but returns fields which make sense for STOMP connections. stomp_connectioninfoitem parameter is used to indicate which connection information items to include in the results. The column order in the results will match the order of the parameters. stomp_connectioninfoitem can take any value from the list that follows:

conn_name
Readable name for the connection.
connection
Id of the Erlang process associated with the internal amqp direct connection.
connection_state
Connection state; one of:

  • running
  • blocking
  • blocked
session_id
STOMP protocol session identifier
channel
AMQP channel associated with the connection
version
Negotiated STOMP protocol version for the connection.
implicit_connect
Indicates if the connection was established using implicit connect (without CONNECT frame)
auth_login
Effective username for the connection.
auth_mechanism
STOMP authorization mechanism. Can be one of:

  • config
  • ssl
  • stomp_headers
port
Server port.
host
Server hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was not enabled.
peer_port
Peer port.
peer_host
Peer hostname obtained via reverse DNS, or its IP address if reverse DNS failed or was not enabled.
protocol
STOMP protocol version, which can be on of the following:

  • {‘STOMP’, 0}
  • {‘STOMP’, 1}
  • {‘STOMP’, 2}
channels
Number of channels using the connection.
channel_max
Maximum number of channels on this connection.
frame_max
Maximum frame size (bytes).
client_properties
Informational properties transmitted by the client during connection
ssl
Boolean indicating whether the connection is secured with SSL.
ssl_protocol
SSL protocol (e.g. “tlsv1”).
ssl_key_exchange
SSL key exchange algorithm (e.g. “rsa”).
ssl_cipher
SSL cipher algorithm (e.g. “aes_256_cbc”).
ssl_hash
SSL hash function (e.g. “sha”).

Management agent plugin

reset_stats_db [–all]
Reset management stats database for the RabbitMQ node.

–all
Reset stats database for all nodes in the cluster.
Main

Docker vs. Kubernetes vs. Apache Mesos: Why What You Think You Know is Probably Wrong

 

There are countless articles, discussions, and lots of social chatter comparing Docker, Kubernetes, and Mesos. If you listen to the partially-informed, you’d think that the three open source projects are in a fight-to-the death for container supremacy. You’d also believe that picking one over the other is almost a religious choice; with true believers espousing their faith and burning heretics who would dare to consider an alternative.

That’s all bunk.

While all three technologies make it possible to use containers to deploy, manage, and scale applications, in reality they each solve for different things and are rooted in very different contexts. In fact, none of these three widely adopted toolchains is completely like the others.

Instead of comparing the overlapping features of these fast-evolving technologies, let’s revisit each project’s original mission, architectures, and how they can complement and interact with each other.

Let’s start with Docker…

Docker Inc., today started as a Platform-as-a-Service startup named dotCloud. The dotCloud team found that managing dependencies and binaries across many applications and customers required significant effort. So they combined some of the capabilities of Linux cgroups and namespaces into a single and easy to use package so that applications can consistently run on any infrastructure. This package is the Docker image, which provides the following capabilities:

  • Packages the application and the libraries in a single package (the Docker Image), so applications can consistently be deployed across many environments;
  • Provides Git-like semantics, such as “docker push”, “docker commit” to make it easy for application developers to quickly adopt the new technology and incorporate it in their existing workflows;
  • Define Docker images as immutable layers, enabling immutable infrastructure. Committed changes are stored as an individual read-only layers, making it easy to re-use images and track changes. Layers also save disk space and network traffic by only transporting the updates instead of entire images;
  • Run Docker containers by instantiating the immutable image with a writable layer that can temporarily store runtime changes, making it easy to deploy and scale multiple instances of the applications quickly.

Docker grew in popularity, and developers started to move from running containers on their laptops to running them in production. Additional tooling was needed to coordinate these containers across multiple machines, known as container orchestration. Interestingly, one of the first container orchestrators that supported Docker images (June 2014) was Marathon on Apache Mesos (which we’ll describe in more detail below). That year, Solomon Hykes, founder and CTO of Docker, recommended Mesos as “the gold standard for production clusters”. Soon after, many container orchestration technologies in addition to Marathon on Mesos emerged: NomadKubernetes and, not surprisingly, Docker Swarm (now part of Docker Engine).

As Docker moved to commercialize the open source file format, the company also started introducing tools to complement the core Docker file format and runtime engine, including:

  • Docker hub for public storage of Docker images;
  • Docker registry for storing it on-premise;
  • Docker cloud, a managed service for building and running containers;
  • Docker datacenter as a commercial offering embodying many Docker technologies.

Docker

Source: http://www.docker.com

Docker’s insight to encapsulate software and its dependencies in a single package have been a game changer for the software industry; the same way mp3’s helped to reshape the music industry. The Docker file format became the industry standard, and leading container technology vendors (including Docker, Google, Pivotal, Mesosphere and many others) formed the Cloud Native Computing Foundation (CNCF) and Open Container Initiative (OCI). Today, CNCF and OCI aim to ensure interoperability and standardized interfaces across container technologies and ensure that any Docker container, built using any tools, can run on any runtime or infrastructure.

Enter Kubernetes

Google recognized the potential of the Docker image early on and sought to deliver container orchestration “as-a-service” on the Google Cloud Platform. Google had tremendous experience with containers (they introduced cgroups in Linux) but existing internal container and distributed computing tools like Borg were directly coupled to their infrastructure. So, instead of using any code from their existing systems, Google designed Kubernetes from scratch to orchestrate Docker containers. Kubernetes was released in February 2015 with the following goals and considerations:

  • Empower application developers with a powerful tool for Docker container orchestration without having to interact with the underlying infrastructure;
  • Provide standard deployment interface and primitives for a consistent app deployment experience and APIs across clouds;
  • Build on a Modular API core that allows vendors to integrate systems around the core Kubernetes technology.

By March 2016, Google donated Kubernetes to CNCF, and remains today the lead contributor to the project (followed by Redhat, CoreOS and others).

Kubernetes

Source: wikipedia

Kubernetes was very attractive for application developers, as it reduced their dependency on infrastructure and operations teams. Vendors also liked Kubernetes because it provided an easy way to embrace the container movement and provide a commercial solution to the operational challenges of running your own Kubernetes deployment (which remains a non-trivial exercise). Kubernetes is also attractive because it is open source under the CNCF, in contrast to Docker Swarm which, though open source, is tightly controlled by Docker, Inc.

Kubernetes’ core strength is providing application developers powerful tools for orchestrating stateless Docker containers. While there are multiple initiatives to expand the scope of the project to more workloads (like analytics and stateful data services), these initiatives are still in very early phases and it remains to be seen how successful they may be.

Apache Mesos

Apache Mesos started as a UC Berkeley project to create a next-generation cluster manager, and apply the lessons learned from cloud-scale, distributed computing infrastructures such as Google’s Borg and Facebook’s Tupperware. While Borg and Tupperware had a monolithic architecture and were closed-source proprietary technologies tied to physical infrastructure, Mesos introduced a modular architecture, an open source development approach, and was designed to be completely independent from the underlying infrastructure. Mesos was quickly adopted by TwitterApple(Siri)YelpUberNetflix, and many leading technology companies to support everything from microservices, big data and real time analytics, to elastic scaling.

As a cluster manager, Mesos was architected to solve for a very different set of challenges:

  • Abstract data center resources into a single pool to simplify resource allocation while providing a consistent application and operational experience across private or public clouds;
  • Colocate diverse workloads on the same infrastructure such analytics, stateless microservices, distributed data services and traditional apps to improve utilization and reduce cost and footprint;
  • Automate day-two operations for application-specific tasks such as deployment, self healing, scaling, and upgrades; providing a highly available fault tolerant infrastructure;
  • Provide evergreen extensibility to run new application and technologies without modifying the cluster manager or any of the existing applications built on top of it;
  • Elastically scale the application and the underlying infrastructure from a handful, to tens, to tens of thousands of nodes.

Mesos has a unique ability to individually manage a diverse set of workloads — including traditional applications such as Java, stateless Docker microservices, batch jobs, real-time analytics, and stateful distributed data services. Mesos’ broad workload coverage comes from its two-level architecture, which enables “application-aware” scheduling. Application-aware scheduling is accomplished by encapsulating the application-specific operational logic in a “Mesos framework” (analogous to a runbook in operations). Mesos Master, the resource manager, then offers these frameworks fractions of the underlying infrastructure while maintaining isolation. This approach allows each workload to have its own purpose-built application scheduler that understands its specific operational requirements for deployment, scaling and upgrade. Application schedulers are also independently developed, managed and updated, allowing Mesos to be highly extensible and support new workloads or add more operational capabilities over time.

Mesos two-level scheduler

Take, for example, how a team manages upgrades. Stateless application can benefit from a “blue/green”deployment approach; where another complete version of the app is spun up while the old one is still live, and traffic switches to the new app when ready and the old app is destroyed. But upgrading a data workload like HDFS or Cassandra requires taking the nodes offline one at a time, preserving local data volumes to avoid data loss, performing the upgrade in-place with a specific sequence, and executing special checks and commands on each node type before and after the upgrade. Any of these steps are app or service specific, and may even be version specific. This makes it incredibly challenging to manage data services with a conventional container orchestration scheduler.

Mesos’ ability to manage each workload the way it wants to be treated has led many companies to use Mesos as a single unified platform to run a combination of microservices and data services together. A common reference architecture for running data-intensive applications is the “SMACK stack”.

A Moment of Clarity

Notice that we haven’t said anything about container orchestration to describe Apache Mesos. So why do people automatically associate Mesos with container orchestration? Container orchestration is one example of a workload that can run on Mesos’ modular architecture, and it’s done using a specialized orchestration “framework” built on top of Mesos called Marathon. Marathon was originally developed to orchestrate app archives (like JARs, tarballs, ZIP files) in cgroup containers, and was one of the first container orchestrators to support Docker containers in 2014.

So when people compare Docker and Kubernetes to Mesos, they are actually comparing Kubernetes and Docker Swarm to Marathon running on Mesos.

Why does this matter? Because Mesos frankly doesn’t care what’s running on top of it. Mesos can elastically provide cluster services for Java application servers, Docker container orchestration, Jenkins CI Jobs, Apache Spark analytics, Apache Kafka streaming, and more on shared infrastructure. Mesos could even run Kubernetes or other container orchestrators, though a public integration is not yet available.

Mesos Workloads

Source: Apache Mesos Survey 2016

Another consideration for Mesos (and why it’s attractive for many enterprise architects) is its maturity in running mission critical workloads. Mesos has been in large scale production (tens of thousands of servers) for more than 7 years, which is why it’s known to be more production ready and reliable at scale than many other container-enabling technologies in the market.

What does this all mean?

In summary, all three technologies have something to do with Docker containers and give you access to container orchestration for application portability and scale. So how do you choose between them? It comes down to choosing the right tool for the job (and perhaps even different ones for different jobs). If you are an application developer looking for a modern way to build and package your application, or to accelerate microservices initiatives, the Docker container format and developer tooling is the best way to do so.

If you are a dev/devops team and want to build a system dedicated exclusively to Docker container orchestration, and are willing to get your hands dirty integrating your solution with the underlying infrastructure (or rely on public cloud infrastructure like Google Container Engine or Azure Container Service), Kubernetes is a good technology for you to consider.

If you want to build a reliable platform that runs multiple mission critical workloads including Docker containers, legacy applications (e.g., Java), and distributed data services (e.g., Spark, Kafka, Cassandra, Elastic), and want all of this portable across cloud providers and/or datacenters, then Mesos (or our own Mesos distribution, Mesosphere DC/OS) is the right fit for you.

Whatever you choose, you’ll be embracing a set of tools that makes more efficient use of server resources, simplifies application portability, and increases developer agility. You really can’t go wrong.

Source :- https://mesosphere.com
kubernetes (k8s), Main

k8s – Concepts & Components (from kubernetes.io)

Master Components

Master components provide the cluster’s control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controller’s ‘replicas’ field is unsatisfied).

Master components can be run on any machine in the cluster. However, for simplicity, set up scripts typically start all master components on the same machine, and do not run user containers on this machine. See Building High-Availability Clusters for an example multi-master-VM setup.

kube-apiserver

Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.

It is designed to scale horizontally – that is, it scales by deploying more instances. See Building High-Availability Clusters.

etcd

Consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.

Always have a backup plan for etcd’s data for your Kubernetes cluster. For in-depth information on etcd, see etcd documentation.

kube-scheduler

Component on the master that watches newly created pods that have no node assigned, and selects a node for them to run on.

Factors taken into account for scheduling decisions include individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference and deadlines.

kube-controller-manager

Component on the master that runs controllers.

Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.

These controllers include:

  • Node Controller: Responsible for noticing and responding when nodes go down.
  • Replication Controller: Responsible for maintaining the correct number of pods for every replication controller object in the system.
  • Endpoints Controller: Populates the Endpoints object (that is, joins Services & Pods).
  • Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces.

cloud-controller-manager

cloud-controller-manager runs controllers that interact with the underlying cloud providers. The cloud-controller-manager binary is an alpha feature introduced in Kubernetes release 1.6.

cloud-controller-manager runs cloud-provider-specific controller loops only. You must disable these controller loops in the kube-controller-manager. You can disable the controller loops by setting the --cloud-provider flag to external when starting the kube-controller-manager.

cloud-controller-manager allows cloud vendors code and the Kubernetes core to evolve independent of each other. In prior releases, the core Kubernetes code was dependent upon cloud-provider-specific code for functionality. In future releases, code specific to cloud vendors should be maintained by the cloud vendor themselves, and linked to cloud-controller-manager while running Kubernetes.

The following controllers have cloud provider dependencies:

  • Node Controller: For checking the cloud provider to determine if a node has been deleted in the cloud after it stops responding
  • Route Controller: For setting up routes in the underlying cloud infrastructure
  • Service Controller: For creating, updating and deleting cloud provider load balancers
  • Volume Controller: For creating, attaching, and mounting volumes, and interacting with the cloud provider to orchestrate volumes

Node Components

Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment.

kubelet

An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.

The kubelet takes a set of PodSpecs that are provided through various mechanisms and ensures that the containers described in those PodSpecs are running and healthy. The kubelet doesn’t manage containers which were not created by Kubernetes.

kube-proxy

kube-proxy enables the Kubernetes service abstraction by maintaining network rules on the host and performing connection forwarding.

Container Runtime

The container runtime is the software that is responsible for running containers. Kubernetes supports two runtimes: Docker and rkt.

Addons

Addons are pods and services that implement cluster features. The pods may be managed by Deployments, ReplicationControllers, and so on. Namespaced addon objects are created in the kube-system namespace.

Selected addons are described below, for an extended list of available addons please see Addons.

DNS

While the other addons are not strictly required, all Kubernetes clusters should have cluster DNS, as many examples rely on it.

Cluster DNS is a DNS server, in addition to the other DNS server(s) in your environment, which serves DNS records for Kubernetes services.

Containers started by Kubernetes automatically include this DNS server in their DNS searches.

Web UI (Dashboard)

Dashboard is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage and troubleshoot applications running in the cluster, as well as the cluster itself.

Container Resource Monitoring

Container Resource Monitoring records generic time-series metrics about containers in a central database, and provides a UI for browsing that data.

Cluster-level Logging

Cluster-level logging mechanism is responsible for saving container logs to a central log store with search/browsing interface.

kubernetes (k8s), Main

K8s – Installation & Configuration

Hello Guys,

 

i know it is quite very difficult to install kubernetes in a proxy prone environment.

Therefore i decided to take the pain and install kubernetes in my proxy prone environment.

I Would Like to share my Steps

For Both Master and Worker Node :- 

vi .bashrc

# Set Proxyfunction setproxy()

{

export {http,https,ftp}_proxy=”http://<proxy_ip&gt;:<port>”

export no_proxy=”localhost,10.96.0.0/12,*.<company_domain_Name>,<internel_ip>”

}
# Unset Proxyfunction unsetproxy()

{

unset {http,https,ftp}_proxy}
function checkproxy()

{

env |grep proxy

}

vi /etc/yum.conf

proxy=http://<proxy_ip>:<port>

proxy=https://<proxy_ip>:<port>

vi /etc/hosts

<ip1-master>  kubernetes-1

<ip2-worker>  kubernetes-2

<ip3-worker>  kubernetes-3

 

mkdir -p /etc/systemd/system/docker.service.d/

 

vi /etc/systemd/system/docker.service.d/http-proxy.conf

 

[Service]

Environment=HTTP_PROXY=http://<proxy_ip>:<port>/

Environment=HTTPS_PROXY=https://<proxy_ip>:<port>/

Environment=NO_PROXY=<ip1-master>,<ip2-worker>,<ip3-worker>
cat <<EOF > /etc/yum.repos.d/kubernetes.repo

[kubernetes]

name=Kubernetesbaseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=1

repo_gpgcheck=1

gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

EOF

 

setenforce 0

 

yum install -y kubelet kubeadm kubectl

systemctl enable kubelet && systemctl start kubelet

 

sed -i “s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g” /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

 

systemctl daemon-reload

systemctl restart kubelet

 

export no_proxy=”localhost,10.96.0.0/12,*.<company domain>,

<ip1-master>,<ip2-worker>,<ip3-worker>”

 

export KUBECONFIG=/etc/kubernetes/admin.conf

 

calico recommended for amd64, Flannel is better but needs CIDR to be 10.244.0.0/24

kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml

 

Master Node :-

kubeadm init

 

Worker Node :-

kubeadm join –token <token received from master node><master ip>:6443 –discovery-token-ca-cert-hash
sha256:<master-hash>

Master Node :-

Check in the master

kubectl get nodes

output-kuber

Main, Operating System, Redhat / CEntOS / Oracle Linux

Standard Linux Tuning

Hello Bloggers,

Majority of the applications these days are deployed on (Debian / Redhat) Linux Operating System as the Base OS.

I Would like to share some generic tuning that can be done before deploying any application on it.

Index Component Question / Test / Reason  
Network
  These are some checks to validate the network setup.
[ Network Are the switches redundant?
Unplug one switch.
Fault-tolerance.
 
  Network Is the cabling redundant?
Pull cables.
Fault-tolerance.
 
  Network Is the network full-duplex?
Double check setup.
Performance.
 
       
Network adapter (NIC) Tuning
  It is recommended to consult with the network adapter provider on recommended Linux TCP/IP settings for optimal performance and stability on Linux.

There are also quite a few TCP/IP tuning source on the Internet such as http://fasterdata.es.net/TCP-tuning/linux.html

  NIC Are the NIC fault-tolerant (aka. auto-port negotiation)?
Pull cables and/or disable network adapter.
Fault-tolerance.
 
  NIC Set the transmission queue depth to at least 1000.

txqueuelen <length>

‘cat /proc/net/softnet_stat’

Performance and stability (packet drops).

 
  NIC Enable TCP/IP offloading (aka. Generic Segment Offloading (GSO)) which was added in kernel 2.6.18

See: http://www.linuxfoundation.org/en/Net:GSO

Check: ethtool -k eth0

Modify: ethtool -K <DevName>eth

lsb
Performance.

Note: I recommend enabling all supported TCP/IP offloading capabilities on and EMS host to free CPU resources.

 
  NIC Enable Interrupt Coalescence (aka. Interrupt Moderation or Interrupt Blanking).

See: http://kb.pert.geant.net/PERTKB/InterruptCoalescence

Check: ethtool -c eth0

Modify: ethtool -C <DevName>

Performance.

Note: The configuration is system dependant but the goal is to reduce the number of interrupts per second at the ‘cost’ of slightly increased latency.

 
       
TCP/IP Buffer Tuning
  For a low latency or high throughput messaging system TCP/IP buffer tuning is important.
Thus instead of tuning the defaults values one should rather check if the settings (sysctl –a) provide large enough buffer The values can be changed via the command sysctrl –w <name> <value>.

The below values and comments were taken from TIBCO support FAQ1-6YOAA) and serve as a guideline towards “large enough” buffers, i.e. if your system configuration has lower values it is suggested to raise them to below values.

  TCP/IP Maximum OS receive buffer size for all connection types.

sysctl -w net.core.rmem_max=8388608

Default: 131071
Performance.

 
  TCP/IP Default OS receive buffer size for all connection types.

sysctl -w net.core.rmem_default=65536

Default: 126976
Performance.

 
  TCP/IP Maximum OS send buffer size for all connection types.

sysctl -w net.core.wmem_max=8388608

Default: 131071
Performance.

 
  TCP/IP Default OS send buffer size for all types of connections.

sysctl -w net.core.wmem_default=65536

Default: 126976
Performance.

 
  TCP/IP Enable/Disable TCP/IP window scaling enabled?

sysctl net.ipv4.tcp_window_scaling

Default: 1

Performance.
Note: As Applications set buffers sizes explicitly this ‘disables’ the TCP/IP windows scaling on Linux. Thus there is no point in enabling it though there should be no harm on leaving the default (enabled). [This is my understanding / what I have been told but I never double checked and it could vary with kernel versions]

 
  TCP/IP TCP auto-tuning setting:

sysctl -w net.ipv4.tcp_mem=’8388608 8388608 8388608′

Default: 1966087 262144 393216
Performance.

The tcp_mem variable defines how the TCP stack should behave when it comes to memory usage:

–          The first value specified in the tcp_mem variable tells the kernel the low threshold. Below this point, the TCP stack does not bother at all about putting any pressure on the memory usage by different TCP sockets.

–          The second value tells the kernel at which point to start pressuring memory usage down.

–          The final value tells the kernel how many memory pages it may use maximally. If this value is reached, TCP streams and packets start getting dropped until we reach a lower memory usage again. This value includes all TCP sockets currently in use.

 
  TCP/IP TCP auto-tuning (receive) setting:

sysctl -w net.ipv4.tcp_rmem=’4096 87380 8388608′

Default: 4096 87380 4194304
Performance.

The tcp_rmem variable defines how the TCP stack should behave when it comes to memory usage:

–          The first value tells the kernel the minimum receive buffer for each TCP connection, and this buffer is always allocated to a TCP socket, even under high pressure on the system.

–          The second value specified tells the kernel the default receive buffer allocated for each TCP socket. This value overrides the /proc/sys/net/core/rmem_default value used by other protocols.

–          The third and last value specified in this variable specifies the maximum receive buffer that can be allocated for a TCP socket.”

 
  TCP/IP TCP auto-tuning (send) setting:

sysctl -w net.ipv4.tcp_wmem=’4096 65536 8388608′

Default: 4096 87380 4194304
Performance.

This variable takes three different values which hold information on how much TCP send buffer memory space each TCP socket has to use. Every TCP socket has this much buffer space to use before the buffer is filled up.  Each of the three values are used under different conditions:

–          The first value in this variable tells the minimum TCP send buffer space available for a single TCP socket.

–          The second value in the variable tells us the default buffer space allowed for a single TCP socket to use.

–          The third value tells the kernel the maximum TCP send buffer space.”

 
  TCP/IP This will ensure that immediately subsequent connections use these values.

sysctl -w net.ipv4.route.flush=1

Default: Not present

 
       
TCP Keep Alive
  In order to detect ungracefully closed sockets either the TCP keep-alive comes into play or the EMS client-server heartbeat. Which setup or which combination of parameters works better depends on the requirements and test scenarios.

As the EMS daemon does not explicitly enables TCP keep-alive on sockets the TCP keep-alive setting (net.ipv4.tcp_keepalive_intvl, net.ipv4.tcp_keepalive_probes, net.ipv4.tcp_keepalive_time) do not play a role.

  TCP How may times to retry before killing alive TCP connection. RFC1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30 minutes depending on retransmission timeout (RTO).

sysctl -w net.ipv4.tcp_retries2=<test> (7 preferred)

Default: 15

Fault-Tolerance (EMS failover)

The default (15) is often considered too high and a value of 3 is often felt as too ‘edgy’ thus customer testing should establish a good value in the range between 4 and 10.

 
       
Linux System Settings
  System limits (ulimit) are used to establish boundaries for resource utilization by individual processes and thus protect the system and other processes. A too high or unlimited value provides zero protection but a too low value could hinder growth or cause premature errors.
  Linux Is the number of file descriptor at least 4096

ulimit –n

Scalability

Note: It is expected that the number of connected clients and thus the number of connections is going to increase over time and this setting allows for greater growth and also provides a greater safety room should some application have a connection leak. Also note that the number of open connection can decrease system performance due to the way the OS handles the select() API. Thus care should be taken if the number of connected clients increases over time that all SLA are still met.

 
  Linux Limit maximum file size for EMS to 2/5 of the disk space if the disk space is shared between EMS servers.

ulimit –f

Robustness: Contain the damage of a very large backlog.

 
  Linux Consider limiting the maximum data segment size for EMS daemons in order to avoid one EMS monopolizing all available memory.

ulimit –d

Robustness:  Contain the damage of a very large backlog.

Note: It should be tested if such a limit operates well with (triggers) the EMS reserved memory mode.

 
  Linux Limit number of child processes to X to contain rouge application (shell bomb)

ulimit –u

Robustness: Contain the damage a rogue application can do.
See: http://www.cyberciti.biz/tips/linux-limiting-user-process.html

This is just an example of a Linux system setting that is unrelated to TIBCO products. It is recommended to consult with Linux experts for recommended settings.

 
       
Linux Virtual Memory Management
  There are a couple of virtual memory related setting that play a role on how likely Linux swaps out memory pages and how Linux reacts to out-of-memory conditions. Both aspects are not important under “normal” operation conditions but are very important under memory pressure and thus the system’s stability under stress.

 

A server running EAI software and even more a server running a messaging server like EMS should rarely have to resort to swap space for obvious performance reasons. However considerations due to malloc/sbrk high-water-mark behavior, the behavior of the different over-commit strategies and the price of storage lead to above recommendation: Even with below tuning of EMS server towards larger malloc regions[1] the reality is that the EMS daemon is still subject to the sbrk() high-water-mark and is potentially allocation a lot of memory pages that could be swapped out without impacting performance. Of course the EMS server instance must eventually be bounced but the recommendation in this section aim to provide operations with a larger window to schedule the maintenance.

 

As theses values operate as a bundle they must be changed together or any variation must be well understood.

  Linux Swap-Space:                1.5 to 2x the physical RAM (24-32 GB )

Logical-Partition:        One of the first ones but after the EMS disk storage and application checkpoint files.

Physical-Partition:     Use a different physical partition than the one used for storage files, logging or application checkpoints to avoid competing disk IO.

 

 
  Linux Committing virtual memory:

sysctl -w vm.overcommit_memory=2

$ cat /proc/sys/vm/overcommit_memory

Default: 0

Robustness

 

Note: The recommended setting uses a new heuristic that only commits as much memory as available, where available is defined as swap-space plus a portion of RAM. The portion of RAM is defined in the overcommit_ratio. See also: http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting and http://www.centos.org/docs/5/html/5.2/Deployment_Guide/s3-proc-sys-vm.html

 
  Linux Committing virtual memory II:

sysctl -w vm.overcommit_ratio=25 (or less)

$ cat /proc/sys/vm/overcommit_ratio

Default: 50

Robustness

Note: This value specifies how much percent of the RAM Linux will add to the swap space in order to calculate the “available” memory. The more the swap space exceeds the physical RAM the lower values might be chosen. See also: http://www.linuxinsight.com/proc_sys_vm_overcommit_ratio.html

 
  Linux Swappiness

sysctl -w vm.swappiness=25 (or less)

$ cat /proc/sys/vm/swappiness
Default: 60

Robustness

 

Note: The swappiness defines how likely memory pages will be swapped in order to make room for the file buffer cache.

 

Generally speaking an enterprise server should not need to swap out pages in order to make room for the file buffer cache or other processes which would favor a setting of 0. 

 

On the other hand it is likely that applications have at least some memory pages that almost never get referenced again and swapping them out is a good thing.

 
  Linux Exclude essential processes (Application) from being killed by the out-of-memory (OOM) daemon.

Echo “-17: > /proc/<pid>/oom_adj

Default: NA

Robustness

See: http://linux-mm.org/OOM_Killer and http://lwn.net/Articles/317814/

 

Note: With any configuration but overcommit_memory=2 and overcommit_ratio=0 the Linux Virtual Memory Ma­nagement can commit more memory than available. If then the memory must be provided Linux engages the out-of-memory kill daemon to kill process based on “badness”. In order to exclude essential processes from being killed one can set their oom_adj to -17.

 
  Linux 32bit Low memory area –  32bit Linux only

# cat /proc/sys/vm/lower_zone_protection
# echo “250” > /proc/sys/vm/lower_zone_protection

(NOT APPLICABLE)
To set this option on boot, add the following to /etc/sysctl.conf:
vm.lower_zone_protection = 250

 

See: http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00062.html

 
       
Linux CPU Tuning (Processor Binding & Priorities)
  This level of tuning is seldom required for Any Application solution. The tuning options are mentioned in case there is a need to go an extra mile. 
  Linux IRQ-Binding

Recommendation: Leave default
Note: For real-time messaging the binding of interrupts to a certain exclusively used CPU allows reducing jitter and thus improves the system characteristics as needed by ultra-low-latency solutions.

The default on Linux is IRQ balancing across multiple CPU and Linux offers two solutions in that real (kernel and daemon) of which only one should be enabled at most.

 
  Linux Process Base Priority
Recommendation: Leave default

 

Note: The process base priority is determined by the user running the process instance and thus running processes as root (chown and set sticky bit) increases the processes base priority.

And a root user can further increase the priority of Application to real-time scheduling which can further improve performance particularly in terms of jitter. However in 2008 we observed that doing so actually decreased the performance of EMS in terms of number of messages per second. That issue was researched with Novell at that time but I am not sure of its outcome.

 
  Linux Foreground and Background Processes

Recommendation: TBD

 

Note: Linux assigns foreground processes a better base priority than background processes but if it really matters and if so then how to change start-up scripts is a to-be-determined. 

 
  Linux Processor Set

Recommendation: Don’t bother

 

Note: Linux allows defining a processor set and limiting a process to only use cores from that processor set. This can be used to increase cache hits and cap the CPU resource for a particular process instance.

 

If larger memory regions are allocated the malloc() in the Linux glibc library uses mmap() instead of sbrk() to provide the memory pages to the process.

The memory mapped files (mmap()) are better in the way how they release memory back to the OS and thus the high-water-mark effect is avoided for these regions.

Network

CIDR Table – Basic Reference (From Wikipedia)

Address Format Difference to last address Mask Addresses Relative to class Typical use
Decimal 2n A, B, C
a.b.c.d / 32 +0.0.0.0 255.255.255.255 1 20 1256 C Host route
a.b.c.d / 31 +0.0.0.1 255.255.255.254 2 21 1128 C Point to point links (RFC 3021)
a.b.c.d / 30 +0.0.0.3 255.255.255.252 4 22 164 C Point to point links (glue network)
a.b.c.d / 29 +0.0.0.7 255.255.255.248 8 23 132 C Smallest multi-host network
a.b.c.d / 28 +0.0.0.15 255.255.255.240 16 24 116 C Small LAN
a.b.c.d / 27 +0.0.0.31 255.255.255.224 32 25 ⅛ C
a.b.c.d / 26 +0.0.0.63 255.255.255.192 64 26 ¼ C
a.b.c.d / 25 +0.0.0.127 255.255.255.128 128 27 ½ C Large LAN
a.b.c.0 / 24 +0.0.0.255 255.255.255.0 256 28 1 C
a.b.c.0 / 23 +0.0.1.255 255.255.254.0 512 29 2 C
a.b.c.0 / 22 +0.0.3.255 255.255.252.0 1,024 210 4 C
a.b.c.0 / 21 +0.0.7.255 255.255.248.0 2,048 211 8 C Small ISP / large business
a.b.c.0 / 20 +0.0.15.255 255.255.240.0 4,096 212 16 C
a.b.c.0 / 19 +0.0.31.255 255.255.224.0 8,192 213 32 C ISP / large business
a.b.c.0 / 18 +0.0.63.255 255.255.192.0 16,384 214 64 C
a.b.c.0 / 17 +0.0.127.255 255.255.128.0 32,768 215 128 C
a.b.0.0 / 16 +0.0.255.255 255.255.0.0 65,536 216 256 C = B
a.b.0.0 / 15 +0.1.255.255 255.254.0.0 1,31,072 217 2 B
a.b.0.0 / 14 +0.3.255.255 255.252.0.0 2,62,144 218 4 B
a.b.0.0 / 13 +0.7.255.255 255.248.0.0 5,24,288 219 8 B
a.b.0.0 / 12 +0.15.255.255 255.240.0.0 10,48,576 220 16 B
a.b.0.0 / 11 +0.31.255.255 255.224.0.0 20,97,152 221 32 B
a.b.0.0 / 10 +0.63.255.255 255.192.0.0 41,94,304 222 64 B
a.b.0.0 / 9 +0.127.255.255 255.128.0.0 83,88,608 223 128 B
a.0.0.0 / 8 +0.255.255.255 255.0.0.0 1,67,77,216 224 256 B = A Largest IANA block allocation
a.0.0.0 / 7 +1.255.255.255 254.0.0.0 3,35,54,432 225 2:00 AM
a.0.0.0 / 6 +3.255.255.255 252.0.0.0 6,71,08,864 226 4:00 AM
a.0.0.0 / 5 +7.255.255.255 248.0.0.0 13,42,17,728 227 8:00 AM
a.0.0.0 / 4 +15.255.255.255 240.0.0.0 26,84,35,456 228 16 A
a.0.0.0 / 3 +31.255.255.255 224.0.0.0 53,68,70,912 229 32 A
a.0.0.0 / 2 +63.255.255.255 192.0.0.0 1,07,37,41,824 230 64 A
a.0.0.0 / 1 +127.255.255.255 128.0.0.0 2,14,74,83,648 231 128 A
0.0.0.0 / 0 +255.255.255.255 0.0.0.0 4,29,49,67,296 232 256 A
Adapters

TIBCO Adapter Error (AER3-910005) – Exception: “JMS error: “Not allowed to create destination tracking

If you encounter the following error in your adapter logs :-

Error AER3-910005 Exception: “JMS error: “Not allowed to create destination tracking=#B0fo–uT5-V4zkYM9A/UbWgUzas#

The following are the possibilities and pointers to be checked :-

  1. Please check the JMS connection configuration of your adapter is correct.
  2. Ensure the JMS user you used have enough permission to create receiver on destination.
  3. Check whether dynamic creation is ON or not in your EMS configuration.
  4. If your destination is a queue then check in “queues.conf” and if it is a topic then “topics.conf” file.
  5. And if you don’t want to Turn ON dynamic creation then you must create the destinations that are required by the adapter manually before starting the adapter.
  6. Finally Kill the BW process and Adapter service, then first start the adapter service and then the BW service.

Cause

  • Check the repository settings.
Adapters

TIBCO Adapters – Received read Advisory Error (JMS Related)

While testing for failover we found that the adapter is not failing over properly to the secondary ems server in case if the primary is down. The adapter logs show the below error. The adapter does not pick up any messages when this error occurs.

Advisory: _SDK.ERROR.JMS.RECEIVE_FAILED : { {ADV_MSG, M_STRING, “Consumer receive failed. JMS Error: Illegal state, SessionName: TIBCOCOMJmsTerminatorSession, Destination: Rep.adcom.Rep-COMAdapter_Rep_v1.exit” } {^description^, M_STRING, “” } }.

The only way to resolve this is to restart the adapter so that it reconnects to the ems server. Then it picks up the messages.

 

“JMS Error: Illegal state” usually happens when a JMS call or request occurs in an inappropriate context. For example, a consumer is trying to receive message while the JMS server is down.  In your case you are saying that this is happening during EMS failover from machine1 to machine2.

One thing to keep in mind is that depending on the number of oustanding messages, connections, and other resources managed by EMS there may be a brief period before the secondary server is ready to accept connections.

Clients that disconnect will typically attempt to reconnect, however there is a limit to the number of reconnection attempts (as well as the interval between attempts).   These are specified at the connection factory level in factories.conf.  Here are some of the applicable settings:

 

reconnect_attempt_count – After losing its server connection, a client program configured with more than one server URL attempts to reconnect, iterating through its URL list until it re-establishes a connection with an EMS server. This property determines the maximum number of iterations. When absent, the default is 4.

reconnect_attempt_delay – When attempting to reconnect, the client sleeps for this interval (in milliseconds) between iterations through its URL list. When absent, the default is 500 milliseconds.

reconnect_attempt_timeout – When attempting to reconnect to the EMS server, you can set this connection timeout period to abort the connection attempt after a specified period of time (in milliseconds).

It may also be helpful to specify heartbeats between the adapter and the EMS server.  This way if the EMS server is brought down either gracefully or ungracefully the connection will be reset when the configured number of heartbeats is missed.  This should then trigger the reconnection attempts described above.  The heartbeat settings are defined in the tibemsd.conf.  Here are some relevant settings:

client_heartbeat_server – Specifies the interval clients are to send heartbeats to the server.

server_timeout_client_connection – Specifies the period of time server will wait for a client heartbeat before terminating the client connection.

server_heartbeat_client – Specifies the interval this server is to send heartbeats to all of its clients.

client_timeout_server_connection – Specifies the period of time a client will wait for a heartbeat from the server before terminating the connection.

 

Docker

Docker – Commands to Manipulate the Containers

Parent command

Command Description
docker container Manage containers
Command Description
docker container attach Attach local standard input, output, and error streams to a running container
docker container commit Create a new image from a container’s changes
docker container cp Copy files/folders between a container and the local filesystem
docker container create Create a new container
docker container diff Inspect changes to files or directories on a container’s filesystem
docker container exec Run a command in a running container
docker container export Export a container’s filesystem as a tar archive
docker container inspect Display detailed information on one or more containers
docker container kill Kill one or more running containers
docker container logs Fetch the logs of a container
docker container ls List containers
docker container pause Pause all processes within one or more containers
docker container port List port mappings or a specific mapping for the container
docker container prune Remove all stopped containers
docker container rename Rename a container
docker container restart Restart one or more containers
docker container rm Remove one or more containers
docker container run Run a command in a new container
docker container start Start one or more stopped containers
docker container stats Display a live stream of container(s) resource usage statistics
docker container stop Stop one or more running containers
docker container top Display the running processes of a container
docker container unpause Unpause all processes within one or more containers
docker container update Update configuration of one or more containers
docker container wait Block until one or more containers stop, then print their exit codes
Docker

Docker – Add Proxy to Docker Daemon

I am gonna cut the chatter and hit the platter.

Proxy Recommendation :-  To Download the image from hub, we need internet connectivity.

I’ma show you the Steps to configure the proxy for Docker daemon.

  1. Check the OS in which the docker-ce or docker-ee is installed.

ubuntu@docker:~$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION=”Ubuntu 16.04.3 LTS”
NAME=”Ubuntu”
VERSION=”16.04.3 LTS (Xenial Xerus)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=”Ubuntu 16.04.3 LTS”
VERSION_ID=”16.04″
HOME_URL=”http://www.ubuntu.com/&#8221;
SUPPORT_URL=”http://help.ubuntu.com/&#8221;
BUG_REPORT_URL=”http://bugs.launchpad.net/ubuntu/&#8221;
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

2. Check the Docker version

ubuntu@docker:~$ sudo docker -v
Docker version 17.05.0-ce, build 89658be

3. Create a directory

sudo mkdir -p /etc/systemd/system/docker.service.d

4. Create a Proxy Conf

vim /etc/systemd/system/docker.service.d/http-proxy.conf

[Service]
Environment=”HTTP_PROXY=http://<proxy–ip>:<port>/”

Environment=”HTTPS_PROXY=https://<proxy–ip>:<port>/”

5. Now try to login to docker

ubuntu@docker:~$ sudo docker login
Login with your Docker ID to push and pull images from Docker Hub. If you don’t have a Docker ID, head over to https://hub.docker.com to create one.
Username: <username>
Password:
Login Succeeded
ubuntu@docker:~$

 

Docker, Main

Docker – Cheat Sheet

Hello Bloggers,

One of the most important thing in learning a new command quickly is going through the cheat Sheet.

I Love to go through Cheat Sheets for a quick references,

therefore i thought of consolidating some of the cheat sheets available online into my blog, for a quick ref.

This slideshow requires JavaScript.

Main, Storm

Apache Storm – Introduction

  • Apache Storm is a distributed real-time big data-processing system.
  • Storm is designed to process vast amount of data in a fault-tolerant and horizontal scalable method.
  • It is a streaming data framework that has the capability of highest ingestion rates.
  • Though Storm is stateless, it manages distributed environment and cluster state via Apache Zookeeper.
  • It is simple and you can execute all kinds of manipulations on real-time data in parallel.
  • Apache Storm is continuing to be a leader in real-time data analytics.

Storm is easy to setup, operate and it guarantees that every message will be processed through the topology at least once.

  • Basically Hadoop and Storm frameworks are used for analysing big data.
  • Both of them complement each other and differ in some aspects.
  • Apache Storm does all the operations except persistency, while Hadoop is good at everything but lags in real-time computation.
  • The following table compares the attributes of Storm and Hadoop.
Storm Hadoop
Real-time stream processing Batch processing
Stateless Stateful
Master/Slave architecture with ZooKeeper based coordination. The master node is called as nimbus and slaves are supervisors. Master-slave architecture with/without ZooKeeper based coordination. Master node is job tracker and slave node is task tracker.
A Storm streaming process can access tens of thousands messages per second on cluster. Hadoop Distributed File System (HDFS) uses MapReduce framework to process vast amount of data that takes minutes or hours.
Storm topology runs until shutdown by the user or an unexpected unrecoverable failure. MapReduce jobs are executed in a sequential order and completed eventually.
Both are distributed and fault-tolerant
If nimbus / supervisor dies, restarting makes it continue from where it stopped, hence nothing gets affected. If the JobTracker dies, all the running jobs are lost.

 

Apache Storm Benefits

Here is a list of the benefits that Apache Storm offers −

  • Storm is open source, robust, and user friendly. It could be utilized in small companies as well as large corporations.
  • Storm is fault tolerant, flexible, reliable, and supports any programming language.
  • Allows real-time stream processing.
  • Storm is unbelievably fast because it has enormous power of processing the data.
  • Storm can keep up the performance even under increasing load by adding resources linearly. It is highly scalable.
  • Storm performs data refresh and end-to-end delivery response in seconds or minutes depends upon the problem. It has very low latency.
  • Storm has operational intelligence.
  • Storm provides guaranteed data processing even if any of the connected nodes in the cluster die or messages are lost.

 

Container, Docker

Docker – Basic Installation & Configuration

Youtube Video :-

Command :-

sudo yum install -y yum-utils \

  device-mapper-persistent-data \

  lvm2

sudo yum-config-manager \

    –add-repo \

    https://download.docker.com/linux/centos/docker-ce.repo

sudo yum install docker-ce

yum list docker-ce –showduplicates | sort -r

sudo systemctl start docker

sudo docker run hello-world

docker volume create portainer_data

docker run -d -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer

docker service create \

–name portainer \

–publish 9000:9000 \

–replicas=1 \

–constraint ‘node.role == manager’ \

–mount type=bind,src=//var/run/docker.sock,dst=/var/run/docker.sock \

portainer/portainer \

-H unix:///var/run/docker.sock

Ansible, DevOps, TIBCO

Ansible for TIBCO (Stop-Start TIBCO Suite)

WHY ANSIBLE?

Working in IT, you’re likely doing the same tasks over and over. What if you could solve problems once and then automate your solutions going forward? Ansible is here to help.

COMPLEXITY KILLS PRODUCTIVITY

Every business is a digital business. Technology is your innovation engine, and delivering your applications faster helps you win. Historically, that required a lot of manual effort and complicated coordination. But today, there is Ansible – the simple, yet powerful IT automation engine that thousands of companies are using to drive complexity out of their environments and accelerate DevOps initiatives.

ANSIBLE LOVES THE REPETITIVE WORK YOUR PEOPLE HATE

No one likes repetitive tasks. With Ansible, IT admins can begin automating away the drudgery from their daily tasks. Automation frees admins up to focus on efforts that help deliver more value to the business by speeding time to application delivery, and building on a culture of success. Ultimately, Ansible gives teams the one thing they can never get enough of: time. Allowing smart people to focus on smart things.

Ansible is a simple automation language that can perfectly describe an IT application infrastructure. It’s easy-to-learn, self-documenting, and doesn’t require a grad-level computer science degree to read. Automation shouldn’t be more complex than the tasks it’s replacing.

COMMUNICATION IS THE KEY TO DEVOPS

Unless automation is designed for teams, it’s just another tool. For it to serve people, automation needs to be smarter and simpler.

Simplicity grows more important the more people it impacts. That’s why Ansible is automation designed with everyone in mind.

TIBCO WITH ANSIBLE

Imagine you have a TIBCO Suite in Linux and you have a monthly maintenance, wherein you are supposed to stop the entire TIBCO Suite to give your servers some momentary rest and start them all again, just like a power nap.

You have to take multiple ssh sessions to kill all services manually and when the server comes up we need to manually start the TIBCO Suite each component wise, Ansible meanwhile resolves this discrepancy .

I have created a playbook to stop & Start the entire TIBCO Suite

You all can customize the playbook as required

Please Find my GITHUB URL for the playbook and instruction

https://github.com/chriszones2000/Ansible-Playbooks

 

 

 

Redhat / CEntOS / Oracle Linux, Tips and Tricks

How to Delete all files except a Pattern in Unix

Good Morning To All My TECH Ghettos,

Today ima show ya’ll a fuckin command to delete all files except a pattern,

ya’ll can use it in a script or even commandline ……. Life gets easy as Fuck !!!!!!!!

find . -type f ! -name ‘<pattern>’ -delete

A Live Example

Before

Before

After the following Command

find . -type f ! -name ‘*.gz’ -delete

After

Operating System, Redhat / CEntOS / Oracle Linux, Ubuntu

How To Patch and Protect Linux Kernel Stack Clash Vulnerability CVE-2017-1000364 [ 19/June/2017 ]

Avery serious security problem has been found in the Linux kernel called “The Stack Clash.” It can be exploited by attackers to corrupt memory and execute arbitrary code. An attacker could leverage this with another vulnerability to execute arbitrary code and gain administrative/root account privileges. How do I fix this problem on Linux?

the-stack-clash-on-linux-openbsd-netbsd-freebsd-solaris
The Qualys Research Labs discovered various problems in the dynamic linker of the GNU C Library (CVE-2017-1000366) which allow local privilege escalation by clashing the stack including Linux kernel. This bug affects Linux, OpenBSD, NetBSD, FreeBSD and Solaris, on i386 and amd64. It can be exploited by attackers to corrupt memory and execute arbitrary code.

What is CVE-2017-1000364 bug?

From RHN:

A flaw was found in the way memory was being allocated on the stack for user space binaries. If heap (or different memory region) and stack memory regions were adjacent to each other, an attacker could use this flaw to jump over the stack guard gap, cause controlled memory corruption on process stack or the adjacent memory region, and thus increase their privileges on the system. This is a kernel-side mitigation which increases the stack guard gap size from one page to 1 MiB to make successful exploitation of this issue more difficult.

As per the original research post:

Each program running on a computer uses a special memory region called the stack. This memory region is special because it grows automatically when the program needs more stack memory. But if it grows too much and gets too close to another memory region, the program may confuse the stack with the other memory region. An attacker can exploit this confusion to overwrite the stack with the other memory region, or the other way around.

A list of affected Linux distros

  1. Red Hat Enterprise Linux Server 5.x
  2. Red Hat Enterprise Linux Server 6.x
  3. Red Hat Enterprise Linux Server 7.x
  4. CentOS Linux Server 5.x
  5. CentOS Linux Server 6.x
  6. CentOS Linux Server 7.x
  7. Oracle Enterprise Linux Server 5.x
  8. Oracle Enterprise Linux Server 6.x
  9. Oracle Enterprise Linux Server 7.x
  10. Ubuntu 17.10
  11. Ubuntu 17.04
  12. Ubuntu 16.10
  13. Ubuntu 16.04 LTS
  14. Ubuntu 12.04 ESM (Precise Pangolin)
  15. Debian 9 stretch
  16. Debian 8 jessie
  17. Debian 7 wheezy
  18. Debian unstable
  19. SUSE Linux Enterprise Desktop 12 SP2
  20. SUSE Linux Enterprise High Availability 12 SP2
  21. SUSE Linux Enterprise Live Patching 12
  22. SUSE Linux Enterprise Module for Public Cloud 12
  23. SUSE Linux Enterprise Build System Kit 12 SP2
  24. SUSE Openstack Cloud Magnum Orchestration 7
  25. SUSE Linux Enterprise Server 11 SP3-LTSS
  26. SUSE Linux Enterprise Server 11 SP4
  27. SUSE Linux Enterprise Server 12 SP1-LTSS
  28. SUSE Linux Enterprise Server 12 SP2
  29. SUSE Linux Enterprise Server for Raspberry Pi 12 SP2

Do I need to reboot my box?

Yes, as most services depends upon the dynamic linker of the GNU C Library and kernel itself needs to be reloaded in memory.

How do I fix CVE-2017-1000364 on Linux?

Type the commands as per your Linux distro. You need to reboot the box. Before you apply patch, note down your current kernel version:
$ uname -a
$ uname -mrs

Sample outputs:

Linux 4.4.0-78-generic x86_64

Debian or Ubuntu Linux

Type the following apt command/apt-get command to apply updates:
$ sudo apt-get update && sudo apt-get upgrade && sudo apt-get dist-upgrade
Sample outputs:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  libc-bin libc-dev-bin libc-l10n libc6 libc6-dev libc6-i386 linux-compiler-gcc-6-x86 linux-headers-4.9.0-3-amd64 linux-headers-4.9.0-3-common linux-image-4.9.0-3-amd64
  linux-kbuild-4.9 linux-libc-dev locales multiarch-support
14 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/62.0 MB of archives.
After this operation, 4,096 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Reading changelogs... Done
Preconfiguring packages ...
(Reading database ... 115123 files and directories currently installed.)
Preparing to unpack .../libc6-i386_2.24-11+deb9u1_amd64.deb ...
Unpacking libc6-i386 (2.24-11+deb9u1) over (2.24-11) ...
Preparing to unpack .../libc6-dev_2.24-11+deb9u1_amd64.deb ...
Unpacking libc6-dev:amd64 (2.24-11+deb9u1) over (2.24-11) ...
Preparing to unpack .../libc-dev-bin_2.24-11+deb9u1_amd64.deb ...
Unpacking libc-dev-bin (2.24-11+deb9u1) over (2.24-11) ...
Preparing to unpack .../linux-libc-dev_4.9.30-2+deb9u1_amd64.deb ...
Unpacking linux-libc-dev:amd64 (4.9.30-2+deb9u1) over (4.9.30-2) ...
Preparing to unpack .../libc6_2.24-11+deb9u1_amd64.deb ...
Unpacking libc6:amd64 (2.24-11+deb9u1) over (2.24-11) ...
Setting up libc6:amd64 (2.24-11+deb9u1) ...
(Reading database ... 115123 files and directories currently installed.)
Preparing to unpack .../libc-bin_2.24-11+deb9u1_amd64.deb ...
Unpacking libc-bin (2.24-11+deb9u1) over (2.24-11) ...
Setting up libc-bin (2.24-11+deb9u1) ...
(Reading database ... 115123 files and directories currently installed.)
Preparing to unpack .../multiarch-support_2.24-11+deb9u1_amd64.deb ...
Unpacking multiarch-support (2.24-11+deb9u1) over (2.24-11) ...
Setting up multiarch-support (2.24-11+deb9u1) ...
(Reading database ... 115123 files and directories currently installed.)
Preparing to unpack .../0-libc-l10n_2.24-11+deb9u1_all.deb ...
Unpacking libc-l10n (2.24-11+deb9u1) over (2.24-11) ...
Preparing to unpack .../1-locales_2.24-11+deb9u1_all.deb ...
Unpacking locales (2.24-11+deb9u1) over (2.24-11) ...
Preparing to unpack .../2-linux-compiler-gcc-6-x86_4.9.30-2+deb9u1_amd64.deb ...
Unpacking linux-compiler-gcc-6-x86 (4.9.30-2+deb9u1) over (4.9.30-2) ...
Preparing to unpack .../3-linux-headers-4.9.0-3-amd64_4.9.30-2+deb9u1_amd64.deb ...
Unpacking linux-headers-4.9.0-3-amd64 (4.9.30-2+deb9u1) over (4.9.30-2) ...
Preparing to unpack .../4-linux-headers-4.9.0-3-common_4.9.30-2+deb9u1_all.deb ...
Unpacking linux-headers-4.9.0-3-common (4.9.30-2+deb9u1) over (4.9.30-2) ...
Preparing to unpack .../5-linux-kbuild-4.9_4.9.30-2+deb9u1_amd64.deb ...
Unpacking linux-kbuild-4.9 (4.9.30-2+deb9u1) over (4.9.30-2) ...
Preparing to unpack .../6-linux-image-4.9.0-3-amd64_4.9.30-2+deb9u1_amd64.deb ...
Unpacking linux-image-4.9.0-3-amd64 (4.9.30-2+deb9u1) over (4.9.30-2) ...
Setting up linux-libc-dev:amd64 (4.9.30-2+deb9u1) ...
Setting up linux-headers-4.9.0-3-common (4.9.30-2+deb9u1) ...
Setting up libc6-i386 (2.24-11+deb9u1) ...
Setting up linux-compiler-gcc-6-x86 (4.9.30-2+deb9u1) ...
Setting up linux-kbuild-4.9 (4.9.30-2+deb9u1) ...
Setting up libc-l10n (2.24-11+deb9u1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up libc-dev-bin (2.24-11+deb9u1) ...
Setting up linux-image-4.9.0-3-amd64 (4.9.30-2+deb9u1) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.9.0-3-amd64
cryptsetup: WARNING: failed to detect canonical device of /dev/md0
cryptsetup: WARNING: could not determine root device from /etc/fstab
W: initramfs-tools configuration sets RESUME=UUID=054b217a-306b-4c18-b0bf-0ed85af6c6e1
W: but no matching swap device is available.
I: The initramfs will attempt to resume from /dev/md1p1
I: (UUID=bf72f3d4-3be4-4f68-8aae-4edfe5431670)
I: Set the RESUME variable to override this.
/etc/kernel/postinst.d/zz-update-grub:
Searching for GRUB installation directory ... found: /boot/grub
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-4.9.0-3-amd64
Found kernel: /boot/vmlinuz-3.16.0-4-amd64
Updating /boot/grub/menu.lst ... done

Setting up libc6-dev:amd64 (2.24-11+deb9u1) ...
Setting up locales (2.24-11+deb9u1) ...
Generating locales (this might take a while)...
  en_IN.UTF-8... done
Generation complete.
Setting up linux-headers-4.9.0-3-amd64 (4.9.30-2+deb9u1) ...
Processing triggers for libc-bin (2.24-11+deb9u1) ...

Reboot your server/desktop using reboot command:
$ sudo reboot

Oracle/RHEL/CentOS/Scientific Linux

Type the following yum command:
$ sudo yum update
$ sudo reboot

Fedora Linux

Type the following dnf command:
$ sudo dnf update
$ sudo reboot

Suse Enterprise Linux or Opensuse Linux

Type the following zypper command:
$ sudo zypper patch
$ sudo reboot

SUSE OpenStack Cloud 6

$ sudo zypper in -t patch SUSE-OpenStack-Cloud-6-2017-996=1
$ sudo reboot

SUSE Linux Enterprise Server for SAP 12-SP1

$ sudo zypper in -t patch SUSE-SLE-SAP-12-SP1-2017-996=1
$ sudo reboot

SUSE Linux Enterprise Server 12-SP1-LTSS

$ sudo zypper in -t patch SUSE-SLE-SERVER-12-SP1-2017-996=1
$ sudo reboot

SUSE Linux Enterprise Module for Public Cloud 12

$ sudo zypper in -t patch SUSE-SLE-Module-Public-Cloud-12-2017-996=1
$ sudo reboot

Verification

You need to make sure your version number changed after issuing reboot command
$ uname -a
$ uname -r
$ uname -mrs

Sample outputs:

Linux 4.4.0-81-generic x86_64
Main, Operating System, Redhat / CEntOS / Oracle Linux, Ubuntu

Cpustat – Monitors CPU Utilization by Running Processes in Linux

Main

Apache Kafka – The New Beginning for Messaging

Introduction

Apache Kafka is a popular distributed message broker designed to handle large volumes of real-time data efficiently. A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ. Though it is generally used as a pub/sub messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

In this tutorial, you will learn how to install and use Apache Kafka 0.8.2.1 on Ubuntu 16.04.

Prerequisites

To follow along, you will need:

  • Ubuntu 16.04 Droplet
  • At least 4GB of swap space

Step 1 — Create a User for Kafka

As Kafka can handle requests over a network, you should create a dedicated user for it. This minimizes damage to your Ubuntu machine should the Kafka server be comprised.

Note: After setting up Apache Kafka, it is recommended that you create a different non-root user to perform other tasks on this server.

As root, create a user called kafka using the useradd command:

useradd kafka -m

Set its password using passwd:

passwd kafka

Add it to the sudo group so that it has the privileges required to install Kafka’s dependencies. This can be done using the adduser command:

adduser kafka sudo

Your Kafka user is now ready. Log into it using su:

su - kafka

Step 2 — Install Java

Before installing additional packages, update the list of available packages so you are installing the latest versions available in the repository:

sudo apt-get update

As Apache Kafka needs a Java runtime environment, use apt-get to install the default-jre package:

sudo apt-get install default-jre

Step 3 — Install ZooKeeper

Apache ZooKeeper is an open source service built to coordinate and synchronize configuration information of nodes that belong to a distributed system. A Kafka cluster depends on ZooKeeper to perform—among other things—operations such as detecting failed nodes and electing leaders.

Since the ZooKeeper package is available in Ubuntu’s default repositories, install it using apt-get.

sudo apt-get install zookeeperd

After the installation completes, ZooKeeper will be started as a daemon automatically. By default, it will listen on port 2181.

To make sure that it is working, connect to it via Telnet:

telnet localhost 2181

At the Telnet prompt, type in ruok and press ENTER.

If everything’s fine, ZooKeeper will say imok and end the Telnet session.

Step 4 — Download and Extract Kafka Binaries

Now that Java and ZooKeeper are installed, it is time to download and extract Kafka.

To start, create a directory called Downloads to store all your downloads.

mkdir -p ~/Downloads

Use wget to download the Kafka binaries.

wget "http://mirror.cc.columbia.edu/pub/software/apache/kafka/0.8.2.1/kafka_2.11-0.8.2.1.tgz" -O ~/Downloads/kafka.tgz

Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation.

mkdir -p ~/kafka && cd ~/kafka

Extract the archive you downloaded using the tar command.

tar -xvzf ~/Downloads/kafka.tgz --strip 1

Step 5 — Configure the Kafka Server

The next step is to configure the Kakfa server.

Open server.properties using vi:

vi ~/kafka/config/server.properties

By default, Kafka doesn’t allow you to delete topics. To be able to delete topics, add the following line at the end of the file:

~/kafka/config/server.properties

delete.topic.enable = true

Save the file, and exit vi.

Step 6 — Start the Kafka Server

Run the kafka-server-start.sh script using nohup to start the Kafka server (also called Kafka broker) as a background process that is independent of your shell session.

nohup ~/kafka/bin/kafka-server-start.sh ~/kafka/config/server.properties > ~/kafka/kafka.log 2>&1 &

Wait for a few seconds for it to start. You can be sure that the server has started successfully when you see the following messages in ~/kafka/kafka.log:

excerpt from ~/kafka/kafka.log

... [2015-07-29 06:02:41,736] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2015-07-29 06:02:41,776] INFO [Kafka Server 0], started (kafka.server.KafkaServer)

You now have a Kafka server which is listening on port 9092.

Step 7 — Test the Installation

Let us now publish and consume a “Hello World” message to make sure that the Kafka server is behaving correctly.

To publish messages, you should create a Kafka producer. You can easily create one from the command line using the kafka-console-producer.sh script. It expects the Kafka server’s hostname and port, along with a topic name as its arguments.

Publish the string “Hello, World” to a topic called TutorialTopic by typing in the following:

echo "Wassup Playas" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic HariTopic > /dev/null

As the topic doesn’t exist, Kafka will create it automatically.

To consume messages, you can create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name as its arguments.

The following command consumes messages from the topic we published to. Note the use of the --from-beginning flag, which is present because we want to consume a message that was published before the consumer was started.

~/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic HariTopic --from-beginning

If there are no configuration issues, you should see Hello,
World
in the output now.

The script will continue to run, waiting for more messages to be published to the topic. Feel free to open a new terminal and start a producer to publish a few more messages. You should be able to see them all in the consumer’s output instantly.

When you are done testing, press CTRL+C to stop the consumer script.

Step 8 — Install KafkaT (Optional)

KafkaT is a handy little tool from Airbnb which makes it easier for you to view details about your Kafka cluster and also perform a few administrative tasks from the command line. As it is a Ruby gem, you will need Ruby to use it. You will also need the build-essential package to be able to build the other gems it depends on. Install them using apt-get:

sudo apt-get install ruby ruby-dev build-essential

You can now install KafkaT using the gem command:

sudo gem install kafkat --source https://rubygems.org --no-ri --no-rdoc

Use vi to create a new file called .kafkatcfg.

vi ~/.kafkatcfg

This is a configuration file which KafkaT uses to determine the installation and log directories of your Kafka server. It should also point KafkaT to your ZooKeeper instance. Accordingly, add the following lines to it:

~/.kafkatcfg

{   "kafka_path": "~/kafka",   "log_path": "/tmp/kafka-logs",   "zk_path": "localhost:2181" }

You are now ready to use KafkaT. For a start, here’s how you would use it to view details about all Kafka partitions:

kafkat partitions

You should see the following output:

output of kafkat partitions

Topic           Partition   Leader      Replicas        ISRs     TutorialTopic   0             0           [0]           [0]

To learn more about KafkaT, refer to its GitHub repository.

Step 9 — Set Up a Multi-Node Cluster (Optional)

If you want to create a multi-broker cluster using more Ubuntu 16.04 machines, you should repeat Step 1, Step 3, Step 4 and Step 5 on each of the new machines. Additionally, you should make the following changes in the server.properties file in each of them:

  • the value of the broker.id property should be changed such that it is unique throughout the cluster
  • the value of the zookeeper.connect property should be changed such that all nodes point to the same ZooKeeper instance

If you want to have multiple ZooKeeper instances for your cluster, the value of the zookeeper.connect property on each node should be an identical, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances.

Step 10 — Restrict the Kafka User

Now that all installations are done, you can remove the kafka user’s admin privileges. Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session you started this tutorial with, simply type exit.

To remove the kafka user’s admin privileges, remove it from the sudo group.

sudo deluser kafka sudo

To further improve your Kafka server’s security, lock the kafka user’s password using the passwd command. This makes sure that nobody can directly log into it.

sudo passwd kafka -l

At this point, only root or a sudo user can log in as kafka by typing in the following command:

sudo su - kafka

In the future, if you want to unlock it, use passwd with the -u option:

sudo passwd kafka -u

Conclusion

You now have a secure Apache Kafka running on your Ubuntu server. You can easily make use of it in your projects by creating Kafka producers and consumers using Kafka clients which are available for most programming languages. To learn more about Kafka, do go through its documentation.

Finally for GUI Download

http://www.kafkatool.com/download/kafkatool.sh

Youtube Video Link [Watch Here]

 

Operating System, Redhat / CEntOS / Oracle Linux, Ubuntu

Linux security alert: Bug in sudo’s get_process_ttyname() [ CVE-2017-1000367 ]

There is a serious vulnerability in sudo command that grants root access to anyone with a shell account. It works on SELinux enabled systems such as CentOS/RHEL and others too. A local user with privileges to execute commands via sudo could use this flaw to escalate their privileges to root. Patch your system as soon as possible.

It was discovered that Sudo did not properly parse the contents of /proc/[pid]/stat when attempting to determine its controlling tty. A local attacker in some configurations could possibly use this to overwrite any file on the filesystem, bypassing intended permissions or gain root shell.
From the description

We discovered a vulnerability in Sudo’s get_process_ttyname() for Linux:
this function opens “/proc/[pid]/stat” (man proc) and reads the device number of the tty from field 7 (tty_nr). Unfortunately, these fields are space-separated and field 2 (comm, the filename of the command) can
contain spaces (CVE-2017-1000367).

For example, if we execute Sudo through the symlink “./ 1 “, get_process_ttyname() calls sudo_ttyname_dev() to search for the non-existent tty device number “1” in the built-in search_devs[].

Next, sudo_ttyname_dev() calls the function sudo_ttyname_scan() to search for this non-existent tty device number “1” in a breadth-first traversal of “/dev”.

Last, we exploit this function during its traversal of the world-writable “/dev/shm”: through this vulnerability, a local user can pretend that his tty is any character device on the filesystem, and
after two race conditions, he can pretend that his tty is any file on the filesystem.

On an SELinux-enabled system, if a user is Sudoer for a command that does not grant him full root privileges, he can overwrite any file on the filesystem (including root-owned files) with his command’s output,
because relabel_tty() (in src/selinux.c) calls open(O_RDWR|O_NONBLOCK) on his tty and dup2()s it to the command’s stdin, stdout, and stderr. This allows any Sudoer user to obtain full root privileges.

A list of affected Linux distro

  1. Red Hat Enterprise Linux 6 (sudo)
  2. Red Hat Enterprise Linux 7 (sudo)
  3. Red Hat Enterprise Linux Server (v. 5 ELS) (sudo)
  4. Oracle Enterprise Linux 6
  5. Oracle Enterprise Linux 7
  6. Oracle Enterprise Linux Server 5
  7. CentOS Linux 6 (sudo)
  8. CentOS Linux 7 (sudo)
  9. Debian wheezy
  10. Debian jessie
  11. Debian stretch
  12. Debian sid
  13. Ubuntu 17.04
  14. Ubuntu 16.10
  15. Ubuntu 16.04 LTS
  16. Ubuntu 14.04 LTS
  17. SUSE Linux Enterprise Software Development Kit 12-SP2
  18. SUSE Linux Enterprise Server for Raspberry Pi 12-SP2
  19. SUSE Linux Enterprise Server 12-SP2
  20. SUSE Linux Enterprise Desktop 12-SP2
  21. OpenSuse, Slackware, and Gentoo Linux

How do I patch sudo on Debian/Ubuntu Linux server?

To patch Ubuntu/Debian Linux apt-get command or apt command:
$ sudo apt update
$ sudo apt upgrade

How do I patch sudo on CentOS/RHEL/Scientific/Oracle Linux server?

Run yum command:
$ sudo yum update

How do I patch sudo on Fedora Linux server?

Run dnf command:
$ sudo dnf update

How do I patch sudo on Suse/OpenSUSE Linux server?

Run zypper command:
$ sudo zypper update

How do I patch sudo on Arch Linux server?

Run pacman command:
$ sudo pacman -Syu

How do I patch sudo on Alpine Linux server?

Run apk command:
# apk update && apk upgrade

How do I patch sudo on Slackware Linux server?

Run upgradepkg command:
# upgradepkg sudo-1.8.20p1-i586-1_slack14.2.txz

How do I patch sudo on Gentoo Linux server?

Run emerge command:
# emerge --sync
# emerge --ask --oneshot --verbose ">=app-admin/sudo-1.8.20_p1"

Kernel Programming, Operating System, Redhat / CEntOS / Oracle Linux, Ubuntu

Impermanence in Linux – Exclusive (By Hari Iyer)

Impermanence, also called Anicca or Anitya, is one of the essential doctrines and a part of three marks of existence in Buddhism The doctrine asserts that all of conditioned existence, without exception, is “transient, evanescent, inconstant”

On Linux, the root of all randomness is something called the kernel entropy pool. This is a large (4,096 bit) number kept privately in the kernel’s memory. There are 24096 possibilities for this number so it can contain up to 4,096 bits of entropy. There is one caveat – the kernel needs to be able to fill that memory from a source with 4,096 bits of entropy. And that’s the hard part: finding that much randomness.

The entropy pool is used in two ways: random numbers are generated from it and it is replenished with entropy by the kernel. When random numbers are generated from the pool the entropy of the pool is diminished (because the person receiving the random number has some information about the pool itself). So as the pool’s entropy diminishes as random numbers are handed out, the pool must be replenished.

Replenishing the pool is called stirring: new sources of entropy are stirred into the mix of bits in the pool.

This is the key to how random number generation works on Linux. If randomness is needed, it’s derived from the entropy pool. When available, other sources of randomness are used to stir the entropy pool and make it less predictable. The details are a little mathematical, but it’s interesting to understand how the Linux random number generator works as the principles and techniques apply to random number generation in other software and systems.

The kernel keeps a rough estimate of the number of bits of entropy in the pool. You can check the value of this estimate through the following command:

cat /proc/sys/kernel/random/entropy_avail

A healthy Linux system with a lot of entropy available will have return close to the full 4,096 bits of entropy. If the value returned is less than 200, the system is running low on entropy.

The kernel is watching you

I mentioned that the system takes other sources of randomness and uses this to stir the entropy pool. This is achieved using something called a timestamp.

Most systems have precise internal clocks. Every time that a user interacts with a system, the value of the clock at that time is recorded as a timestamp. Even though the year, month, day and hour are generally guessable, the millisecond and microsecond are not and therefore the timestamp contains some entropy. Timestamps obtained from the user’s mouse and keyboard along with timing information from the network and disk each have different amount of entropy.

How does the entropy found in a timestamp get transferred to the entropy pool? Simple, use math to mix it in. Well, simple if you like math.

Just mix it in

A fundamental property of entropy is that it mixes well. If you take two unrelated random streams and combine them, the new stream cannot have less entropy. Taking a number of low entropy sources and combining them results in a high entropy source.

All that’s needed is the right combination function: a function that can be used to combine two sources of entropy. One of the simplest such functions is the logical exclusive or (XOR). This truth table shows how bits x and y coming from different random streams are combined by the XOR function.

Even if one source of bits does not have much entropy, there is no harm in XORing it into another source. Entropy always increases. In the Linux kernel, a combination of XORs is used to mix timestamps into the main entropy pool.

Generating random numbers

Cryptographic applications require very high entropy. If a 128 bit key is generated with only 64 bits of entropy then it can be guessed in 264 attempts instead of 2128 attempts. That is the difference between needing a thousand computers running for a few years to brute force the key versus needing all the computers ever created running for longer than the history of the universe to do so.

Cryptographic applications require close to one bit of entropy per bit. If the system’s pool has fewer than 4,096 bits of entropy, how does the system return a fully random number? One way to do this is to use a cryptographic hash function.

A cryptographic hash function takes an input of any size and outputs a fixed size number. Changing one bit of the input will change the output completely. Hash functions are good at mixing things together. This mixing property spreads the entropy from the input evenly through the output. If the input has more bits of entropy than the size of the output, the output will be highly random. This is how highly entropic random numbers are derived from the entropy pool.

The hash function used by the Linux kernel is the standard SHA-1 cryptographic hash. By hashing the entire pool and and some additional arithmetic, 160 random bits are created for use by the system. When this happens, the system lowers its estimate of the entropy in the pool accordingly.

Above I said that applying a hash like SHA-1 could be dangerous if there wasn’t enough entropy in the pool. That’s why it’s critical to keep an eye on the available system entropy: if it drops too low the output of the random number generator could have less entropy that it appears to have.

Running out of entropy

One of the dangers of a system is running out of entropy. When the system’s entropy estimate drops to around the 160 bit level, the length of a SHA-1 hash, things get tricky, and how they effect programs and performance depends on which of two Linux random number generators are used.

Linux exposes two interfaces for random data that behave differently when the entropy level is low. They are /dev/random and /dev/urandom. When the entropy pool becomes predictable, both interfaces for requesting random numbers become problematic.

When the entropy level is too low, /dev/random blocks and does not return until the level of entropy in the system is high enough. This guarantees high entropy random numbers. If /dev/random is used in a time-critical service and the system runs low on entropy, the delays could be detrimental to the quality of service.

On the other hand, /dev/urandom does not block. It continues to return the hashed value of its entropy pool even though there is little to no entropy in it. This low-entropy data is not suited for cryptographic use.

The solution to the problem is to simply add more entropy into the system.

Hardware random number generation to the rescue?

Intel’s Ivy Bridge family of processors have an interesting feature called “secure key.” These processors contain a special piece of hardware inside that generates random numbers. The single assembly instruction RDRAND returns allegedly high entropy random data derived on the chip.

It has been suggested that Intel’s hardware number generator may not be fully random. Since it is baked into the silicon, that assertion is hard to audit and verify. As it turns out, even if the numbers generated have some bias, it can still help as long as this is not the only source of randomness in the system. Even if the random number generator itself had a back door, the mixing property of randomness means that it cannot lower the amount of entropy in the pool.

On Linux, if a hardware random number generator is present, the Linux kernel will use the XOR function to mix the output of RDRAND into the hash of the entropy pool. This happens here in the Linux source code (the XOR operator is ^ in C).

Third party entropy generators

Hardware number generation is not available everywhere, and the sources of randomness polled by the Linux kernel itself are somewhat limited. For this situation, a number of third party random number generation tools exist. Examples of these are haveged, which relies on processor cache timing, audio-entropyd and video-entropyd which work by sampling the noise from an external audio or video input device. By mixing these additional sources of locally collected entropy into the Linux entropy pool, the entropy can only go up.

TIBCO

TIBCO Universal Installer – Unix – The installer is unable to run in graphical mode. Try running the installer with the -console or -silent flag (SOLVED)

Many a times,

when you try to install TIBCO Rendezvous / TIBCO EMS or even certain BW Plugins ( That are 32 bit binaries ) on a 64 bit JVM based UNIX System (Linux / Solaris / AIX / UX / FreeBSD)

You typically encounter an error like this

Capture

 

Well, many people ain’t aware of the real deal to solve this issue,

After much Research with permutations and Combinations, there seems to be a solution for this :-

Follow the Steps mentioned below For RHEL 6.XX Systems (Cuz i ain’t tried for other NIX platform yet)

  1. sudo yum -y install libXtst*i686 *
  2. sudo yum -y install libXext*i686*
  3. sudo yum -y install libXrender*i686*

I am damn sure, it’ll work for GUI mode of installation

BusinessWorks, TIBCO

java.sql.SQLRecoverableException: IO Error: Connection reset ( Designer / BWEngine / interfaceName )

Sometimes, when you create a JDBC Connection in your Designer, or when you configure a JDBC Connection in your EAR, You might end up with an error like this :-

Designer :-

Capture

Runtime :-

java.sql.SQLRecoverableException: IO Error: Connection reset

(In your trace file)

This happens because of urandom

/dev/random is a random number generator often used to seed cryptography functions for better security.  /dev/urandom likewise is a (pseudo) random number generator.  Both are good at generating random numbers.  The key difference is that /dev/random has a blocking function that waits until entropy reaches a certain level before providing its result.  From a practical standpoint, this means that programs using /dev/random will generally take longer to complete than /dev/urandom.

With regards to why /dev/urandom vs /dev/./urandom.  That is something unique to Java versions 5 and following that resulted from problems with /dev/urandom on Linux systems back in 2004.  The easy fix was to force /dev/urandom to use /dev/random.  However, it doesn’t appear that Java will be updated to let /dev/urandom use /dev/urandom. So, the workaround is to fake Java out by obscuring /dev/urandom to /dev/./urandom which is functionally the same thing but looks different.

Therefore, Add the following Field to bwengine.tra and designer.tra OR your Individual track’s tra file and restart the bwengine or designer and it works like Magic Johnson’s Dunk.

java.extended.properties -Djava.security.egd=file:///dev/urandom

Main, Tuning

Interrupt Coalescence (also called Interrupt Moderation, Interrupt Blanking, or Interrupt Throttling)

A common bottleneck for high-speed data transfers is the high rate of interrupts that the receiving system has to process – traditionally, a network adapter generates an interrupt for each frame that it receives. These interrupts consume signaling resources on the system’s bus(es), and introduce significant CPU overhead as the system transitions back and forth between “productive” work and interrupt handling many thousand times a second.

To alleviate this load, some high-speed network adapters support interrupt coalescence. When multiple frames are received in a short timeframe (“back-to-back”), these adapters buffer those frames locally and only interrupt the system once.

Interrupt coalescence together with large-receive offload can roughly be seen as doing on the “receive” side what transmit chaining and large-send offload (LSO) do for the “transmit” side.

Issues with interrupt coalescence

While this scheme lowers interrupt-related system load significantly, it can have adverse effects on timing, and make TCP traffic more bursty or “clumpy”. Therefore it would make sense to combine interrupt coalescence with on-board timestamping functionality. Unfortunately that doesn’t seem to be implemented in commodity hardware/driver combinations yet.

The way that interrupt coalescence works, a network adapter that has received a frame doesn’t send an interrupt to the system right away, but waits for a little while in case more packets arrive. This can have a negative impact on latency.

In general, interrupt coalescence is configured such that the additional delay is bounded. On some implementations, these delay bounds are specified in units of milliseconds, on other systems in units of microseconds. It requires some thought to find a good trade-off between latency and load reduction. One should be careful to set the coalescence threshold low enough that the additional latency doesn’t cause problems. Setting a low threshold will prevent interrupt coalescence from occurring when successive packets are spaced too far apart. But in that case, the interrupt rate will probably be low enough so that this is not a problem.

Configuration

Configuration of interrupt coalescence is highly system dependent, although there are some parameters that are more or less common over implementations.

Linux

On Linux systems with additional driver support, the ethtool -C command can be used to modify the interrupt coalescence settings of network devices on the fly.

Some Ethernet drivers in Linux have parameters to control Interrupt Coalescence (Interrupt Moderation, as it is called in Linux). For example, the e1000 driver for the large family of Intel Gigabit Ethernet adapters has the following parameters according to the kernel documentation:

InterruptThrottleRate
limits the number of interrupts per second generated by the card. Values >= 100 are interpreted as the maximum number of interrupts per second. The default value used to be 8’000 up to and including kernel release 2.6.19. A value of zero (0) disabled interrupt moderation completely. Above 2.6.19, some values between 1 and 99 can be used to select adaptive interrupt rate control. The first adaptive modes are “dynamic conservative” (1) and dynamic with reduced latency (3). In conservative mode (1), the rate changes between 4’000 interrupts per second when only bulk traffic (“normal-size packets”) is seen, and 20’000 when small packets are present that might benefit from lower latency. The more aggressive mode (3), “low-latency” traffic may drive the interrupt rate up to 70’000 per second. This mode is supposed to be useful for cluster communication in grid applications.
RxIntDelay
specifies, in multiples of 1’024 microseconds, the time after reception of a frame to wait for another frame to arrive before sending an interrupt.
RxAbsIntDelay
bounds the delay between reception of a frame and generation of an interrupt. It is specified in units of 1’024 microseconds. Note that InterruptThrottleRate overrides RxAbsIntDelay, so even when a very short RxAbsIntDelay is specified, the interrupt rate should never exceed the rate specified (either directly or by the dynamic algorithm) by InterruptThrottleRate
RxDescriptors
specifies the number of descriptors to store incoming frames on the adapter. The default value is 256, which is also the maximum for some types of E1000-based adapters. Others can allocate up to 4’096 of these descriptors. The size of the receive buffer associated with each descriptor varies with the MTU configured on the adapter. It is always a power-of-two number of bytes. The number of descriptors available will also depend on the per-buffer size. When all buffers have been filled by incoming frames, an interrupt will have to be signaled in any case.

Solaris

As an example, see the Platform Notes: Sun GigaSwift Ethernet Device Driver. It lists the following parameters for that particular type of adapter:

rx_intr_pkts
Interrupt after this number of packets have arrived since the last packet was serviced. A value of zero indicates no packet blanking. (Range: 0 to 511, default=3)
rx_intr_time
Interrupt after 4.5 microsecond ticks have elapsed since the last packet was serviced. A value of zero indicates no time blanking. (Range: 0 to 524287, default=1250)