“Discovery” is used in the initial phase of a process’s connection to a Metaspace, to discover which other nodes are already member of the metaspace and establish connections to them. Discovery and connection are Metaspace level operations (separated from joining and leaving spaces).
- A single process can only be connected once to a specific Metaspace. This means that if any thread of a process has already called cout for a metaspace name, then no further calls to connect for the same metaspace name can be made (however, a process can have simultaneous connections to different Metaspaces).
- Discovery and connection is not a very fast operation; it sometimes can take many seconds for a process to go through the whole discovery and connection process (depending on the type of discovery and the number of nodes already connected).
- Because connect() cannot be called repeatedly for the Metaspace in order to get a copy of an already connected Metaspace object the programmer may use the convenience function ASCommon.getMetaspace(): this function takes a metaspace name and if the process is already connected to that Metaspace will return a copy of the Metaspace object for that connection.
Multicast versus Unicast Discovery
Discovery can be one of the following types: Unicast (TCP) or Multicast (TIBPGM or RV). In general, Multicast discovery is somewhat easier to us, because it requires less configuration (which is why it is the default) and tends to be used during the development phase, while TCP discovery requires (and offers) a bit more control through configuration, and, unlike multicasting works in all network environments and therefore tends to be used in production. But there is no hard and fast rule for which of those methods of discovery is recommended and all types of discovery mechanism are supported equally.
Some things to remember:
- There is no loss of functionality depending on the kind of discovery that is used.
- Only one type of discovery can be used for a particular Metaspace. But a process could use unicast discovery to connect to one Metaspace, and multicast discovery to connect to another.
To be able to use multicast discovery, the following conditions must be met:
- UDP packets must be able to flow bidirectionally between all the Metaspace members.
- If the Metaspace members are on separate subnets, multicast routing must be enabled between those subnets.
When it comes to choosing between the two available choices for multicast discovery (PGM or RV):
- There is no functional difference between using the built-in PGM reliable multicast protocol stack or using TIBCO Rendezvous as the discovery protocol.
- You only need to have TIBCO Rendezvous installed on your machine if you want to use it for multicast delivery. If you only use PGM multicast delivery then RV does not even need to be installed on the machine.
- Using TIBCO Rendezvous as the discovery protocol can give you a little more flexibility in your deployment mode (for example making remote daemon connections or leveraging RVRDs).
With Unicast discovery, all communication between Metaspace members occurs solely over TCP connections (no UDP or multicast). The best practices for unicast discovery are:
- ALL of the Metaspace members MUST use exactly the same Discovery URL string.
- If you want fault-tolerance for the discovery service, then you must specify more than one IP:PORT in the discovery URL.
- If all of the processes for all of the IP:PORTs listed in the discovery URL disappear, the Metaspace temporarily stops working (but data is not necessarily lost) until one of those processes is restarted.
In practice, to use Unicast discovery, you will designate some machines as the “servers” of the cache service. You will want to start and keep restarting at least one ActiveSpaces process if needed on each of those nodes (as a service or using Hawk, for example). These processes can be but do not have to be as-agents; they just have to connect to the Metaspace to keep it alive, regardless of whether they seed anything or not. Those processes will be the “well known” Metaspace members that you will use in the discovery URL.
Use a different listen port than the default for those “well known” processes, for example, 60000. This way you can start the specific processes you want using a Listen URL of “tcp://:60000”.
Host IP1: as-agent –listen “tcp://:60000” –discovery “tcp://IP1:60000;IP2:60000”
Host IP2: as-agent –listen “tcp://:60000” –discovery “tcp://IP1:60000;IP2:60000”
where IP1 and IP2 can be any hostname that resolves to or is an IP address.
Remote Client Connection
Remote clients are ActiveSpaces processes that connect to the metaspace indirectly through a directly connected “proxy” member of the metaspace.
- There is NO loss of functionality for the client application whether it is directly or remotely connected to the metaspace.
- If there is ANY one-way firewall or ANY Network Address Translation happening between any two machines, the ActiveSpaces processes on those machines will NOT be able to connect to each other. In this case, you have no other choice but to deploy the processes on one of the machines are remotely connecting to the processes on the other machine(s).
- Remote clients initiate a single TCP connection to the proxy member they are connecting to.
- While directly connected, members of a metaspace have a single TCP connection to every other member, which might be initiated from either end.
- A Metaspace scales to a much larger number of remotely connected members than directly connected members.
- If you have a lot of “pure client” processes that never seed on anything, consider deploying them as remote clients.
- Remotely connected clients, however, experience on average higher space operation response time than directly connected processes (although remote client throughput will not necessarily be lower).
- Although a remotely connected process can never seed on any space, the “seeded scope” is still meaningful: it is then the scope of what the proxy member the remote is connected to seeds.
- This means that Get requests from a remote client on any key that the proxy is connecting to seeds or replicates will be serviced with the lowest latency time (order of a network round trip)!
The simplest way to make a remote client connection is to use a Discovery URL in the form: “tcp://IP:Port?remote=true”
Where IP and Port are the name of the machine where a proxy is running. You can start an as-agent and specify that it provides remote connectivity by using
the -remote_listen parameter:
On host IP1 enter:
as-agent –remote_listen “tcp://:55555”
and then use the Discovery URL “tcp://IP:55555?remote=true” on any application.