TIBCO Activespaces – Best Practices –10. Physical Considerations

Physical Considerations

Network considerations

  • In general, just as with many other distributed systems, the network can have a large influence on performance: in many use cases and deployment scenarios there is a very good chance that a Space operation will need to wait for a network round trip time to elapse before it completes. Therefore, the  faster the network between the ActiveSpaces nodes, the lower the latency of each space operation.
  • There can be no IP network address translation or one-way firewall between any two directly connected members of a Metaspace.
    • If one of those conditions exists, the process can still connect to the metaspace by making a remote client connection.
  • If multicast discovery is used and not all the members of the metaspace are attached to the same subnet, then either multicast routing must be enabled between the subnets or TCP discovery must be used.
  • Unless the link is both low latency and redundant (for example, it is a MAN link), it is not advisable (although it is technically possible) to extend a single Metaspace between two separate sites linked by a WAN link.
    • You can instead have a Metaspace as each site and write a small application to listen to changes on one site and replicate them on the other.

Physical versus Virtual Considerations

Although running in virtual environment is certainly supported, it is supported only so long as the following conditions are met:

  • No “snapshots” or any other kind of operation that suspends or pauses (or moves) a virtual machine are run on the virtual machine.
  • Multicast discovery is NOT used

However , it is a best practice to use of physical hosts rather than virtual hosts when possible (at least for seeders). This is because ActiveSpaces does not benefit from being deployed in a virtual environment, because it already has its own built-in virtualization mechanism (or rather provides a virtual data store functionality) and deploying multiple virtual machines each with ActiveSpaces processes on a single physical server would:

  • Potentially degrade the fault tolerance of the replication mechanism (if the physical machine goes down, all of the virtual machines it contains go down at the same time).
  • Probably results in more of the physical CPU being used for overhead of the virtualization environment without providing any functional advantage (ActiveSpaces already pools together the RAM and CPUs of those virtual machines together).

Also remember that in many cases network bandwidth to all of the seeders is going to be the overall limiting factor, and because multiple virtual machines most of the time have to share a physical interface with other virtual machines. Therefore you would be getting more overall bandwidth using many small physical machines each with their own network interfaces than by using a large physical host with less physical network interfaces than virtual machines. And even when the number of physical interfaces matches the number of virtual machines, you still have the overhead of the virtualization layer’s internal (software) network switch.

Sizing Guidelines

Note: These values and sizing guidelines are valid for version 2.0.2 of ActiveSpaces; they have changed in the past and will change again in the future depending on the version of ActiveSpaces.

Sizing of ActiveSpaces is along two independent axes: the amount of memory required to  store the records and the indexes, and the number of CPU cores required to support a certain number of operations per second.

  • For number of operations per second, a (very conservative) estimate is to expect 10,000 “per key operations” (i.e., put, get, take, lock, unlock, and so on –not browser creation of remote code invocation) per second per “seeder core” on a single space.
  • For memory sizing, the amount or RAM required by a seeder to store a single record comes from two sources:
    • There is around 400 bytes per record of internal overhead (including the key field index that is automatically created for each space) pre record stored in the space.
    • The amount of space it takes to store the fields of each record and associated overhead, according to the following list:
      • Short 2 bytes
      • Int   4 bytes
      • Long 8 bytes
      • Float 4 bytes
      • Double 8 bytes
      • String 8 bytes + UTF-8 encoded string (1 byte
        per char for basic ASCII) + 1 byte for null
        terminator
      • Blob 8 bytes + byte length
      • Datetime 9 bytes
    • The replication factor
    • A 30% to 50% factor to account for runtime memory requirements (buffers and internal metaspace related data structure, “headroom” to account for some possible memory fragmentation when data is constantly updated)

This results in the following formula (e.g. using 42% headroom factor):

Memory_of_record_in_bytes =(400 bytes + payload_bytes) x (1 + replication_count) x 1.42

  • If you are using indexes (beyond the key fields index) you also need to account for a per-entry overhead (also around 400 bytes , on average (the exact size being almost impossible to calculate as it depends on the type of index and the distribution of the data and total number being indexed).
    • For example, the memory requirements for HASH index types do not grow exactly linearly, because hash tables grow by doubling their size as needed.

Conversely a TREE index’s memory requirements grow in part depending on the depth of the tree; and, some data sets generate trees of different depth depending on how the possible key space is used (for example, if a

Advertisements

TIBCO Activespaces – Best Practices – 9. Persistence

Persistence

Shared-all Persistence

  • You can register more than one instance of the implementation of the shared-all interface on the same space.
    • When more than one instance of the implementation of the shared-all persistence interface is registered, then onWrite and onRead operations are distributed amongst those instances.
      • The most efficient deployment mode is when every seeder on the space registers an implementation of the shared-all persistence interface. In this case, the instance registered by a seeder will handle the onWrite and onRead events for the keys that the seeder seeds.
    • Regardless of the number of persistence instances registered in a space, in shared-all persistence mode only one of those instances receives the onLoad event.

Shared-nothing Persistence

There is no need to implement any interface in order to use shared-nothing persistence, it is built-in to ActiveSpaces.

  • When using shared-nothing persistence you MUST ensure that all of the seeders on the space have a unique member name and that this member name remains the same between instantiations of the seeder(s).
  • In shared-nothing mode of persistence the re-loading of the data is distributed (unlike with shared-all persistence), but you must make sure that enough seeders (and the seeders that were the last members of the space when it went down) are connected to the metaspace before triggering recovery if you do not want to miss any persisted data being recovered.

Usage Patterns

Storing “Lookup Data”

You can use ActiveSpaces to store “lookup data’,” typically data that does not change very often and is not very large (such that a copy of the whole data set can fit in the memory of each node that needs access to it) but that you need to be able to read (on any key) as fast as possible.

  • In this case, you can use ActiveSpaces to store the data into a space with a replication degree of “ALL’, and make the applications that need access to this data be seeders on the space.
  • Because a seeder can service read request not only for seeded entries but also replicated entries at “in-process speed,”  a read request on any key in the space is serviced very fast.

Storing Documents

While ActiveSpaces stores tuples and tuples are a flat container, fields you can still use ActiveSpaces to store structured data.

  • For example, you can use a space as a map of objects in Java or .NET by creating a space with the key and value fields being both of the Blob type and storing serialized versions of the objects in those fields.
  • A Tuple object itself comes with very fast and efficient serialization and deserialization methods and can also be serialized and stored into a Blob field of another tuple.

For unstructured data (for example, XML or JSON documents) you can also store (and then query them) with ActiveSpaces as long as you are willing to write the bit of code required to extract from the document the key field(s) as well as the field(s) you want to query on (and have indexes built on).

  • You then store those key and indexable fields as separate fields in the tuple, followed by the whole document stored in a single String or Blob field.
  • You could also use TIBCO BusinesWorks to not just do this extracting of the key and indexed fields from the document, but also to implement a SOAP or REST interface to the applications wanting to store and query those documents in ActiveSpaces.
  • It would also be very efficient to store the document itself not as one very large String but to first compress it (using something like Zlib for example, which is known to compress XML/JSON documents very well) into a byte array and then store this compressed data into a Blob field.
    • This would result not only in much reduced memory requirements (unless you need to query every single field in the document, it is likely to allow you to be able to store more data than you have RAM available), but also higher throughput and lower latency of the operations since less time is spent serializing and sending data over the network when it is compressed.

Workload Distribution

There are multiple ways you can use ActiveSpaces to distribute workload, either as a workload distribution and process synchronization mechanism, or using it more as a distributed computation grid.

Space as a Queue

  • You can very simply use a space as a distributed (but unordered) queue by having any number of applications create “take browsers” on the space.
    • ActiveSpaces automatically takes care of all race conditions and corner cases such that a particular tuple is only “taken” (consumed) by one of those browsers. This provides you with “demand driven distribution” of the consumption of the tuples stored in the space (just like a “shared queue’)
    • You can even make this consumption more “fail-safe” by using a “lock browser” rather than a “take browser”: lock rather than consume the tuple, and if (and only if) the processing of the tuple is successful then do a take with option unlock to consume it.
  • Using “take browsers” to consume data from a Space will provide “demand-based distribution” if one of the consuming processes is faster than the others, because it will call “next()”more often on its take browser it will automatically consume more entries than the others.
  • Take a look at the request/reply example provided with ActiveSpaces to see how to use a space as a queue for distribution of requests.

Remote Invocation

You can use remote invocation to either direct an invocation request to a single process, or to trigger this invocation in parallel on a set of processes.

Remotely invoked classes can be passed a “context tuple” as well as return a “result tuple” back to the invocating code, those tuples are “free form’, meaning that they do not need to conform to any specific schema and can contain any fields (of any type) the programmers want.

Remote invocation comes in two forms:

  • Directed invocation
  • Broadcasted invocation

Directed Invocation

Directed invocation can be either to a specific metaspace member, or leveraging ActiveSpaces’ distribution algorithm, to one of the space’s seeders according to the value(s) of the key fields:

  • You can use Space and Metaspace membership listeners to keep a list of Space or Metaspace members in your application and then invoke classes directly on any member.
  • You can distribute invocations amongst the seeders of a space by invoking on a key tuple on that space.
  • ActiveSpaces then uses its internal distribution algorithm to direct the request to the seeder that seeds the key in question.
    • In the invoked code, you are guaranteed that a read on the key being invoked on will be local to the seeder.
      • You can use directed invocation to do “in-place data updating.” For example, if you want to increment a single field in a large tuple, rather than getting the whole tuple from the Space, incrementing the field, and doing a put of the whole tuple, it can be more efficient to invoke an “increment” class on the key, which means that the getting of the record, incrementation of the field, and subsequent updating of the record all happen locally on the seeder for the record in question.
      • The invocation happens regardless of data being actually stored in the space for the key being used.
    • You can even define a space on which you store no data at all, and which is used only to distribute invocation requests amongst the seeders of the space. In this case you only need to define the key field(s) that you will use as your distribution field(s).

Broadcasted Invocation (and Map/reduce Style Processing)

Broadcasted invocation triggers the execution of a class on multiple space members at the same time, the invocation can be sent to all the members of a Space or to all the Seeders of a Space, or even to all remotely connected client applications.

  • Broadcasted invocation does not require (or make use of) IP broadcast( or muliticast) packets over the network. While the invocation request is effectively “broadcasted” to a number of hosts, this “broadcasting” happens over the individual TCP connections established between those members.
  • Unlike directed invocation, which returns a single result tuple, broadcasted invocations return a collection of results (one per invoked member), each result possibly containing a result tuple.

Broadcasted invocation can be used to do fast parallelized processing of large data sets (map/reduce style processing).

  • In this case “large data sets” means “all or most of the records stored in the space’.

Let’s look at a practical example: calculating the average value of a field in all (or most) of the records stored in a Space:

  • Create a class implementing the MemberInvocable interface; in this class create a browser on the space (with a filter if needed) with a distribution scope of “SEEDED.”
    • Make sure the class is in the path of all the seeders for the Space.
  • Use this browser to iterate through all the (matching) records and to calculate the average value of the field.
  • Return the average value (and number of matching records).
    • This invocable class is effectively your “map” function.
  • On the invoking side, use InvokeSeeders to trigger the execution of this class in parallel on all the seeders of the Space. And compute an average value from the partial average values (and number of matching records) returned by the invocation.
    • This final calculation that happens on the invoker is effectively your “reduce” function.

This method is the fastest way to compute the average because:

  • The processing is distributed, with each seeder doing part of the processing.
    • There is no need for the programmer to worry about how many processes seed on the space.
  • The processing at each seeder will be fast because it happens only on “local data’.
    • When creating a browser of distribution scope “SEEDED’, the browser will iterate only over records that are stored in the process itself (and therefore very fast).

Distributed process synchronization

Broadcasted invocation can also be used for synchronizing distributed processing:

  • You can take a look at the ASPerf example to see how it is using a space (the “control space”) not to store any data but only to synchronize processing amongst a number of processes.

Locking of keys on a space can also be used to synchronize distributed processing:

  • Since the scope of a lock (i.e., the “owner” of a lock) is a thread, you can use locking on a space to synchronize processing not only between threads in a process but also between processes (and by extension between threads on different process).
  • Locking of a key in a space is not linked to data being stored in the space at that key (other than protecting against modification of the data): you can lock a key even if there is no record stored in the space for the key.
    • You can use a space not to store any data but only for locking and unlocking keys in order to synchronize distributed processing between threads and processes.

TIBCO Activespaces – Best Practices – 8. Seeding and Leeching

Seeding and Leeching

  • Remember that seeding and leeching is a “per space” state: a process can seed on one space and leech on another.
  • Also remember that a leech is not limited in functionality compared to a seeder, it can make use of all of the Space operations.
  • A Seeder contributes some of its resources to the scalability of the space, and as a “side effect” of being a seeder can experience faster response time on operations on the space(s) it seeds on:
  • If the key value is either seeded or replicated, then any operation to read it will be faster because it will not involve interacting with another seeder process (and therefore no network round trip is required).
  • If the entry is seeded, any operation to modify the entry (insert, update, delete) is faster.
    • If the replication degree for the space is greater than zero, the operation will involve sending the replication request to another seeder process.
    • If there is synchronous replication, however, the operation will not return until it has been replicated, meaning a replication request is sent to the replicating seeder(s) and an acknowledgement is received.
  • Every time a seeder joins or leaves, a space it triggers redistribution (in the case of a join) or re-replication (in the case of a leave) of some of the entries in the space.
    • If a process joins a space as a seeder, it should be relatively stable over time (i.e., should not join and leave the space repeatedly).
    • To minimize the impact of redistribution (or re-replication) on ongoing operations on the space, redistribution occurs as a background operation, which means that it can take some time to complete.

TIBCO Activespaces – Best Practices – 7. API Considerations

API Considerations

Threading

Many ActiveSpaces API objects are thread safe (meaning that multiple threads can use them at the same time)

  • The Metaspace object is thread safe:
    • Metaspace connection is actually a process-wide resource.
  • Use getMetaspace() to get more copies of the metaspace object as needed.
  • Space objects are thread safe:
  • Multiple threads can use the same space object at the same time.
  • Operations by multiple threads on the same Space object are parallelized. This is a very efficient way to increase throughput!
    • Browser objects (used for queries and iteration) are thread safe: you can have more than one thread calling the next() method of a single browser object

Java/.NET Specific Things to Remember

  • Remember to always stop your browsers when you are finished with them.
  • There are resources associated with a browser (possibly including a new thread) that will not be garbage collected if you don’t call stop().
  • Since you have to try/catch creating and using the browser, the final statement, for example, would be a good place to do it.
  • Remember to keep a reference to your space object, if it gets garbage collected, your listeners and browsers will stop working!
  • The best way to do this is to remember call close() at the end of the part of the code for each corresponding getSpace() calls.
  • Multiple calls to getSpace() will return reference counted copies of the Space object that is created by the first getSpace() (or browse() or metaspace.listen()) call, those copies could be garbage collected at different times and therefore cause a process to go from seeder to leach on the space.

C-Specific Things to Remember

  • You can pass NULL as the oldValue parameter to your put/putEx calls
  • You can pass NULL as a filter string.

Shared-Nothing persistence

If you use shared-nothing persistence, you MUST ensure that your seeders all have a specified member name, and that this member name is consistent from one instance of the seeder to another over time.

Sizing Guidelines

Virtual Environments

While ActiveSpaces does support deployments in virtual environments, it does so only when the following conditions are met:

  • TCP discovery only:
    • Some virtualized environments do not support multicasting (notably public clouds).
    • Even in virtual environments supporting multicasting, our experience is that those environments do not do a very good job of it (lots of packets drops).
  • No VMWare snapshots:
    • “Snapshots” in VMWare can result in TCP connections being dropped (and if the snapshots last long enough in timeouts), which means processes being kicked from the metaspace when the snapshot is happening.

TIBCO Activespaces – Best Practices – 6. Space Operations

Space Operations

Operations on Keys

You can use options in some space operations in order to do two things:

  • Optimize response times and resource usage by:
  • Combining operations
    • You can do a put plus a lock or unlock operation in a single call and possibly in half the time if you do a put with a lock or unlock option set.
    • These two operations are executed atomically as well.
    • For example, doing a transactional update on a record is usually implemented by a lock followed by a get, followed by a put and an unlock: this can be done with ActiveSpaces in two operations:
      • A lock (without the forget option set) returns the tuple that is stored at the key that gets locked.
      • A put with the unlock option set (and the forget option set because you do not care to receive the old value you are updating back from the space) does the update and clears the lock.
    • Indicating that you do not care to capture what the space operation would normally return
      • A put operation normally returns the record that was overwritten (if any) from the space. Set the forget option for your put to save some resources by indicating that you do not care to receive this old value back.
      • To clear a record in a space, use a take operation with the forget option set.
      • A lock operation normally returns the record stored in the space at the key that was locked (essentially a get and lock). Set the forget option if you not care to get this value from the space.
  • Override some space definition settings for the record being written in the space or the operation:
  • Set a specific TTL for a record being put in the space
  • LockWait time for the put, lock , unlock

Batch and Asynchronous Operations

Space operations are by default execute synchronously:

  • If your application is a leech, this means that read and write operations will always require at least one network round trip of latency before returning.
  • If your application is a seeder, the greater the number of seeders on the space the greater the chance that a write operation will require a network round trip to execute.
  • At current network speeds,  the network round trip time can make up for 90% of the latency of execution of an operation
  • Even the “loopback interface” can experience large variations in network latency and can sometimes average to a time equivalent to that of 1 Gigabit Ethernet networks (though the loopback interface is usually pretty “high bandwidth’).
  • The network round trip is therefore usually what has the greatest effect on the latency of each space call returning, and if the application is going back-to-back individual space operations a direct impact on its throughput (the number of operations per second that the space can execute).

You can get higher throughput by parallelizing operations over the space(s) using one of the following methods:

  • Batch operations (putAll, getAll,  takeAll, and so on) should be used when you need to execute more than one operation of the same type on a single space.
    • For example, rather than doing “puts in a loop,” use putAll.
    • Remember that operations in a batch are executed in no particular order!
  • Asynchronous operations return as soon as the request is sent over the network, without waiting for an answer or acknowledgement from the node the request was sent to.
  • You can use asynchronous operations to parallelize a bunch of operations on a bunch of spaces.
  • Consider using asynchronous operations when doing “writes,” since your application logic may not care about the returned values from write operations.
  • You can combine asynchronous operations with batch operations (doing asynchronous batch operations).

Browsers and Listeners

Browser prefetch can have a very large impact on performance: use the largest prefetch value (up to “-1”which means prefetch everything) you can afford in order to get the best performance out of your browsers.

It is better (more efficient) to register a single listener that has handlers for all types of events than to register multiple listeners each having only a handler for one kind of events.

Get, take, lock, unlock and invoke

Take and lock:

  • The take and lock operations also do an implicit “get ”since they both return the tuple (if any) stored at the key that was just cleared or locked
  • This makes the take operation a “consume” operation: if two processes try to “take” the same key at the same time you have a guarantee that only one of them will be able to take the tuple.
  • Remember you can create “take browsers” and “lock browsers” as well as the default “get browsers’.
  • When a process leaves (gracefully or not) the metaspace, all the locks it had created are automatically cleared.

Lock Scope:

  • If a space has a lock scope of process, all threads in the process share the lock.
  • However, be careful that the scope of transactions remains a thread; therefore any writes in a pending transaction would mean that an attempt to unlock any of the keys in question could fail, or block until the transaction commits or rolls back, depending on the space’s LockWait.
    • You can control this behavior by using a LockWait option with the unlock operation.

Lock Wait:

  • If you assume your locks will be transient and don’t want to have to implement retry logic in your code, you can leverage the space’s LockWait attribute to have locking operations automatically block until they can acquire the lock.
  • However, if you combine a LockWait value of “forever” with a LockTTL value of “forever,” you might create deadlock situations,

Transactions

  • Transactions can span as many spaces as you want, but are limited to a single Metaspace.
  • When a process leaves (gracefully or not) the Metaspace, any of its pending transactions are automatically rolled back.

Space operations within a transaction are effectively prepared when the operation is invoked: if all your space operations invoked after you called beginTransaction executed without an error then you can safely call Commit, otherwise you should call Rollback.

You can adjust the scope of the transaction from “thread” to “process” and you can pass a transaction context from one thread to another.

TIBCO Activespaces – Best Practices – 5. Joining as a Seeder or a Leech?

Joining as a Seeder or a Leech?

Things to remember:

  • There is no change in functionality regardless of whether your application seeds or acts as a leech on a space.
  • However, there is an impact on the resource usage of your application: a seeder contributes memory to store records and CPU cycles to service requests to read/write/consume/process those records.
  • Seeding or leeching is a per-space state; an application can be a seeder on some spaces and a leech on others.
  • There is, however,  a possible change in latency of space operations experienced by an application that is a leech on the space (on the order of one “network round trip” for every operation on the space) and an application that seeds on the space (some operations will be completed without incurring a “network round trip” as “in-process speeds’).
  • Leveraging this “side effect” of a seeding application being able to perform some space operations at “in-process speed” is key to getting high throughput in some scenarios.
    • In some cases (with a small dataset, many more reads than writes) you can leverage the replication degree of ALL (where all the records are replicated on all of the seeders) to ensure that reads on any key are always serviced at the lowest latency of “in-process speed.” For example, you can do this for lookup data or for some types of joins.
  • When a seeder joins or leaves a space, this triggers redistribution and re-replication of some of the records:
  • ActiveSpaces is designed to minimize the number of records that need to be redistributed and re-replicated through its use of the Consistent Hashing Algorithm for distribution of the records:
    • Redistribution and re-replication happens “in the background” and is spread over time to minimize its impact:
      • There is no interruption in service of reads (or writes) during redistribution and re-replication.
      • There is minimal impact in the latency of ongoing operations during redistribution and re-replication.
      • Because of this, redistribution and re-replication will not happen instantly but rather over time.
    • However, even though the impact of redistribution and re-replication is minimized, it is still there:
      • CPU and Network utilization are increased during re-distribution and re-replication.
      • While the impact is minimized, it is still there: some of the operations might experience a latency increase of a few milliseconds while redistribution and re-replication is happening.
    • It is therefore recommended to keep the number of seeders of a space relatively stable over time:
      • If an application is only intermittently connected to the metaspace it is probably not a good candidate for being a seeder.
  • When a process is a seeder on a space, it is very easy to perform queries or iterate just over the subset of the records that are seeded by the process
  • These queries and iterations are very fast (and scale very well) as they happen only locally and at “in-process speed’.
    • Simply specify DISTRIBUTION_SCOPE_SEEDED in your BrowserDef or ListenerDef; there is no need to change anything else in the code.
      • If the application is connected as a remote client, DISTRIBUTION_SCOPE_SEEDED means the seeding scope of the proxy metaspace member the application is connected to.
    • Consider leveraging SCOPE_SEEDED to distribute processing over all or most of the records in the space (just like a Map of Map/Reduce) using remote invocation to trigger execution of your code in parallel on all of the seeders of the space.
  • Creating a Listener or an EventBrowser with DISTRIBUTION_SCOPE_SEEDED means you only receive events about changes to the records the process seeds.
    • In that case you can get two extra event types:
      • onSeed: because of redistribution (seeder leaving) you are now seeding the record.
      • onUnseed: because of redistribution (seeder joining) you are no longer seeding the record.
      • In this case, onPut events are also an implied onSeed, and onTake events are an implied onUnseed.
      • You can use those events to keep track of which records you seed in your application logic.
    • You can use this to create “event-driven map/reduce” style processing:
      • Rather than the processing being batch oriented and triggered by some process using remote invocation, the processing of each record when it is created or updated (or even consumed) is triggered automatically.

TIBCO Activespaces – Best Practices – 4. Fault-tolerance and Persistence

Fault-tolerance and Persistence

Replication is what provides you with fault-tolerance

If you are a seeder or replicator of the key for a get operation, it will be serviced at the very low latency “in-process” speed.

  • If the size of the dataset is small, you can even ensure that reads on ALL keys will be at in-process speed if you set the replication count of the space to ALL.
  • Replicate ALL (the numeric value used in configuration for REPLICATE_ALL is -1 ) is in general very useful for smaller data sets where data is read a lot more often than it is modified:
    • lookup data
    • joins
    • and so on
  • With asynchronous replication, the write request is sent to the seeder for the key (probably another process), and then synchronously acknowledged back to the process issuing the write while simultaneously being sent to the replicators asynchronously.
  • With synchronous replication, the write request is sent to the seeder and all the replicators with acknowledgement back to the process issuing the write.
    • Synchronous replication does NOT mean that at the network level a single node sends the replication requests to all of the other nodes, because replication happens in a more distributed way in ActiveSpaces.
  • Persistence is what provides you with the ability to shut down all of the seeders or the entire Metaspace and then recover the data that was stored in the space when the system went down:
    • Shared-nothing persistence typically scales better than shared all persistence.
    • Shared-nothing persistence extends the replication degree of the space:
      • A shared-nothing persisted space with a replication degree of one means you won’t lose any data even if you lose all of the persistence files from one of the seeders.
      • However, there is no “automated recovery” of spaces in shared-nothing persistence. You must ensure that you have started enough of your processes doing seeding on the space before administratively triggering the recovery.
      • Also pay attention to the order in which the persisters were brought down if some write operations occurred after some of the persisters went down.

TIBCO Activespaces – Best Practices – 3. Space Definition

Space Definition

Schema Considerations

To get the fastest speed out of ActiveSpaces you might need to tailor the way you store information in it depending on how you are going to access it.

  • The fastest and most scalable way to access data in ActiveSpaces is through the use of the key operations: put/get/take.
    • When reading a record, get is always be faster than creating a browser and doing a query.
  • You can store structured data, objects or documents in serialized format into blob fields
    • You can use a Tuple object as a map of fields, and then store the serialized Tuple in a blob field; Tuple serialization is very fast and optimized.
    • Some very large objects (for example: XML documents, JSON documents, and so on), can be dramatically reduced in size by using standard compression techniques. You can fit many more of these objects into memory if you store them in compressed format.
  • If you have a Java or .NET object that you want to store in ActiveSpaces:
    • You can implement your own very fast serialization to Tuple routine:
      • Implement a toTuple() method and a constructor from a Tuple, and map your variables to fields
      • You can serialize non-primitive variables to blob types or use serialized tuples into blob fields
    • You can also do queries, by using the browser/listener calls, but you can only query on one space at a time.
      • Storing data in a serialized blob does not necessarily mean you cannot query it:
        • Just extract from those structures the values that you are going to be querying on and store them as fields in the Tuple as well as contained in the serialized object.
        • When implementing the toTuple() serialization mentioned above, you can then implement any “value extractor” you want to extract values you may want to later query on.
        • If data is exposed as top level Tuple fields, you can then have indexes on those extracted fields.
      • All fields (including key fields) can be nullable.

Indexing

The ActiveSpaces query engine can leverage indexes to speed up query processing. On larger datasets indexes can improve response times significantly.

  • Index facts:
    • You can define any number of indexes you want.
    • Indexes can be on a single field or on multiple fields (composite).
    • You can have more than one index per field.
  • However indexes are not free:
    • Indexes will cost you between 400 and 800 bytes of memory per record being indexed.
    • Indexes must be maintained, and they result in increased CPU usage when processing write operations.
  • There is always an index defined on the key fields; by default, it is a HASH index, but you can change it to TREE.
    • For example, if you have queries on a subset of the key fields, a key field index of type TREE could be leveraged for some of those queries.
  • You can get details about how the query was planned and whether indexes were used by the seeders or not using the Browser’s getQueryLog()
  • HASH indexes have the following advantages and disadvantages:
    • They are fastest for equality tests (for example, “field = value”).
    • Slightly less memory is required per entry indexed, but almost all of the entries will have their own index entry.
    • The same amount of memory is required per entry regardless of the number of fields in the index.
  • TREE indexes have the following advantages and disadvantages:
    • They are fastest for range tests (for example, “field > value”).
      • However, TREE indexes are still leveraged to accelerate equality tests as well: if you are going to do both range and equality testing in your queries you only need to define a single TREE index in most cases.
    • More memory is required per entry, but if the value set has some structure to it (for example continuous ranges) the TREE index can end up being more memory efficient than HASH.
    • The amount of memory increases with the number of fields composing the index.
    • Composed TREE indexes can often also be leveraged for tests on individual or subsets of the fields composing the index.
      • For example, a TREE index on fields “A,B” could also be leveraged for “A>value” or “B>value”, or “A=value”, or “B=value” and combinations thereof.
    • The order of the list of fields in a composed TREE index can have an influence on performance (notably write operations).
      • To be more efficient, if you have a TREE index on fields A and B and you know that the number of distinct values of “A” > the number of distinct values of “B, ”then you should define the index fields in the order “A”,”B” rather than the other way around.

Space Attributes

  • See the “Fault-tolerance and Persistence” section for information on the replication and persistence attributes.
  • Min Seeders. If the number of seeders for the space goes below this threshold then the space stops being in READY state, and all operations on the space are suspended.
    • If you are using shared-nothing persistence, you may want to use this attribute to control when applications will not be able to make any more modifications to the data that could then not be recovered.

For example if you have four seeders with a degree of replication of 1, set min seeders to 3.

TIBCO Activespaces – Best Practices – 2. Discovery URL

Discovery URL

“Discovery” is used in the initial phase of a process’s connection to a Metaspace, to discover which other nodes are already member of the metaspace and establish connections to them. Discovery and connection are Metaspace level operations (separated from joining and leaving spaces).

  • A single process can only be connected once to a specific Metaspace. This means that if any thread of a process has already called cout for a metaspace name, then no further calls to connect for the same metaspace name can be made (however,  a process can have simultaneous connections to different Metaspaces).
  • Discovery and connection is not a very fast operation; it sometimes can take many seconds for a process to go through the whole discovery and connection process (depending on the type of discovery and the number of nodes already connected).
  • Because connect() cannot be called repeatedly for the Metaspace in order to get a copy of an already connected Metaspace object the programmer may use the convenience function ASCommon.getMetaspace(): this function takes a metaspace name and if the process is already connected to that Metaspace will return a copy of the Metaspace object for that connection.

Multicast versus Unicast Discovery

Discovery can be one of the following types: Unicast (TCP) or Multicast (TIBPGM or RV). In general, Multicast discovery is somewhat easier to us, because it requires less configuration (which is why it is the default) and tends to be used during the development phase, while TCP discovery requires (and offers) a bit more control through configuration, and, unlike multicasting works in all network environments and therefore tends to be used in production. But there is no hard and fast rule for which of those methods of discovery is recommended and all types of discovery mechanism are supported equally.

Some things to remember:

  • There is no loss of functionality depending on the kind of discovery that is used.
  • Only one type of discovery can be used for a particular Metaspace. But a process could use unicast discovery to connect to one Metaspace, and multicast discovery to connect to another.

Multicast Discovery

To be able to use multicast discovery, the following conditions must be met:

  • UDP packets must be able to flow bidirectionally between all the Metaspace members.
  • If the Metaspace members are on separate subnets, multicast routing must be enabled between those subnets.

When it comes to choosing between the two available choices for multicast discovery (PGM or RV):

  • There is no functional difference between using the built-in PGM reliable multicast protocol stack or using TIBCO Rendezvous as the discovery protocol.
  • You only need to have TIBCO Rendezvous installed on your machine if you want to use it for multicast delivery. If you only use PGM multicast delivery then RV does not even need to be installed on the machine.
  • Using TIBCO Rendezvous as the discovery protocol can give you a little more flexibility in your deployment mode (for example making remote daemon connections or leveraging RVRDs).

Unicast Discovery

With Unicast discovery, all communication  between Metaspace members occurs solely over TCP connections (no UDP or multicast). The best practices for unicast discovery are:

  • ALL of the Metaspace members MUST use exactly the same Discovery URL string.
  • If you want fault-tolerance for the discovery service, then you must specify more than one IP:PORT in the discovery URL.
    • If all of the processes for all of the IP:PORTs listed in the discovery URL disappear, the Metaspace temporarily stops working (but data is not necessarily lost) until one of those processes is restarted.

In practice, to use Unicast discovery, you will designate some machines as the “servers” of the cache service. You will want to start and keep restarting at least one ActiveSpaces process if needed on each of those nodes (as a service or using Hawk, for example). These processes can be but do not have to be as-agents; they just have to connect to the Metaspace to keep it alive, regardless of whether they seed anything or not. Those processes will be the “well known” Metaspace members that you will use in the discovery URL.

Use a different listen port than the default for those “well known” processes, for example, 60000. This way you can start the specific processes you want using a Listen URL of “tcp://:60000”.

Host IP1: as-agent –listen “tcp://:60000” –discovery “tcp://IP1:60000;IP2:60000”

Host IP2: as-agent –listen “tcp://:60000” –discovery “tcp://IP1:60000;IP2:60000”

where IP1 and IP2 can be any hostname that resolves to or is an IP address.

Remote Client Connection

Remote clients are ActiveSpaces processes that connect to the metaspace indirectly through a directly connected “proxy” member of the metaspace.

  • There is NO loss of functionality for the client application whether it is directly or remotely connected to the metaspace.
  • If there is ANY one-way firewall or ANY Network Address Translation happening between any two machines, the ActiveSpaces processes on those machines will NOT be able to connect to each other. In this case, you have no other choice but to deploy the processes on one of the machines are remotely connecting to the processes on the other machine(s).
    • Remote clients initiate a single TCP connection to the proxy member they are connecting to.
    • While directly connected, members of a metaspace have a single TCP connection to every other member, which might be initiated from either end.
  • A Metaspace scales to a much larger number of remotely connected members than directly connected members.
    • If you have a lot of “pure client” processes that never seed on anything, consider deploying them as remote clients.
    • Remotely connected clients, however, experience on average higher space operation response time than directly connected processes (although remote client throughput will not necessarily be lower).
  • Although a remotely connected process can never seed on any space, the “seeded scope” is still meaningful: it is then the scope of what the proxy member the remote is connected to seeds.
    • This means that Get requests from a remote client on any key that the proxy is connecting to seeds or replicates will be serviced with the lowest latency time (order of a network round trip)!

The simplest way to make a remote client connection is to use a Discovery URL in the form: “tcp://IP:Port?remote=true”

Where IP and Port are the name of the machine where a proxy is running.  You can start an as-agent and specify that it provides remote connectivity by using
the -remote_listen parameter:

On host IP1 enter:

as-agent –remote_listen “tcp://:55555”

and then use the Discovery URL “tcp://IP:55555?remote=true” on any application.

TIBCO Activespaces – Best Practices – 1. Exposing Metaspace Connection Attributes

Exposing Metaspace Connection Attributes

When creating ActiveSpaces applications, developers should make sure that they expose the metaspace connection attributes so that they can be adjusted by administrator at deployment time. The Metaspace connection attributes are contained in the MemberDef object that is passed to Metaspace.connect(). At a minimum, the following MemberDef attributes should be exposed:

  • Member name:
    • If specified, this should be unique in the Metaspace.
    • If not specified, a unique name is generated automatically
      • If you intend to use the process as a seeder on a shared-nothing persisted space you MUST specify a name for the process.
    • Metaspace name:
      • If not specified, defaults to “ms.”
    • Discovery URL:
      • If not specified, defaults to “tibpgm.”
    • Listen URL

In some cases (typically “server” applications) it is recommended to expose these attributes:

  • DataStore:
    • The directory to use to store shared-nothing persistence files. Only useful if the application will be seeding on shared-nothing persistent spaces.
  • RemoteDiscovery:
    • Only needed if you intend the process to be a remote client proxy and be able to accept and service connections from remote clients.
  • WorkerThreadCount:
    • Adjusts the size of the worker thread pool used by remote invocation. Only useful if the application will service some remote invocation requests.

Optionally, the following attributes could be exposed as well (if one wants to give the administrator the ability to adjust more “low level” settings):

  • Timeout
    • Internal ACTIVESPACES protocol timeout for operations (default is 60s)
  • RxBufferSize
    • Adjusts the amount of memory allocated as a receive buffer per TCP connection. If the metaspace will contain a large number of directly connected members you may want to adjust this value down from the default 2 MB in order to reduce memory requirements.
    • Adjusting this parameter down may help reduce memory consumption when there are a lot of (direct) members in the metaspace.
    • For example, if the application is only going to be a leech and read/write small records, it will probably not need to use a full 2 MB of buffer space.
    • The downside of making this parameter too small is increased CPU resource usage.
  • TcpThreadCount
    • Adjusts the size of the thread pool used to distribute incoming TCP data from other members. Only useful for applications that seed data.

The final attribute of the MemberDef is the Context tuple. The context tuple is provided to the application programmer to help identify classes of applications connecting to the metaspace or to spaces when they are monitored by another application using a MetaspaceMemberListener or a SpaceMemberListener, because this context tuple will be passed to the listeners inside the membership event.

TIBCO Activespaces – Basic Concept

ActiveSpaces applications are programs that use ActiveSpaces software to work collaboratively over a shared data grid.
The data grid comprises one or more tuple spaces.
An ActiveSpaces distributed application system is a set of ActiveSpaces programs that cooperate to fulfil a mission (either using the administrative CLI tool, or the ActiveSpaces API calls).
Tuples are distributed, rather than “partitioned” across seeders—members that are configured to contribute memory and processing resources to a space.
ActiveSpaces automatically redistributes tuples when seeders join and leave the space.
Unlike a horizontally partitioned database, where the allocation of items to nodes is fixed, and can only be changed through manual reconfiguration, ActiveSpaces data is automatically updated on all devices on the data grid and rebalanced transparently by using a “minimal redistribution” algorithm.
Replication allows the distribution of data replicates on different peers for fault tolerance.
ActiveSpaces’ data access optimization feature uses a replicate if one is locally available.
If a seeder suddenly fails, the replicate is immediately promoted to seeder, and the new seeder creates new replicates.

This optimizes system performance.

 

TIBCO ActiveSpaces – Create a Metaspace, Space and a Tuple

  • This video consists of the following points i have covered :-
    Create a Metaspace
    Create a Space.
    Add a tuple using Designer.
    Add a tuple using
    Also did alot of timepass which you can just observer if you come across in future or simply skip da stuffs.

TIBCO Activespaces – Prerequisites

FOR WINDOWS 32 BIT OS.

  1. Pre-requisites
    • jdk-8u25-windows-i586
    • TIB_rv_8.3.2_win_x86_vc8
  2. Binaries Used
    • TIB_activespaces_2.1.2_win_x86
    • TIB_activespaces-addon_2.1.2_win_x86
    • TIB_bwpluginactivespaces_2.0.0_win_x86
    • Spacebar-2.1-win32.win32.x86
  3. Installation Directory
    • E:\tibco
  4. Resources Tested on
    • 32 bit architecture
    • 4 cores
    • 24 GB RAM
    • 40GB HDD
  5. OS Details (Installed with administrator user)
    • Server <ip-address>
    • Administrator user password :- <password>

TIBCO_ACTIVESPACES

Do Refer to this document TIBCO Activespaces Usage and Use-cases.
Note:- Will Upload the installation Video Soon.