TIBCO Activespaces – Best Practices – 5. Joining as a Seeder or a Leech?

Joining as a Seeder or a Leech?

Things to remember:

  • There is no change in functionality regardless of whether your application seeds or acts as a leech on a space.
  • However, there is an impact on the resource usage of your application: a seeder contributes memory to store records and CPU cycles to service requests to read/write/consume/process those records.
  • Seeding or leeching is a per-space state; an application can be a seeder on some spaces and a leech on others.
  • There is, however,  a possible change in latency of space operations experienced by an application that is a leech on the space (on the order of one “network round trip” for every operation on the space) and an application that seeds on the space (some operations will be completed without incurring a “network round trip” as “in-process speeds’).
  • Leveraging this “side effect” of a seeding application being able to perform some space operations at “in-process speed” is key to getting high throughput in some scenarios.
    • In some cases (with a small dataset, many more reads than writes) you can leverage the replication degree of ALL (where all the records are replicated on all of the seeders) to ensure that reads on any key are always serviced at the lowest latency of “in-process speed.” For example, you can do this for lookup data or for some types of joins.
  • When a seeder joins or leaves a space, this triggers redistribution and re-replication of some of the records:
  • ActiveSpaces is designed to minimize the number of records that need to be redistributed and re-replicated through its use of the Consistent Hashing Algorithm for distribution of the records:
    • Redistribution and re-replication happens “in the background” and is spread over time to minimize its impact:
      • There is no interruption in service of reads (or writes) during redistribution and re-replication.
      • There is minimal impact in the latency of ongoing operations during redistribution and re-replication.
      • Because of this, redistribution and re-replication will not happen instantly but rather over time.
    • However, even though the impact of redistribution and re-replication is minimized, it is still there:
      • CPU and Network utilization are increased during re-distribution and re-replication.
      • While the impact is minimized, it is still there: some of the operations might experience a latency increase of a few milliseconds while redistribution and re-replication is happening.
    • It is therefore recommended to keep the number of seeders of a space relatively stable over time:
      • If an application is only intermittently connected to the metaspace it is probably not a good candidate for being a seeder.
  • When a process is a seeder on a space, it is very easy to perform queries or iterate just over the subset of the records that are seeded by the process
  • These queries and iterations are very fast (and scale very well) as they happen only locally and at “in-process speed’.
    • Simply specify DISTRIBUTION_SCOPE_SEEDED in your BrowserDef or ListenerDef; there is no need to change anything else in the code.
      • If the application is connected as a remote client, DISTRIBUTION_SCOPE_SEEDED means the seeding scope of the proxy metaspace member the application is connected to.
    • Consider leveraging SCOPE_SEEDED to distribute processing over all or most of the records in the space (just like a Map of Map/Reduce) using remote invocation to trigger execution of your code in parallel on all of the seeders of the space.
  • Creating a Listener or an EventBrowser with DISTRIBUTION_SCOPE_SEEDED means you only receive events about changes to the records the process seeds.
    • In that case you can get two extra event types:
      • onSeed: because of redistribution (seeder leaving) you are now seeding the record.
      • onUnseed: because of redistribution (seeder joining) you are no longer seeding the record.
      • In this case, onPut events are also an implied onSeed, and onTake events are an implied onUnseed.
      • You can use those events to keep track of which records you seed in your application logic.
    • You can use this to create “event-driven map/reduce” style processing:
      • Rather than the processing being batch oriented and triggered by some process using remote invocation, the processing of each record when it is created or updated (or even consumed) is triggered automatically.
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s