TIBCO Activespaces – Best Practices – 3. Space Definition

Space Definition

Schema Considerations

To get the fastest speed out of ActiveSpaces you might need to tailor the way you store information in it depending on how you are going to access it.

  • The fastest and most scalable way to access data in ActiveSpaces is through the use of the key operations: put/get/take.
    • When reading a record, get is always be faster than creating a browser and doing a query.
  • You can store structured data, objects or documents in serialized format into blob fields
    • You can use a Tuple object as a map of fields, and then store the serialized Tuple in a blob field; Tuple serialization is very fast and optimized.
    • Some very large objects (for example: XML documents, JSON documents, and so on), can be dramatically reduced in size by using standard compression techniques. You can fit many more of these objects into memory if you store them in compressed format.
  • If you have a Java or .NET object that you want to store in ActiveSpaces:
    • You can implement your own very fast serialization to Tuple routine:
      • Implement a toTuple() method and a constructor from a Tuple, and map your variables to fields
      • You can serialize non-primitive variables to blob types or use serialized tuples into blob fields
    • You can also do queries, by using the browser/listener calls, but you can only query on one space at a time.
      • Storing data in a serialized blob does not necessarily mean you cannot query it:
        • Just extract from those structures the values that you are going to be querying on and store them as fields in the Tuple as well as contained in the serialized object.
        • When implementing the toTuple() serialization mentioned above, you can then implement any “value extractor” you want to extract values you may want to later query on.
        • If data is exposed as top level Tuple fields, you can then have indexes on those extracted fields.
      • All fields (including key fields) can be nullable.

Indexing

The ActiveSpaces query engine can leverage indexes to speed up query processing. On larger datasets indexes can improve response times significantly.

  • Index facts:
    • You can define any number of indexes you want.
    • Indexes can be on a single field or on multiple fields (composite).
    • You can have more than one index per field.
  • However indexes are not free:
    • Indexes will cost you between 400 and 800 bytes of memory per record being indexed.
    • Indexes must be maintained, and they result in increased CPU usage when processing write operations.
  • There is always an index defined on the key fields; by default, it is a HASH index, but you can change it to TREE.
    • For example, if you have queries on a subset of the key fields, a key field index of type TREE could be leveraged for some of those queries.
  • You can get details about how the query was planned and whether indexes were used by the seeders or not using the Browser’s getQueryLog()
  • HASH indexes have the following advantages and disadvantages:
    • They are fastest for equality tests (for example, “field = value”).
    • Slightly less memory is required per entry indexed, but almost all of the entries will have their own index entry.
    • The same amount of memory is required per entry regardless of the number of fields in the index.
  • TREE indexes have the following advantages and disadvantages:
    • They are fastest for range tests (for example, “field > value”).
      • However, TREE indexes are still leveraged to accelerate equality tests as well: if you are going to do both range and equality testing in your queries you only need to define a single TREE index in most cases.
    • More memory is required per entry, but if the value set has some structure to it (for example continuous ranges) the TREE index can end up being more memory efficient than HASH.
    • The amount of memory increases with the number of fields composing the index.
    • Composed TREE indexes can often also be leveraged for tests on individual or subsets of the fields composing the index.
      • For example, a TREE index on fields “A,B” could also be leveraged for “A>value” or “B>value”, or “A=value”, or “B=value” and combinations thereof.
    • The order of the list of fields in a composed TREE index can have an influence on performance (notably write operations).
      • To be more efficient, if you have a TREE index on fields A and B and you know that the number of distinct values of “A” > the number of distinct values of “B, ”then you should define the index fields in the order “A”,”B” rather than the other way around.

Space Attributes

  • See the “Fault-tolerance and Persistence” section for information on the replication and persistence attributes.
  • Min Seeders. If the number of seeders for the space goes below this threshold then the space stops being in READY state, and all operations on the space are suspended.
    • If you are using shared-nothing persistence, you may want to use this attribute to control when applications will not be able to make any more modifications to the data that could then not be recovered.

For example if you have four seeders with a degree of replication of 1, set min seeders to 3.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s