| By Peter Holditch | Article Rating: |
|
| November 6, 2003 03:24 PM EST | Reads: |
10,262 |
This month's article is again inspired by an interesting design discussion posted on the weblogic.developer.transaction newsgroup. (Ever get the feeling I'm running short of inspiration? Ideas for new articles always welcome!)
Since the problem described is a common one with transactional design I thought it might be valuable to review the design, the problems with it, and some solutions.
The problem was stated on the newsgroup thus:
We have a Session Bean method (with a
"Required" TX attribute) that creates an entity
Bean, and then fires a JMS message that indicates
that it was created. There is an MDB that listens
for this message. When it hears it, it looks up the
entity bean.
The problem is that sometimes the lookup of this
entity bean will throw an ObjectNotFoundException.
We have ensured that the JMS message firing
uses the transaction context of the method, so that
the creation of the entity bean and the firing of the
message all takes place within the same transaction
(we did this by using the "javax.jms.TopicConnectionFactory", and using a JMS session that was not transacted). Also, we have verified that the
entity that gets created exists in the database (at least it does sometime after the lookup by the MDB fails).
So, what's going on here? The creation of the entity bean and the sending of the JMS message are in the same transaction, and we know therefore that the message will not be dispatched until the transaction is committed, so why can't the logic in the MDB see the new entity bean? The transaction manager is broken, right?
Well, no. In order to understand this situation, you need to take a step back and think about the implementation the transaction manager does. From the 10,000 ft level, things should be working: the devil must be in the detail... Let's go diving!
Let's Dive for the Devil
A transaction encompasses the entity creation
and the sending of the JMS message, so they will
complete as an atomic unit - either message sent
and entity created or total failure, that's what the
transaction manager is giving us. However, from
an implementation perspective, we need to look
more closely at exactly when the transaction is
complete. It can't be when the application (or the
EJB container) calls commit - we know that this
just initiates a set of dialogues between the transaction
manager and the resource managers,
which is bound to take some time. The completion
will happen some time later, when these dialogues
are done. Diving even deeper, you may
recall that these dialogues fall into two categories
- the two phases of the transaction (it's called
two-phase commit, after all) looking at the xa
specification. You'll find that once a resource
manager has replied affirmatively to a prepare, it
is undertaking to guarantee to make whatever
updates were in the scope of the transaction at
some time in the future. Now we're getting somewhere
- we've found a period of time over which
things will be happening behind the scenes;
maybe these asynchronous things are causing
our problem. From a high-level perspective,
given the xa guarantee, the transaction can be
assumed to be complete once the prepare calls
have all succeeded. From an implementation
level, until the commit calls are processed by all
the resource managers we cannot be certain that
we will be able to access the updated database
state, and we have no way of knowing exactly
when these commits will happen - commit processing
is going on in the background and the
time it takes to perform a commit will vary
depending on factors such as system load,
resource-manager locality, the order the transaction
manager sent out commit messages in, and
so on. (This ignores completely the possibility of
failure; imagine the database manager crashing
after a prepare. The commit can't be processed
until it is brought back online. How long will that
take? Well, it depends on how long it takes to fix
the problem - if the crash is caused by a faulty
power supply in a machine, then it could
take days waiting for a spare part. This
whole parenthetic discussion then leads
into one of my favorite subjects, the transaction
abandonment timeout.)
So, the moral of this story is that you cannot rely on an atomic transaction being truly atomic in time - it will complete as a logically atomic unit, sure, but there will always be amounts of timing jitter involved in making its results visible across all the resources it touched.
Danger: Mixed Synchronicity!
It is clear now what the problem is with
the design stated on the newsgroup. The
assumption has been made that this asynchronous
transaction processing doesn't
happen. A race has been set up between the
JMS and the database resource managers to
commit the transaction. When JMS wins,
the message-processing logic assumes the
database has committed too, but it hasn't -
the commit processing is still going on in
the background, and the ObjectNotFound
exception is thrown.
So much for the theory, how can we fix the design? There are (as always in architecture of this kind) a few options, ranging from the hacky workaround to the elegant rearchitecture.
The hacky workarounds involve coding round the problem, either with JMS message birth times or defensive coding in the MDB. If the code that creates the JMS message sets the birth time for some time in the future, the JMS system will introduce a delay into the processing path before it releases the message. This delay should give enough time for the commit processing to complete. That's a great theory as far as it goes, but how long should the delay be? As I already said, the required window will depend on system load and physical architecture, and it might vary radically in some failure conditions. Using this method for a production system will sooner or later lead to sporadic failures as loads and deployment vary, and will incur a support cost and a reliability loss. So, the defensive coding. A simple-minded approach might be to roll back the MDB, have the message redelivered, and try again; or simply try again after a pause in the MDB logic itself. That's well and good, but what if the scenario isn't object creation, but modification? Now you can't be certain that the data you're updating is the current data (at least, you'll not be certain that you're certain - it depends on the database's locking strategy); to code around this, you add a version field to the object and implement some kind of optimistic concurrency so that the MDB can wait until it's sure it's operating on the right version of the data.
The fact that you're doing all this frantic coding to work around this issue should be ringing alarm bells - clearly the architecture of the application does not mesh well with the architecture of the infrastructure. The best solution is to get to the bottom of why...
It's Not a Mesh, It's a Mess!
JMS is all about allowing processes to run
asynchronously with respect to one another.
JTA is all about making updates that logically
execute atomically, which in turn
implies synchronously (or as near synchronously
as reality allows). In this scenario, an
attempt is being made to use JMS as a synchronous
calling mechanism - the operations
on the data are clearly related to one
another (synchronous) but for some reason
we have interposed an asynchronous messaging
system into the processing flow.
Maybe the most elegant solution would be
to implement the next processing step as an
Entity EJB, call it via RMI, and have it participate
in the original transaction. All the
updates would be visible to all the processing
steps then, since the updated data is visible
before the commit within the transaction.
But what if there's another requirement
that necessitates the asynchronous
path to the "stage 2 processing"? Well, wrap
the Entity EJB you created in this use case in
an MDB facade and the logic can then be
executed synchronously or asynchronously,
depending on the use case (even better,
maybe the "stage 2 Entity" only offers a
local interface).
As a parting observation, this kind of tricky asynchronous corner case is not at all uncommon in building transactional systems - in fact, it's more like the norm. TP systems like Tuxedo, CICS, and others all offer facilities analogous to the design pattern I just described to handle this kind of thing. So does the BEA WebLogic Workshop framework - it builds in this style atop J2EE and provides a natural, event-driven programming model while taking care of this kind of implementation detail in the framework, again demonstrating the power and potential of using such a framework to simplify implementation while increasing reliability.
REPRODUCED WITH PERMISSION FROM BEA SYSTEMS.
Published November 6, 2003 Reads 10,262
Copyright © 2003 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Peter Holditch
Peter Holditch is a senior presales engineer in the UK for Azul Systems. Prior to joining Azul he spent nine years at BEA systems, going from being one of their first Professional Services consultants in Europe and finishing up as a principal presales engineer. He has an R&D background (originally having worked on BEA's Tuxedo product) and his technical interests are in high-throughput transaction systems. "Of the pitch" Peter likes to brew beer, build furniture, and undertake other ludicrously ambitious projects - but (generally) not all at the same time!
- Transactions, Suspension, and the Ticking Clock
- Transactions: Driving You to Distraction?
- WebLogic Performance: Pursuit of Speed Isn't Everything
- Transactions: How Distributed Are Yours?
- Measuring the Value of Software Infrastructure
- And Now for Something Completely Different
- Avoiding Middle-Aged Spread for Your WebLogic Infrastructure
- Notes from a Small Place
- Is the Glass Half Full or Half Empty?
- Transaction Not Supported? Just Say No!
- How Loose is Your Coupling?
- Transactions for the Next Generation























