Friday, April 21, 2006

AJAX hammers servers?

I read a question posed by James Governor's blog and Tim Bray's always insightful response to it yet was left feeling unfulfilled. It lead me to this blog which provided some further information. Perhaps it is the jetlag or I just haven't abused enough caffeine today but I feel compelled to write more.

The author does not define several things which make his statements relatively un-confirmable. First of all, there is no such thing as a spec for the Web 2.0 so his claim that RIA's are a crucial aspect is a flawed assumption from a pragmatic standpoint, although somewhat orthogonal to the question asked.

So let's look at what AJAX does and why it is favored. In the past, if you wanted to make a webpage that displayed up to date information, you may have to force the page to refresh itself every few minutes or prompt the user to do this. Since HTTP is stateless, that means firing off an HTTP get() request every few minutes to retrieve the entire page. The easiest way to do this was to use the meta-refresh element of HTML 4.0 transitional. Since HTTP server are idempotent, they respond each time they get the request. Tim and I had a conversation about this back in 1999-2000 when he called it TAXI (the father of AJAX). The question posed was "wouldn't it be much more efficient to only refresh parts of a page rather than the whole page). Out of that necessity, AJAX was born. It was actually Microsoft that cemented it by putting the XMLHTTPRequest() object into IE.

Now let's separate business needs from technology. If you have a business requirement to provide current, up to date information to your end users via the internet, you are going to do it, regardless of the underlying technology and the costs on servers. From a pure pragmatic standpoint, one should want to do this in the most efficient manner possible. There are several models available to use.

One is that the server "pushes" information to clients when some event happens (perhaps a stock price changes). The problem with this is that it often means the server has to dispatch a large number of concurrent messages when that event occurs, even if the clients themselves are not requesting it. Suboptimal - this can cause server overload, scare small children and bruise fruit.

Second model is the client side pull. This usually makes more sense given the client controls the nature of the request frequency, although the services architects determine the content size and policies for requests. This makes it much easier on the server to balance its outward messages given not all clients are likely to request at the exact same time.

Reloading only part of a page or even just the data for that part of the page is much more efficient that having to load an entire page or the data and presentation aspects of a component of the page, therefore, I would state that AJAX (or other AJAXian type methodologies) are probably the most efficient way to handle the business requirements that are placed on the web today.

If you were to ban AJAX from the web, we would have to revert to full page reloads which would certainly be more bandwidth and processor intensive than AJAX enables.

My opinion - those who are online gamers, porn surfers, MP3 downloaders are all more likely to cause the scalability problems that AJAX developers. Carefully thought out architecture should be used where possible. AJAX solves more problems that it creates.

Duane

Wednesday, April 19, 2006

Circumventing PDF with gmail? Not!!

I was recently amused by reading a blog of a group who apparently
defeated PDF's DRM system by using GMail's "convert to HTML" option. I nearly fell off my chair when I read the claim " (it) works regardless of the files; usage restrictions..". Yes - under certain circumstances you can gain access to text or other components of a PDF document that has policy protection on it, but *only* if the person applying the policies set the policies to allow this type of access AND does not encrypt the PDF. Keep in mind that PDF is a completely free, open and available standard that anyone can implement. There are several third party SDK's to manipulate PDF documents. Before you read the blog above, it is extremely helpful to understand how the encryption and DRM mechanisms work.

In general, if you do not want someone other than the intended recipient to
view a PDF, you should encrypt it. By default, the encryption level for compatibility with Acrobat 5.0 and later is 128bit RC4. Encrypting the contents of a PDF with a strong key results in a situation where there is no way gmail or any other
application can crack it open by brute force. The PDF is turned into cipher text that is completely incomprehensible to anyone without the key to open it. I am so certain of this that I will provide $500 USD to the first person who can open this document within one year.

A person encrypting a PDF document has several options. First, you can determine the compatibility for earlier versions of Acrobat (5 , 6) or jump straight to Acrobat 7.0 and higher. If you select to encrypt it for Acrobat 7, the default level encryption method is AES, much harder (read = impossible) to crack using brute force.



You can also opt to encrypt all the document contents, or leave the metadata unencrypted. This is useful should you want to be able to have the document searchable in real time based on the metadata. Note the lower section of the screenshot above - by default, the box is checked to allow text access to the document. If you leave this selected, some PDF applications can access the text. If you don't want this, please de-select this option. After setting all of the options and pressing next, you will still be given a generic warning that certain non-Adobe products might not enforce this document's policies. Note that if you do not select "require a password to open the document", the usefulness of encrypting it is moot. Others will still not be able to copy the document by using the text copy tool or Control-C, but other means can be employed.

To summarize so far, Acrobat has DRM capabilities to limit the following interactions with documents

1. ability to disable printing
2. ability to disable cut and paste
3. ability to disable control printscreen
4. ability to disable local file saving
5. ability to disable local file saving
6. ability to disable accessibility
7. ability to make a document no longer exist

A person must comprehend the frame and scope of the intended use of each of
these and their built in restrictions. PDF's are like music - if you can
render it once, it is possible to capture it and render it again. Even if
we figured out a way to prevent all third party screen scraping software
from capturing what you see on a computer screen, someone who both has
access to the document for a single view AND intent to distribute it further can simply take adigital photo of their computer screen to circumvent all of these. There is simply no way to stop someone who is intent on doing this using 1-6 above.

Another methodology is available to place a dynamic watermark on the page, perhaps stating the users name and address in bold gray text across the document. This too can be defeated if one took a screen shot of the document and used a great tool like ... err "Adobe Photoshop" to take care of that nasty watermark. I am guessing the magic wand tool is your best friend here ;-)

So how can you protect a PDF? If you really want to make it secure and also
track the users interaction with it, you would be wise to use Adobe Policy
Server
. The policy server uses a model of persistent DRM that follows the
document everywhere it goes. If you feel the document is out of control and
you want to stop it, you can simply "destroy" the document which will cause
it to fail to un-encrypt itself when someone opens it. Is there a way
around that? Sure - sneak into the office of the person who made the
policy, install a tiny pinhole camera near their desk and capture their
authentication.

See what I am getting at, no matter what you do, there is a way around it if
someone is really intent. The easier method is "social engineering" rather than brute force.

So here is a challenge. Take this document here (link to APS protected
document) and try to render it with gmail (or any other method). I will pay
$500 USD to the first person who can show me the un-encrypted content of this document within one year of this.

How I would do it? I would probably try to lure myself into providing a password to a site that offered me some form of membership and hope that I was rather lazy and used the same password for this document. D'oh!! Not gonna work - I typed a random phrase of about 13 characters to encrypt this using AES.

Good luck!

Tuesday, April 18, 2006

The Web 2.0

I now feel compelled to write in this subject after holding my breath and counting to 10 several times. I recently read YABA* about building a "Web 2.0 Meter". A not bad idea *if* you had some sort of defined criteria that those being judged could adhere to. Another site claims it is a Web 2.0 validator
Web 2.0 Validator.

Enough! I can even hear Homer Simpson saying “D’oh!!” when he thinks about this. Time to rant a bit from Logic 101. You cannot measure something by two independent “meters” without some distinct set of metrics around the subject. Sorry folks, it is that simple. I would like to point out that the Web 2.0 validator site had the sense to state the rules they use and that the gist of the article at Oreilly was not about the web 2.0 meter. It was a realization that the concepts we have come to “associate” with the Web 2.0 were really the Web 1.0’s original goals. GAH!!! I just had an unpleasant realization that now I am trying to quantify the Web 2.0. To solve this problem, I think we have to look to the past as well as the future (yeah yeah – so what does that rule out? Thinking about the exact present moment?). The Oreilly folks are pretty smart IMO so please don't take this as some petty stab - more of a friendly prod :-)

If “Web 2.0” is to be used as a catch all term for where we are going, let's put some substance behind it. If not, it will suffer the same symptoms as SOA and Web Services - both very meaningful to most people, just with differing semantics. In fact, the lack of clarity around SOA lead a group of almost 200 people to get together and write a formal Reference Model for SOA under the auspices of OASIS. Similarly, a group got together within W3C and worked on a Reference Architecture for Web Services. I am proud to state that I worked on both projects.

So what can be done to put substance in the Web 2.0?

1. Write an abstract reference model to show the components of the Web 2.0 (abstract); and
2. Create sets of high level abstract patterns, mid level patterns and low level idioms to illustrate what is really meant by the web 2.0; and
3. Create reference architecture (in plural and somewhat generic) for all components of the Web 2.0, describing their externally visible properties and relationships with other components.

An example would be to illustrate the syndication-subscription pattern using some architectural patterns template. The abstract notion is that subscribers notify a syndication component of their wish to receive content. When the syndication component has content it is ready to push out, it configures a list of recipients based on some criteria then proceeds to push the content out. A lower level idiom could show this implemented using Apache components, perhaps even with options for content formatting based on device, reliable messaging protocols, security and end user authentication with a persistent security model for the content itself.

The cool thing about this approach is that it still gives each and every implementer the freedom to make their own black box components whilst preserving a common layer of understanding. It also provides documentation about what is really meant, granted, those who cannot distinguish abstract from concrete may still be confused.

Of course to do this, you would require an architectural patterns meta model and template that allowed you to go from the very abstract to the very concrete, but I think I know where one is that can be donated to some organization.

Why should this be done? Simple – without this, “Web 2.0” is nothing more that a marketing term. Sure – several people will say “no – it means X and exactly X”, but the chances of Boolean Y = eval(personA.X == personB.X) evaluating to “1” in every instance is very low IMHO.

The Web 1.0, aka the “internet”, has achieved a common definition though. Even though it is not concisely written, there is general consensus on what a web server does, what the layered wire protocols do, how security works and how people interact with websites (via browsers). If someone says “this server is internet enabled”, people imply that it means it can take HTTP requests and return text in compliance with the requesters requirements.

Sorry – folks. I just don’t believe that the Web 2.0 will inherit an implied reference model the way the Web 1.0 did. The culprit for this is the Web 1.0 as it exists – it allows anyone, almost anywhere, to write what they think the Web 2.0 is and share it with others. Also, unlike the Web 1.0, the Web 2.0 is not mandatory. The basic components of the Web 1.0 such as HTTP, TCP/IP, SMTP, MIME, HTML etc all were mandatory, therefore it was fairly easy to draw a box and state – this is the Web 1.0. One could even through in some common non-mandatory components and still make a solid statement (example – scripting languages (ASP, VBScript, JavaScript, ActionScript plus CSS, XML et al).

A Reference Model for the Web 2.0 might want to declare some form of compliancy and conformancy statements. Such might be a weighted test of it could be a bar that you must pass. Regardless, before this exists, what is the point of building “validators” and “meters”. Harrumph – end of rant. Take it all with a grain of salt – the Oreilly folks are smart and I’m sure we’ll see something soon ;-)

If not – does anyone feel compelled to take a stab at a formal definition? I will gladly jump in the fray and donate my time to help.

*Yet Another Blog Article – in case you didn’t figure it out ;-)