Wednesday, August 16, 2006

Semaphore's mystery: Background on Cypher Creation

So as Adobe's Technical Evangelist for Security , I should probably share some background to you Semaphore solvers. One of the things that seems more than likely is that the cryptograph is encrypted using sequential "rounds". The concept of rounds is used in most modern cryptographic techniques such as AES. In my earlier post explaining how to build AES cyphertext using Rinjdael's key algorithms, the process is explained. For those of you who don't want to understand the gobblety gook on that blog entry, here are the basics.

The first round takes your normal text and renders it as cypher text using a "key". An example of this might be to take the phrase

"I have solved the San Jose Semaphore"

and encrypt it by moving all the letters forward (+) 8 values on the ASC II text table using the digital reference as a guide. The resulting text would be now encrypted as:

"Q(pi~m(|wt~ml(}pm([i%20(Rw|m([muixpwzm"

You could now employ a second "round" whereby each cypher text character is substituted for another using some lookup table or algorithm. In fact, in many second rounds, the key for the first round is actually encrypted as part of the cypher. An example could be to take the string above and transpose the case from lower case to upper case or substitute characters on a querty keyboard two spaces to the left of the keys needed to re-create the text.

The advantage of using two rounds is an exponential gain in complexity for those trying to solve the equation. Adobe uses AES in most of its' Livecycle products which uses 4 separate rounds of encryption resulting in so many combinations that it is physically impossible to crack the code using a brute force method. In fact, assuming you could build a machine that could crack one DES key every second by brute force, it would take a billion trillion years to crack AES at the 128 bit strength. That is why I scoffed in my earlier blog entry on some people who were under a mistaken impression they could circumvent Adobe's PDF encryption techniques using Gmail. The idea is preposterous!!!

The linear substituion demonstrated above is of course is a very poor encryption algorithm. For starters, it is linear so any patterns that are used in the original text are present in the cyphertext. For example, all spaces in the original text are displayed as "(" characters. This leads to easy recognition of patterns for things like double characters in words (example:" two "t"'s in "pattern").

A much better approach is called "non-linear substitution" whereby each character is substituted with its’ cipher text character using a dynamic map rather than a static linear map (such as Shift+8 characters on the ASCII chart). There are many ways to build this that avoid patterns being recognized from the original text and even introduce confusing patterns to those who want to crack the cipher. In the Semaphore broadcast, the tone of the woman's voice could be a key that acts as a "shift" and the tone could act as an offset on a chromatic scale. A great example of this might be to create a 16 by 16 grid and have multiple allowances of the assessors for each cipher substitute. The chart could look like this:


Now imagine you wanted to substitute the same phrase "I have solved the San Jose Semaphore". You could take it into the first cypher round and transmit it as a tied hash (ignoring space characters for now):

Alpha 9
Delta 13
Lima 7
Lima 2
Delta 9
Oscar 15
Kilo 15
Foxtrot 10
Juliet 8
Kilo 5
...etc...

Notice that patterns that may have existed before can be dissolved. For example, the "e" in have and the "e" in solved can be keyed to two completed different coordinates (both Kilo 5 and Oscar 15 correspond with "e"). I have additionally introduced what appears to be a pattern by using two Lima coordinates side by side.

The text above is very similar to what comes out of Semaphore if you listen to the live broadcast. I mapped the above grid to the broadcast from Semaphore earlier today and found there are now some astounding patterns present. By "astounding", I really mean statistically abnormal. Here is what I decrypted:

Lhsvezh vbdktppbtk vbdktppbtk lbpelsxhsllbpelsxhilhbl tnfddnnhbl tnfddln

Note that the repetition of the characters "vbdktppbtk" is statistically abnormal and probably deserves more attention. I am not 100% sure this map is correct as I was on a conference call during the time and may have misheard a character or two but this is substantially accurate.

So what should the table contain? If Semaphore is in fact a 16 by 16 grid for one round of encryption, it probably doesn't just contain A-Z repeated. Extended ASCII itself has 255 characters which is one shy of 16 times 16 (256). Could that be the answer? If someone cares to take the ASCII table and overlay it on the grid and map out some of the broadcast, I would love to see the results although I suspect the results are going to be in cyhper still. One thing that *could* be gained by such an exercise is to find out the finite length of the broadcast. If you think I should do it, please leave a comment on this blog.

So what are the second rounds? I picked up my guitar last night and played the notes as they are broadcast and found a lot of them are middle "C". Perhaps the offset on the chromatic scale from C is a second round key? What part does the airplane detection play? Why are the patterns broadcast at exactly 7.2 seconds apart? Why am I writing on this?

More to ponder.... Back to the drawing board.

I think I'm getting hooked on this project.

More Semaphore clues - cracking the Adobe Semaphore Cryptograph

In response to a questions I received from a woman on my earlier blog entry, I have made an incorrect statement (although I did say it was a guess). Joann asked if "semaphore's discs spun faster when an airplane flew overhead". I replied it was probably coincidental however have since been corrected.

Siri (no last name) who worked on the project internally at Adobe emailed me and stated " I saw in the last blog you were discussing the reactions of the semaphore when planes flew by - there actually is an antenna on the roof that registers when a plane goes by and causes the semaphores to spin wildly". I stand corrected. The only thing that comes to mind is what does it have to do with the cryptogram? Theories anyone?

Stay tuned for another blog post with some new theories on the subject.

Ciao!

Tuesday, August 15, 2006

Semaphore Solution? A 16 by 16 grid?

After thinking about cracking the Semaphore cryptograph a bit more, I noticed that the string-integer codes both use values that seem to range from 1-16. No numeric value I have encountered is over 16 and neither are the letters using a=1, b=2 etc.

This may imply a 16 by 16 grid. The changes in tones of the voice could be markers for new words or other punctuation, case etc. The lighted glyphs could eassily be used to demark a 16 * 16 grid too, given each of the four glyphs can indicate 4 positions (vertical, horizontal, slanted right, slanted left). This is consistent with the Semaphore flaggin system. I would presume that this gives each of the four glyphs the ability to communicate either A-P or 1-16. How does this sync up with the broadcast message?

I have made a 16 * 16 grid and used letters of the alphabet laid out in a number of ways on the grid to reference the targets but the message is still cypher text. Perhaps there is a dual stage to the encryption. This leaves the tones of the music and the ladies voice to use as clues. The tones that begin each segment seem to be rather limited to only a few notes as does the ladies voice.

More later. Anyone have any theories?

Adobe LiveCycle 8 gets SOA right

A lot of vendors are claiming to be “SOA compliant”; whatever that means. Unless you have made such a claim within the context of an established metric, such as the OASIS Reference M odel for SOA or IBM’s Big Red Book of SOA Patterns, such claims are largely hollow. At a recent Syscon event in New York, my colleagues and I howled in laughter as we overheard someone talking about how with their company’s offerings, within 3 years many people might be able to do SOA, AJAX and ESB together. So before I make ay claims, I want to quantify a few things about SOA.

The OASIS Reference Model for SOA, a well scrutinized committee specification, provides the context for the rest of this blog entry. The RM alone is not sufficient to make the claim I have in the title, so I will augment it a bit. For SOA to be done right, a ubiquitous set of protocols must exist to facilitate the widest possible range of service bindings as possible with the minimal set of complexity. Once a large group of services exist using a common set of protocols and/or standards, a virtual bus exists where services may be consumed by consumers supporting the set of protocols and standards. This is what is often called a “service bus”. Additionally, a common processing model and a common set of invocation patterns aid the virtual bus in providing consumers an easy onramp to use the services on the bus.

Adobe LiveCycle has matured a lot in version 7.0+ and lots of SOA-ish behavior is present in the current version, but some real breakthroughs are coming in LiveCycle 8 as it gets a specification for service developers to use to create such a bus. This is partly due to SOA in general maturing and Adobe’s vigilance in keeping atop the standards around SOA and Web Services.

For starters, simple tasks like standard naming conventions for services across the entire platform are part of the core design specifications. This makes is far easier for developers to intuitively find services via the service registry. The service registry itself is a major step forward from LiveCycle 7. Event though LC7 employed a service registry, it was likely under-optimized for use within the platform and largely tasked with form management functions in LC 7. In LC 8, the core service registry is a central resource that can be used by the entire platform for registering and referencing services.

In LC8, Services are layered more consistently. Rather than each component simply determining its’ own level of abstraction for an API, the unified service designs classify services into different layers. The lowest layer is the native API’s for a component (such as a Java Interface) and connectors bridge the gaps to map the endpoints from various wire protocols to the underlying interfaces. Managed transparency is present and the interfaces are also agnostic to the wire protocols used by consumers to deliver their requests to the service container.

LiveCycle 8 will be previewed at the MAX Conference this year in Las Vegas by Matt Butler. I look forward to attending the talk and finding out more. It should be a pivotal moment in the life cycle of Adobe LiveCycle as the SOA story continues to mature.

Semaphore - Clues to cracking the Cypher: is Slashdot smart enough?

For those of you who love a challenge, Adobe has sponsored a whopper. The Semaphore art project in San Jose is where art meets technology. Four large round glyphs rotate their position every 7.2 seconds while a simultaneous low power radio broadcast emits a coded message. Artist Ben Rubin’s mind shred’s message seems to follow a pattern. Each broadcast segment contains an audible analog tone, an audible analog pattern, followed by a string-integer hash. Several items vary during the broadcast including the tone of the woman’s voice as she speaks the integers. The tones also change.

Here is a pattern:

Tone, dot pattern, click(ping), string, integer, ping

Here are some general observations that might help those trying to decode it. I also want to state that while I do work for Adobe, I have in no way had any internal knowledge of this project nor do I have any keys to the answer.

Background:

Semaphore is an ancient flag based signaling system. A person holds two flags and uses one rotational angle to act as a key while using a second flag to indicate a specific value. The comparison to the rotating glyphs cannot be ignored.

1. What is the significance of the glyphs changing position every 7.2 seconds? This could be a key or it could be incidental to the entire exercise. I would suspect that due to its’ precise timing, it is a key.

2. Ben Rubin’s education should probably be factored in. There are no details of him ever studying cryptographic techniques. Accordingly, I would presume the cypher’s key to be less complex than Rinjdael’s (AES) et al. I did find his master’s thesis entitled “Constraint based cinematic editing” which may be a clue into his mind.

3.What possible significance does the tone of the woman’s voice have? It seems to speak in two tones – one about one octave higher than the other. It this significant of some kind of logic gate?

4. What are the string-integer pairs. Here is an example:
India 02
Kilo 08
Echo 06
Delta 01
Charlie 05
Mike 03
Mike 14
Echo 06
Delta 04
Delta 04 (note repeat)
India 02
Kilo 08
Echo 06
Delta 01
Charlie 05
Mike 03
India 02
Delta 15
Delta 04
Mike 14
Alpha 10
Delta 04
Delta 04
Alpha 10
Charlie 16
Delta 15
India 02
Delta 15
Delta 04
Mike 14
Alpha 10
Delta 04
Delta 04
Alpha 10
Charlie 16
Delta 15
Delta 01
Pumpkin 02 ??
Kilo 03
November 04
Charlie 11
Charlie 16
Lima 03
Echo 06
…..

Note the pattern repeats certain characters (Delta 04’s seem popular). There are alsio patterns of repetition that seem to repeat above a statistically normal basis. Based on this I would aver that the answer is a value of text. The same values suggest double letter combinations in the resulting text (example = Challenge has two “ll”’s)

While the Semaphore Flag code uses only 9 positions, note that the numeric values scale much higher. Could this be a revision of the code based on some key (7.2) to reflect the glyphs ability to provide a more precise rotational index? I did not encounter any numeric value over 16 while listening.

The Semaphore art uses the NATO phonetic alphabet.

A: Alpha
B: Bravo
C: Charlie
D: Delta
E: Echo
F: Foxtrot
G: Golf
H: Hotel
I: India
J: Juliet
K: Kilo
L: Lima
M: Mike
N: November
O: Oscar
P: Papa
Q: Quebec
R: Romeo
S: Sierra
T: Tango
U: Uniform
V: Victor
W: Whiskey
X: X-ray
Y: Yankee
Z: Zulu

Note that “Pumpkin” is not actually part of the phonetic alphabet. Perhaps I heard it wrong.

Good luck - anyone with Theories, please post them back to this blog. Maybe we can get lucky....