In Which I Attempt to Reason with the Establishment

Dear Mr. Djanogly,

Having written to MPs in the past — even to those whom are more closely aligned with my views — I am not confident that this letter will be met with anything more than token acknowledgement, platitude or unwavering adherence to your party rhetoric. Indeed, I am so disenchanted with our oligarchic system, that I’m sure that even if I am able to persuade you of my argument, when it comes to legislature, you will submit to the whip. That notwithstanding, I have increasingly grave concerns over the direction in which government policy on data and communication security is taking, such that remaining silent is no longer an option.

I didn’t vote Conservative in the recent general election and my convictions are probably unrepresentative of the 36% of the Huntingdonshire constituency that won you your seat. That said, as a senior software developer for a world-class scientific institute, having a background in mathematics, a working knowledge of cryptographic engineering and being a member of the Open Rights Group, I at least consider myself to be well-informed on the subject.

As you yourself are someone of professional legal standing, I would hope that the policies being put forward at least give you pause. How, for example, is this recent dystopian quote from David Cameron even reasonable, let alone justifiable?

For too long, we have been a passively tolerant society, saying to our citizens: as long as you obey the law, we will leave you alone.

Your manifesto calls for:

  • New communications data legislature — the so called “Snooper’s Charter”, which has been repeatedly dismissed by the House of Lords for being too sweeping and lacking in objective consultation — which was announced by Theresa May within hours of the election result. The proposed bill increases the remit of the Regulation of Investigatory Powers Act and, along with last year’s rushed and heedless Data Retention and Investigatory Powers Act, legitimises the broad, warrantless surveillance of citizens, exposed by Edward Snowden, by GCHQ and other signal intelligence agencies around the world. It enables draconian powers, without check, that could be used to silence freedom of speech arbitrarily. At the very least, it is an expensive system that, by its very nature, probably won’t work that well, is open to abuse and provides no guarantees for the people it’s trying to protect.

  • The scrapping of the Human Rights Act, which surmounts to the dissolution of habeas corpus. Whatever happened to “checks and balances” or the presumption of innocence until proven otherwise by due process? This — as with everything in the politics of fear — is aimed at “extremism”, but how is this defined and what’s to stop the corruption of this definition at the whim and convenience of whomsoever happens to be in power?

  • Censorship, blocking and restriction of certain online content, when the UN has, in 2011, described Internet access as a human right. What gives anyone the moral authority to deem things inappropriate? How can such a system be justified when it demonstrably blocks false-positives (e.g., when the recently mandated adult content filter blocked charities who help those struggling with sexual abuse)? When did North Korea become the model to emulate?

Moreover, there have been repeated calls for the weakening of cryptographic systems (e.g., using “backdoors” or key escrow) from government, security and law-enforcement agencies. However, any cryptanalyst will tell you that weakened security is broken security. Imagine, for example, that every lock were made to accept two keys — your own and a master key, for the authorities — how long do you think it would take before that master key was reverse engineered? I predict it would be a matter of hours. Cryptography is more complex, but the principle is the same and not without precedent (e.g., the breaking of the CSS key used by DVDs).

While you may be able to legislate that commercial security products can be compromised, given that open cryptography algorithms won’t (or even “can’t”) be jeopardised in such a way, does this imply that they will be outlawed in the UK? An outlaw on an algorithm is an outlaw on thought itself. Again, quoting a recent statement from the UN:

Encryption and anonymity, and the security concepts behind them, provide the privacy and security necessary for the exercise of the right to freedom of opinion and expression in the digital age. Such security may be essential for the exercise of other rights, including economic rights, privacy, due process, freedom of peaceful assembly and association, and the right to life and bodily integrity.

I appreciate the need for security, but omnipresent state surveillance is not the solution. This just breeds an air of distrust and erodes whatever cohesion that still exists within society. As Benjamin Franklin is often quoted:

Those who would give up essential liberty, to purchase a little temporary safety, deserve neither liberty nor safety.

The trite argument of having nothing to fear if you have nothing to hide is spurious. No one lives their life entirely in the open, for the whole world to see, and for good reason. Whatever a person’s intent — malign or inconsequential — their trust must be earnt. A government is no more implicitly trustworthy than any other entity and if one is suspected of a crime, then their privacy may only be surrendered by judicial warrant.

To be perfectly clear, I don’t condone violence or hatred, but the freedom of the many is of far greater importance than the threat posed by the few.

At a time when our environment is on the brink, rather than control in the (false) name of security, a much higher priority is the husbandry of our home. As for governance, it seems clear to me that the promotion of cultural pluralism, to combat the growth of a sectarian society, economic diversity outside the realm of the financial industry (e.g., investment in science and technology) and advocacy of critical thinking over the rise of propaganda and misinformation should be your greatest concerns. Of course, it’s harder to govern an opinionated and emancipated electorate, but your interests should be that of the people, not of yourselves or your benefactors.

In the meantime, I for one will be taking steps to secure my digital footprint — using strong cryptography wherever possible, hosting data in digital havens such as Iceland, etc. — and will actively encourage others to follow suit. I hope that will not brand me as an extremist!

Yours concernedly;
Christopher Harrison

P.S., I have taken the liberty of publishing this letter openly on my website. I invite you to respond, therein. Otherwise, unless you explicitly forbid it, I shall faithfully publish any written response I receive from you there.

Pad Thai Fluke

Despite what I said last year, my New Year’s Resolution for 2015 has been to cook a new dish once a week. Even if I only got a 10% success rate, that would be five new recipes to my repertoire. So far, it’s proven to be extremely successful: just one-or-two rejects; mostly good stuff and a few really great new things.

Maybe I’ll do a post about all the new food I’ve been cooking at some point, but let’s focus on today. The week before last’s (4th April) new recipe was pad thai — a classic Thai noodle dish — based and simplified largely from this article I found in the Guardian. The result wasn’t bad, but I was convinced I could do better…and better, I did!

Let me share with you my recipe!

n.b., “Fluke” as in lucky, rather than the parasitic flatworm…probably shouldn’t have shared that 😕

Continue reading Pad Thai Fluke

Journalistic Integrity

Yesterday, on Newsbeat, was a story about the phenomenon of young Europeans — particularly girls and women — defecting to Syria. During which, a counter-terrorism specialist lambasted their use of cryptographic software and urged that — while “privacy is important to us”™ — backdoors for law enforcement should be readily available to make their jobs easier.

Won't somebody please think of the children!

Now I don’t want to paint myself as a target — and, by that, I mean with respect to certain signal intelligence agencies — but as a digital rights advocate with a passing knowledge of cryptography, it troubled me that the report completely neglected any discussion of the potential repercussions of compromising such systems. Setting a precedence for surveillance is a slippery slope; the consequences of which ought to be made abundantly clear to all.

Maybe you think Newsbeat is dumbed-down, so I shouldn’t expect too much. I think that’s a bit patronising: Newsbeat may cater to its demographic (e.g., today’s story on the new, racially diverse emoji), but I wouldn’t say it’s dumbed-down. Indeed, the reason I listen to Radio One on my commute is because it’s entertaining. I gave Radio Four a try and, despite being very dry, it only serves to highlight the problem I have with the BBC’s news and current affairs: It panders to its audience. While it’s true that my views often align with this side, it does nothing to broaden my — or others’ — horizons by preaching to the choir.

This isn’t the first time such one-sidedness has been peddled by them. Indeed, I notice that it’s often the case when I have some, non-trivial knowledge of the subject in question. (I’ve heard similar things from others: Mrs. Xoph often questions the impression of her kin given by the BBC.) As such, without evidence to the contrary, I think it would be reasonable to extrapolate this bias to any of their stories.

Of course, it goes without saying that one shouldn’t believe everything they read, hear or see — critical thinking and all — but in my mind, the BBC always had a reputation for impartiality. When did this change?

Genomics 101

Nearly four months ago, I started working at the Wellcome Trust Sanger Institute. To say it’s a sweet gig would be something of an understatement! I can only think of a handful of cooler things: e.g., ESA, NASA, CERN; mostly by virtue of me understanding physics better than I do biology. Indeed, I’m not a genomicist — or even a bioinformatician — I’m just a run-of-the-mill software engineer (grunt). To be honest, being surrounded by alphageeks is a bit intimidating (think Stuart from The Big Bang Theory) with my barely-layman’s knowledge of genetics, but that’s entirely my problem.

Anyway, since joining, I’ve had a question about genomics that’s been bugging me. I think I’ve figured it out… (I probably should have asked my colleagues from the start, but in retrospect — presuming my understanding is correct — it’s a bit of a stupid question!)

If everyone’s DNA is different, how can there be a single human genome?

The clue to this is largely in the name: genomics is about genes, not nucleotides. DNA is a very long string of nucleotide pairs — the A, C, G and Ts you’re probably familiar with — but certain substrings of basepairs function to delimit genes. To analogise, written sentences begin with a capital letter and end with a full stop; likewise genes begin with a particular sequence and end with another. Genes are what define species and that’s what genomics is all about: Determining the manifest of genes for a species, what they do and how they interact. So every human has the same set of genes, but the “parameters” of those genes will be different. For simplification’s sake, if there were a single gene for eye colour: every person would have it, but some would have the blue version (allele), while others would have brown, etc.

To continue with my literary analogy, think of a genome for a particular species as a constricted poem, like a sonnet or limerick. The sentences (genes) within the poem must have the same meter, syllable count or rhyming scheme, but the actual words can be different. Thus we can have a multitude of poems with the same structure, but with vastly different content. Such is the genome to DNA.

I actually have another, related query, that is probably more of a thought-experiment rather than answerable…

The DNA molecule is a polymer, like plastic or sugar. Indeed, the two strands that hold the nucleotides in its quintessential double-helix are sugars: deoxyribose, hence the name deoxyribonucleic acid. In principal, polymers can be arbitrarily long — the human genome is over three billion basepairs — but are there non-chemical constraints on its length? For example, is there a point where it becomes too long to hold itself together, or that biochemical processes become too inefficient to be useful?

The reason I ask is because, as we are operating over an alphabet of just four symbols, any theoretical maximum genome size \(N\) would give us an upper-bound on all varieties of DNA-based life at \(\frac{4}{3}(4^N – 1)\). Granted this upper-bound would unimaginably exceed the number of atoms in the universe, but presumably vast swathes of DNA don’t translate to viable lifeforms.

My point being, if this were true and in a very nit-picky way, life couldn’t therefore be truly infinite in variety.

Found 404s

When you get a 404 Not Found error, on StackOverflow, you are presented with some obfusticated code:

Even with syntax highlighting turned off, the preprocessor macros immediately mark this as C to me, but with extra bits encoded into comments. Let’s deal with the C first:

The first line aliases putchar to v. putchar is part of the C standard library and simply takes an unsigned integer, representing an ASCII character, which it prints to the screen, and returns the original input value. It’s kind of like an identity function with IO side-effects! Then, in the second line, the main function is simply aliased to print, taking a parameter x. The important thing to note is that x is not bound to anything in this macro, so it is effectively redundant. Anyway, the main function deobfusticates to:

Function application is left associative, so putchar(52) gets evaluated first. ASCII point 52 is the character “4”, which is printed to the screen. This returns 52 and so putchar(52 - 4) outputs ASCII point 48 (the character “0”). Then, in the final call, we simply add four back to get ASCII 52. That is, the print(202*2) call at line 4, just outputs the string “404” to the screen and then exits.

As I said, the argument passed to print is irrelevant to C, but in some scripting languages the “#” is interpreted as a comment. This actually therefore makes the source in its entirety valid Python and, I think, Perl:

i.e., Outputs \(202\times 2 = 404\).

Next up is the third line, which is a comment in the C source that starts on the end of the second. It’s in Brainfuck:

In Brainfuck, the only valid symbols are “>” (move up a memory address), “<” (move down an address), “+” (increase the value at the current address), “-” (decrease the value), “[” (loop until zero), “]” (close loop), “.” (output current address value) and “,” (get input); anything else (like that random “4”) should just be ignored. So we have:

  1. Move to address 1
  2. Increase the value by 8
  3. Start loop
  4. Move to address 2
  5. Increase by 6
  6. Return to address 1
  7. Decrease by one
  8. Loop until zero
  9. Move to address 2
  10. Increase by 4
  11. Output
  12. Decrease by 4
  13. Ouput
  14. Increase by 4
  15. Output

In lines 1 to 8, we create an iteration count in address 1 (of eight) and increase address 2 by six in each loop. Thus, when we exit the loop, the value at address 2 is \(8\times 6 = 48\). We then add four to that (i.e., 52, ASCII “4” again) and output, then reduce by four and output (ASCII “0”), then set up the final “4”.

In the entirety of this source, there are only four other valid Brainfuck symbols: the “+” and “-” on line 2, which increment then decrement the value at address 0 (i.e., does nothing); then the “>.” on line 5, which will output the value at address 3 (i.e., does nothing, because the value here is 0 and ASCII point 0 is just the null string terminator). Thus, again, the source in its entirety can be run through a Brainfuck interpreter to output “404”.

Now there’s the question of the final line: In C, Python, Perl and Brainfuck, it is has no effect. There’s also the curious spacing of the #define macros on the first two lines. Also, what’s up with that random “4” in the Brainfuck code on line 3, anyway?… It may all just be a red herring, but I have a feeling that there is a bit more to this!

Either way, this is neat. Kudos to whoever wrote it! I reminds me of those “How many triangles can you see?” puzzles, but an order of magnitude more complex.

Movies of 2014

Last year’s rating system was an attempt to give more justifiable scores, but it was too complicated. Moreover, neither of us are technical experts on some of the things we were judging, so that seemed a tad disingenuous. I was tempted to introduce an even more sophisticated system, but Mrs. Xoph and I had a very busy year and ended up rating everything we saw retrospectively (i.e., just now), using a simple ten point scale!

It works well enough :)

Continue reading Movies of 2014

Syntactic Shibboleths

A shibboleth is a word or phrase that can be used to identify a speaker’s background based on the way that it is pronounced by them. So, for example, one could vaguely determine which end of the UK someone was from by how they pronounce the vowel in the word “bath”: [æ] would err towards the north, while it’s [ɑ] in the south (east). It’s actually possible to isolate a person’s origin quite precisely by testing a number of shibboleths, with respective, intersecting regions, together. A kind of linguistic Guess Who, if you like.

Four years ago, for my MA dissertation (available on request), I conducted an experiment that called for native English speakers. However, because it was disseminated over the Internet, I had no way of certifying this, while still needing a way to improve the signal-to-noise ratio in my data. My innovation was to test subjects’ responses (i.e., grammaticality judgements) on subtly warped syntactic structures which are peculiar to English (or, at least, Germanic languages). Subjects who baulk at the dodgy constructions are more likely to be native speakers, against those who accept them because they look “about right”.

These are my syntactic shibboleths (actually, “morphosyntactic”, but that breaks the alliteration). The first four have been “battle tested” in research — and proved very useful in sanitising my data — the others (and there are certainly more) seem like viable candidates from my experience. For each, I give three examples of corruption, largely to make native speakers cringe!

Continue reading Syntactic Shibboleths

London Calling

Fortunately for me, I have now escaped from London. Not so fortunately, I still work there…which is a bit of an epic commute, but I digress!

Anyway, regarding the day job, recently people have been rather confused about how they can determine a London address. I immediately suggested using the postcode, as they generally follow a regular pattern and, in the capital, they correspond to the compass points in the “outward” part. That is:

Outward Area
E East
EC East Central
WC West Central
W West
N North
NW North West
SE South East
SW South West

This can easily be turned into a regular expression that can be used to validate, presuming the postcodes are already validated, London postcodes:

This is fine for Inner London, but as the city slowly absorbs its suburban neighbours, this won’t work for the foetid glory that is its greater metropolitan area. We have to expand into the orbiting postal code regions:

Outward Area
BR Bromley
CM Chelmsford
CR Croydon
DA Dartford
EN Enfield
HA Harrow
IG Ilford
KT Kingston-upon-Thames
RM Romford
SM Sutton
TW Twickenham
UB Southall
WD Watford

Now it would be straightforward to put together an alternation group of these thirteen (unlucky for some) codes, but apparently that would overgeneralise as some areas covered are not considered within Greater London. To solve this problem, I was given a 64MB CSV file of every valid London postcode and let loose!

I wasn’t about to upload over 300,000 postcode records into a database table, so I figured I would proceed down my regular expression route. Fortunately, with a bit of command line fu, I was able to reduce those records down to 130 unique, Greater London outward postcodes:

The 130 outwards that I obtained represent the actually valid Greater London postcodes for the above districts. As postcodes follow a quite simple format, I was able to condense these into 13 regular expressions:

Outward Area Regular Expression
BR Bromley ^BR[1-8]
CM Chelmsford ^CM1[34]
CR Croydon ^CR([02-9]|44|90)
DA Dartford ^DA([15-8]|1[4-8])
EN Enfield ^EN[1-9]
HA Harrow ^HA\d
IG Ilford ^IG([1-9]|11)
KT Kingston-upon-Thames ^KT([1-9]|1[7-9]|22)
RM Romford ^RM([1-9]|1[0-5]|50)
SM Sutton ^SM[1-7]
TW Twickenham ^TW([1-9]|1[0-59])
UB Southall ^UB([1-9]|1[018])
WD Watford ^WD([236]|23)

We know that the inward part of postcode is always of the format \d[A-Z]{2}$ and it should follow the outward by a space, although this is often missed or doubled up. So we can take the alternation group of the above outwards, factor and include the inward pattern to obtain this beast:

Now we can both determine and classify London addresses without having to resort to an enormous lookup table :)

Not to gloat, but it’s most satisfying to know ones tools well. What would have taken my colleagues — even those who claim to be developers — literally days, I was able to do in less than an hour.