Lernmaschine

Vor vier Jahren schrieb ich Datenkrake Google, weil ich die landläufige Vorstellung von Google als einer großen Datenbank für unpassend hielt. In Wirklichkeit, so meine These, sei maschinelles Lernen der Kern von Google. Inzwischen gibt es daran nicht mehr viel zu zweifeln. Google hat mit AlphaGo Aufsehen erregt, einer KI, die menschliche Go-Meister schlägt. Mit Tensor Flow stellt Google eine KI-Bibliothek als Open Source bereit. Vor zwei Wochen wurde bekannt, dass man sogar spezielle Hardware für Deep-Learning-Anwendungen entwickelt hat: Tensor-Prozessoren, auf denen AlphaGo seine Berechnungen ausführte. Dazu passend hat Google gerade das Startup Nervana übernommen, das ebenfalls optimierte Hardwarearchitekturen für das maschinelle Lernen entwickelt hat.

Das kann in diesem Tempo noch eine Weile weitergehen. Halten unsere Debatten mit der Entwicklung Schritt?

Classifying Vehicles

Security is a classification problem: Security mechanisms, or combinations of mechanisms, need to distinguish that which they should allow to happen from that which they should deny. Two aspects complicate this task. First, security mechanisms often only solve a proxy problem. Authentication mechanisms, for example, usually distinguish some form of token – passwords, keys, sensor input, etc. – rather than the actual actors. Second, adversaries attempt to shape their appearance to pass security mechanisms. To be effective, a security mechanism needs to cover these adaptations, at least the feasible ones.

An everyday problem illustrates this: closing roads for some vehicles but not for others. As a universal but costly solution one might install retractable bollards, issue means to operate them to the drivers of permitted vehicles, and prosecute abuse. This approach is very precise, because classification rests on an artificial feature designed solely for security purposes.

Simpler mechanisms can work sufficiently well if (a) intrinsic features of vehicles are correlated with the desired classification well enough, and (b) modification of these features is subject to constraints so that evading the classifier is infeasible within the adversary model.

Bus traps and sump busters classify vehicles by size, letting lorries and buses pass while stopping common passenger cars. The real intention is to classify vehicles by purpose and operator, but physical dimensions happen to constitute a sufficiently good approximation. Vehicle size correlates with purpose. The distribution of sizes is skewed; there are many more passenger cars than buses, so keeping even just most of them out does a lot. Vehicle dimensions do not change on the fly, and are interdependent with other features and requirements. Although a straightforward way exists to defeat a bus trap – get a car that can pass – this is too expensive for most potential adversaries and their possible gain from the attack.

Unexpected Moves

When AlphaGo played and won against Sedol, it made innovative moves not only unexpected by human experts but also not easily understandable for humans. Apparently this shocked and scared some folks.

However, AI coming up with different concepts than humans is nothing new. Consider this article recounting the story of Eurisko, a genetic programming experiment in the late 1970s. This experiment, too, aimed at competing in a tournament; the game played, Traveller TCS, was apparently about designing fleets of ships and letting them fight against each other. Even this early, simple, and small-scale AI thing surprised human observers:

“To the humans in the tournament, the program’s solution to Traveller must have seemed bizarre. Most of the contestants squandered their trillion-credit budgets on fancy weaponry, designing agile fleets of about twenty lightly armored ships, each armed with one enormous gun and numerous beam weapons.”

(G. Johnson:
Eurisko, The Computer With A Mind Of Its Own)

Keep in mind there was nothing scary in the algorithm, it was really just simulated evolution in a rather small design space and the computer needed some help by its programmers to succeed.

The Eurisko “AI” even rediscovered the concept of outnumbering the enemy instead of overpowering him, a concept humans might associate with Lanchester’s models of predator-prey systems:

“Eurisko, however, had judged that defense was more important than offense, that many cheap, invulnerable ships would outlast fleets consisting of a few high-priced, sophisticated vessels. (…) In any single exchange of gunfire, Eurisko would lose more ships than it destroyed, but it had plenty to spare.”

(G. Johnson:
Eurisko, The Computer With A Mind Of Its Own)

Although Eurisko’s approach seemed “un-human”, it really was not. Eurisko only ignored all human biases and intuition, making decisions strictly by cold, hard data. This is a common theme in data mining, machine learning, and AI applications. Recommender systems, for example, create and use concepts unlike those a human would apply to the same situation; an article in IEEE Spectrum a couple of years ago (J. A. Konstan, J. Riedl: Deconstructing Recommender Systems) outlined a food recommender example and pointed out that concepts like “salty” would not appear in their models.

Transparency and auditability are surely problems if such technology is being used in critical applications. Whether we should be scared beyond this particular problem remains an open question.

 

(This is a slightly revised version of my G+ post, https://plus.google.com/+SvenT%C3%BCrpe/posts/5QE9KeFKKch)

The Key-Under-the-Doormat Analogy Has a Flaw

The crypto wars are back, and with them the analogy of putting keys under the doormat:

… you can’t build a backdoor into our digital devices that only good guys can use. Just like you can’t put a key under a doormat that only the FBI will ever find.

(Rainey Reitman: An Open Letter to President Obama: This is About Math, Not Politics)

This is only truthy. The problem of distinguishing desirable from undesirable interactions to permit the former and deny the latter lies indeed at the heart of any security problem. I have been arguing for years that security is a classification problem; any key management challenge reminds us of it. I have no doubt that designing a crypto backdoor only law enforcement can use only for legitimate purposes, or any sufficiently close approximation, is a problem we remain far from solving for the foreseeable future.

However, the key-under-the-doormat analogy misrepresents the consequences of not putting keys under the doormat, or at least does not properly explain them. Other than (idealized) crypto, our houses and apartments are not particularly secure to begin with. Even without finding a key under the doormat, SWAT teams and burglars alike can enter with moderate effort. This allows legitimate law enforecement to take place at the cost of a burglary (etc.) risk.

Cryptography can be different. Although real-world implementations often have just as many weaknesses as the physical security of our homes, cryptography can create situations where only a backdoor would allow access to plaintext. If all we have is a properly encrypted blob, there is little hope of finding out anything about its plaintext. This does not imply we must have provisions to avoid that situation no matter what the downsides are, but it does contain a valid problem statement: How should we regulate technology that has the potential to reliably deny law enforcement access to certain data?

The answer will probably remain the same, but acknowledging the problem makes it more powerful. The idea that crypto could not be negotiated about is fundamentalist and therefore wrong. Crypto must be negotiated about and all objective evidence speaks in favor of strong crypto.

Apple, the FBI, and the Omnipotence Paradox

“Can God create a rock so heavy He could not lift it?” this is one versio of the omnipotence paradox. To make a long story short, ominpotence as a concept leads to similar logical problems as the naïve set-of-sets and sets-containing-themselves constructions in Russel’s paradox. Some use this paradox to question religion; others use it to question logic; and pondering such questions generally seems to belong to the realm of philosophy. But the ongoing new round of (civil) crypto wars is bringing a tranformed version of this paradox into everyone’s pocket.

Can Apple create an encryption mechanism so strong that even Apple cannot break it? Apple claims so, at least for the particular situation, in their defense against the FBI’s request for help with unlocking a dead terrorist’s iPhone: “As a result of these stronger protections that require data encryption, we are no longer able to use the data extraction process on an iPhone running iOS 8 or later.” Although some residual risk of unknown vulnerabilities remains, this claim seems believable as far as it concerns retroactive circumvention of security defenses. Just as a locksmith can make a lock that will be as hard to break for its maker as for any other locksmith, a gadgetsmith can make gadgets without known backdoors or weaknesses that this gadgetsmith might exploit. This is challenging, but possible.

However, the security of any encryption mechanism hinges on the integrity of key components, such as the encryption algorithm, its implementation, auxiliary functions like key generation and their implementation, and the execution environment of these functions. The maker of a gadget can always weaken it for future access.

Should Apple be allowed to make and sell devices with security mechanisms so strong that neither Apple nor anyone else can break or circumvent them in the course of legitimate investigations? This is the real question here, and a democratic state based on justice and integrity has established institutions and procedures to come to a decision and enforce it. As long as Apple does not rise above states and governments, they will have to comply with laws and regulations if they are not to become the VW of Silicon Valley.

Thus far we do not understand very well how to design systems that allow legitimate law enforcement access while also keeping data secure against illiegitimate access and abuse or excessive use of legitimate means. Perhaps in the end we will have to conclude that too much security would have to be sacrificed for guaranteed law enforcement access, as security experts warn almost in unison, or that a smartphone is too personal a mind extension for anyone to access it without its user’s permission. But this debate we must have: What should the FBI be allowed to access, what would be the design implications of guaranteed access requirements, and which side effects would we need to consider?

For all we know, security experts have a point warning about weakening what does already break more often than not. To expectat that companies could stand above the law because security, however, is just silly.

PS, remember Clarke’s first law: “When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”

PPS: Last Week Tonight with John Oliver: Encryption

Besondere Arten personenbezogener Daten

Das Bundesdatenschutzgesetz hebt einige Arten personenbezogener Daten heraus und stellt sie an mehreren Stellen unter einen noch strengeren Schutz. Die Definition:

»Besondere Arten personenbezogener Daten sind Angaben über die rassische und ethnische Herkunft, politische Meinungen, religiöse oder philosophische Überzeugungen, Gewerkschaftszugehörigkeit, Gesundheit oder Sexualleben.« (§ 3 (9) BDSG)

Die Idee dahinter ist so einfach wie plausibel: Datenschutz geht nicht nur selbst aus den Grundrechten hervor, er soll auch vor Eingriffen in andere Grundrechte schützen.

Angela Merkel am Rednerpult auf einem CDU-Parteitag
(Quelle: CDU/CSU-Bundestagsfraktion, CC-BY-SA, https://commons.wikimedia.org/wiki/File:Cdu_parteitag_dezember_2012_merkel_rede_04.JPG)

Was heißt das? Nehmen wir unsere Bundeskanzlerin als Beispiel. Dass sie Deutsche ist, der CDU nahesteht und sich zur evangelischen Spielart des christlichen Glauben bekennt, diese Angaben gehören zu den besonderen Arten personenbezogener Daten. Je nach Auslegung fallen diese Angaben vielleicht sogar hier im Blog unter das BDSG, immerhin ist ein Blog was mit Computern.

Nicht zu den besonderen Arten personenbezogener Daten gehören zum Beispiel ihre  Wohnanschrift oder die Vorratsdaten ihres privaten Mobiltelefons (deren Speicherung  allerdings eine eigene Rechtsgrundlage neben dem BDSG hat).

Ist das eine sinnvolle Risikorientierung des Datenschutzes? Fürs Individuum nicht unbedingt. Welche Daten welche Risiken implizieren, lässt sich im Einzelfall nicht an so einer Grobklassifikation festmachen. Im Fall der Kanzlerin gäbe es sicher so manchen, der sie gerne mal zu Hause besuchen würde (vor ungewollten Besuchen schützt sie freilich der Personen- und nicht der Datenschutz), und wann sie mit wem telefoniert, dürfte so manchen mehr interessieren als ihr religiöses Bekenntnis oder Aspekte ihres Lebenslaufs. Das bleibt so, wenn wir eine weniger im Licht der Öffentlichkeit stehende Person wählen; auch dann korrespondieren die individuellen Risiken nicht unbedingt mit den Datenarten nach BDSG.

Mehr Sinn ergeben die besonderen Arten personenbezogener Daten, wenn wir eine Kollektivperspektive annehmen. Eigentlich möchten wir erreichen, dass bestimmte Arten der Datenverarbeitung gar nicht versucht oder sehr erschwert werden, etwa ein Gesundheitsscoring durch Arbeitgeber als Grundlage für Einstellungs- und Kündigungsentscheidungen. Nicht ohne Grund ähneln die Kategorien aus § 3 (9) BDSG jenen des Antidiskriminierungsgesetzes AGG.

Gesellschaftliche Entwicklungen sind immer mit einem gewissen Konformitätsdruck auf Individuen verbunden. Die Kollektivperspektive gehört deshalb in den Datenschutz; individuelle Rechte sind nur dann etwas wert, wenn man sie auch praktisch ohne Nachteile ausüben kann. Dass sie dort auf den ersten Blick unpassend (und in manchen Ausprägungen paternalistisch) wirkt, liegt an der starken Grundrechtsbetonung. Datenschutz kommt als Grundrecht daher, das ich persönlich in Anspruch nehme. Die kollektive Klimapflege ist Voraussetzung dafür, aber der Zusammenhang ist nicht offensichtlich.

Eat Less Bread?

“Eat less bread” requests a British poster from WWI. We all know it makes sense, don’t we? Resources become scarce at wartime, so wasting them weakens one’s own position. Yet this kind of advice can be utterly useless: tell a hungry person to eat less bread and you will earn, at best, a blank stare. However reasonable your advice may seem to you and everyone else, a hungry person will be physically and mentally unable to comply.

“Do not call system()” or “Do not read uninitialized memory” request secure coding guides. Such advice is equally useless if directed at a person who lacks the cognitive ability to comply. Cognitive limitations do not mean a person is stupid. We all are limited in our respective ability to process information, and we are more similar to than dissimilar from each other in this regard.

Secure coding guidelines all too often dictate a large set of arbitrary dos and don’ts, but fail to take human factors into account. Do X! Don’t do Y, do Z instead! Each of these recommendations has a sound technical basis; code becomes more secure if everyone follows this advice. However, only some of these recommendations are realistic for programmers to follow. Their sheer number should raise our doubt and let us expect that only a subset will ever be adopted by a substantial number of programmers.

Some rules are better suited for adoptions than others. Programmers often acquire idioms and conventions they perceive as helpful. Using additional parentheses for clarity, for example, even though not strictly necessary, improves readability; and the const == var convention prevents certain defects that are easy to introduce and sometimes hard to debug.

Other rules seem, from a programmer’s point of view, just ridiculous. Why is there a system() function in the first place if programmers are not supposed to use it? And if developers should not read uninitialized memory, what would warn them about memory being not initialized? Such advice is inexpensive – and likely ineffective. If we want programmers to write secure code, we must offer them platforms that make secure programming easy and straightforward and insecure programming hard and difficult.

Security and protection systems guard persons and property against a broad range of hazards, including crime; fire and attendant risks, such as explosion; accidents; disasters; espionage; sabotage; subversion; civil disturbances; bombings (both actual and threatened); and, in some systems, attack by external enemies. Most security and protection systems emphasize certain hazards more than others. In a retail store, for example, the principal security concerns are shoplifting and employee dishonesty (e.g., pilferage, embezzlement, and fraud). A typical set of categories to be protected includes the personal safety of people in the organization, such as employees, customers, or residents; tangible property, such as the plant, equipment, finished products, cash, and securities; and intangible property, such as highly classified national security information or “proprietary” information (e.g., trade secrets) of private organizations. An important distinction between a security and protection system and public services such as police and fire departments is that the former employs means that emphasize passive and preventive measures.

(Encyclopædia Britannica)

Öffentlicher Elektrowahllobbyismus

Ein Anbieter digitaler Wahltechnologie klärt uns darüber auf, wie unsicher herkömmliche Wahlen seien und welche Vorteile die maschinelle Abwicklung künftiger Wahlen hätte. Uns? Nun, genauer gesagt, die Leser von European View, dem Organ des Centre for European Studies. Das ist der Think Tank der Europäischen Volkspartei, der Euro-CDU/CSU.

Lest einfach mal den Artikel und streicht alle Stellen an, die Euch stutzig machen. Fertig? Dann findet Ihr hier und dort die Musterlösung.