When data protection by design and data subject rights clash

• Data Protection by Design (DPbD), a holistic approach to embedding principles 
in technical and organisational measures undertaken by data controllers, 
building on the notion of Privacy by Design, is now a qualified 
duty in the GDPR. 
• Practitioners have seen DPbD less holistically, instead framing it through 
the confidentiality-focussed lens of Privacy Enhancing Technologies 
(PETs). 
• While focussing primarily on confidentiality risk, we show that some 
DPbD strategies deployed by large data controllers result in personal data 
which, despite remaining clearly reidentifiable by a capable adversary, 
make it difficult for the controller to grant data subjects rights (eg access, 
erasure, objection) over for the purposes of managing this risk. 
• Informed by case studies of Apple’s Siri voice assistant and Transport for 
London’s Wi-Fi analytics, we suggest three main ways to make deployed 
DPbD more accountable and data subject–centric: building parallel systems 
to fulfil rights, including dealing with volunteered data; making inevitable 
trade-offs more explicit and transparent through Data Protection 
Impact Assessments; and through ex ante and ex post information rights 
(arts 13–15), which we argue may require the provision of information 
concerning DPbD trade-offs. 
• Despite steep technical hurdles, we call both for researchers in PETs to 
develop rigorous techniques to balance privacy-as-control with privacyas-confidentiality, 
and for DPAs to consider tailoring guidance and future 
frameworks to better oversee the trade-offs being made by primarily wellintentioned 
data controllers employing DPbD.


Introduction
Data protection law has historically faced significant enforcement challenges.Data protection authorities (DPAs) have classically been underfunded and outgunned, possessing limited ability to scrutinize the onthe-ground practices of data controllers and restricted capacity to meaningfully act when transgressions are suspected.In response to these governance challenges, concerned communities have advocated a range of technological approaches that allow effective but noninvasive use of data, or 'DIY' protections which data subjects can adopt unilaterally. 1hese approaches, often called 'privacy-enhancing technologies' (PETs), are commonly discussed in regulatory circles within the context of 'privacy by design' (PbD).PbD emphasizes that issues of privacy should be considered from the start and throughout the design process through creative social and technical means.Most point to its intellectual home in a report undertaken by the Dutch Data Protection Authority and TNO, with support of the then Information and Privacy Commissioner for Ontario, Tom Wright, 2 although its heritage can be traced further back to the considerations given to 'technical and organizational measures' in the

Key Points
Data protection by design (DPbD), a holistic approach to embedding principles in technical and organizational measures undertaken by data controllers, building on the notion of Privacy by Design, is now a qualified duty in the GDPR.
Practitioners have seen DPbD less holistically, instead framing it through the confidentialityfocussed lens of privacy enhancing technologies (PETs).
We show that some confidentiality-focussed DPbD strategies used by large data controllers leave data reidentifiable by capable adversaries while heavily limiting controllers' ability to provide data subject rights, such as access, erasure and objection, to manage this risk.
Informed by case studies of Apple's Siri voice assistant and Transport for London's Wi-Fi analytics, we suggest three main ways to make deployed DPbD more accountable and data subject-centric: building parallel systems to fulfil rights, including dealing with volunteered data; making inevitable trade-offs more explicit and transparent through Data Protection Impact Assessments; and through ex ante and ex post information rights (Articles 13-15), which we argue may require the provision of information concerning DPbD trade-offs.
Despite steep technical hurdles, we call both for researchers in PETs to develop rigorous techniques to balance privacy-as-control with privacy-asconfidentiality, and for DPAs to consider tailoring guidance and future frameworks to better oversee the trade-offs being made by primarily wellintentioned data controllers employing DPbD.data protection directive (DPD) 3 and in the national and regional laws that preceded it. 4The term PbD entered use around 2000, with the Workshop on Freedom and Privacy by Design at the Computers, Freedom and Privacy 2000 conference in Toronto, 5 and a variety of papers made use of the term around that time. 6As laid out by the Information and Privacy Commissioner for Ontario from 1998-2014, Ann Cavoukian, PbD is not simply a set of organizational and technical measures to prevent information disclosure, but maps more broadly onto a wider idea of privacy as represented by the Fair Information Practices (FIPs) and even extends beyond them, aiming at a 'significant "raising" of the bar in the area of privacy protection'. 7hile recommendations of PbD by regulators have significant history, 8 the concept has only recently made it onto the statute books in Europe as part of the General Data Protection Regulation (GDPR). 9In doing so, it underwent a shrewd transformation into 'data protection by design' (DPbD).This metamorphosis, which some scholars have commented on as wise, 10 makes it clear that the aim is to ensure privacy as enshrined in data protection rights and principles, rather than the flexible, multi-layered and hard to pin down concept of privacy in general. 11While the European Commission has historically referred to the two concepts synonymously, 12 the focus on DPbD alone provides scope for further clarity.Lee Bygrave summarizes that DPbD requirements, as now enshrined in Article 25 of the GDPR (and also in Article 20 of the Law Enforcement DP Directive), 13 impose a 'qualified duty on controllers to put in place technical and organisational measures that are designed to implement data protection principles effectively and to integrate necessary safeguards into the processing of personal data so that such processing will meet the Regulation's requirements and otherwise ensure protection of data subjects' rights'. 14The relevant article, Article 25 of the GDPR, reads: 1. Taking into account the state of the art, the cost of implementation and the nature, scope, context and purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural persons posed by the processing, the controller shall, both at the time of the determination of the means for processing and at the time of the processing itself, implement appropriate technical and organisational measures, such as pseudonymization, which are designed to implement data-protection principles, such as data minimization, in an effective manner and to integrate the necessary safeguards into the processing in order to meet the requirements of this Regulation and protect the rights of data subjects.2. The controller shall implement appropriate technical and organizational measures for ensuring that, by default, only personal data which are necessary for each specific purpose of the processing are processed.That obligation applies to the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility.In particular, such measures shall ensure that by default personal data are not made accessible without the individual's intervention to an indefinite number of natural persons.

An approved certification mechanism pursuant to
Article 42 may be used as an element to demonstrate compliance with the requirements set out in paragraphs 1 and 2 of this article.What are these principles that technical and organizational measures should take aim at?They are found primarily in Article 5(1): lawful, fair and transparent processing; purpose limitation; data minimization; accuracy; storage limitation; and integrity and confidentiality.Article 5(2) introduces a further, additional overarching principle to the GDPR, 'accountability', laying the burden of proof on the controller to prove compliance with the six principles in Article 5(1).Yet, in contrast to these wide-ranging principles, within which reside the rights and obligations the legislation details, the PETs literature takes relatively singleminded aim at information disclosure.It focusses in particular on guarantees rooted in either information theory or the computational "hardness" of the resultant re-identification or disclosure problem. 15Despite attempts in related literature on complementary approaches to coin terms such as 'transparency-enhancing technologies' and 'profile transparency by design', the PET paradigm has dominated the 'by design' discussion in data protection contexts. 16Unlike the data protection paradigm, which has increasingly shifted to placing accountability obligations upon data controllers in an effort to make them trusted custodians of personal data, the PET paradigm departs from a 'diametrically opposed perception', not of the data controller as a trusted third party, but as an adversary. 17In a similar vein, recent taxonomies of privacy-enhancing technologies claiming to be 'comprehensive' consider privacy primarily in terms of disclosure risks present at different levels, rather than in terms of the multi-faceted nature of privacy espoused by Cavoukian and the European Commission. 18his notion of privacy-as-confidentiality sits at least apart from, and potentially at tension with, the notion of privacy-as-control as espoused by the FIPs and the GDPR. 19As the Article 29 Working Party notes, PbD incorporates rights such as erasure, noting that 'functionality should be included facilitating the data subjects' right to revoke consent, with subsequent data erasure in all servers involved (including proxies and mirroring).'They note that in addition to data confidentiality, 'controllability', 'transparency', 'data minimization', and 'user friendly systems' should be considered under the PbD umbrella. 20espite these clarifications by regulators, the renaming of the term to emphasize its focus, and the commentary in the literature on the wide array of protection goals that privacy engineering should have, 21 'privacy by design' in practice is often a narrower affair.Where data are of high dimensionality (where they have many distinct variables), many PbD approaches aimed at the 'unlinkability' of data 22 will inevitably fail to prevent information disclosure where faced with a capable adversary.This does not mean that PETs cannot be used to minimize or reduce risk in this way, but we argue that this minimization comes at a cost.That cost can be, as we demonstrate with case studies, the effective ability to wield data protection rights-the 'intervenability' promoted by privacy-as-control-over such data. 23The important rights of access and portability (Articles 15, 20), erasure (Articles 17), and the right to object to processing (Article 21) suffer in particular as a result.
There is a danger that data controllers implement privacy design strategies 24 that leave them with data that is difficult for them to re-identify, but far from trivial for an adversary to, given that adversaries likely have a high tolerance for inaccuracy and access to many additional, possibly illegal, databases to triangulate individuals with.The situation is worsened by the fact that a data controller may have relatively little technical reidentification capacity, while also having a very low tolerance for inaccuracy when it comes to their provision of core data protection rights, such as access or erasure.Indeed, to erroneously provide a data subject sensitive personal data of another in response to a subject access request would usually be in breach of the same law the controller would be seeking to comply with.These controllers, some of which we will illustrate below in case studies, have bound their hands in a very particular way.Their actions have reduced their own data protection obligations and shifted a risk onto the data subject, who has been stripped of her ability to manage the risk herself.When the data subject concerned loses trust in a previously trusted controller, there is nothing she can do but wait for a breach and hope that her record is unable to be effectively triangulated.
We do not intend to suggest that this is a deliberate tactic by the data controllers in our case studies (even though it might be an effective one).However, it does not need to be deliberate to be problematic.Trade-offs are a natural part of all complex decision-making, and the need to make them clearly rather than implicitly is a core component of good decision-making in valueladen contexts. 25Where there are very few organizational or technical measures supporting data protection deployed, DPbD is likely to benefit everyone.But where basic safeguards are already in place, satisfying everyone and their varying privacy preferences26 may become more difficult, as 'privacy' is no longer a case of Paretoimprovement (under which it can masquerade as a unified concept), but requires choosing a certain approach (eg confidentiality) to the detriment of others (eg control).Thinking in terms of data protection rights and obligations as we do in this article can make this challenge clearer: achieving one makes it more difficult, or even impossible, to achieve others.Not engaging with these trade-offs does not make them disappear, it simply means they have been determined in an arbitrary fashion.In this article, we do however present some vignettes which indicate that certain controllers do pursue an interpretation of these provisions, deliberately or not, which is unfavourable to the effective exercise of data subject rights.
Deliberate or not, these implicit trade-offs are not even contemplated by pre-emptive provisions in data protection law, such as data protection impact assessments (DPIAs).We believe that there are indeed grounds in the GDPR to support more consideration and transparency regarding the way these trade-offs are determined and communicates-and it is important we identify and use them in this way-but it requires new readings of many of the relevant obligations which this article aims to provide.Firstly, however, we turn to real-world case studies to explore this concern in context.

WiFi analytics on the London Underground
Between 21 November and 19 December 2016, Transport for London (TfL), the public transit agency for the UK's capital, ran an in-house trial using the WiFi networks installed at 54 of the stations they manage.They collected more than 500 million connection requests from devices passively transmitting their MAC addresses, with the aim of improving (i) customer information for journey planning and congestion; (ii) management of events and disruption; (iii) timetable planning and station upgrades; (iv) retail unit and advertising positioning. 27ransport for London, unlike many undertaking WiFi analytics, 28 were aware of legal obligations in this area, data protection in particular.TfL undertook a data protection impact assessment (DPIA) and met with the UK's DPA, the Information Commissioner's Office (ICO). 29They cite the ICO's WiFi Analytics Guidance 30 in their use of salting MAC addresses to make reidentification on the basis of device hardware data highly challenging for an attacker.In consultation with the ICO, users were informed using a 'layered approach', which included a press release picked up by the media, a news story on 21 November in the Metro (a free morning newspaper widely distributed and read on London transport), a linked website (tfl.gov.uk/privacy)adapted throughout the trial on the basis of feedback with users, 300 large posters on platforms and at station entrances, through social media and through briefings packs issued to station staff and stakeholder organizations. 31s location data is high dimensional, it is highly likely to be unique and easy to re-identify.A now classic study showed that only four spatiotemporal points are needed to single out the vast majority of individuals in a dataset, even where records are rendered significantly coarser (something that often heavily diminishes the data's utility). 32Unsurprisingly, TfL are therefore uncomfortable with releasing the dataset, refusing it on privacy grounds when requested under Freedom of Information law.They note, correctly, that: Although the MAC address data has been pseudonymised [. ..] given the possibility that the pseudonymised data could, if it was matched against other data sets, in certain circumstances enable the identification of an individual, it is personal data.The likelihood of this identification of an individual occurring would be increased by a disclosure of the data into the public domain, which would increase the range of other data sets against which it could be matched. 33me concerns have been raised over the nature of the 'salt' added to the MAC address or other identifier to generate the string to be hashed. 34While the ICO recommends that a salt be changed after 'a short period of time', 35 and the Article 29 Working Party recommends that a unique device identifier should only be stored 'for a maximum period of 24 hours for operational purposes', 36 it appears that TfL used a constant salt, generated by once typing letters at random on the keyboard with averted eyes. 37Such an approach creates two risks.Firstly, anyone who knew or discovered this salt could reverse engineer the process.Secondly, and arguably more probably, a constant salt links devices across days, making attacks not aimed at cryptography but based on external sources of data, such as knowing where someone was at four particular times in a week, more feasible.
One approach would seek to make extra efforts to de-identify the held data.The main way to make data more difficult to re-identify would be to give records more frequently-changing, difficult-to-reverse hashed identifiers.But this would likely be unacceptable for some controllers, as it makes the purpose of the analysis they seek to undertake difficult to fulfil, and so data subjects might suspect that data controllers would wish to transform the data in this way.For example, it would preclude the use of analysis to understand longitudinal patterns in data, restricting them only to what can be learned in a snapshot of time.This is far from the logic of the A/B testing style trials favoured in both industry and policy circles right now. 38et, another approach sits on the side of the data subject, rather than the controller.More specifically, it sits with capabilities and behaviours of the hardware used.Much of data protection law aims to build trust in data controllers as responsible stewards of sensitive information.Yet, proponents of personal PETs take what some may consider as a contrasting, comparatively dismal view of the world-a gloomy planet where every other actor is a potential adversary that wants to do harm to them with their data-and as such seek to adopt technical practices in order to minimize the information that any third party can learn about them.These practices are increasingly popular with some software and hardware producers.Apple's portable devices include MAC address randomization, which seeks to foil third parties working to build a longitudinal record of a particular device's network scanning activity.Some Android devices utilize this, although many manufacturers, such as Samsung, do not support or practice it. 39his has a similar, although not identical, 40 effect to regularly changing the salt, and serves to make persistent tracking harder.Yet, even with these approaches enabled, researchers are consistently finding ways, both statistical and based on technical implementation or other features of smartphones, to link individuals across contexts. 41As TfL recognize, despite protections placed at either the controller side or the device side, such data is not safe from re-identification attacks.
Given this risk of reidentification, particularly from adversaries were data to leak, does a data subject not have a right to understand the data that is being collected about them, and utilize their rights, such as the right to object to processing, or the right to erase data relating to them?It is not difficult to imagine a situation where a previously trusted data controller now loses trust, either to be a well-intentioned custodian of data, or to be capable of keeping it confidential with high certainty. 42While a data subject may well wish to do so, these protections, whilst not fully mitigating any risk, do effectively remove the ability of data controllers to provide the full range of data protection rights usually afforded to data subjects.Indeed, TfL note: The salt is not known by any individual and was destroyed on the day the data collection ended.Therefore, we consider the data to be anonymous and are unable to identify any specific device.As we cannot process known MAC addresses in the same manner as we did in the pilot, we are unable to complete any Subject Access Request for the data we collected. 43re TfL to attempt this, they would find that in the cases of some hardware, the difficulty would be compounded by the device MAC randomization practices described above.In particular, while devices can be identified with acceptable levels of accuracy for an attacker,44 the levels of identification achieved would be insufficient for providing guaranteed and comprehensive erasure, or accurate access (including avoiding divulging information about others). 45This reduces the protection afforded by law to the security provisions in the GDPR, as well as the trust in the controller to adhere to the principle of purpose limitation, giving the data subject little-to-no control over the data observed about them after the fact.
As mentioned, beyond subject access requests, another provision in the GDPR relates to the right to object to processing. 46Where the legitimate or public interest grounds are relied upon, data subjects should be 'entitled to object to the processing of any personal data' unless the controller can 'demonstrate that its compelling legitimate interest overrides the interests or the fundamental rights and freedoms of the data subjects'. 47This manifests as an 'opt-out' provision, recommended in relation to 'big data' analytics grounded in legitimate interests by both the ICO and the European Data Protection Supervisor (EDPS). 48Opt-outs from Wi-Fi analytics in particular feature in the ICO's guidance on the matter, 49 although whether they are mandatory under European law is unclear. 50This may yet change in the proposed updated ePrivacy regulation (at the time of writing entering trialogue negotiations), which has been amended to require opt-outs when WiFi analytics have been used. 51The Dutch Data Protection authority stopped short of mandating WiFi tracking firm Bluetrace to be required to offer opt-outs, instead settling for the company to undertake research into their technical feasibility, and pointing them towards opt-out registers being developed by Dutch civil society organizations. 52To the authors' knowledge, the company never did so, having instead opted to cease WiFi analytics entirely, its business model being incompatible with the requirements of the regulator. 53n addition, the 'Mobile Location Analytics Code of Conduct' proposed by the Future of Privacy Forum (FPF) has opting-out as one of its principles, noting that the option should be available on the website of an operator. 54Indeed, FPF themselves run an opt-out service which partners with some organisations selling Wi-Fi tracking technologies to provide a global opt-out list (https://optout.smart-places.org/).FPF note on their website, however, that Owners of iOS 8 devices that wish to opt-out of Mobile Location Analytics can still do so by visiting the Smart Store Privacy Opt Out Page.However, since this opt out works by recognizing the MAC address of an opted-out device, in the case of iOS 8 devices, any such opt out will be reset when the device's MAC address changes. 55is highlights another rights issue-that the Privacy by Design approach taken in the development of Apple devices, among others 56 prevents effective opting out without necessarily providing effective privacy.The ambient environment, much of which is rightfully untrusted, as anybody could silently set up a device capturing MAC addresses, leads hardware providers to make a value choice for data subjects.Whether opting out is possible given MAC randomization is a research question in and of itself.Legally, enforceable Do Not Track signals may be required-something which raise many issue in and of themselves that we do not seek to unpack in this article, suffice to say that they would require unprecedented coordination between the manufacturers of wireless tracking systems and those of mobile devices. 57

Apple's 'Siri' voice assistant
Voice assistants are commonplace in a range of devices.Typically, these systems, including Microsoft's Cortana, Google's Assistant and Apple's Siri, work by recording and compressing audio data, processing it for transcription on the company's servers, and returning the transcript to the phone, where a local speech synthesis system may 'reply' to the user.The use of this approach has allowed unprecedented accuracy in speech recognition, as well as avoiding energy, resource and space-intensive processing on the terminal device.Many people use these technologies to activate device functionalities, or to dictate messages or documents.
Firms may provide the recording data to data subjects.Google, for example, provide a tool where voice and audio data can be searched and managed. 58These can be seen as meeting their access obligations under European data protection law, although unlike many implementations of access rights, there does not appear to be a difference in these tools inside or outside the USA.
Other firms, notably Apple, despite providing a nearidentical service to their competitors in this regard, do not provide these data to data subjects automatically, nor do they provide such data upon explicit request under the Irish Data Protection Acts. 59In correspondence with one of the authors, they cite privacy-by-design as the reason for this.Apple's notion of Privacy by Design in relation to voice assistant data seems to hinge on three aspects.
Firstly, Apple claims that voice identifier data is divorced from the usual identifiers that Apple users are familiar with.While Google users log-in with their account details, under which all their voice data are then listed, Apple generate device-specific identifiers that are separate from these identities.
When Siri is turned on, the device creates random identifiers for use with the voice recognition and Siri servers.These identifiers are used only within Siri and are utilized to improve the service.If Siri is subsequently turned off, the device will generate a new random identifier to be used if Siri is turned back on. 60vertheless, these are persistent identifiers.It appears that if the user never disables Siri in the device's settings, as we might expect few users to do rather than simply opting not to use it, the identifier persists throughout the lifetime of the device.Apple claimed in correspondence that they do not have a technical means to access the Siri identifier on the device, nor to search the data by identifier, as they have chosen not to build one. 61econdly, Apple claim that data usually have their linked identifiers scrubbed, and are eventually deleted after certain times have elapsed: User voice recordings are saved for a six-month period so that the recognition system can utilize them to better understand the user's voice.After six months, another copy is saved, without its identifier, for use by Apple in improving and developing Siri for up to two years.A small sub-set of recordings, transcripts and associated data without identifiers may continue to be used by Apple for ongoing improvement and quality assurance of Siri beyond two years.Additionally, some recordings that reference music, sports teams and players, and businesses or points of interest are similarly saved for purposes of improving Siri. 62irdly, Apple claim that while Siri is able to recognize your name, it does this by sending such details from your phone each time Siri is used, until such a time where it has not been used for ten minutes, upon which it is deleted from the remote server. 63

Issues with this conception of privacy by design
Upon first glance, the above may seem like privacypromoting design features.Yet, there are significant conceptual flaws with each, as well as the entire system, that means while Apple currently find it difficult to access this data, re-identification would be possible, if not relatively trivial in some cases.
Firstly, refusing to build a database retrieval tool is no basis on which to refuse data subject rights.Retrieval is generally a standard feature of database systems.Indeed, it is arguably their very purpose.In most cases, data controllers would have to proactively modify their systems in order to remove such functionality from standard database software. 64econdly, refusing to make the device identifier accessible to the data subject through the design of the software while still enabling it to be transmitted regularly to the data controller serves little practical purpose other than obstructing the data subject's ability to verify it is indeed them requesting the data. 65Indeed, this seems to be doing more to stand in the way of data protection rights than provide privacy by design.Recital 30 is quite clear that such identifiers would be considered associated with a natural person, noting that 'online identifiers provided by their devices', including those provided by RFID (which, being imperceptible, are similarly inaccessible to the average data subject), may either directly enable profiling or identification, or may do so indirectly, such as in combination 'with other information received by the servers'.
Thirdly, while Apple note that they do not permanently save the name you provide on the server, they do save many kinds of information of similar or even greater use in re-identification alongside your identifier.Indeed, Apple note that because it is onerous to send details such as relationships with family members, reminders, and playlists to the server each time a Siri session is started (and would likely introduce unwanted lag and/or data use), they send those initially, and store them there.Even if we were to accept that a device specific identifier was not personal data (despite the rulings surrounding MAC addresses and even dynamic IP addresses), a list of their contacts and their relations to you is relatively trivial even for non-experts to use to re-identify individuals by using easily accessible data sources, like social media.It seems similarly likely that simple re-identification attacks could be formulated against things such as reminders, particularly as they often mention the names of organizations or individuals.
Fourthly, a significant body of research has demonstrated that individuals can be re-identified and clustered by voiceprints alone, which have such reidentification potential that they are being used and proposed for biometric authentication. 66Apple themselves even possess several patents in this area from their own in-house research activities. 67Even based on text transcripts without the voice data, researchers have demonstrated attacks that can re-identify or cluster individuals stylometrically, based on the words and grammar they use. 68ompounding this, it is not just how things are said, but what is being said.Sensitive data can be said and held in textual form.How to redact terms that might disclose sensitive data is an active field of research. 69his is very challenging even when the forms of text are relatively standardized, such as in medical documents 70 -standardization not present in messages or other spoken interactions.As a recent review notes, '[g]eneral-purpose privacy solutions for plain text are scarce and they only focus on the protection of sensitive terms, which are assumed to be manually identified beforehand'. 71These systems have not been developed with conversation transcripts in mind; it is unclear that there are effective privacy mechanisms in place here that would defend against re-identification.Furthermore, sensitive data is likely to be recorded, including special categories of data under the GDPR, such as political opinions.Without guarantees that private and reidentifiable parts of a conversation have been redacted, which seem technically difficult, if not currently impossible, to provide, little assurance can be given.

Breaches of data which identify a user but do not contain contact information
Many of the most high-profile data breaches concern data which is conventionally personally identifying, such as full names, home or email addresses, or telephone numbers.In such cases, contacting the individuals affected is relatively straightforward since the controller holds relevant details which would enable a communication channel.However such data types needn't be involved for significant negative effects to occur.As discussed in section 'Case studies of rights lost in the balance', there might be unique identifiers from an end-user's device such as a MAC address, IMEI number, and device-generated advertising IDs.Other examples of data sources that would allow a data subject to be identified but not easily communicated with include high-dimensional data, such as web browsing history or lists of available plugins. 72Web tracking is a common practice, often using browser fingerprints rather than an explicit, provided identifier.Data such as these have, unsurprisingly, been shown to contain sensitive insights.Facebook has in the past used such approaches to profile individuals by 'ethnic affinity', for example. 73These high-dimensional data are also often highly identifying.One 2010 study found that 83.6% of users visiting a website had a unique device fingerprint, with an additional 5.3% sharing their fingerprint with just one other record.A breach involving web browsing data could enable an attacker to single out an individual's entire browsing history based on supplementary data about just a few pages they visited, an attack demonstrated by researchers on German members of parliament, judges and other public figures. 74Other devices and modalities than the web are similarly vulnerable.For example, researchers have been able to identify individuals from their gait using just the gyroscopic sensors on a phone for over ten years. 75The resulting societal situation has meant we are seeing growing instances of high dimensional datasets covering many data subjects' activities that are capable of being mined for their sensitive information, but lacking straightforward nominal identifiers or contact information.
What are the consequences of this?Imagine a breach of such data.What would be required to occur?A notification to the data protection authority under Article 33(1) GDPR would be required.But what about a notification to the individuals whose data has been accessed?That falls under Article 34, which states that '[w]hen the personal data breach is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall communicate the personal data breach to the data subject without undue delay'.This is the case unless there were appropriate technical and organizational protection measures that 'render the personal data unintelligible to any person who is not authorised to access it'; subsequent measures to mitigate the risk had been taken; or it would involve 'disproportionate effort', in which case a 'public communication or similar measure' could be substituted.
Firstly, in some cases an argument could be made that the personal data were unintelligible, and therefore they do not trigger individual notification requirements under Article 34.This might indeed be the case where data were suitably encrypted with state-of-the-art technologies.It is important, however, to draw a distinction between unintelligibility owing to encryption, and merely not being able to associate a record with a particular individual without supplementary data points.What may seem unintelligible, and therefore nonidentifiable from the data controller's perspective, might in fact be identifiable with access to minimal additional data, as demonstrated in the aforementioned attack on the browsing habits of German public figures.As attackers will likely hold different and usually illicitly acquired, personal datasets not available to the breached data controller, this distinction becomes highly important to consider.
Secondly comes the thornier topic of the 'disproportionate effort' that may be required to communicate to individuals.Data such as cookies, browser fingerprints, or device-specific identifiers are not traditional means of identifying someone for the purposes of communicating with them.However, they are frequently assigned and collected by behavioural advertising networks precisely to 'communicate' with individuals through the specific medium of in-browser or in-app advertisements.In a lighter vein, some well-known internet pranks involve buying eerily targeted adverts aimed at individual friends, intending to unsettle them. 76Such examples demonstrate that even without personal contact details for traditional communication channels, such organizations may have ways to contact their data subjects despite lacking traditional contact details.
Thus, one option for such organizations to communicate breaches to affected individuals in a manner that could be deemed proportionate might even be to purchase advertising space, using the same data to target them once more, in order to tell them their data had been breached.Particularly when it comes to shadowy data brokers that the majority of data subjects are unaware of the identity of, let alone details of their practices or how to contact them, a 'public communication' would be unlikely to be effective, not least because these companies have purposefully never developed channels or a capacity to communicate with data subjects.
Another option would be to facilitate communication via those service providers who do have the capacity to link non-traditional identifiers with traditional communication channels, and who could facilitate the communication of a data breach from an organization to the affected individual.Selecting an appropriate intermediary would depend on the context, but for instance, a device manufacturer/operating system provider like Apple or Google can readily link a device ID to an email address, and thus allow an ad network who identifies users by device IDs to communicate a breach to the affected individuals by email.Another example might be a cellphone network service provider who can easily link SIM numbers to IMEI numbers and thus facilitate breach notifications via SMS or phone call.
While the viability of various breach notification measures is highly context-dependent, these examples demonstrate another way in which privacy-asconfidentiality is in tension with data protection principles-in this case, transparency regarding breach notifications.Technical choices which promote privacyas-unlinkability (eg not being able to associate highdimensional browsing data to a relevant contact address), could end up denying data subjects the right to know about breaches which have a high risk of affecting them.

Putting the data protection in DPbD
If DPbD risks taking away rights, as in the cases illustrated above, how might we rectify or ameliorate this?Here, we propose some approaches that might help do this, assess their possibilities and pitfalls, and place them in legal context.

Parallel systems to fulfil DP rights
One set of options would be to maintain parallel systems with the explicit purpose of upholding these rights.Here, we outline two main types of these systems in legal and technical context: systems designed to retain data to provide access and better enable erasure and objection, and systems designed to process additional data, which may be provided by the data subject, to make re-identification possible.

Obligations to retain data
In College van burgemeester en wethouders van Rotterdam v MEE Rijkeboer (hereafter Rijkeboer), 77 the CJEU was referred a question relating to a case where Mr Rijkeboer, a Dutch citizen, asked the Mayor and Executive Board of Rotterdam to provide him with details of the third parties to which any information relating to him held by the municipality had been communicated to.In Mr Rijkeboer's case, the data controller had replied positively but partially, providing only information relating to the previous year, as Dutch law and practice provided that the data from the year preceding had been wiped.The question to the court was whether, in the absence of a timeframe provided within the access rights of Article 12 of the DPD, Member States could impose deletion of such datawhich was in a sense metadata about the data held on Mr Rijkeboer 78 -after a certain period of timemeaning that such access rights could not refer to data outside this time period.
The Mayor and Executive Board of Rotterdam, the United Kingdom, the Czech, Spanish, and Dutch governments submitted that the right of access 'exists only in the present and not in the past', while the Greek government and the European Commission submitted that it applies not only to the present but also extends into the past. 79The court ruled that such a right must necessarily relate to the past to ensure the practical effect of access, erasure and rectification provisions, 80 that the exact time limitation was up to further Member State rule-making, but that a period of one year alone does 'not constitute a fair balance of the interest and obligation at issue' unless it can be shown that anything longer would lead to an 'excessive burden' on the controller. 81Indeed, any time limit upon this metadata should constitute 'a fair balance between, on the one hand, the interest of the data subject in protecting his privacy, in particular by way of his rights to object and to bring legal proceedings and, on the other, the burden which the obligation to store that information represents for the controller'. 82is ruling is pertinent to the current discussion, as it directly places the question of data subjects' rights against what the court described as the 'burden' that data storage places on the controller-a burden which consists increasingly of securing this data against adversaries, rather than just the simple cost of storage media.The Court acknowledged that this was a trade-off that the DPD did not contemplate explicitly; the same can be said of the GDPR.
Another relevant point from this ruling is the distinction made by Court between two types of data in light of the right of access. 83Firstly, that the 'basic data', used for the functionality of local service provision, was being stored for a longer period than the data regarding the transfers (which of course may be sensitive to the data subject), was noted to be a source of the unfair balance that had been struck by the Rotterdam Mayor and Executive Board. 84Put differently, one could say the controller adopted a different retention policy for 'content data' (ie the actual personal data such as individuals' names) as opposed to 'metadata' (eg information relating to how the personal data was used and its source).This has an interesting, although not exact, parallel to some alleged PETs.In these technologies, we can also distinguish between different types of data; the full, potentially identifiable data collected, and the transformed data which is now more difficult to link to data subjects.The former is erased after a certain timeframe, 85 often at the time that it is transformed into the latter for retention.Is this erasure a "fair balance"?
Distinguishing different types of information for the purposes of data management is a common industry practice.Yet, the ways in which data are classified within organizations do not always have neat analogues the legal framework.Take the Siri identifiers.Siri identifiers clearly single out a data subject, as they are persistent identifiers that link to a device typically only used by one person.One of the major purposes of this system is to deliver a personalized voice assistant to a data subject.As a result, Article 11(1) GDPR, which relieves data controllers from having to 'maintain, acquire or process additional information in order to identify the data ARTICLE Downloaded from https://academic.oup.com/idpl/article-abstract/8/2/105/4960902 by Said Business School user on 04 July 2018 subject for the sole purpose of complying' with the Regulation (notably accommodating data subject rights), does not apply.And even if it were, Article 11(2) still enables data subjects to have their data subject rights accommodated upon providing additional information that does allow the controller to (re-) identify them.However, the Regulation does not seem to contemplate that technological developments have allowed identification of a data subject for the purpose of service delivery and data processing, but not for the purposes of data access.
In this case, how might a data controller use a parallel system to augment the identification process being undertaken for service delivery to also allow data access?Apple IDs could be stored alongside Siri identifiers in a separate database.While Recital 64 of the GDPR does note that a controller 'should not retain personal data for the sole purpose of being able to react to potential requests', here the controller already holds both sets of personal data, and only needs to establish a link between them.Asking controllers to purposively make it difficult to consistently find a data subject across many datasets held seems problematic in light of the practical challenges of GDPR implementation.A mechanism could also be implemented on the device to obtain the identifiers used.The core question that relates to these approaches is of security.A centralized list in the first case presents a significantly heightened re-identification risk were attackers to gain access to this data.Both options jeopardize what one might suspect to be among Apple's deeper aims-to claim their hands are tied when faced with law enforcement or intelligence services requests, as they have done publicly before. 86hile these approaches rely on burying data in a haystack, it may also be possible for Apple to provide this data to users in a form only they can access, using encryption techniques.In this case, the data controller would not be retaining the data in a form they could access, but instead providing portability from the outset.Indeed, users might find it useful to have a repository of speech data and transcripts in order to quickly train any new system, were they to change providers.If data protection by design means access and portability by design, there are feasible design solutions that could form part of the strategy from the outset.This may also allow a user the effective right of erasure; the data they hold could be automatically compared with the de-identified database, and the matches removed.
Just as Rijkeboer made the data controller have to rethink their data retention process, it seems feasible that future rulings could also take aim further upstream, at the "fair balances" being struck in the design process.
Acquiring additional information (Articles 11; 12(2) and Recital 57, GDPR) Re-identifying data to an acceptable percentage of certainty to exercise data protection rights may be difficult in practice for any data controller practicing certain types of DPbD, regardless of intention.The GDPR recognizes this in Articles 11 and 12(2), which exempt the data controller from having to accommodate data subject rights if it can demonstrate it is not in a position to identify the data subject.Article 11(2) however, grants data subjects the ability to provide additional data to enable such (re-)identification, something not every data subject might be inclined to do. 87The final call though, seems to be in the hands of the controller.Pursuant to Article 12(2) in fine, the data controller still has an opportunity to demonstrate not being in a position to (re-)identify, even after being provided with additional information by the data subject.
Having said all that, it would still require a considerable burden of proof to adequately demonstrate reidentification is not possible, even despite additional information being provided by the data subject.This does not only appear from the GDPR's general emphasis on accountability and weightier focus on data controllers' responsibilities, but also manifests itself through Recital 57.This Recital notes that while the data controller 'should not be obliged to acquire additional information in order to identify the data subject' to comply with the regulation, they 'should not refuse to take additional information provided by the data subject in order to support the exercise or his or her rights'. 88Taking a step back, it is of course important to emphasize that an (alleged) inability to fully accommodate data subject rights cannot be exploited to evade data protection law altogether, and that all other provisions (notably those in Article 5 and 6) still apply in full. 89n unanswered question remains-such acquired or volunteered additional information still requires a reidentification process that while very possible, may not be straightforward to the data controller to undertake.Indeed, data controllers may not have expertise in this space, particularly when it does not form part of their core processing activities.While Recital 57 shines little light on this, Recital 26 provides some guidance, suggesting that factors such as cost, time, available and emerging technology should be taken into consideration. 90his provides an interesting avenue for a policy intervention-a possibly controversial one-that could support data subjects' rights.While theoretical attacks for re-identification are often possible, and would likely undermine the privacy-by-design approaches taken above, there is a valid argument about whether these technologies are 'available' in the context of Recital 26.While they might be available to attackers, and traded, like stolen data, on shady online markets, this creates an imbalance between the deployable technologies available to data controllers and those available to their adversaries.Is there an obligation on data controllers to develop (or to procure from security companies) state-of-theart re-identification tools in order to make data subject rights possible?
There is a parallel here with other examples, albeit not all in the EU, in which certain individuals are owed redress by an organization who lacks the means of identifying them.After a U.S. financial lender, Ally Financial, was found to have racially discriminatory car loan pricing, they were ordered to used census data to estimate which of their borrowers were Black, Hispanic, or Asian in order to (imperfectly) identify the rightful recipients of compensation. 91Similar efforts might be beneficial in the case of data breaches, where publicly available information could be mined in order to identify a means of contacting affected individuals who are otherwise only known to the controller by their browsing history, device fingerprint, or other data.
Data protection law, in an attempt to be technologically neutral, is silent on imposing specific innovation requirements on data controllers-which is probably a good thing, as mandating technological advancement through legislating it seems like a misguided idea. 92But were governments, academia, or civil society to develop and make re-identification tools for high-dimensional data publicly available, with a codebase compatible with many types of commercial systems, it would be hard to deny these technologies were 'available' in the sense of Recital 26.Additionally, the possibility for certification bodies outlined in the GDPR may provide a further avenue for keeping up-to-speed on the state of the art technologies in this space. 93Yet, this comes with its own security risks.Not only are these tools then available to attackers, but they may even be installed and calibrated on the systems that data is being illicitly obtained from, leaving adversaries a little like 'a kid in a candy store'.
When these re-identification mechanisms are already designed however, and out in the published research literature, 'putting a lid on them' would appear to be a poor policy approach.Even where the codebase is scrappy and unreliable, these are precisely the types of tools that 'script kiddie' adversaries are used to working with.Imagining that making these tools more useful and deployable would only serve to help attackers is likely to underestimate adversaries' existing capacity to use and generate knowledge to valorize stolen personal data, as well as to understate the benefit of such tools for giving data subjects more control over the data they are entitled to legal rights over.Indeed, making them more usable may not vastly increase the capabilities of attackers that were always willing to string together unreliable code, and may primarily serve to empower data subjects to manage risks relating to them.

Making trade-offs with Data Protection Impact Assessments
Given the value-laden nature of these trade-offs, it is important that they are made in an explicit way, with care and with rigour.As it stands, the GDPR, being extremely vague about what DPbD means, does not acknowledge either in recitals or the enacting terms the existence of trade-offs within design approaches.When these trade-offs, as we have shown, involve the fundamental rights of data subjects, this is unacceptable. 94ata Protection Impact Assessments 95 (DPIAs) are positioned as a potentially apt point in the compliance process to consider the trade-offs present when employing DPbD strategies.DPIAs are the main form of ARTICLE Downloaded from https://academic.oup.com/idpl/article-abstract/8/2/105/4960902 by Said Business School user on 04 July 2018 preemptive analysis and documentation requirement in the GDPR, taking particular aim at high-risk processing. 96The GDPR explicitly, albeit in the recitals, notes that the 'risk to the rights and freedoms of natural persons, of varying likelihood and severity, may result from personal data processing which could lead to physical, material or non-material damage, in particular: [. ..] whre data subjects might be deprived of their rights and freedoms or prevented from exercising control over their personal data'.97 Accordingly, the Article 29 Working Party identifies, as one of the criteria leading to high risks to data subjects, situations where 'the processing in itself "prevents data subjects from exercising a right or using a service or a contract"'. 98et in the same guidance, it is worth noting that DPbD gets only a fleeting mention as another preemptive approach comparable to DPIA; the Working Party is silent on including DPbD itself as a topic within DPIAs. 99It is furthermore easy to see how DPbD measures such as those discussed could be seen as one of several 'measures envisaged to address the risks, including safeguards, security measures and mechanisms'. 100s processing undertaken in response to otherwise risky processing with the intention of decreasing that risk, they might escape the scrutiny applied to the original concern.While an infinitely recursive DPIA is highly undesirable, so is one that lacks appropriate reflexivity.
While a DPIA might, potentially and with further clarification, provide a venue for considering trade-offs, this approach has a number of limitations owing in particular to the weakening of certain key provisions in the final text of the GDPR. 101While the requirement to 'seek the views of data subjects or their representatives' during the DPIA process suggests that those individuals affected can articulate their views about appropriate trade-offs, this obligation is limited.It is only required 'where appropriate' and 'without prejudice to the protection of commercial or public interests'.The exemption from consulting data subjects where it might affect 'the security of the processing operations', presents yet another situation in which protection of data through obscurity could excuse the pursuit of other substantive data protection obligations.As a result of these limitations, such consultations may often in practice constitute a form-filling task, particularly as these views to be sought are not grounded in any particular task or question, and are not (as we discuss further below) required to be published or publicized.

Right to information about privacy architectures
As we have described, 102 we can increasingly locate examples where data subjects' personal data is being processed without the accompanying data subject rights effectively being enabled.Yet it seems rare for data subjects to be informed before the time of collection or processing that such rights will not apply.Where they are, claims seem highly generalized.Apple's Privacy Policy, for example, simply states that they 'may decline to process [access] requests that are frivolous/vexatious, jeopardize the privacy of others, are extremely impractical, or for which access is not otherwise required by local law.' 103Which data will be 'extremely impractical' to exercise rights over?Which will be considered to 'jeopardize the privacy of others'?Without this information, it seems unclear that a proper evaluation could be made by a data subject as to whether she wishes to entrust her personal data to such a controller.
Must a data controller, explicitly and without request at the time data are obtained, warn a data subject that the rights they might expect do not exist?This would seem critical if, as data protection law expects, data subjects are to play a part in managing the risks in accordance with their own preferences.There is a requirement to provide 'information necessary to ensure fair and transparent processing', including 'the existence of the right to request from the controller access to and rectification or erasure of personal data'. 104Yet, it is unclear whether this is a provision that requires the existence of these rights in a general sense-an awareness raising measure, as well as one seeking to provide logistical support (eg through pointing to the relevant controller contact details)-or whether this is an existence of these rights in applied context, considering each type of data processed by the controller.We feel that in light of the overarching transparency principle in Article 5(1), linked explicitly to Articles 13-14 in Article 12(1), the latter reading is well-supported.Considering that Article 11(2) contemplates times when there might be no 'existence' of these rights, it makes sense that this requirement would not apply in those cases.Does this mean that data controllers would have to invert the obligation, and explicitly tell data subjects that their rights will not be honoured?That would be very useful, but it seems less clear.If this was the case, would controllers also have to tell data subjects of the non-existence of automated decision-making, which in Articles 13(2)(f)/ 14(2)(g) is phrased in a very similar way?Given that 'solely automated', 'significant' automated decisions seem rare in practice, 105 more often than not this would serve to bulk up information notices in a quite meaningless way.
However, it is clear that there is an obligation on data controllers to provide the reasons for their nonfulfilment of a specific access request ex post.In contrast to the unclear scope of ex ante information requirements, relevant ex post information requirements expect controllers to provide more detailed information on when subject rights are not available and why. 106But there is no explicit hook in Articles 13-14 for a data controller to provide information ex ante about the DPbD measures that might be restricting such rights so that a data subject might assess them, nor provide information on the safeguards being applied to their data that might affect their ability to exercise their rights.Given the importance of these rights to the data protection regime as a whole, this seems problematic in relation to the transparency and accountability principles of the GDPR. 107e argue however that such a requirement can be read into the obligation to provide 'meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject' of automated decision making. 108Commentators have historically viewed the potential of automated decision rights in relation to 'algorithmic accountability' discussions, through the lens of 'decisions' individuals encounter in their day-to-daylives such as credit scoring or behavioural targeting. 109et, in relation to the envisaged removal of fundamental rights using automated processing, we argue that an automated decision ('which may include a measure') 110 could also be considered in relation to processing that happens internally, within a data controller or processor.These rights have been considered strongly restricted by both a restriction to be 'solely' automated, and to trigger 'legal' or 'similarly significant' effects on individuals.There is strong reason to believe that the systems we have been discussing in this paper meet both conditions.Firstly, privacy enhancing technologies rarely have humans in-the-loop after their initial setup-usually this would undermine mechanisms reducing information disclosure-and as a result, we can broadly think of these technologies as 'solely' automated.Secondly, the removal of rights would arguably have both a 'legal effect', in the sense of changing a data subject's position with respect to Article 11, and a 'similarly [significant]' effect, impacting on fundamental rights and freedoms.
As there appear to be grounds to meet this condition, we lastly have to consider whether or not a discernable 'decision' has been made.The most clear indication that it has in the case of DPbD is that Recital 71 specifically includes that the scope of automated decisions 'may include a measure'-the precise terminology in Article 25(1) describing DPbD as 'technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented'.Some might say that these measures happened at the moment of system design, not at the point of processing, and therefore, not being solely automated nor affecting a single data subject at that point, no information obligation exists.Yet, to apply this reasoning to profiling systems, such as behavioural advertising, would be absurd.While at a mechanical level, visiting a webpage might trigger the application of a pre-built profile to deliver advertising, 111 the 'logic involved' would presumably not (and seemingly not in the eyes of the A29WP) 112 be restricted to the last leg alone-that a user requested online components, which matched a browser fingerprint to a profile accessed a database, and therefore was provided specific content-but would refer to the broader system insofar as it was relevant to the final decision, including the construction of the profile in question.In a similar manner, rights to understand DPbD systems which are applied automatically would presumably have some broader, systemic notion applicable to them as well.
In the case of a 'measure' such as DPbD, it might be hard to imagine what a right to a human 'in-the-loop', the core remedy offered in Article 22 that Articles 13-15 refer to, would look like in this situation.Yet, these information rights are not explicitly fully restricted by the compatibility of the remedy in a separate article.Indeed, we note that not only does the terminology in Articles 13-15 refer to 'automated decision-making', without either of the conditions in Article 22, it also counsels that it is 'at least', not only, in the context of Article 22 that these rights trigger, opening the door for less restrictive judicial interpretations in the future.
The effect of this reading of automated decision information rights on DPbD measures which prevent the effective exercise of other rights would be twofold.It would firstly oblige controllers to provide 'meaningful information about the [. ..] significance and envisaged consequences' of such processing-the loss of data protection rights.They would have to do this ex ante-the Article 29 Working Party has recently taken the view that the 'meaningful information' rights in Articles 13-14 should provide identical information to those in Article 15. 113 At minimum, this provision would have the same effect we argue is present in Article 13(2)(b) and 14(2)(c) above, reinforcing our reading of the GDPR that to inform data subjects of these lack of rights is mandatory.A more generous reading could even see it go beyond this.The 'consequences' of the loss of data protection rights include a loss of control, and as such this might entail a discussion of the re-identification risk were such data to be accessed without authorization.Insofar as 'envisaged' is understood as 'intended' 114 rather than 'foreseen', it could be countered that the data controller does not 'intend' a data breach, and therefore would not be required to inform data subjects about its potential consequences.Yet, given that such a breach could be highly damaging to data subjects, it is likely to trigger the separate 'significance' requirement, even were it to dodge the 'envisaged' one.
The second consequence relates to the 'meaningful information about the logic involved' requirement.This gets us closer to an obligation on the data controller to provide information about the extent, form and structure of relevant safeguards in a way that can be assessed by the data subject-or indeed, given these are ex ante information rights not requiring an existing data subject to trigger, by interested parties more broadly.The Oxford English Dictionary defines 'logic' as 'a system or set of principles underlying the arrangements of elements in a computer or electronic device so as to perform a specified task.'Where the task is partial deidentification or some other computational transformation to render such data difficult to single out, and thus deprive and individual data subject of certain data protection rights, this would indicate that a basic-and importantly, a 'meaningful'-schematic would be provided.The 'meaningful' condition, one of the few changes to these rights from the DPD, obliges information about this logic to relate to the data subject in a useful way-but given a lack of detailed requirements written in the GDPR, we will likely have to wait for this right to be tested to understand how far it will take data subjects.
Despite a lack of detailed requirements relating to information rights, following Articles 5(2) and 24(1), data controllers are expected to be able to adequately demonstrate compliance with all GDPR provisions, which includes security and DPbD obligations.As discussed above, 115 DPIAs might be an important venue for demonstrating this compliance and hammering out the trade-offs faced.As it stands however, they do not seem to be a reliable transparency mechanism-there is no obligation to publish these documents under the GDPR, and indeed industry opinion is highly opposed to such an obligation, usually on grounds of their potential to contain trade secrets and proprietary information. 116When passed to a DPA as part of prior The problem with the lack of publishing of this information does not relate to an imaginary world where engaged data subjects pore over the minutiae of DPIAs, but in general the lack of rigorous scrutiny expected of a pluralist society afforded to organizational or technical approaches to privacy.Oversight is unlikely to be useful if only provided at an individual level.Just as individuals suffer from consent fatigue, many of the solutions for the increasingly complex processing ecosystem today risk of a 'transparency fallacy', where the responsibility for obtaining and digesting complex information about computational systems falls, unhelpfully, on the data subject. 118Instead, having third parties placed as beneficiaries of some information rights would be a useful future step.While DPAs have significantly increased powers to investigate data controllers, this usually happens only after a complaint has been raised. 119It is difficult to raise a complaint about improper or ineffective applications of privacy or data protection by design without some insight into the system infrastructurethat is, until such systems fail in a large and noticeable way, at which point transparency is hardly a helpful remedy.In theory, high-risk processing where risks cannot be sufficiently mitigated must involve consultation and prior authorization with the responsible DPA. 120et, the precise risks we have been discussing occur during this mitigation process, and the trade-offs are commonly ignored-because they present separate risks from the initial high-risk processing, we are not optimistic they will be flagged and identified.Article 80(1) and 80(2) envisage some role for bodies to exercise rights either mandated by a data subject or (optionally, subject to member state derogations) without them, but these rights do not include those concerning information provision.Aligning information rights with those that can understand, investigate and report potential breaches, or simply to publicly highlight the existence of state-of-the-art technologies that can better make the trade-offs between different aspects of privacy and data protection in the context of DPbD, seems like a requirement for the future, particularly as systems become more pervasive, invisible and complex.

Conclusions
Privacy by design was initially defined as a holistic concept.Amidst the vigorous and welcome research in how technological approaches can help us achieve it, its wellrounded nature has been somewhat lost.PbD, and the DPbD now mandated by law, is seen increasingly as a synonym for the formal privacy enhancing technologies literature that take reducing unwanted information disclosure as their sole goal-partly, at least, as it is a mathematically tractable, single optimization target.These literatures undoubtedly provide useful tools for both data subjects and controllers-we do not contest thatbut they are not designed with data protection in mind, when we are sorely in need of such technologies to help us uphold data protection principles in today's datarich world.
Because data protection does not take sole aim at information disclosure, but a framework of rights and obligations intended to strike fair balances between a wide array of societal aims, fundamental rights and personal freedoms, we argue that the way that deployed PbD solutions trade-off against these rights while leaving significant residual risk to data subjects is problematic.We accept it is often impossible to have everything at once, but believe that this means decisions about which rights and risks to prioritize over others must be openly discussed and decisions rendered accountable.At a high level, the GDPR's transparency and accountability principles would appear to necessitate it.DPIAs are a good venue to make trade-offs, but are intransparent from a lack of publishing requirements, and recent guidance around them has omitted obligations to consider DPbD specifically.This is particularly important as controllers do have economic incentives to minimize obligations to fulfil data subject requests.While we believe that they are welcome to maximize their economic logics within the boundaries of the law, and that doing so is not a nefarious aim, there is a danger that a range of practices emerge that go unscrutinized as a result.Scrutinizing the effectiveness of technical privacy strategies is important to ensure that trade-offs made, implicitly or explicitly, deliberately or not, are compatible with both overarching data protection principles and Article 8 of the Charter.If the form of DPbD is itself shrouded in secrecy, it seems difficult to believe that meaningful oversight is possible.
We urge DPAs and other relevant actors to update their guidance to ensure that ex ante transparency rights are enforced to include specific information as to where and why data subjects can expect to lose their rights to DPbD trade-offs.While it is clear in the GDPR that data subjects are entitled to this information ex post, when an attempt to use a right such as access, erasure or objection has been refused, we have argued that the Article 13-14 ex ante information rights may be more powerful than considered in this domain.Not one but two sources of the GDPR can be drawn upon to support this claim-notification of the existence of rights (Articles 13(2)(b)/14(2)(c)), and meaningful information about significant solely automated decisions and measures (Articles 13(2)(f)/14(2)(g)), both of which to differing degrees appear to oblige data controllers to lay out beforehand which rights and obligations are not being provided, and why.
We urge technical communities, civil society and regulators to support the development of re-identification technologies in order to enable data subjects to be located in partly de-identified datasets so they can better manage their own risks.Re-identification tools are constantly developed by both academic researchers as well as more nefarious actors, yet the codebases for these tools are often too scrappy for easy use by controllers intending to be legally compliant.We argue that making tools to enable rights available will increase data subject agency (by increasing the number of 'state-of-the-art' technologies that are 'available' to controllers) more than it will meaningfully increase the capabilities of adversaries, who, as adept users of scrappy code, are core beneficiaries of the current imbalance.
We also urge the technical community developing PETs to consider, in addition to their current research, how to get more out of the current trade-offs between control and confidentiality in the solutions they engineer.These trade-offs are only likely to get more common, in particular with the growth in technologies such as secure classification and multi-party computation, 121 which are likely to tie controllers' hands in new and interesting ways, or even call our current understanding of a 'controller' and their competencies into question.In calling for this greater examination, we are not seeking to echo Australian Prime Minister Malcolm Turnbull's much-mocked recent proclamation that whilst the 'laws of mathematics are very commendable [. ..] the only law that applies in Australia is the law of Australia'. 122But such practical difficulties data controllers face in trying to accommodate data subject rights do form part of a broader disconnect between technical and legal definitions/interpretations of key data protection notions.Indeed, a 2012 Enisa Report highlighted a mismatch between the right to erasure in the law and in practice. 123There may well be times where legal requirements are technically or mathematically intractable, but without interdisciplinary research and funding challenges to understand the true limits of attempts to maximize these trade-offs, we will not know.While not all PETs research should focus in this way, and there is a lot to be gained from even deeper research into how to reduce information disclosure further in more complex application areas, the lack of research into this area is stark and sorely in need of rectification.
In other cases, such as the WiFi analytics described above, the issue may be both technical and due to a lack of coordination in the use of privacy-enhancing tools in an untrusted environment.Forcing users to make their own devices difficult to track, such as through MAC randomization, serves to chastise them for sensible precaution by stripping them of their rights to manage data that is still risky and fraught with re-identification potential.Recent legislative moves, such as those in the proposed ePrivacy Regulation to make Do Not Track signals from browsers and devices legally binding 124 have some promise in this area, but depending on an individual's threat model, may be of little use.An individual may trust established data controllers, but be using privacy enhancing technologies, such as MAC address randomization, to prevent 'cowboys' with little regard for data protection law scraping their passively emitted data.This type of problem appears to be difficult to solve, and likely deserves more legal, social, and technical examination than it has currently been afforded.
Emerging problems with characteristics such as those we are describing would likely benefit from broad interdisciplinary engagement, including Human-Computer Interaction (HCI) and Responsible Research and Innovation (RRI) which have significant experience in this field.Like all fields filled with trade-offs, they cannot be 'solved'-but we are confident that they can be better navigated and managed.Data protection law can surely help with the first step-acknowledging, both internally and externally, that these trade-offs exist.Making 'better' trade-offs promises to be considerably harder, but surely an important task for those spanning roles, disciplines and sectors in the years to come.doi:10.1093/idpl/ipy002Advance Access Publication 4 April 2018

3
Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, OJ 1995 L 281/31 (Data Protection Directive, hereafter 'DPD').4 See generally, Gloria Gonza ´lez Fuster, The Emergence of Personal Data Protection as a Fundamental Right of the EU (Springer, Dordrecht 2014).
7 Ann Cavoukian, Privacy by Design: The 7 Foundational Principles (Information and Privacy Commissioner of Ontario, Toronto, Canada 2010) 1. 8 van Rossum and others (n 2).9 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement competent authorities for the purposes of the prevention, investigation, detection or prosecution of criminal offences or the execution of criminal penalties, and on the free movement of such data, and repealing Council Framework Decision 2008/977/JHA, OJ 2016 L 119/89 (hereafter the 'Law Enforcement DP Directive').14 Lee A Bygrave, 'Data Protection by Design and by Default: Deciphering the EU's Legislative Requirements' (2017) 1 Oslo Law Review 105, 114.
The controller should be obliged to respond to requests from the data subject without undue delay and at the latest within one month and to give reasons where the controller does not intend to comply with any such requests.';Art 12(4) stating if 'the controller does not take action on the request of the data subject, the controller shall inform the data subject without delay and at the latest within one month of receipt of the request of the reasons for not taking action'.107 Indeed, it could be argued that fundamental rights are at stake here in some situations: art 8 Charter in particular, but potentially also other rights and freedoms such as non-discrimination (art 21 Charter) or and freedom of expression (Art.11Charter).
,117the documents may become subject to local Freedom of Information laws, but given that data controllers can avoid prior consultation by claiming they have mitigated the risk, it is yet to be seen how common prior consultation will be in practice.
'trade secrets or intellectual property and in particular the copyright protecting the software,' but also mentions that such arguments cannot be (ab)used to refuse access altogether.Similarly, the EU Trade Secrets Directive also notes that its provisions 'should not affect the rights and obligations laid down' in the DPD.See Recital 35, Directive (EU) 2016/ 943 of the European Parliament and of the Council of 8 June 2016 on the protection of undisclosed know-how and business information (trade consultation secrets) against their unlawful acquisition, use and disclosure, 2016 OJ (L 157) 1. 117 Art 36(3)(g), GDPR.118 Edwards and Veale (n 105).119 Indeed, given the resource limitations of DPAs, it is hard to see proactive investigations affecting anything but the most high profile actors.The UK Information Commissioner has noted that her office has a history of taking forward complaints even where there is no data subject mandating them, in relation to national debate around whether the UK makes a derogation to incorporate Art 80(2), but even this remains very different from solo investigation.See Information Commissioner's Office, The Information Commissioner's Office (ICO) response to DCMS General Data Protection Regulation (GDPR) derogations call for views.(ICO 2017) para 113.120 Art 36, GDPR.Michael Veale et al.Á When data protection by design and data subject rights clash 121 ARTICLE Downloaded from https://academic.oup.com/idpl/article-abstract/8/2/105/4960902 by Said Business School user on 04 July 2018