data labelling

Data protection digest 3 – 16 Aug 2024: data labelling for LLMs, third-party cookies as a cause of leaks

In this issue: X’s AI Grok training suspended in the EU,  third-party cookies may lead to data breaches, Uniqlo ‘payroll’ mistake, car rental refusal based on client’s income, and AI non-transparency – data scraping, maximisation, risks of regurgitation, and what is behind data labelling for the LLMs industry.

Stay up to date! Sign up to receive our fortnightly digest via email.

LLMs, data labelling and data protection

A fundamental principle of data protection law is data minimisation. Privacy International however insists that LLMs are being trained through indiscriminate data scraping and generally maximise their approach to data collection. Under data protection laws, individuals have the right to assert control over data related to them. However, LLMs are unable to adequately uphold these rights, as the information is held within the parameters of a model in addition to a more traditional form, such as a database. ‘Regurgitation’ can also lead to personal data being spat out by LLMs. Because training data is enmeshed in LLM algorithms, this can be extracted, (or regurgitated), by feeding in the right prompts. 

PI also investigated digital labour platforms that have arisen to supply data labelling for LLM training. This includes training an AI model against a labelled dataset and is supplemented by reinforcement learning from human feedback. For example, data labellers mark raw data points, (images, text, sensor data, etc.), with ‘labels’ that help the AI model make crucial decisions, such as for an autonomous vehicle to distinguish a pedestrian from a cyclist. It appeared that many such labellers can be completely disconnected from the AI developers, and are often not informed about who or what they are labelling raw datasets for. They are also subject to algorithmic surveillance and unreliable job stability. 

Third-party cookies as a cause of data breaches

JDSupra legal insights look at the disclosure of data through website cookies which may facilitate a data breach in California. In the related court case, the plaintiff claimed that an online counselling service where website users can find and seek therapy violated the California Consumer Privacy Act by allowing tracking software to retarget website users with ads. The court refused to dismiss the data breach claim. Specifically, the simple fact a user visited the website, may qualify as sensitive information because such a visit could mean they must have been seeking therapy.

Concerning whether using retargeting cookies is inherently illegal, the court refrained from rendering a decision.

US Child privacy bill

On 30 July, the Kids Online Safety and Privacy Act was passed by the Senate. KOSPA is a variation of two previously proposed bills: the Kids Online Safety Act, (KOSA), and the amended Child Online Privacy Protection Act, (COPPA 2.0). The act applies to digital platforms, particularly those with more than 10 million active monthly users. The duty of care includes options for minors to protect their data, prohibition of the use of dark patterns, and transparency regarding the use of opaque algorithms, etc. KOSPA now heads to the House, where it will be debated over potential censorship and the possibility of minors lacking access to vital information. 

Oncological oblivion

The Italian data protection authority Garante looks at “the right to be forgotten” in oncology, and whether banks, insurance companies, credit bodies, and employers can ask for information on the oncological pathology of an individual in a remission stage. Also, can a clinically recovered person adopt a child? These and other questions are answered in the FAQs published by the regulator, (in Italian). The aim is to prevent discrimination and protect the rights of people who have recovered from oncological diseases.

Chatbots and customer data

Employees sharing patient or consumer personal information with an AI chatbot have resulted in allegations of data leaks to the Dutch Data Protection Authority, (AP). The majority of chatbot developers store all data entered. Organisations must make clear agreements with their employees about the use of AI chatbots.  They could also arrange with the provider of a chatbot that it does not store the entered data. 

More official guidance

Avoiding outages and system failures: The US Federal Trade Commission insists that many common types of software flaws can be preemptively addressed through systematic and known processes that minimise the likelihood of outages. This includes rigorous testing of both code and configuration and the incremental rollout procedures. For instance, when deploying changes to automatically updating software, vendors could initially deploy it to a small subset of machines, and then roll it out to more users after it’s confirmed that the smaller subset has continued to function without interruption. 

data labelling

Surveys at schools: The Latvian data protection authority investigates if a teacher can ask students to complete surveys. The educational process has long been not limited to the learning of the subject, but the psychological state of the child too. Answers given in student surveys can be divided into standard, personalised or anonymous forms. However, children often are not able to assess how much private information to give to others. Thus, security requirements, such as data non-disclosure and storage limitations must be applied in most cases.

Additional parent consent should be required if the surveys are related to the organisation of the learning process indirectly.

AI systems transparency: The German Federal Information Security Office, (BSI), published a white paper on the “Transparency of AI systems”. It says that the increasing complexity of the AI “black boxes” systems as well as missing or inadequate information about them makes it difficult to make a visual assessment or to judge the trustworthiness of the outputs. The paper defines the term transparency for various stakeholders from users to developers, and discusses the opportunities and risks of transparent AI systems, both positive, (promoting safety, data protection, avoiding copyright infringements), and negative, (the possible disclosure of attack vectors). 

Receive our digest by email

Sign up to receive our digest by email every 2 weeks

Uniqlo ‘payroll’ mistake

data labelling

The Spanish regulator imposed a fine of 450,000 euros, (reduced to 270,000 euros), on the UNIQLO branch in Spain, DataGuidance reports. The complainant, who provided services to UNIQLO, requested their payroll data and received an email containing a PDF document with payroll information on the entire 446-strong workforce. The document contained names, surnames, social security, bank account numbers, and more.

The breach was caused by a human error within the human resources department, but the employee in question had not informed their superior. The regulator confirmed that the negligent action of the employee does not exempt the data controller from liability.

Healthcare IT provider fine

The UK Information Commissioner’s Office has provisionally decided to fine Advanced Computer Software Group 6.09 million pounds. It provides IT and software services to the NHS and other healthcare providers, and handles people’s personal information on behalf of these organisations as their data processor. The decision relates to a ransomware incident in 2022, when hackers accessed several of Advanced’s health and care systems, (with the personal information of 82,946 people), via a customer account that did not have multi-factor authentication

More enforcement decisions

Car rental and client’s income: The Italian Garante imposed a one million euro fine on Credit Agricole Auto Bank for the illicit processing of personal and income data of customers who requested financing for the long-term rental of a car. The bank accessed the centralised fraud prevention system, also on behalf of its subsidiary, a car leasing company, despite it not having the necessary authorisation from the Ministry of Finance. 

The complainant contacted the bank to know the reasons behind the denial of the long-term rental and the inclusion of their name on a credit risk list. The bank stated these were due to the client’s negative income situation. Furthermore, the bank did not first acquire the client’s tax return form, an essential document for making a comparison with the information contained in the database. 

Dark patterns in the gambling industry: The Guernsey privacy regulator reviewed 19 online gaming sites for indicators of deceptive designs. In 42% of cases, the analysis was unable to find the website or app’s privacy settings, (in most cases those found were unnecessarily lengthy and complex). Also, it was more difficult to delete an account than it was to create one. In one of the instances, a user made their account deletion request through an on-site chatbot, as they were unable to find the ‘delete account’ option on the site. In another case, the organisation asked that a form be completed and returned to them, along with identity verification documents. Neither the documents nor the form were required to create an account. 

Data security

Lack of encryption: The Danish regulator has reprimanded the Vejen Municipality for insufficient security measures. Three stolen computers with information about children were not encrypted – and the same turned out to be the case with up to 300 other computers in the municipality. The computers were only intended for use by teachers as part of the teaching process. In practice, however, they were also used by teachers to make status descriptions of students, class handovers, etc. The regulator also issued a reminder that encryption of portable devices is a very basic security measure which is relatively easy and not very costly to implement.

GPS tracking: A court in Slovenia confirmed the decision of the Information Commissioner to restrict the use of GPS tracking of company vehicles, on a systematic, automated and continuous basis. The company did not demonstrate that such GPS tracking is a suitable and necessary measure for the protection of company vehicles and the equipment and documentation contained in them, nor to ensure employee safety or for the enforcement of potential legal claims and defence against them. 

Among other things, the court confirmed that the data obtained by the operator through the GPS tracking of company vehicles constitutes employees’ data, even though it is not recorded and stored in the tracking system itself, as the employees as drivers can be identified with the help of other documents, (eg, travel orders).

AI Grok

X agreed with the Irish Data Protection Commission to suspend the processing of the personal data contained in the public posts of X’s EU/EEA users, (processed between 7 May and 1 August), to train its AI ‘Grok’. The suspension will last while the DPC examines, together with other regulators, the extent to which the processing complies with the GDPR. The agreement was reached after the regulator submitted the case to the country’s Supreme Court.

In June, Meta also agreed with the DPC that it would delay processing EU/EEA user data for its AI tools. However, unlike Meta, X didn’t even notify its users beforehand. To make sure that X’s AI training is properly handled, the privacy advocacy group NOYB has now filed complaints with the data protection authorities in nine countries, (questioning what happened to EU data that had already been ingested into the systems, and how X can effectively distinguish between EU and non-EU data).

Do you need support on data protection, privacy or GDPR? TechGDPR can help.

Request your free consultation

Tags

Show more +