Secrecy over a machine learning process banned an artificial intelligence tool in The Netherlands

The Hague Court ruling in the System Risk Indication (SyRI)

The District Court of The Hague ruled past February 5th, a case concerning the System Risk Indication (SyRI). This technology has brought too many questions about its purpose and the way it works. It is not clear if SyRI is deep learning, machine learning, a profiling tool, or just an algorithm.

According to the Court, SyRI did not meet the ‘fair balance’ test required by the European Convention on Human Rights (ECHR). The Court believed that the legislation governing the deployment of SyRI did not comply with Article 8 of the ECHR that protects the right to privacy.

But this is not only a ruling about a traditional balancing test on fundamental human rights. This case is about the impossibility for a Court to determine the specific methods and procedures used by this technology. It is Artificial Intelligence, Machine Learning, Deep Learning, or just software that uses programming and algorithms? That is the question that the Court could not answer and that led them to finally ban this technology.

Before going into the specifics of the case, let us take a look at how this technology functions.

How SyRI creates a corpus and then applies a risk selection process

SyRI works upon request of two or more government bodies that join in partnership for a specific project. As in the current scenario, to combat fraud in areas such as benefits, allowances, and taxes. The process follows two phases: the processing (step 1) and the analysis (step 2).

The first Phase: Collecting big data

In the first step, all the existing files are put together to create a corpus. Understand the corpus as all the information/data collected for the specific purpose/project. For this particular case, the corpus included all the personal and company information, social security numbers, addresses, tax information. It also included gender and integration data, educational data, pension data, indebtedness data, eligibility for benefits, and health insurance data.

Once the corpus is put together, the data is uploaded to the processor (computer). Once uploaded, a¨risk selection algorithm” is applied to the corpus. This generates potential hits. A potential hit is understood to mean an increased risk of fraud. A first hit means a first possible suspect(s), a human(s) suspect(s) of fraud.

The second Phase: let’s recalibrate the machine learning process

The algorithm applied gets reviewed and evaluated in the second phase. These first results are sent to the Ministry of Social Affairs for risk analysis. This job is done by the analysis unit of the Inspectorate SZW. If the data is satisfactory, the Minister makes a risk report based on the final risk selection. This means the process turned out successfully being able to identify possible fraud suspects.

But what happens if the results are not what the agencies werelooking for?

This is when SyRI gets new algorithms to match more of the expected results. Let us stop here for a quick second. At this point it is inevitable to devise the closeness that SyRI has with a Machine Learning or Deep Learning process.

The life cycle of machine learning starts with asking the question. The next step is collecting the data, train the algorithm, try it out, and collect the feedback. Depending on the results, the feedback can help make the algorithm better so you can have increased accuracy and performance. There is no programming, but there is a mix of data and algorithms to get your machine to turn smarter. 

So let’s get back to step 2. According to the Court, ¨the risk model that is applied to the file link and the analysis phase can subsequently be adjusted by the analysis unit of the Inspectorate SZW if necessary.¨ In short, they can adjust the risk model (algorithm) applied by SyRI to the corpus so the results can be more precise towards the final purpose. So is this Machine learning? if there is no programming, this is Machine Learning.

How did the Court understand this process?

The Court is not clear on what SyRI really is

The State (defendant) argued that SyRI is a digital tool to combat fraud in areas such as benefits, allowances, and taxes. Other parties in the case claimed that SyRI did much more, including the use of deep learning, and citizen profiling. 

However, the Court cannot tell for certain what this technology is about. ¨The State (defendant) neither provided objectively verifiable information in order to enable the Court to test the views of the State on what SyRI is¨. In other words, the State never revealed how exactly this technology works or processes the collected corpus.

The Court also found that the State never made public the indicators that make up the risk model. Surprisingly the reason given by the State was ¨that the citizens could tune their behavior accordingly¨. These same criteria is found on the SUWI Act concerning the information that could be shared about SyRI.

The only way to possibly foresee SyRI´s scope is by analyzing the legislation that authorized it

SyRI´s law explicitly provides for the possibility of adopting a risk model based on the evaluation. However, new risk models can also be developed with new indicators. The Court acknowledged: ¨SyRI legislation leaves open the possibility that the use of SyRI uses predictive analyzes, deep learning, and data mining. The definition of risk model in the SUWI Decree does not preclude this.¨ And in the Court’s own words: ¨SyRI “fits” with ‘ deep learning’ and ¨self-learning¨ systems.¨

The Ruling: a balance test between machine learning and the right to privacy

The Dutch government has a special responsibility under art 8(2) to balance the advantages of using the technology with the interference it could constitute in people’s private life. The current SyRI regulation does not pass this test because there is no fair balance. There is no fair balance because SyRI is not clear and easy to check. Therefore art 8 is violated.

The Court believed that SyRI is unable to provide sufficient safeguards to protect the right to privacy. The risk indicators and the risk model usable in SyRI are in conflict with what is required by Article 8, paragraph 2 of the ECHR.

The Court also found that the SyRI regulation does not pay sufficient attention to the purpose limitation principle and the principle of data minimization. Let us explain these two a bit more.

The legal limitation of the data set lies in the exhaustive list of categories of data that are eligible for processing within the specific SyRI project. However, according to the Court, there is hardly any personal data that cannot be considered for processing in SyRI.

Furthermore, only the government bodies participating in the partnership can determine what type of data is selected for every project. The problem relies on the fact that the data selection is all done without any independent third-party supervision or control.

The Court finally strikes down article 65 of the SUWI Act and Chapter 5a of the SUWI Decree due to conflict with Article 8, paragraph 2 of the ECHR.

The positives of the Ruling: a gap in the law, difficulties to identify the technology, and a possible bias

The evidence identified a gap in the law that almost gave no limits to this technology tool. This tool has no limitations on Text and Data Mining, neither in a Deep Learning process. This Court had no other choice but to state that the law was too broad. The Court was not comfortable to allow possible developments or uses without any safeguards or controls.

This case also identifies a technology that is far more advanced to what the defendant initially described. The State stated that SyRI was neither a deep learning application nor a self-learning system. However, the Court was able to see that the tool had all the options to use deep learning methods.

The Court also brought out the problem of bias on machine learning.  SyRI is eligible for processing large amounts of data including special personal data. Now, if risk profiles are used, there is a danger that unintentional connections will be made based on bias. According to the ruling SyRI has only been used in so-called ‘problem neighborhoods’. Criteria such as a lower socioeconomic status or an immigration background could bring bias results.

The negatives: running short on the fair balance approach, and lack of tools to address this type of cases

The Court is not clear if this technology would pass the fair balance test if it is open and disclosed. Questions remain on a possible alternative outcome if the State would not have concealed the risk-utility used by SyRI. If the algorithm is open and disclosed, does that mean that this technology would not violate any human rights?

It is also clear that the Court does not have enough tools to address these new technologies. It has happened numerous times with copyright cases in Courts all around the world. This case shows how Courts need a legislative regulatory body on all matters related to Artificial Intelligence, Machine Learning, or Text and Data Mining.

The future of machine learning, deep learning, and text and data mining

This Ruling brought along many questions that are being discussed extensively in the copyright arena. But these questions are also related to human rights laws. Shall we open the doors to data so machines can learn more and produce more? or shall we keep them half-closed while we regulate? Shall we understand the process of feeding the AI machines under a human rights scrutiny test?

This Court ruling also reminds us of the problems that government, courts, policymakers, and legislators are having with definitions related to Artificial Intelligence, Machine Learning, Text and Data Mining. The difference between each classification is not always easy to follow or identify, as clearly happened in this case decided by of District Court of The Hague.

Leave a Reply

Your email address will not be published. Required fields are marked *