Ethics and machine learning

Machine learning has many new aspects that need to be understood by the society since it is expected to be widely used in the society. A variant of machine learning called supervised learning is used for applications like object detection in pictures as well as for classification. Let's take a quick look at some of the basic concepts below.


Model defines how input data is processed. Depending on the use case, input data may consist of digital pictures, digitized sounds, or numerical data, for example. The model consists of mathematical operations which transform the input data into desired output (e.g. classification "cat" / "dog" in picture). There are many algorithms which can be used for the model, ranging from simple mathematical formulae to fashionable deep learning networks.

A model is basically neutral and generic, so that an individual model may be used for classifying cats vs. dogs on the one hand, or types of trees on the other. For demanding applications, performance may be improved on tailoring it to a special purpose.

Training data

Supervised learning models are trained with data which reflects the kind of inputs that they are expected to see in production use. If one wants to classify pictures into "cat", "dog", "neither", then training data most likely will contain pictures of all three categories, taken from different angles and in varying illumination conditions. The same model may be used for different kind of classification task by using different training data set, say, for detecting types of trees.

The important aspect here is that the training data fundamentally defines what the model will detect. It is tailored to the intended usage, whereas the model can be more generic.


There are ethical questions which arise from the use of machine learning in society. This is a broad field, and I shall highlight just a couple of examples here.

The most obvious aspect is the content of training data. It defines how the machine learning system performs in action, including any biases which are inherent to the data. For example, automated natural language processing (NLP) systems may learn outdated biases related to roles of genders, if they are trained on classics of the literature.

Relating to use of machine learning in military systems, the training data is playing a more central role than the model, since it needs to represent the objects of interest to military use. The model, on the other hand, may be basically general purpose with also civilian uses. For example, an image processing model may be used for both military and civilian uses by using respective training data.


Popular posts from this blog

Business of Machine Learning

Latency - the new black?

On y va