PoVeJMo

On Monday, September 16, 2024, a public debate took place at the Faculty of Computer and Information Science of the University of Ljubljana on legal, ethical and technical issues regarding the large language model for Slovenian language as part of the PoVeJMo project.

The goal of the research project Adaptive Natural Language Processing with Large Language Models (PoVeJMo) is the development of an open large language model for Slovenian, which will later be the basis for advanced applications in medicine, humanities, industrial environment and software development. The success of this project depends on how well the “machines” will speak Slovenian language.

The participants discussed the quantity and quality of texts needed to build such a model, as well as the societal perspective and responsibility for the operation of algorithms. Dr. Maja Bogataj Jančič pointed out that such a public project is extremely important and can represent an example of how “data” can be collected and managed (including author’s texts), which are necessary for the creation of artificial intelligence that can work in public good. In this case, society as a whole, and not corporations, decides what language model will be built. Several obstacles stand in the way of success, such as a small number of Slovenian speakers or many non-digitized texts. Maja emphasized that copyright can also be a big obstacle (more on this: Can copyright bring AI to its knees?). In her opinion, the national legal environment (mainly exceptions for text and data mining) is generally favorable to the generation of large language models, with the big problem being that the Slovenian legislator has not defined lawful access in accordance with the Directive on copyright and related rights in the Digital Single Market (DSM Directive), as it intentionally omitted that lawful access also includes access to content that is freely available online. It would make sense to correct this error as soon as possible.

You are invited to read the article in Delo.