PoVeJMo
On Monday, September 16, 2024, a public debate took place at the Faculty of Computer and Information Science of the University of Ljubljana on legal, ethical and technical issues regarding the large language model for Slovenian language as part of the PoVeJMo project.
The goal of the research project Adaptive Natural Language Processing with Large Language Models (PoVeJMo) is the development of an open large language model for Slovenian, which will later be the basis for advanced applications in medicine, humanities, industrial environment and software development. The success of this project depends on how well the “machines” will speak Slovenian language.
The participants discussed the quantity and quality of texts needed to build such a model, as well as the societal perspective and responsibility for the operation of algorithms. Dr. Maja Bogataj Jančič pointed out that such a public project is extremely important and can represent an example of how “data” can be collected and managed (including author’s texts), which are necessary for the creation of artificial intelligence that can work in public good. In this case, society as a whole, and not corporations, decides what language model will be built. Several obstacles stand in the way of success, such as a small number of Slovenian speakers or many non-digitized texts. Maja emphasized that copyright can also be a big obstacle (more on this: Can copyright bring AI to its knees?). In her opinion, the national legal environment (mainly exceptions for text and data mining) is generally favorable to the generation of large language models, with the big problem being that the Slovenian legislator has not defined lawful access in accordance with the Directive on copyright and related rights in the Digital Single Market (DSM Directive), as it intentionally omitted that lawful access also includes access to content that is freely available online. It would make sense to correct this error as soon as possible.
You are invited to read the article in Delo.
ODIPI is organizing ERA KR21 Conference: Barriers and Incentives for Open Science in the Copyright Law that will take place on 2 December, 2024 at Hotel Four Points by Sheraton (Mons) in Ljubljana and also online.
The District Court of Hamburg ruled in the case of Kneschke v. LAION e.V. that LAION did not infringe the copyright of photographer Kneschke, as the use of his photograph was covered by the exception for text and data mining (TDM) for scientific purposes.
“Can copyright bring artificial intelligence to its knees? Which other circumstances may cause that the “making” of generative AI can dramatically change in the (near) future. This short paper presents potential challenges that copyright poses to the training of the machines on large amount of data. Different jurisdictions address these issues differently. In the USA the legality of these activities is tested in several court cases. Do gentlemen’s agreements and pragmatic symbiosis known from the “search engines business model” provide sufficient basis and/or incentive for the business model of “making” generative AI business model as well?