Evolution of AI and its Integration into Translation
As AI has experienced major advances over the past few decades, it has impacted several sectors, including language translation. The use of AI in translation originated in the 1950s with the creation of rule-based machine translation (RBMT) (Vinson, 2025). It relied on predefined linguistic rules to translate text from one language into another. However, the rules had to be input manually, which took a lot of time, and the translations were not very accurate.
Then, statistical machine translation (SMT) emerged in the 1980s. It allowed computers to analyse large bilingual corpora to align words and phrases by using statistics. However, it was difficult to accurately translate between languages with major grammatical differences, for example English and Japanese.
In the 2010s, neural machine translation (NMT) brought a major change within the field. Similarly to SMT, the computer was trained using large bilingual corpora, but thanks to deep-learning algorithms and neural networks, translations were more accurate than ever before (“History of AI Translation,” 2022). This accuracy made NMT largely popular, creating a major shift in the translation industry.
Since the beginning of the 2020s, generative AI models have emerged. The European Union’s AI Act (2023, Article 28 b(4)) defines generative AI as “foundation models used in AI systems specifically intended to generate, with varying levels of autonomy, content such as complex text, images, audio, or video.” It differs from traditional AI, which focuses on specific tasks, such as classification, prediction, or defined problem-solving. Generative AI aims to produce new data that resemble human-created content. Generative models, like OpenAI’s ChatGPT, have shown proficiency in language understanding and translation, and can produce contextually accurate translations though they are not translation software per se.
Translation Tools and Their Reliance on AI
The translation industry has seen an increase in AI-powered tools to help make translation more efficient and more accessible.
Translation tools, such as DeepL or Google Translate, use AI to improve their translations. While these tools offer some advantages, they still have their limitations. For example, context misinterpretation, misunderstanding of cultural nuances, and inaccurate translation of idiomatic expressions are recurring problems in AI translation. Moreover, AI-generated translations can struggle with highly specialized or sensitive content where human expertise remains essential.
Though AI has deeply changed the translation industry, it is not yet capable of fully replacing human translators. While it excels in handling large volumes of text quickly, human translators provide critical elements, such as cultural sensitivity, creative adaptation, and deep contextual understanding.
As with every digital system, AI or translation models need to be “educated.” In this context, we talk about “training” the system, which requires collecting ready-to-use data wherever they are available. But in our digitalized era where everything needs to be more efficient and faster, laws regarding the way these systems work tend to be left behind. AI programmers and providers then assume they have carte blanche to operate as they wish until decisions are taken. Recently, more significant organizations have tried to increase awareness concerning the ethical issues raised by the operating of AI systems.
The Need for Large Data sets in AI Training
When it comes to AI models, a large data set is needed in order to function properly. Let’s take the explanation given on the wonk.ai website (Mohammed et al., 2024) offering AI translation models for various companies. According to them, their translation model is trained by the following five steps. The first step is to collect language data from websites, glossaries, language databases, documents, etc. This helps the system to integrate language rules, defined terms from glossaries, tone-of-voice, or writing style. The second step involves extracting the language pairs from the collected data to find pairs of sentences to help the system better understand the context, enhancing the translation output. The third step is processing, i.e., validating, cleaning, and combining the language data for the training. This is necessary as translations of some collected texts are found elsewhere, and need to be paired together. The fourth step is the AI training itself when the collected data are brought together in a training corpus, and the training goes on until the AI output is good enough for evaluation. The fifth and last step is the rating by the customers, who are translation managers.
After all this, the AI model continues to learn, which is why it is thought to be useful in various fields. To do so, proofreading is critical in order to give some feedback to the system, thereby improving it. Of course, it takes time and costs money to obtain a good AI translation model, and every AI provider is trying to reach the “human-quality level.”
However, this collection of data is highly dependent on the end user of the system and the language pair. In translation, the tone, terminology, and phraseology vary a lot from one domain to another. These last few years, AI translation systems have been increasingly used for legal translation, mostly to reduce costs and improve efficiency. That is when the training phase is crucial: legal texts are needed to train the translation system, but they can’t be found or used that easily. Moreover, legal systems differ from one country to another, which is another parameter to implement during the training phase. As concluded in a study published in March 2024 (Moneus & Sahari, 2024), there is also the issue of dissimilarity between languages: Chinese is abstract and metaphorical while English is linear and logical. This means AI systems still need some improvement, and it could be helped with more accessibility to bilingual data for additional and rarer languages.
Ethical Implications of Data Sourcing
Generative AI Systems, such as ChatGPT, are based on a range of data pulled from books, articles, websites, social media posts, etc. As we said before, they require a training phase during which “a vast corpus of textual data [are] employed to instruct the speech-processing algorithms” (Lucchi, 2024, p. 617). This leads to a variety of issues regarding IP as the sources used may contain copyrighted works as well as legal considerations. In this context, “the programmers accountable for the development and training of ChatGPT hold the responsibility for ensuring that the training data remain free from any copyright violations” (Lucchi, 2024, p. 617). That is why the most recent recommendations ask for more transparency concerning the sources used or the way these systems work.
From the programmer’s point of view, it is acceptable to use freely accessible copyrighted data because the system uses the information as a source of inspiration to present new material and inventive results. AI algorithms rely mostly on huge amounts of data that are essential to enhance the performance of its system, that is why the first step would be to establish explicit data-sharing agreements between data providers and AI programmers. It would allow the legal use of copyrighted data for the purpose of training.
The main issue lies in the fact that AI cannot generate authentic ideas. On the contrary, it relies on the data it has been trained with to generate reshuffled texts. When the text is written by a human, it is viewed as a moral responsibility to cite the sources that have been used, as well as a way to avoid plagiarism, and ensure the reliability of their work. However, if we take ChatGPT as an example, while its answer is based on a large corpus of training data, it is not always accurate and may “forget” to cite its sources. Even when the user asks for them, it sometimes even invents non-existent works, which further increases the lack of credibility. That is why uninformed users may not know they have used the work of someone else. Moreover, the original creators of this training data are not aware their work is being stolen!
When discussing the unauthorized use of a creator’s work, we refer to the infringement of IP rights. At the core of IP law lies the concept of copyright. Originating from the Anglo-Saxon legal tradition, this concept grants creators exclusive rights over their original work, ensuring they have control over reproduction, distribution, and adaptation. Nowadays, it aligns with the European concept of droits d’auteur, the equivalent of copyright, adding the dimension of “moral rights.” These rights emphasize an author’s personal connection to their work, including the right to be recognized as the creator (Blésius, 2008). We will see how these are relevant in the context of a translation.
Ownership of Translations: Human Translation
Translation ownership raises important issues. It is interesting to know who the copyrights attached to a translation belong to, both from a financial and a recognition standpoint. In fact, the ownership issue is twofold, especially if a tool like SDL Trados Studio is used—Who owns the final translation work between the translator and the commissioner? Can ownership rights be attributed to AI-generated translations? Who owns the content generated from a prompt?
Translation is not only a means of expression, but also an art form and, as such, it is protected by various legal frameworks that safeguard the copyrights of its creators. For example, under Article 2(3) of the Berne Convention for the protection of literary and artistic work (n.d., Section F I, .2), “Translations, adaptations, arrangements of music and other alterations of a literary or artistic work shall be protected as original works without prejudice to the copyright in the original work.” Under the TRIPS Agreement of 1994, Article 10(2) states “[c]ompilations of data or other material, whether in machine-readable or other form, which by reason of the selection or arrangement of their contents constitute intellectual creations shall be protected as such.”
As stated in these international conventions, the translation produced by a translator is protected as any other artistic work and is, therefore, also a source of copyright. The answer to the first question should be easy. As a translation is considered as original work and is protected by copyrights, these copyrights belong to the translator.
However, the answer is not that simple. First, we need to differentiate between freelance translators and translators employed by an agency. For translators employed by an agency “under the employment legislation of many countries, workers automatically assign intellectual property rights in the works they create to their employers” (Smith, 2009, p. 8). In this situation, it is clear that the translation created belongs to the company, which then sells it to the client who commissioned it. This is also true for the translation memories, whether they are provided by the agency or by the client: “in the case of salaried employees who create term banks or TMs, these rights will automatically pass to the organisations for which they work” (op.cit.).
For freelance translators, it is all a question of contract agreement with the client. A translator will always be the first owner of the copyright. By selling their work to their client, they will give that copyright away. However, even when copyright is transferred, the translator is not liable for unauthorized modifications made by the client (Blésius, 2008). This is also valid for translation memories and term banks created by a translator for a given work, “unless copyright is previously transferred under contract, translation memories belong to the translators who create them” (Smith, 2009, p. 8).
But what about a translation generated by AI?
Ownership of Translations: AI Systems
As stated earlier in this article, generative AI systems work by training on and incorporating large data sets to their algorithms. These data are not always acquired legally, and algorithms do not, for most instances, cite their sources when providing an answer to a prompt. Systems like ChatGPT (American) or Mistral (French) are able to provide almost human-like translation and, thus, create the fear of “the end of human translation.” AI is a fast-developing technology present in almost every field and has become an integral part of translation works. With it, new legal questions arise that should be considered—To whom should the ownership of such translation be granted? To the customer of the AI tool, to the developer, or simply to the system itself?
In 2022, a class-action lawsuit was filed in the United States against Stability AI by visual artists claiming that the company used their copyrighted work to train their AI model without their consent. The court granted in part and denied in part the defendants’ motions. The court allowed the direct copyright infringement claim to proceed, recognizing that the issue of whether AI models infringe copyright is unsettled and depends on the specifics of each case (Madigan, 2024).
Recently, a new decision was taken by the US government on January 29th, 2025 (Dreyfus Law Firm, 2025). It emphasizes the requirements needed to accept an AI-generated content as a copyrightable work. According to this decision, an AI-generated content may be copyrighted under the following conditions: There is sufficient human involvement in the creative process, i.e., the material is not generated only by AI, and AI is used as a tool to further human creativity. It also explains the importance of prompts given to the system, which need to be sufficiently creative. In this idea, if an artist modifies, arranges, or selects elements of AI-generated content, this content could be eligible for partial copyright protection.
From a global perspective, every country addresses AI and copyright issues in different ways. For example, the European Union AI Act, published on August 6th, 2023, mentions the obligation for AI systems to comply with IP rights, which means that AI model providers are required to “publicly share a detailed summary of the text and data used in training their AI models” (Fitzpatrick, 2025).
As the legal framework is still evolving to accommodate these new technologies, we may not be able to give a clear answer regarding copyrights involving AI in art or translation. AI developers should, however, ensure that they comply with the law regarding the data they acquire for their training models. This involves obtaining proper licenses and compensating the individuals who own the IP that they wish to incorporate into their training data sets (Deloitte AI Institute, n.d.).
The world of AI is constantly changing. The technology itself is being enhanced day by day and incorporated in more and more domains and aspects of our lives. Unfortunately, laws cannot evolve as fast—even in a digitalized world. AI truly has a potential for creativity or accelerating working tasks, but because of the way it is developed, it breaks several laws. As we saw, IP plays a great role in the creativity area, but AI programmers don’t seem to care much, and more than that, they are not forced to comply with the law, as there is a clear lack of regulation regarding AI and copyright. This technology can only use what it has been fed with, which, for the most part, are copyrighted works.
Several solutions have already been proposed, and an increasing number of companies, organizations, and countries are currently trying to highlight the legal issues concerning AI in various fields. The first claim concerns more transparency regarding the sources used by AI systems for training or generating answers, and transparency in terms of its overall way of functioning, which might be deadly for AI programmers. Some countries have their own solutions, and the European Union is leading the way in the fight for transparency.
More recently, in February 2025, the AI Action Summit was held in Paris. It aimed to “collectively establish scientific foundations, solutions and standards for more sustainable AI working for collective progress and in the public interest” (France Diplomacy, 2025) with more than 800 participants. The results showed the following: the willingness to create a sustainable, safe, trustworthy, and transparent AI and to use it wisely where it is the most needed, such as healthcare and/or education. While 62 countries signed the final agreement, the US—despite being one of the leaders in the domain of AI—did not.
Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS Agreement). (n.d.). WIPO Lex. Retrieved February 16, 2025, from https://www.wipo.int/wipolex/en/treaties/details/231
AI Action Summit (10 & 11 Feb. 2025). (2025). France Diplomacy – Ministry for Europe and Foreign Affairs. https://www.diplomatie.gouv.fr/en/french-foreign-policy/digital-diplomacy/news/article/ai-action-summit-10-11-feb-2025
AI Action Summit Conference: AI, Science, and Society. (2025, February 6). Institut Polytechnique de Paris. https://www.ip-paris.fr/en/news/ai-action-summit-conference-ai-science-and-society-ip-paris
AI and Copyright: Understanding the U.S. Copyright Office’s Second Report on Copyrightability. (2025, February 10). Dreyfus. https://www.dreyfus.fr/en/2025/02/10/ai-and-copyright-understanding-the-u-s-copyright-offices-second-report-on-copyrightability/
Artificial Intelligence and Intellectual Property. (n.d.). WIPO Pearl. Retrieved February 16, 2025, from https://www.wipo.int/about-ip/en/frontier_technologies/ai_and_ip.html
Berne Convention for the Protection of Literary and Artistic Works. (n.d.). WIPO Pearl. Retrieved February 16, 2025, from https://www.wipo.int/treaties/en/ip/berne/index.html
Bharati, R. K. (2024). AI and intellectual property: Legal frameworks and future directions. International Journal of Law, Justice and Jurisprudence, 4(2), 207–215. https://doi.org/10.22271/2790-0673.2024.v4.i2c.141
Bird & Bird LLP, Directorate-General for Translation (European Commission), Debussche, J., & Troussel, J.-C. (2014). Translation and intellectual property rights :final report. Publications Office of the European Union. https://data.europa.eu/doi/10.2782/72107
Blésius, C. (n.d.). Copyright and the translator. Who owns your translations? Retrieved February 16, 2025, from https://cblesius.co.uk/articles/CopyrightAndTheTranslator-WhoOwnsYourTranslations.html
Creamer, E. (2024, April 16). Survey finds generative AI proving major threat to the work of translators. The Guardian. https://www.theguardian.com/books/2024/apr/16/survey-finds-generative-ai-proving-major-threat-to-the-work-of-translators
Devin, V. (2025, January 29). From the Past to the Future: The Impact of AI on Translation Technology – Localize Articles. Localize. https://localizejs.com/articles/the-impact-of-ai-on-translation-technology
Fitzpatrick, D. (2025, February 3). New Copyright Ruling Just Made AI Skills The Biggest Advantage. Forbes. https://www.forbes.com/sites/danfitzpatrick/2025/02/03/new-copyright-ruling-just-made-ai-skills-the-biggest-advantage/
Gil, A., Juliana, N., & David, A. S. (2023, April 7). Generative AI Has an Intellectual Property Problem. Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem
Guadamuz, A. (2017, October 1). L’intelligence artificielle et le droit d’auteur. Magazine de l’OMPI. https://www.wipo.int/fr/web/wipo-magazine/article-details/?assetRef=40141&title=artificial-intelligence-and-copyright
Hartley, V. (n.d.). AI Translation or Machine Translation: What’s the difference? Language Wire. Retrieved February 16, 2025, from https://www.languagewire.com/en/blog/ai-translation-vs-machine-translation
How to Use AI for Legal Translations: Benefits, Limitations, and Best Practices. (2024, July 1). LegalTranslations.com. https://www.legaltranslations.com/fr/blog/how-to-use-ai-for-legal-translations
Intellectual Property in ChatGPT. (2023, February 20). European Commission. https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/intellectual-property-chatgpt-2023-02-20_en
Introduction to the History of AI Translation and Popular Software. (2022, December 19). Human Science Co., Ltd. https://www.science.co.jp/nmt/blog/32553/
Kupferschmid, K. (2024, December 12). Insights from Court Orders in AI Copyright Infringement Cases. Copyright Alliance. https://copyrightalliance.org/ai-copyright-infringement-cases-insights/
Lacruz Mantecón, M. L. (2023). Authorship and Rights Ownership in the Machine Translation Era. In H. Moniz & C. Parra Escartín (Eds.), Towards Responsible Machine Translation: Ethical and Legal Considerations in Machine Translation (pp. 71–92). Springer International Publishing. https://doi.org/10.1007/978-3-031-14689-3_5
Leschen, S. (2024, September 27). Translation, an art worth protecting. Institute of Translation and Interpreting. https://www.iti.org.uk/resource/translation-an-art-worth-protecting.html
Lucchi, N. (2024, September). ChatGPT: A Case Study on Copyright Challenges for Generative Artificial Intelligence Systems. Cambridge University Press. https://www.cambridge.org/core/journals/european-journal-of-risk-regulation/article/chatgpt-a-case-study-on-copyright-challenges-for-generative-artificial-intelligence-systems/CEDCE34DED599CC4EB201289BB161965
Madigan, K. (2024, August 29). Top Takeaways from Order in the Andersen v. Stability AI Copyright Case. Copyright Alliance. https://copyrightalliance.org/andersen-v-stability-ai-copyright-case/
Mohamed, Y. A., Khanan, A., Bashir, M., Mohamed, A. H. H. M., Adiel, M. A. E., & Elsadig, M. A. (2024a). The Impact of Artificial Intelligence on Language Translation: A Review. IEEE Access, 12, 25553–25579. https://doi.org/10.1109/ACCESS.2024.3366802
Moneus, A. M., & Sahari, Y. (2024). Artificial intelligence and human translation: A contrastive study based on legal texts. Heliyon, 10(6). https://doi.org/10.1016/j.heliyon.2024.e28106
Ong, J., Lo Khai Yi, & Winn Wong, H. W. (2024, September 3). EU AI Act: The Essential Guide to Copyright Compliance for General-Purpose AI Models. Chambers and Partners. https://chambers.com/articles/eu-ai-act-the-essential-guide-to-copyright-compliance-for-general-purpose-ai-models
Smith, R. (2009, November 19). Copyright issues in translation memory ownership. Proceedings of Translating and the Computer 31. TC 2009, London, UK. https://aclanthology.org/2009.tc-1.13/
Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet. (2025, February 11). Elysée. https://www.elysee.fr/en/emmanuel-macron/2025/02/11/statement-on-inclusive-and-sustainable-artificial-intelligence-for-people-and-the-planet
The legal implications of Generative AI. (n.d.). Deloitte AI Institute. Retrieved February 16, 2025, from https://www2.deloitte.com/us/en/pages/consulting/articles/generative-ai-legal-issues.html
Trained Translation Models. (n.d.). Wonk.Ai. Retrieved February 16, 2025, from https://wonk.ai/en/training-of-translation-models/
Wang, L. (2023). The Impacts and Challenges of Artificial Intelligence Translation Tool on Translation Professionals. SHS Web of Conferences, 163.