BotXO has trained the most advanced Danish BERT model to date 

BotXO has trained the most advanced Danish AI language model yet. The BotXO BERT model has been fed with a staggering 1.6 billion Danish words and it is also available as open source.

Google’s BERT model is one of the most well-known machine learning models for text analysis. Google has released English, Chinese and multilingual models. But now BotXO are the first to release an extensive open sourced Danish BERT model. The code is downloadable from

Language models are constantly evolving

Remember our blog post about the Google XLnet where we explained how the Chinese team Baidu, was beating Google’s BERT model with their new language model ERNIE 2.0? We also mentioned that BotXO would proceed to experiment with what can be achieved with the current state of the art performance in Natural Language Processing (NLP).

This has led BotXO to build upon Google’s BERT model as any Danish private company, educational institution, NGO or public organisation in need of AI in Danish could greatly benefit from this. This makes BotXO one of the very few companies in Denmark to improve and support Danish AI by publishing open source code. This is no small thing. On one hand, this next step in the development of the Danish BERT model is highly useful for the whole Danish AI and machine learning ecosystem and on the other hand, it provides inspiration to the whole industry, in the direction of democratizing AI and making new updates publicly available, for everyone.

“BotXO might be the single company in Denmark lifting the community with open source code. It is both important and inspirational for the industry”. Jacob Knobel, CEO, AI consultancy firm, Datapult og winner of Forbes 30 under 30, 2016.

But first, what is a BERT model?

BERT is an acronym for “Bidirectional Encoder Representations from Transformers”. It is a deep neural network that can be used for Natural Language Processing (NLP). The network has learnt about Danish grammar and semantics by reading vast amounts of Danish text.

BotXO has trained the most advanced Danish BERT model.

Deep neural networks can be used for Natural Language Processing (NLP). Photo: ThVideostudio @ Envatoelements.

How much text has the Danish model read?

When working with AI language models, part of the challenge is to collect huge amounts of text needed to make an extensive model. BotXO has managed to overcome the obstacle by turning the model into a massive bookworm.

BotXOs Danish BERT model has read 1.6 billion words, equivalent to more than 30.000 novels. Although this might sound like a lot, the model could have read even more, but it is difficult to find much more publicly available Danish text.

What can a BERT model be used for?

The general language understanding capabilities of the model can be used as the first step in text analysis pipelines. The model reads texts and returns vectors, which are points in a coordinate system. The shorter the distance between the points returned by two different texts is, the more equivalent their meaning is. Thus, the vectors can be used to figure out if different pieces of text are related. By combining the model’s general language understanding with e.g. data and knowledge of the positivity and negativity of the texts, the BERT model can help with sentiment analysis, entity extraction and all the other disciplines in Natural Language Processing.

The Danish BERT model can be used for sentiment analysis in Danish. For instance, it can analyse different prejudices in a text, define the purpose of a text, context and point out relevant words. This is useful to multiple industries such as e-commerce, finance, the tech industry and the public sector.

Why is it so important to Denmark?

At BotXO we believe that it is crucial for countries with ‘smaller’ languages, or less widespread languages, to secure themselves and make sure that they have a part in the global economy. They do this by using and taking advantage of the endless opportunities that come with Artificial Intelligence. Furthermore, we think that it is important that it is not only up to the big international players to determine where, when and how Danish organisations can benefit from these technological achievements, as this would pose the risk for Denmark to be left behind in the AI race.

“It’s vitally important for people in Denmark to have access to the benefits that language technology has brought to the English-speaking world, and seeing game-changing advances like Danish BERT come from the commercial sector, through BotXO, is a hugely positive sign. It clearly puts the company ahead of the curve in today’s Danish AI.” – Leon Derczynski, PhD., Associate Professor, Natural Language Processing, Department of Computer Science, IT-University of Denmark.

BotXO is with the new Danish BERT model making sure that Denmark is not being left behind in the AI race and putting Christiansborg Palace in Copenhagen on the AI map.

BotXO is with the new Danish BERT model making sure that Denmark is not being left behind in the AI race. (Christiansborg Palace in Copenhagen Denmark.) Photo: stevanovicigor @ Envatoelements.

Why do we need a Danish BERT model?

Google has released a multilingual BERT model, but it is trained in more than a hundred different languages. Danish text, therefore, only constitutes 1% of the total amount of data. The model has a vocabulary of 120,000 words*, thus, it only has room for about 1200 Danish words. BotXO’s model, on the other hand, has a vocabulary of 32,000 Danish words.

(* In fact, “words” are a bit inaccurate. In reality, it functions in a way that the model divides rare words so that “Inconsequential” for example, becomes “In-”, “-con-” and “-sequential”. As these kind of word divisions occur among different languages, there are room for more than 1200 Danish “words” in Google’s multilingual model.)

It requires a lot of power to learn from so much collected text.

It requires a lot of power to learn from so much collected text. Photo: grafvision @ Envatoelements.

How does the model learn from text?

The model learns in two different ways:

First, it reads a sentence, e.g. “I like Chinese food, especially spring rolls.”. Then, it hides some of the words from itself: “I like [HIDDEN] food, especially spring rolls.”. Then, it tries to guess the hidden word. If it guesses wrong, it adjusts so that it gets better the next time. If, on the other hand, it guesses correctly, then the model knows that it has understood the meaning of the text. In the example, the model learns that spring rolls belong to the Chinese cuisine.

Afterwards, the model would read the next sentence in the text, for example: “That’s why I often do my grocery shopping in the Asian supermarket”. Then the model also reads a random sentence from another book: “At 19 o’clock, Mads Jensen was arrested”. The model then tries to figure out which of the two sentences is the correct one that would logically follow the first sentence, “I like Chinese food, especially spring rolls.”.

How can we use the BERT model?

In line with our mission at BotXO to develop and make Danish AI publicly available, it only made sense that the Danish BotXO BERT model would be open source. This means that others can further develop it and use it to improve their products and services as well as producing new solutions.

The model and the instructions for data scientists and engineers are available for free here: We hope that you will support Danish AI by sharing the link in your organization. If your organization needs something industry-specific and you don’t have the time, ability or resources to build it yourself – aka you’re not a developer – we can set it up for you on our platform. Just get in touch with us at

Follow our blog to keep an eye on the latest AI news, chatbot best practices, and much more.