Creating a SharePoint page translation extension using Azure Cognitive Services

Robin Agten
delaware
Published in
5 min readMay 5, 2020

--

In this blog post I will explore how to build a Translation SPFx application customizer extension. The goal of this extension is to translate the content of a SharePoint page using Azure Cognitive Services. There are browser extension available to translate web pages, but they mainly focus on translating the entire UI. In this case we are only interested in translating the actual content of a SharePoint page.

About the Azure Translator Text API

The Translator Text API is part of the Azure Cognitive Services portfolio and is described by Microsoft as follows:

Microsoft Translator API is a neural machine translation service that developers can easily integrate into their applications websites, tools, or any solution requiring multi-language support such as website localization, e-commerce, customer support, messaging applications, internal communication, and more.

I will not discuss the working of this API in this post but it is pretty well documented here: https://docs.microsoft.com/en-us/azure/cognitive-services/Translator/

The SPFx application customizer

As already mentioned, I am using an SPFx application customizer extension to inject a translation bar on all the SharePoint pages. The available languages can be configured in the extension properties. Once a language is selected, the page will be translated and a disclaimer will be shown. After a translation you also have to option to reload the original version. The following steps are needed to make the translation work:

  1. Setting up the Translator Text API
  2. Detect the language of the page
  3. Get the available languages
  4. Get the Text web parts on the page
  5. Translate the content of the Text web part

Setting up the Translator Text API

To make this solution work, A Translator Text Azure resource is needed. There are different pricing tiers available and there is also a free plan that gives you 2M chars of any combination of standard translation and custom training per month.

Once you have the Azure Resource, you will get an API key which you can use to make the API calls. The API is served out of multiple datacenter locations. It is possible to target a specific region using the base URL of the API.

Both the API key and the region specifier are configurable in the application customizer.

Detecting the language of the page

When the page is loaded, the current language is determined based on the page description (or the page title if the description field is not populated). The PnPjs V2 library is used to fetch the current page item.

const page = await sp.web.lists
.getById(this.props.currentListId)
.items
.getById(this.props.currentPageId)
.select("Title", "FileLeafRef", "FileRef", "Description")
.get();

The description is then passed to the ‘/detect ‘ endpoint. This will return an array with possible languages, including a score and a flag indicating if translation is supported. An example is shown below:

[
{
"language": "de",
"score": 0.92,
"isTranslationSupported": true,
"isTransliterationSupported": false,
"alternatives": [
{
"language": "pt",
"score": 0.23,
"isTranslationSupported": true,
"isTransliterationSupported": false
},
{
"language": "sk",
"score": 0.23,
"isTranslationSupported": true,
"isTransliterationSupported": false
}
]
}
]

Get the available languages

The available languages to translate to are also configurable. The ‘/languages’ endpoint of the API is used to fetch the available languages. The available languages from the API are then mapped to the list of languages provided in the configuration. If a language is not supported by the API it will not be available in the list of translation options. A list of supported languages can be found here.

Get the Text web parts on the page

For this solution we only want to translate the actual written content on the page and no other UI elemets. To do this, the PnPjs V2 library is used to first load the client side page, and then load all the Text web parts:

sp.web.loadClientsidePage(relativePageUrl).then(
async (clientSidePage: IClientsidePage) => {
// Get all text controls
var textControls: ColumnControl<any>[] = [];
clientSidePage.findControl((c) => {
if (c instanceof ClientsideText) {
textControls.push(c);
}
return false;
});
});

Translate the content of the Text web part

Once all the Text web parts are loaded, the corresponding HTML element is fetch based on the client web part Id:

const element = document.querySelector(
`[data-sp-feature-instance-id='${textControl.id}']`
);

Before we can do the actual translation we need to check the inner HTML length. The ‘/translate’ endpoint can only handle 5000 characters at a time. If this is the case the HTML child elements are looped and each child element is translated separately. If the inner HTML for an element without any children is still bigger then 5000 characters (one big paragraph of more then 5000 characters for example) the ‘/breaksentence ‘ endpoint is used. This will return an array of numbers each representing the length of the sentence. Using these numbers we can split up the inner HTML string and translate each sentence separately. A recursive function will check the length and translate accordingly:

private _translateHtmlElement = async (
element: Element,
languageCode: string
): Promise<void> => {
// If inner HTML >= 5000 the API call will fail
// translate each HMTL child node
if (element.innerHTML.length > 4999) {
const childElements = [].slice.call(element.children);
if (childElements.length > 0) {
for (const childElement of childElements) {
await this._translateHtmlElement(
childElement,
languageCode
);
}
} else {
// Fallback: translate each sentence individually if the
// the length of one html tag is longer then 4999 characters
const breakSentenceResult = await this.props.translationService.breakSentence(element.textContent);
let startIndex, endIndex = 0;
const fullTextToTranslate = element.textContent;
for (const sentenceLenght of breakSentenceResult.sentLen) {
endIndex += sentenceLenght;
const sentenceToTranslate =
fullTextToTranslate.substring(startIndex, endIndex);
const translationResult = await
this.props.translationService.translate
(
sentenceToTranslate,
languageCode,
false
);
element.textContent = element.textContent.replace(
sentenceToTranslate,
translationResult.translations[0].text
);
startIndex = endIndex;
}
}
} else {
const translationResult = await
this.props.translationService.translate(
element.innerHTML,
languageCode,
true
);
element.innerHTML = translationResult.translations[0].text;
}
}

Recources

Originally published at http://digitalworkplace365.wordpress.com on May 5, 2020.

--

--