Note: This blog post was originally written in Japanese for our Japanese website. We used our machine translation platform Translation Designer to translate it and post-edit the content in English. The original Japanese post can be found here.

Have you ever heard the word annotation? The word "annotation" appears in a variety of fields. In PDF software such as Adobe Acrobat, it refers to a function that allows you to add comments, lines, circles, and other symbols to files. Annotation as used in the field of genes and genomes also means annotating gene sequences.

What you associate annotation with will likely change completely depending on what industry you belong to, but this time, we will explain the meaning and use of annotation in data annotation and artificial intelligence (AI) and machine learning.

The meaning of annotation

Annotation in the translation industry refers to labeling, or attaching metadata, to data such as text, images, and videos. Labeling makes it possible to give meaning to data and associate one piece of information with another. This makes it possible to classify and analyze data and to create learning data for AI.

Main use

One area where annotations are in particularly active use these days is in machine learning for AI. A large amount of annotation data is used as training data.*

*Training data is data that allows AI to perform machine learning. Based on this data, AI learns and judges what is and what is not correct.

It’s not possible to have AI learn from data that has no meaning itself. In addition, even if the AI learns with data that is inaccurate, there will be no point as the AI will just output results with poor accuracy. It is extremely important to prepare a large amount of usable data and train the AI with them. For this reason, it is essential to accurately annotate the data and create correct data that the AI can learn from.

The use of annotations mentioned above has been introduced in various media outlets, but in this post, we'd like to share annotations that are not often brought up in mainstream media — annotations that may help your daily work. We’ll be focusing on annotations for utilizing and managing your data.

Annotations for utilizing and managing data

In addition to using annotations in machine learning like mentioned above, they can also be used for data utilization and management. For example, you can use annotations to:

1. Convert images and audio into recognizable text data, and then extract necessary terms and information to create a list.
2. Add metadata to images and audio, and then classify the data according to the desired categories.
3. Extract only adjectives and adverbs using morphological analysis, and then classify them into positive and negative responses. 

Regarding example 1, it is not possible to recognize text in scanned documents such as those that were handwritten. Optical Character Recognition (OCR) is performed using a dedicated tool to enable character recognition. Audio and video data can also be converted to text by using speech recognition technology.

Documents and data digitized by character recognition and voice recognition technologies are superior from a data management standpoint. By having data annotation, you can create an at-a-glance list of data that extracts the necessary terms and information. This extraction process uses special tools and techniques for annotation, so it is performed with much higher accuracy and efficiency than doing it manually.

Regarding example 2, data such as images and audio usually contain certain metadata when they are created. Metadata, in this case, is information that describes the contents of the data.

For example, if you check the details of an image file of a photo taken with a smartphone, you will see the date and time the photo was taken and where it was taken. This is automatically added, but with no other information, it is insufficient to apply in other use as data.

Therefore, it is necessary to categorize and define the data first, and then implement the annotation process. By doing this annotation work, it is possible to create a classification list of image and audio data, enabling data management and data utilization that precisely match your needs.

As for example 3, when morphological analysis* is used in combination with annotation, for example, you can classify social media posts into positive and negative responses and keep record of the count. Analyzing the aggregated results can be used for product development and improvement.

*A technology to divide sentences into words and determine parts of speech — it is possible to extract useful information from sentence data.

What is common among all of the above examples is that if you use annotations wisely, you can greatly reduce the time spent on data confirmation and data management. We support our clients by providing data annotation services along with the translation work. Our services are not limited to Japanese and English, so feel free to reach out to us with annotation services for documents in any language you have to deal with!

Kawamura's annotation services

Some of you might not have been familiar with the term “annotation,” but we hope this post gave you a hint of what to expect.

As a company comprised of technology-savvy linguistics specialists, we provide annotation services such as those mentioned above. We have a dedicated team to develop technology that specializes in languages, providing support that understands and considers the characteristics of the content.

Even if you don’t know where to start with using annotations, feel free to get in touch with us. We will work with you to find a solution to your problem. We are happy to start the conversation with you at any time.