Top Open Source Data Annotation Tools That Should Be On Your Radar

essidsolutions

Whether you are developing a machine learning model for a self-driving car or a product recommendation app, the first step is to label your training data. In other words, every dataset should be labeled (or annotated), so when the model is deployed, it will recognize similar datasets in unannotated data and take the appropriate action. Whether you end up with a high-performing ML model or a failed project will depend upon the data annotation tools you use to label your data.

What Are Data Annotation Tools?

According to Cloud FactoryOpens a new window , a data annotation tool is “a cloud-based, on-premise, or containerized software solution that annotates training data for machine learning.” While many commercial tools are available for purchase, open-source or freeware data annotation tools are often preferred. Not only are they available at no cost, but they also let you customize your tools’ annotation accuracy thresholds and security features. 

Baseline Features to Look for in Open Source Annotation Tools

When choosing among the myriad of open-source projects, you should evaluate them in terms of the following:

Dataset management

To derive the most out of your machine learning project, you should ensure that the annotation tool should be capable of working with the file types you will need to label at all times. You should be able to search, filter, sort, clone and merge datasets, whether they are stored locally or in the cloud.

Different tools store annotations in different output formats, such as Pascal, TFRecords, or text files (CSV, txt), to name a few. You should choose the tool that meets your format requirements; otherwise, you will need to spend additional time converting your annotations to your target format. 

Annotation methods

Ensure that the tool’s annotation methods for building and managing ontologies, such as classes and attributes, meet your particular use case requirements. While many tools can work with many different use cases, others focus only on specific types of labeling. As per Cloud Factory, your chosen annotation tool should be able to annotate images for all kinds of computer vision tasks you will employ, such as classification, object annotation, or semantic segmentation. 

Application

As far as the annotation app itself is concerned, not all tools can be used both online and offline. While some tools can function as Windows apps and web-based apps, most of them are web-based only. So it would be best if you choose accordingly.

Also, consider privacy issues that you may incur before considering web-based-only tools. Working with a 3rd party web app may expose your system to a data breach. Likewise, you will want your tool to prevent unauthorized viewing or downloading of your data by annotators. In short, ensure that the annotation tool will help you maintain any regulatory compliance requirements your use cases fall under.

Efficiency

Finally, look for tools that will include hotkeys and a user interface to make manual annotation more efficient and less time-consuming. 

Learn More: How to Improve the Accuracy of AI Systems With Diversified Data

Five Widely Used Open Source/ Freeware Tools to Consider

Computer Vision Annotation Tools (CVAT)

CVAT can perform both image and video data annotation and can be installed in the local network using Docker or locally on any operating system. You can also work with it entirely online from CVAT’s website. With CVAT, you have a variety of annotation shapes to choose from, including everything from rectangles, polygons and polylines to points, cuboids, tags and tracks. It also supports a wide range of annotation formats such as CVAT, Pascal, XML, MS COCO, YOLO and TFRecords. Hotkeys and semantic segmentation are also supported. 

Data quality for annotation can be set from very high full resolution to completely compressed. Among its collaborative features is its capability to divide annotation tasks among team members and monitor, visualize, and analyze annotation jobs. It also supports automated annotation using pre-trained models.

LabelImg

LabelImg has a Qt graphical interface that you can install locally on any operating system. It is available for Windows/Linux/Ubuntu/Mac and as a Python library in Anaconda or Docker. It supports a number of output formats such as Pascal, YOLO’s txts, CSV and TFRRecords. It supports hotkeys and image verification but supports only the bounding box annotation shape and has no browser support.

VGG Image Annotator (VIA)

VIA can be run through a browser window and can label image, audio, and video data. Annotation shapes supported include bounding boxes, circles, ellipses, as well as polygons, points and polylines. You can also use it for text annotation. Supported output formats include COCO JSONs, Pascal and CSVs. Exporting to other formats will require additional external transformations. Along with hotkeys, VIA includes project management functionality for setting up multiple jobs for annotators and tracking their progress.

Visual Object Tagging Tool (VoTT)

VoTT can import data from both local and cloud storage and export labeled data back to local or cloud storage. It can run from source or on Windows, Linux or OSX. It is also available as a standalone web application that can run on any web browser. However, the web app requires that the dataset be uploaded to the cloud as it cannot access a local file system. It supports two types of annotation shapes: polygons and rectangles. Features include project tracking metrics and keyboard shortcuts. Along with common output formats CSV, Generic JSONs, Pascal, and TFRecords, VoTT also supports Microsoft Cognitive Toolkit (CNTK) and Azure Custom Vision Service.

CoLabeler

CoLabeler is a freeware tool that is free to download, install, use, and share like its open-source counterparts. It uses bounding box and 2-D point annotation shapes and also supports text annotation.

Learn More: Why Machine Learning Accuracy Matters and Top Tools to Supercharge It

Key Questions to Ask Vendors When Making the Move to Commercial Solutions

Most small/medium-sized work teams prefer free open-source tools. However, there may come a time when commercial solutions will become a better value. For example, open-source tools are difficult to scale, as these tools typically do not offer the workflow features necessary for enterprise-scale teams working on data annotation. Additionally, while they are not free, commercial solutions can greatly reduce the cost of ownership related to open-source tools–such as workflow development and ongoing support that are typically built into commercial tools. 

However, unlike self-built, open-source tools, vendors don’t typically build their tools to a customer’s specifications. You must decide what custom features you are willing to forego both now and in the future. With that in mind, CloudFactory lists some key questions you should ask vendors before making a move to a commercial solution. For example, how does a vendor’s tool differ from other commercially available tools? What aspects of the machine data labeling process does their tool support? Are they open to making changes and feature enhancements to better serve your use cases?

In terms of dataset management, what features do they offer? Where can files be stored? What volume of data can the tool handle? Will you be able to upload pre-annotated images into the tool? 

Does the tool come with an API and/or SDK? Can you upload custom-built classes and attributes into the tool? Can your own algorithms be plugged into the tool? Finally, what enterprise features are built into the tool, including security compliance or certifications, quality control, quality assurance or AI?

Bottom line

To have the control, data security, and agility to make feature enhancements or other changes, open-source tools that are self- built and managed may end up being the best option over commercially produced tools. 

Do you think open-source data annotation tools are as effective in improving the accuracy of ML projects as commercial ones? Comment below or let us know on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to hear from you.