Data Quality in the Internet of Things

By on Aug 29, 2013

“On the Internet, nobody knows you’re a dog”.  More appropriately: “On the internet, nothing knows you’re a dog”.

An emerging risk in the Internet of Things (IoT)  is associated with the quality of data related to “things”.  The risk is not that the data on or in the thing is corrupted itself. The risk  it is that the meta-data sources on the Internet referenced by the thing are fouled in one way or another, and critical decisions are made by people or machines based on bad data.

This past week I was attending an IoT standards meeting: ISO/IEC JTC 1 SWG IOT.  I was fortunate enough to meet many interesting and knowledgeable people working on the IoT, including someone from the barcode (aka GS1 – community.  Barcodes have been around for 40 years and represent possibly the earliest, most successful example of  creating “smart things”.  The codes lead retailers and consumers to a database of information describing much more about the product than what is necessarily on the label.   Barcodes enable efficiency gains from inventory management to shipping to checkout.

A barcode is essentially a machine-readable number identifying a manufacturer and product.  Barcode numbers and related meta-data are managed by GS1 on an international basis, which allocates the manufacturer prefix, registers the product ID suffix assigned by the manufacturer. GS1 then maintains the meta-data associated with the barcoded number.  Product meta-data might include information related to ingredients, product bulletins, points of contact, etc.   GS1 also is developing a publicly accessible, on-line database of the meta-data called GS1 source.   A relative of the barcode is the trendy but unmanaged QR (Quick Response) code, which typically encodes text-based URLs for accessing more information.

There are many free, third-party barcode and QR scanning applications on the Internet, allowing people and automated systems to extend the IoT to goods through scanning barcodes and QRs.  However, the methods and sources by which  most of these third-party applications access product meta-data is ad hoc, and therefore subject to question.

A data quality vulnerability develops when third-party applications operated by people or integrated into processes (machines), use barcodes and/or QR codes for automated decisions without reliable reputation, attribution and provenance information.

Data quality can be critically important for IoT services involving physical changes triggered by the introduction of products: for example, allergic reactions (food preparation or delivery),  environmental conditions (air quality, water quality), chemical reactions (cooking, process control catalysts).

Some of the threats in the IoT associated with product data quality and proliferating third-party applications include:

  • Inappropriate summarization of product data to serve application-specific purposes – like memory consumption.
  • Application coding errors which parse product data incorrectly, and inadvertently change, truncate or omit information.
  • Applications designed to execute (complete the task) in lieu of sufficient information – rather than “fail safe”. (“Git ‘er done!”)
  • Applications that link product codes to falsified information for competitive or fraudulent purposes.

For instance, we know QR codes are being employed to propagate malware on the Internet, and many third-party barcode reader-apps offer information that is clearly not from product manufacturers.  So who is in charge of monitoring or mitigating harm caused by third-party applications leveraging the growing wealth of barcode and QR supplied meta-data in the IoT?  No one.

Until the day the Internet is governed by unified international laws and regulations,  the reputation of data sources in the IoT will become a larger and larger issue.

Data quality is arguably about source reputation of the source, and reputation is a tough thing to judge without the perspective of massive intelligence and sampling, such as McAfee Global Threat Intelligence.

The conclusion of this post is that reputation and threat intelligence has a significant role in the IoT.

Postscript: where do you apply intelligence about data quality in the IoT?  There are options: at the device, at the gateway or in the network.   We will explore these in a follow-up posting.


Leave a Reply

Your email address will not be published. Required fields are marked *