Starting small with Big Data

5 min read

‘Big Data’ sits high up in the premier league of buzzwords that can confuse discussions in today’s shipping industry, but it remains true that the amount of data harvested from ships continues to rise rapidly.

Of the 380,000 or so messages from 5,600 vessels that pass through GTMaritime’s filtering solutions on a typical day, for example, the proportion with data payloads now far exceeds chatter between ship’s officers and engineers and shore-based colleagues. This reflects a broader shift in how business is done in an increasingly digital world.

Proponents of digitalisation are prone to see Big Data as a silver bullet capable of solving multiple problems confronting ship owners, from optimising port-arrivals to maximising vessel utilisation, predicting machinery failures or identifying inefficient fuel consumption. However, for all the fervour, it is a bullet at risk of missing the target completely.

Alluring as it is, ‘Big Data’ is also amorphous. Translating the promise of analytics and algorithms into outputs robust enough to work and keep on working in shipping’s harsh and unpredictable environment therefore demands a clear view of purpose and priorities.

Paradoxical as it seems, the best way forward is to start small. Rather than gathering data for its own sake, the endgame should be clear at the outset so that it is possible to work backwards to determine how to reach it.

At some point, the data beneficiary will have to decide whether to build an in-house solution or to work with third-parties. While home-brew solutions can be tailored to meet precise needs, they can also be hard to maintain over the long-term – especially if in-house developers move on. There can also be a temptation to bolt on new functionality over time for quick fixes to become permanent. Either can create a coding security risk. 

Despite it being a new field, there is thriving digital start-up culture in the marine domain. If acceptable compromises can be found, turning to an external provider can be simpler, while also saving internal costs.  

Another important decision concerns how data is hosted. Even if starting small, it pays to think ahead and lay the foundations for future evolution and development. In this respect, a cloud-based hosting solution offers greater long-term flexibility than a company setting up its own database servers. Modern hosting systems provide granular control over who can access and manipulate what data, as well as well-defined APIs, or software hooks, for easier integration and data retrieval with third-parties.

A golden rule of data management is to store any piece of information only once to minimise risk of data conflicts/inconsistency and to ensure transparency. A dedicated hosting platform will serve as a single point of truth able to share consistent data across multiple applications as requirements grow or change. Moreover, a dedicated hosting platform will have stronger safeguards against attempted data theft or attack by cyber-criminals. In short, it will be less hackable.

All of this brings us to data itself which, in many cases, continues to be wasted, lost or scattered across different parts of the business. Even if initial plans to employ analytic tools are relatively modest, it should be clear that the data supporting the analytics represents a valuable asset which, regardless of its source or whether it’s needed immediately, merits care. Irrespective of individual points of view on data-crunching today, the analytical approach to ship operation is here to stay while it is a general rule of machine learning that the more historic or training data that is available, the greater its potential.

A recurring question during the initial development of Big Data projects concerns data dirtiness – or cleanliness. A common refrain among software developers is that garbage in results in garbage out. While there is truth in this axiom, seeking data perfection is both futile and unnecessary. Instead, data needs to be fit purpose and, as in so many things, prevention is better than cure. Stopping data getting ‘dirty’ at source is easier than attempting to fix it afterward. Correctly calibrating sensors and fitting them consistently across the fleet provide a sound basis for comparing like with like, for example.

In short, Big Data holds enormous potential for shipping and the whole maritime supply chain but unlocking its benefits depends on setting realistic goals and good preparation. This pragmatism provides the basis for GTMaritime’s approach to ensuring that its clients sail safely into the digital era, and for its new product and service development.