Metadata is data about data. It’s like a roadmap giving you a bird’s eye view of everything, without needing to access it directly.
Traditional infrastructure-based approaches are like planning a trip by first driving all available routes before deciding which is best. With a roadmap, the decision is simple and ensures you select the best possible route.
Storage-centric solutions to data management simply cannot provide the intelligence about the data they store, nor were they designed to. Coalescing metadata provides an intelligent roadmap to data management and enables new insights without altering the underlying infrastructure.
Every digital file contains multiple types of metadata. Many systems manage the file-system metadata describing the basic attributes, such as file size, location, name, when it was last modified.
Very few systems are able to classify and manage the richer and more descriptive metadata that enrich the roadmap, and give you more information to work with.
These richer metadata types include: geospatial metadata found in satellite imagery, other unique metadata found in MRI files, genome sequences, medical records and so on.
So rather than creating a giant data lake of all physical data, why not create a virtual lake of metadata? Why not use the metadata roadmap to plot the journey?
The available metadata from many different data types and data locations can be made centrally searchable, from which new patterns and discoveries can be made.
This approach can drive decisions about how to manage data, without needing to physically move it, or to alter the underlying storage infrastructure.
This approach provides the Research Data Storage Infrastructure (RDSI) project with the ability to manage 55 petabytes of nationally significant data across eight nodes (data stores), representing hundreds of data sets, from multiple higher education institutions across Australia.
Mediaflux, the powerful data management platform from Arcitecta, is the engine that leverages the power of metadata to enable seamless collaboration across these eight sites.
Although each location has its own data centres, with different storage environments and use cases, researchers can now search across all sites as though everything were consolidated into a single infrastructure.
Mediaflux harvested the metadata of all kinds of files, enabling rich query and data mining across otherwise incompatible environments.
Whether in a small enterprise or a nationwide research network such as RDSI, Mediaflux empowers organisations to create a virtual metadata lake to get the advantages of Big Data methodologies without recreating their entire infrastructures – addressing the issue of data variety, not just volume and velocity.
By Jason Lohrey, CTO, Arcitecta & Floyd Christofferson, CMO, Arcitecta