But getting that metadata is not so simple. While there are products that claim to address this need, they have taken a very academic approach. Indeed, the concept of “information about information” raises the question of what “information” is in the first place. So these metadata products have focused on defining concepts and how the concepts relate to each other. While these concepts eventually connect with actual information, the connections are tenuous. This metadata must be manually entered by humans and is therefore subjective, incomplete, subject to human error, and inevitably obsolete, since it trails the real systems, which are changing constantly.
Ab Initio has taken a very different approach by focusing on operational metadata. Operational metadata is actionable by business and IT management. It is about the systems that process data, the applications in those systems, and the rules in those applications. It is about the datasets throughout the enterprise, what is in them, how they got there, and who uses them. It is about the quality of the data, and how that quality has changed over time. It is about all the many things in all the IT systems.
Ab Initio also ties this operational metadata together with business metadata – business definitions, created by business people, of the various pieces of information throughout the enterprise. The result is a true enterprise metadata management system – the Ab Initio® Enterprise Meta>Environment®, or EME®.
An enterprise metadata management system must be many things to many people:
There is no end to these kinds of questions. Getting useful answers quickly is essential. These are questions that can be answered with the Ab Initio Enterprise Meta>Environment.
The term “metadata” has different meanings across industries. Ab Initio uses the term “metadata” in the context of the business computing world. In the image processing world, for example, it means something altogether different: information such as when an image was captured, what kind of device took the picture, what the lighting was, and so on. Web pages have metadata too, this being the language the page was written in, the tools used to create it, and how to find more information on this topic.
Ab Initio’s metadata graphical user interface, the EME Metadata Portal™, allows one to start at any point in the system and explore in any direction one chooses. All this metadata is presented at the appropriate level of detail for each audience. Business users are not overwhelmed with technical minutiae when they are trying to answer business questions, while developers and operational staff can easily find the details of interest to them.
Consider a file that the EME has identified as the ultimate source for a calculation used in a report. What can the EME tell you, a user, about this file? Through Ab Initio’s approach of relating elements of metadata, one to the other, you can glean interesting and important information about the file from the intuitive graphical interface, including:
Below is a screen shot of the EME in the process of navigating metadata. The underlying screen is a lineage diagram that displays a number of datasets and their processing relationships. Each of the overlays shows different types of metadata that have all been linked together to the same metadata element.
Capturing so much metadata and storing it in separate buckets would be an accomplishment in and of itself, but the EME does more than that. It establishes relationships between elements of metadata, which effectively enriches their value, revealing deeper meaning about the business to the real-world users of metadata at a company.
The challenge, of course, is how to gather all this metadata in a way that is actually useful. In large, complex organizations with heterogeneous, distributed (even global) environments, this challenge is particularly hard. There are issues of scalability and integration. How to gather metadata from such a disparate set of sources and technologies? How to process so much information? How to store it and display it intelligently, without overwhelming the user or dumbing down the content? How to marry metadata across lines of business, countries, even languages?
The EME integrates all the different kinds of metadata stored in it and, as a result, multiplies the value of each. For example, this integration enables end-to-end data lineage across technologies, consolidated operational statistics for comprehensive capacity planning, and fully linked data profile statistics and data quality metrics.
To begin with, all information about the definition and execution of Ab Initio applications is automatically captured and loaded into the EME. This includes business rules, data structures, application structure, documentation, and run-time statistics. Because users build end-to-end operational applications with the Co>Operating System®, everything about those applications is automatically captured.
This metadata is then integrated with external metadata through a combination of the EME’s Metadata Importer and sophisticated metadata processing with the Co>Operating System.
Ab Initio’s support for combining metadata from multiple sources allows metadata from one source system to be enriched with metadata from other sources. For example, the Metadata Importer might load the core details of database tables and columns from a database catalog, then enrich the metadata with descriptions and logical links from a modeling tool, and finally link the imported metadata to data quality metrics. The Metadata Importer can load external metadata such as:
Non-standard and custom metadata sources can also be imported and integrated into the EME. Users can apply the Co>Operating System’s powerful data processing capabilities to arbitrarily complex sources of metadata. The Co>Operating System can extract metadata from these non-standard systems, process it as necessary, and load and integrate it with other metadata in the EME.
The EME integrates a very wide range of metadata and is fully extensible. The home page of the Metadata Portal allows the user to directly navigate the type of metadata of interest:
From this page you can select an area of interest and dive in to see:
Metadata about projects and applications. The EME stores and manages all information about Ab Initio projects and the applications they contain. Projects are organized in hierarchies and can be shared or kept private. The EME keeps track of which projects reference other projects, as well as tracking all objects within a project.
Details about application versions. The EME maintains complete version information and history about every detail of Ab Initio applications. Differences between versions of graphs, record formats, and transform rules are displayed graphically. Users can see details about the exact versions that are being used in production.
Users, groups, locks, and permissions. The EME provides access control management for all metadata. Furthermore, as part of a complete source code management system, the EME’s exclusive locking mechanism for whole applications or pieces of applications prevents developers from interfering with each other.
Hierarchical organization of metadata. Metadata can be organized into arbitrary hierarchies and folders to help capture business meaning and to provide targeted navigation.
Data dictionaries. The EME supports the creation of one or more data dictionaries or conceptual data models. Data dictionaries can be a simple hierarchical list of business terms, or a more complex semantic model with complex relationships between business terms.
Enterprise-wide deployments typically have multiple data dictionaries – one for each division or product area, as well as an enterprise model. In the EME, divisional business terms link directly to columns and fields, and have relationships back into the enterprise model. This allows companies to harmonize business concepts across the enterprise without forcing each division to abandon its own data dictionary.
Metadata from reporting tools. The EME imports metadata from all the major business intelligence (BI) reporting tools, including MicroStrategy, Business Objects, and Cognos. This includes details about reports and report fields, as well as internal reporting objects such as Facts, Metrics, Attributes, and Aggregates. Lineage queries can trace the calculations of various report fields back through the BI tools into the data mart or data warehouse, and from there all the way back to the ultimate sources.
Metadata from database systems. The EME imports metadata (schemas, tables, columns, views, keys, indices, and stored procedures) from many database systems. The EME performs lineage analysis through multiple levels of views and stored procedures. For large database systems, the EME is often the only way to understand the interrelationship of database tables, views, and procedures – especially for impact analysis queries, table reuse exercises, and consolidation projects.
Metadata from files. The EME imports metadata about files, including complex hierarchical record formats such as XML and COBOL copybooks.
End-to-end data lineage. The EME builds complete models of the flow of data through an enterprise by harvesting metadata from a large number of different operational systems, reporting tools, database systems, ETL products, SQL scripts, etc. This integrated model allows users to query the system about data lineage – how data was computed, and what is impacted by a change.
System diagrams. The EME stores graphical pictures that can represent system diagrams or other diagrams of metadata organization. In the Metadata Portal, clicking on a “hot-linked” graphical item within a diagram navigates the user to the connected metadata object.
Logical models. The EME imports logical and physical models from common modeling tools. It models links from logical models to physical models, which are then merged with the schema information in the actual databases.
Domains and reference data. The EME stores reference data, including domains and reference code values. It can be the primary manager for certain reference data, or can track and maintain a copy of reference data from a different system. It also supports code mappings between logical domain values and multiple physical encodings.
Data profiles. The EME stores data profile results and links them with datasets and individual fields. Many statistics are computed, such as common values and data distributions. These statistics can be computed on demand or automatically as part of an Ab Initio application.
Operational statistics. The Co>Operating System produces runtime statistics for every job and for every dataset that is read or written. These statistics can be stored in the EME for trend analysis, capacity planning, and general operational queries.
Data quality metrics. To support a comprehensive data quality program, Ab Initio computes data quality statistics and error aggregates and stores them in the EME. The EME can analyze and display data quality metrics for datasets and for collections of datasets. Data quality metrics can also be combined with data lineage to see a “heat map” showing where there are data quality problems in the enterprise.
Pre-development specifications. The EME can capture mapping specifications as part of the development process. The Metadata Portal allows analysts to specify existing or proposed sources and targets along with arbitrary mapping expressions. By using the EME for defining mappings, users can see how the mappings fit into a larger enterprise lineage picture.
These specifications can then be used to guide a development team and to permanently record requirements. After production deployment, the EME will continue to show these specifications in lineage diagrams alongside their actual implementations.
Data masking rules. The EME stores data masking rules, which can then be applied to data flowing through Ab Initio applications. Ab Initio provides many built-in rules, and users can define their own custom masking algorithms. These rules can be associated with fields or columns, or with business terms in the conceptual model. When linked at the conceptual level, data masking rules are automatically applied to the corresponding physical columns and fields.
Data stewards and metadata about people and groups. The EME stores metadata about people and groups. This metadata can be linked to other metadata objects to document data governance roles such as data stewardship. Metadata about people and groups can be automatically imported from external systems such as corporate LDAP servers.
Built-in and custom metadata reports. The EME provides many built-in reports. Users can also define custom reports that run against metadata stored in the EME and that are accessible from the Metadata Portal.
Custom metadata. Users can extend the EME schema to allow a wide variety of additional metadata to be integrated into the EME. Schema extensions include the addition of attributes to existing objects, as well the creation of new metadata objects that can be linked to other existing metadata. Users can easily customize the EME user interface to allow for tabular and graphical views on both standard and custom metadata.
The EME is an open system based on industry-standard technologies:
The EME provides sophisticated governance processes that can be customized to meet the needs of large enterprises.
For technical metadata (applications and business rules), the EME supports a complete source code management system, with checkin/checkout, locking, versioning, branching, and differencing.
For business and operational metadata, the EME comes with a built-in metadata governance workflow, including work queues, approvals, and audit trails. The EME can also interface with external approval workflow tools. The EME’s proposal/approval workflow mechanism is based on changesets. Users create changesets to propose metadata additions, updates, and/or deletions, and then submit them for approval.
Below is a screen shot of the changeset submission process:
When a user submits a changeset for approval, the EME sends an email message to the appropriate metadata stewards. These stewards can inspect the proposed changes and approve or reject them. If approved, the changeset is applied and becomes visible to the general user population.
The EME also supports integration of changesets via its web services API, as well as with external workflow approval/BPM systems, such as Oracle’s AquaLogic. In this case the external workflow system is responsible for communicating items in work queues, documenting communications, managing escalations, and resolving final status.
All approved changesets result in new versions of metadata in the EME. The EME maintains a complete history of all previous versions and their details.
Enterprise metadata management was long a goal of large companies, but an unattainable, impractical one. Passive “repositories” (in many cases simply glorified data dictionaries) held only a fraction of the relevant metadata and soon became stale, out-of-date “islands” of metadata. The organizations that most needed a comprehensive approach to managing metadata – complex, global companies with inherent problems of scalability, with diverse metadata sources, with security issues that cross lines of business, and with huge amounts of information to display and navigate – were the least likely to succeed.
But Ab Initio's Enterprise Meta>Environment has finally made enterprise metadata management possible, even in the largest of companies. Some examples:
The Ab Initio EME didn’t happen overnight, and it didn’t come from an ivory tower: it’s the result of years of serious engagement with companies like these.