Data Intelligence

What is Data Discovery?

Actian Corporation

July 3, 2020

data-discovery-datasets

In this age where data is all around us, organizations have increasingly been investing in data management strategies to create value and gain a competitive advantage. However, according to a study conducted by Gemalto in 2018, it was found that 65% of organizations can’t analyze or categorize all the consumer data they store.

It is therefore crucial for enterprises to look for solutions that allow them to seek out the value of their data from the metrics, insights, and information by facilitating their data discovery journey.

Data Discovery Definition

Data discovery problems are everywhere in the enterprise, whether it’s in the IT, Business Intelligence, or Innovation department. By integrating data discovery solutions, enterprises provide data access to all employees, enabling Data teams and Business analysts to understand and thus collaborate on data-related topics.

It is also very useful for enterprises seeking better compliance management. It allows organizations to know what data is personal/sensitive and where it can be found. In addition, data discovery can bolster innovation, as it unblocks essential information for satisfying customers and gaining a competitive advantage.

From Manual to Smart Data Discovery

For 20 years, before advanced machine learning techniques, data specialists mapped their data using the sole brain power of humans. They critically thought out what data they had, where it was stored, and what are the needs to be provided to the end customer. Data Stewards usually took care of data assets documentation, rules, and standards that guided the data discovery process. In these manual approaches, usually done using Excel sheets, people conceptualized and drew out maps to comprehend their data.

Nowadays, with the advancement of technology, the definition of data discovery includes automated ways of presenting data. Smart Data Discovery represents a new wave of data technologies that use augmented analytics, Machine Learning and Artificial Intelligence. It not only prepares, conceptualizes and integrates data, but also presents it through intelligent dashboards to reveal hidden patterns and business insights.

The Benefits of Data Discovery

Enterprise data moves from one location to another in the speed of light, and is being stored in various data sources and storage applications. Employees and partners are accessing this data from anywhere and anytime, so identifying, locating and classifying your data in order to protect it and gain insights from it should be the priority!

The benefits of data discovery include:

  • A better understanding of enterprise data, where it is, who can access it and where, and how it will be transmitted.
  • Automatic data classification based on context.
  • Risk management and regulatory compliance.
  • Complete data visibility.
  • Identification, classification, and tracking of sensitive data.
  • The ability to apply protective controls to data in real time based on predefined policies and contextual factors.

Data discovery enables enterprises to adequately assess the full data picture.

On one hand it helps implement the appropriate security measures to prevent the loss of sensitive data and avoid devastating financial and reputational consequences for the enterprise. On the other, it enables teams to dig deeper into the data to identify the specific items that reveal the answers and find ways to show answers. It’s a win-win situation.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

SQLite: Not Faster, Not Better, But Cheaper?

Actian Corporation

July 2, 2020

litter of puppies with free balloons

Understanding SQLite’s Total Cost of Ownership (TCO)

Over the past three months, this blog series has explored why developers gravitated toward SQLite for embedded data management. Some developers chose SQLite because members of the extended team knew SQL and wanted to leverage that knowledge to support data management or the extraction of data for visualization and reporting. Most developers, though, adopted it to overcome the limitations of existing flat file management systems.

That all makes sense in hindsight. The adoption of new products and technologies very often turns on the answer to a simple question: is this replacement an improvement over what I’ve got now? Even more categorically, is the new thing faster, better, or cheaper? An ideal replacement would be faster, better, and cheaper but that’s a trifecta that usually eludes us. Rarely does a proposed change take place if at least one of these characteristics is not present, though. So what prompted the adoption of SQLite? Was it faster, better, or cheaper than the available alternatives? And if it was any of these things, is it still faster, better, or cheaper than the database alternatives that are available today?

Faster?

Once, yes, SQLite was faster—compared to operations involving a flat file. Today? Hardly.

SQLite is positively lethargic on several fronts. In a head-to-head in comparison of SQLite and Actian Zen Core, data access via SQL may be comparable, but accessing the same data using the NoSQL API of Actian Zen delivers a performance improvement that is an order of magnitude boost over SQLite. Or consider speed in terms of the kinds of optimized client-server interactions demanded by applications in the realm of modern edge data management. Client-server interactions in IoT and mobile scenarios depend on high-performance data collection and the inline processing of transactions—from multiple external channels. But because SQLite operates exclusively in a serverless mode the data must be transformed (the “T” in ETL) before it can move to or from any server-based companion, such as Microsoft SQL Server. That step not only incurs a measurable performance hit, but it also creates a potential chokepoint that can constrain application scalability. Add into the mix a requirement for data encryption and decryption as part of that client-server transformation—and is there really a question about whether encryption will be required in any modern edge data management scenario?—and you can see the speedometer on the SQLite dashboard slipping further back towards zero.

SQLite was a speed demon in its day—but so was the Intel 80×86 architecture. Need I say more?

Better?

Well, unless you’re still interacting exclusively with the underlying file system, the answer is another easy “no.” We examined the limitations of the SQLite serverless architecture extensively in installments 5, 6, and 7 of this series. While the architecture was a breakthrough at the time, it was also a breakthrough for its time. It met the then-emerging need for a simple mobile and web application data cache. But that’s not today’s need. Today’s mobile and IoT scenarios require an architecture designed for high speed, multi-channel, multi-process, multi-threaded applications. While some early IoT applications were built on an assumption that the vast majority of data would be sent to the cloud for processing and analytics—a scenario in which SQLite seemed viable as a local data cache—it has become apparent that the underlying assumption itself was flawed. With the emergence of a modern edge data management topology in which analysis and transaction processing can take place at the edge rather than deep in the cloud, an optimized client-server architecture designed for streamlined performance along the entire continuum of device-to-edge-to-cloud/data center redefines the concept of “better.”

As with faster, SQLite was once better than other database alternatives when it came to single-user mobile applications and local data caching. But serverless architectures aren’t meant to address the tasks of our time. Ours is a multi-verse, with multi-machine-to-multi-machine and multi-human-to-machine interactions and transactions occurring all the time. That world demands more than SQLite can deliver.

Cheaper?

Okay, SQLite gets that one. It’s open source, free. Can’t get cheaper than that. For Do-It-Yourselfers (DIYs) who eye the cost of externally produced and purchased software with hawk-like vigilance, SQLite may still exert a pull. Same thing among the business decision-makers when they hear that SQLite is free. That could mean more left in the budget for other line items in the BOM or to pay for additional service hours.

But “free,” here, is as misleading as “free puppies.” If you only look at the upfront cost of SQLite, you can’t beat it. But you’re decoupling that assessment from any consideration of the internal cost of that decision. If you factor in the costs of software design, development, testing, updates, ongoing support, and so forth—all of which, as we have previously discussed, involve a significant amount of hoop-jumping, given the inherent limitations of the architecture—then the cost calculation changes dramatically.

We could devote an entire blog just to DIY cost estimates, so we’re not going to dive deep here. Anyone that still has Cobol assets or other legacy tools and systems completely understands how difficult it can be to maintain and support code that was originally designed to meet the challenges of an earlier era. If cheaper remains the prime mover for you and if you’re determined to do the work to extend the capabilities of SQLite to meet today’s needs, there are various tools you can access to model the cost burden per number of lines of code that this effort will incur. The models vary by the size of the code body, regulatory guidance, projected lifecycle, and many other factors, but they may be able to help you assess in advance the true cost of this folly—sorry—I mean effort.

Of course, you may be smiling smugly and thinking that, no, you’re not actually going to do it yourself. There’s an entire industry of boutique developers that specialize in SQLite tools and add-on components, including SQL query editors, utilities for encrypting data at rest and in transit, transformation tools for synchronizing data with Microsoft SQL Server, and much more. But this approach only introduces a different dimension of cost to your undertaking. Not only do these add-ons effectively nullify the “free” aspect of SQLite (since they’re not free), but reliance upon these small vendors introduces an element of risk over which you have no control. Any bugs or shortcomings in their code becomes an inherent part of your application. If the boutique developer disappears—and the majority of them have historically disappeared in fairly short order—then you’re suddenly back to the DIY model you thought you were avoiding. This time, though, you’re having to DIY without a full understanding of the code you’ve incorporated, which often means suboptimal patching and extending code that you probably would have designed differently from the ground up if it were yours.

Oh, and between the lines above—no pun intended—you can clearly see that you still need to DIY the integration of these boutique add-ons into the solution you’re developing. Do we need to puncture your bubble still further by noting that the burden of troubleshooting any issues or conflicts arising from the incorporation of these add-ons also falls to you? You won’t necessarily have the insight required to resolve these issues easily, but you’ll ultimately be responsible for the solution you’ve delivered and yours will be the throat they reach for when users are unhappy.

Time to Retire That Number

Ultimately, SQLite is not faster, not better, and not cheaper. Not anymore. We’ll give SQLite its due: It was a brilliant addition to the tech team in its youth, but it is time to hoist that jersey to the rafters and retire the number. If it’s not faster, better, or cheaper, why would you still adopt it? Given the demands of modern edge data management, faster, better, and cheaper all point to Actian Zen.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance Tool: Lean Data Governance Canvas

Actian Corporation

June 30, 2020

enterprise strategic outcomes

Inspired by Ash Maurya’s “Lean Canvas” business model, the Actian Data Intelligence Platform’s Lean Data Governance Canvas is intended for Data Managers whose missions are to clarify and orchestrate data governance within their organizations. From a methodological point of view, the Lean Data Governance Canvas is composed of two main parts:

  • The elements on the left represent those of a governance system.
  • The elements on the right that are inherent to an organization.

It’s important to know that this Lean Data Governance Canvas is a toolkit for implementing data governance. The intervenors will have to iterate to conduct an LDGC to have the least number of assumptions as possible over time.

However, be aware: there must not be one unique canvas that represents the entirety of the enterprise, but rather, there must be separate ones for strategic and operational levels.

The insights highlighted in the Lean Canvas will have to be consistent and respect the company’s strategic objectives.

Method: Lean Data Governance Canvas

0 – Strategic Objectives

Before beginning your journey with the Lean Data Governance Canvas, it is important to highlight the enterprise’s strategic expectations and ask yourself:

What are the enterprise’s and the board’s strategic objectives? How does it apply to the data and IT department?

1 & 2 – Segment Data Citizens & Problems

Start by thinking about a type of persona. After this, you can take the time to come up with up to 3 challenges that this group faces:

Who are the data citizens you wish to address?
What are the top 3 problems/ risks that data governance seeks to solve for the defined data citizen segment?

Your data citizens are either the ones in charge of your data governance (Data Owners, Data Managers, IT Custodians, etc.) or the producers/consumers of data (Management, Supply-chain, CRM, Data Science, Marketing, etc). Your risks may concern one or more of these personas.

3 – Regulatory Compliance

Digital transformation brings about more regulatory compliances (like the GDPR for example). To keep your constraints in mind, write down your regulatory requirements and ask yourself this question:

What are the risks stemming from regulatory (including supervisory) requirements?

4 – Value Proposition

This part of the LDGC focuses on the value that data governance will bring to the segmented data citizens.

Why push back data governance implementation for the defined data citizen segments?

The value proposition is unique, congruent and engaging for concerned data citizens. Communication or marketing support can sometimes be a valuable aid in formalizing a value proposition. Do not hesitate to get closer to the relevant internal teams.

5 – Solutions

In this section, the means and principles are defined, ones that will make it possible to overcome the problems of your data citizen segments and veer towards the value proposition. Without going into too much detail:

What are the 3 main principles that will answer the data citizen segments problems?

In this Canvas, a solution must not take into account what already exists and is not determined according to time or budget. The Canvas is not a project of timing, but an upcoming project that must be considered as an MVP (minimum viable product) for a first milestone.

6.1 – Targeted Metrics

These indicators define the performance of the established data governance in the data citizen segment. They will measure the resolution of the problem and the value of your governance rules.

What key indicators should be measured to validate the progress of the sought-out value proposition?

6.2 – Connectivity Metrics

These metrics are indicators that define the implemented data governance performance on the sources of information that your previously listed.

What key indicators should be measured to validate the performance of the data governance rules on a source?

7 – Data Sources

What are the “absolutely necessary” data sources that will bring the most value at the start of your defined data citizen segments?

Data sources are valuable assets for data-centric teams. The goal therefore, is to find the value. Mass-production and exhaustiveness induce an immediate complexity that can not be easily controlled. The choice must be made based on the data’s value according to the uses of business.

8 – Technological Needs

Identify the technological needs that must be acquired to measure governance metrics and/or achieve the value proposition.

What are the technologies and tools needed to measure the associated metrics?

9 – People Needs

Identify the skills and resources to bring data governance to life, animate and measure it within the targeted data segment.

Who are the people concerned and what tips and interactions are necessary to strive for the value proposition and its maintenance?

The Evolution of the Lean Data Governance Canvas Over Time

After focusing on these first steps, it is important to test it! We encourage Lean Data Governance Canvas users to rework the canvas as much as possible – through iteration – and testing them, after which a winning data governance model should appear. Despite the difficulty of these workshops, we are convinced that this work will save you time, energy and money. Think about it, with thee Lean Data Governance Canvas, it is possible to build something everyone in the enterprise wants and respects.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Connected Data in the Cloud Increases AI System Performance

Actian Corporation

June 29, 2020

AI System for Artificial Intelligence

Artificial Intelligence (AI) is the core of the next wave of IT systems, and it is time to get ready. The past year has seen tremendous growth in the adoption of artificial intelligence systems to help companies improve operational insights and provide enhanced customer service experiences.  If you aren’t leveraging AI to support your business already, you should at least be investigating the possibilities.    Here are some steps you can take to establish a solid data foundation to support high-performance AI

Add More Data Sources

Artificial Intelligence systems run on data.  They have a tremendous ability to analyze information, perform pattern matching, and correlation analysis to provide real-time analytics and natural language response – but AI systems are only as “smart” as the data they are programmed to access.  Most of the AI systems in use today are nowhere near maxed out on compute capabilities – they are limited by their data sources.  If you want to increase the capabilities of your AI systems, the first thing you need to do is grant them access to more diverse data sources.

Connect Your Data

Once you’ve collected data from various sources, you need to connect it into the systems that will analyze and use it. You can use Actian DataConnect to help you do this.  DataConnect enables you to connect all your data sources, whether they are IT applications, deployed infrastructure, remote sensors, IoT devices, or 3rd party data feeds.  DataConnect can be used to move data into a cloud data warehouse like the Actian Data Platform, or it can be used to connect your AI system to the individual data sources directly.

Stream, Stream, Stream

Two of the highest-value use cases for AI are real-time analytics and natural language interactions (things like chatbots and voice response systems). Both of these use cases require real-time information to be effective, and that comes from streaming data. Traditional analytics systems struggled with streaming data because of the volume and latency involved in processing. Because AI systems run in the cloud, they have the compute capacity to process a nearly infinite amount of streaming data in real-time to help you understand what it means. So, once you get your data sources connected, turn on the streams of data.

Move Historical Data in the Cloud for Real-Time Analytics

Real-time streaming data isn’t the only source that AI systems can leverage.  Historical data is a valuable source for performing trend and correlation analysis and developing projections about future events.  Co-locating your data warehouse in-the-cloud with your AI engine enables both big-data analytics and AI-enabled real-time processing to access cloud-scale compute and storage resources with very little network latency.  Actian provides a modern cloud data warehouse ideal for supporting AI system processing.

Most companies have some sort of data warehouse today where they are storing and archiving transactional data from each of their IT systems.  Migrating your data warehouse to the cloud is a great way to accelerate value from AI systems as on-premises data warehouses introduce compute and network constraints that can significantly limit your AI performance.  If you have some data that you need to keep on-premises, that’s okay. Actian Data Platform can be deployed as a hybrid data warehouse, supporting both your cloud and on-premises needs.

What if You Aren’t Ready for AI Quite Yet?

Maybe your company isn’t quite ready to make the jump to leveraging Artificial Intelligence.  That’s okay—the steps outlined above support Online Transactional Processing (OLTP) integration and traditional analytics as well.  Connecting your data and unlocking the processing power of cloud compute in your data warehouse enables you to access more data for harvesting actionable insights that help your leaders make better decisions.  When you are ready for an AI system in the future, the data foundation will be ready for you to move forward confidently.

To learn more about how Actian can help you improve your AI performance and support the next wave of IT capabilities, visit https://www.actian.com/data-platform/

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Management is Embracing Cloud Technologies

Actian Corporation

June 29, 2020

data-management-cloud-computing

Contemporary business initiatives such as digital transformation are facing an explosion of data volume and diversity. In this context, organizations are looking for more flexibility and agility in their data management.

This is where Cloud strategies come in…

Data Management Definition

Before we begin, let’s define what data management is. Data Management, as described by TechTarget is “the process of ingesting, storing, organizing and maintaining the data created and collected by an organization”. Data management is a crucial part of an enterprise’s business and IT strategy and provides analytical help that drives overall decision-making by executives.

As mentioned above, data is seen as a corporate asset that can be used to make better and faster decisions, improve marketing campaigns, increase overall revenue and profits, and above all: innovate. As a result, organizations are seeing cloud technologies as a way to improve their data initiatives.

Cloud Strategies are the New Black in Data Management Disciplines

It is an undeniable fact that Cloud service providers are becoming the new default platform for database management. This phenomenon provides data management teams with great advantages:

  • Cost-Effective Deployment: Greater flexibility and a more rapid configuration.
  • Consumption-Based Spending: Pay for what you use and do not over-provision.
  • Easy Maintenance: Better control over the associated costs and investments.

By knowing this, there is no doubt that data leaders perceive cloud as a less expensive technology, driving this choice even more.

Data leaders will embrace the cloud as an integral part of their IT landscape in the coming years and months. However, we strongly believe that the rate at which organizations migrate  to the cloud will differ by organization size. Small or midsize organizations will migrate quicker , while larger organizations will take months, even years to migrate.

Thus, the Cloud is going to become a default option for all data management technologies.  Many strategies appear including various deployment types or approaches. We have identified 3 main strategies:

  • Hybrid Cloud: Made up of two or more separate Cloud infrastructures that may be private or public and that remain single entities
  • Multicloud: Use more than one cloud service provider infrastructure as well as on-premises solutions.
  • Intercloud: Where data is integrated or exchanged between cloud service providers as part of a logical application deployment.

The Cloud is also Seen as an Opportunity for Data Analytics Leaders

The increased adoption of cloud strategy deployments regarding data management  has important implications for data and analytics strategies. As data is moving to the cloud, the data and analytics applications they use must follow.

Indeed, the emphasis on the speed of value delivery has made cloud technologies the first choice for new data management solution development for vendors, and deployment for enterprises. Thus, enterprises and data leaders are choosing next-gen data management solutions. They will migrate their assets by selecting applications that connect to future cloud strategies and preparing their teams & budgets for the upcoming challenges they will overcome.

Those data leaders who use analytics, business intelligence (BI) and data science solutions are seeing Cloud solutions as greater opportunities to:

  • Use a cloud sandbox environment for trial purposes in terms of onboarding, usages, connectivity and create a prototyping analytics environment before actually buying the solution.
  • Facilitate application access wherever you are and improve collaboration between peers.
  • Access to new emerging capabilities over time with ease, with continuous delivery approaches.
  • Support heavy lifting with the cloud’s elasticity and scalability along the analytics process.

A Data Catalog, the new Essential Solution for Cloud Data Management Strategies

Data and analytics leaders will inevitably engage in more than one cloud where data management, governance and integration become more complex than ever before. Thus, data leaders must equip their organization to new metadata management solutions to assist in finding and inventorying data distributed across a hybrid and multi cloud ecosystem. Failure to do so will result in a proliferation of data silos, leading to derailed data management, analytics and data science projects.

Data management teams will have to choose among the wide-range of data catalog in the market the most relevant one.

We like to define a data catalog as a way to create and maintain an inventory of data assets through the discovery, description and organization of distributed datasets.

If you are working on the data catalog project, you will find: 

  • On the one hand by fairly old players, initially positioned on the Data Governance market.
    These players provide on premises solutions with rich but complex offers, which are expensive, difficult and time-consuming to deploy and maintain, and are designed for cross-functional governance teams. Their value proposition is focused on control, risk management and compliance.
  • on the other hand by suppliers of data infrastructures (Amazon, Google, Microsoft, Cloudera, etc.) or data processing solutions (Tableau, Talend, Qlik, etc.), for which metadata management is an essential block to complete their offer. They offer much more pragmatic (and less costly) solutions, but are often highly technical and limited to their ecosystem.

We consider those alternatives as not sufficient enough. Here are  some essential guidelines to find your future data catalog. It must:

  • Be a cloud data catalog enabling competitive pricing and rapid ROI for your organization.
  • Have universal connectivity, adapting to all systems and all data strategies (edge, cloud, multi-cloud, cross-cloud, hybrid).
  • Have very advanced automation for the collection and enrichment of data assets as well as their attributes and links (augmented catalog). The automatic feeding mechanisms, as well as the suggestion and correction algorithms reduce the overall cost of the catalog and guarantees the quality of the information it contains.
  • Be strongly focused on user experience, especially for business users, to improve solution adoption.

To conclude, data management capabilities are becoming more and more cloud-first and in some cases cloud-only.

Data leaders who want to drive innovation in analytics will need to leverage cloud technologies from data assets. They will have to go from ingestion to transformation without forgetting to invest in an efficient data catalog in order to find their way in an ever more complex data world.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

Marketing Agility Through Real-Time Analytics

Actian Corporation

June 24, 2020

person working on a laptop with data and graphs researching advanced financial analytics

Today’s marketplace is complex and competitive, and changes quickly. New competitive products and services are entering the market every day. Customer preferences shift with social trends and pricing dynamics continuously evolve. To win in this type of environment, marketing agility is critical to identify opportunities and threats and to respond to them quickly – this requires data. The key to enabling marketing agility is real-time analytics. The question you must ask is “Do your marketers have the tools they need to succeed?”

Real-Time Market Insights

In a highly dynamic marketplace, conditions can change frequently, and without notice. Competitors adjust their offerings (new features, price changes, promotions, etc.). Customers influence each other’s preferences through social media discussions and reviews. Media coverage causes wide swings in customer sentiment about both your product and your company. Each of these forces represents a potential opportunity or threat to your marketing efforts.

Real-time marketing analytics is a powerful tool to monitor your business environment, listen to the chatter and identify when action is required. The faster you identify the change and respond to it, the better outcome you will be able to achieve. Actian can help you quickly learn when to adjust and adapt to changes in the market or your customer base.

Develop Personalized Marketing Campaigns

Customers want to feel like you understand and care about them as individuals. Help your company be noticed in a crowded market and capture more wallet share by deploying effective, innovative, and highly personalized campaigns informed by deep analysis. Traditional campaign optimization models use limited samples of transactional data, which can lead to incomplete customer views. Actian allows you to connect to a wide variety of diverse data sources, including social media and competitors’ Websites in real time to learn which competitive offerings are gaining traction in the marketplace. Web-purchasing patterns and call center text logs stored on Hadoop provide valuable insights into customer interactions. Marketing and campaign data ensure any recommended actions comply with company goals, rules, and regulations.

By combining these data sets, your marketing team will be able to develop a richer understanding of your customer’s needs, motivations, influences, and buying behaviors. These insights can then be used to develop targeted market segmentations and personalized marketing campaigns that speak directly to the target customer. Actian helps you build, test and deploy campaigns with rapid succession. Real-time analytics enables your marketing team to monitor the effectiveness of these campaigns, adjust to market dynamics and fine-tune messaging for peak performance.

Achieve Results

With your customer scores and optimized lists in hand, you can design innovative campaigns that allow you to create and sustain a competitive advantage. Increase campaign revenue and minimize marketing costs by focusing your resources on opportunities that will create the most value. Increase your customer satisfaction and loyalty by demonstrating that you understand their needs and are focused on solving their problems. Improve your product development processes and supply chain by providing marketing insights upstream that lead to better products and services.

The real-time analytics provided by Actian Data Platform can help your marketing team succeed in a highly competitive and rapidly changing business environment. Identify opportunities faster and achieve first-mover advantage. Neutralize threats through decisive action to prevent encroachment by competitors into your customer base. To learn how Actian can help your company achieve marketing agility powered by real-time cloud-based analytics, visit www.actian.com/data-platform

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Connected Data to Drive Service Assurance in IT

Actian Corporation

June 22, 2020

Service Assurance Artificial Intelligence Eye

IT Service Management (ITSM) staff have a unique challenge to provide service assurance across a broad and diverse technology ecosystem. Connected data is essential to enable them to do their job effectively. Your incident and problem management staff don’t know what they don’t know. They can only see the data that is made available to them through your ITSM system and other administrative consoles. If the data they are looking at is incomplete or fragmented, it is difficult for them to know where things are broken and need attention. All the while, your business processes, employees, and potentially your customers are being affected. Connected data is the key to service assurance in IT.

Doesn’t My ITSM Platform Take Care of This Already?

Modern ITSM platforms from companies like ServiceNow, Cherwell, BMC, and FreshWorks have a lot of great capabilities to help you orchestrate your incident and problem management workflows. They also have some slick visualizations to provide the “single pane of glass” user experience that your ITSM staff needs. But what is that single pane of glass showing, and where is that data coming from?  Often the data comes from other systems such as your operations management tools, cloud dashboards, synthetic transaction monitors, and other utility-type services and devices deployed across your IT environment.

The large ITSM platform products have many out-of-the-box connectors. However, there are often many missing that you need to address on your own. This is understandable. An ITSM platform vendor isn’t going to know all of the different technologies you have deployed in your IT environment, or what management and diagnostic tools you have available to support them. Even if they did understand your needs, they would need time to develop the connectors, test them, and build them into their product release cycle. Eventually, you might get what you need, but relying entirely on your ITSM vendor for integration doesn’t give the flexibility and agility that most companies need.

DataConnect Provides a Flexible Solution to Connecting Your IT Data

This is where a platform like Actian DataConnect can help. Offered as an Integration Platform as a Service (IPaaS), DataConnect supplies a flexible and easy-to-implement platform that enables you to design, deploy, and execute data integrators across your whole IT environment. Do you have streaming data from monitoring tools? No problem! Do you have administrative console data that you want to make available in your ITSM system? DataConnect can help you do that. Do you have embedded telemetry built into your in-house developed applications? Your ITSM vendor doesn’t even know about those data sources, but with DataConnect, you can add them to your IT monitoring solution. Are you leveraging services, SaaS, network infrastructure that is managed by 3rd parties? DataConnect can make those data sources available too. The DataConnect platform gives you the flexibility to source data from almost anywhere (inside or outside your company), and if you need to add something, you can just add it.

Ease of Deployment

Application development teams and IT project teams deploying new 3rd party systems can stream telemetry data to your ITSM system from day-1 that you release the system, without a lot of custom coding or waiting for the ITSM vendor to build/update a connector.  If you have existing components and services deployed in your IT environment that aren’t currently sending data to your ITSM system, you can use DataConnect to get them connected and make their telemetry data available quickly.  By combining the data integration capabilities of Actian DataConnect with the single pane of glass and workflow orchestration capabilities of your ITSM platform, you have the opportunity to give your IT staff access to the rich, accurate and connected data that they need to do their job effectively.

To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Don’t Just Move Data, Integrate It!

Actian Corporation

June 18, 2020

Moving Data with Data Connect

If you are simply “lifting and shifting” data from one place to another, you are missing out on the power that a data integration platform can bring you. It is time to look beyond extract, transfer, load (ETL) from individual source systems, and expand your integrations to include multi-source joins that enable you to see across source systems. Don’t just move data, integrate it!

Data integration is more than moving data from source to target systems. It is part of the greater data value chain that transforms raw source data into information and actionable insights that help drive decisions and operational processes. Like any other value chain, each step in the process moves the data one step closer to consumption by transforming it in ways that add value to the end user. One might argue that moving data into a centralized repository, or a downstream database adds value.  Yes, it does, and if all you have is essentially a “data forklift,” this may be the best you can do. If you have a true data integration platform like Actian DataConnect, you can do a whole lot more (and you should).

Multi-Source Joins

A data integration platform, like Actian DataConnect, gives you a powerful set of tools at your fingertips to help you not just move data from one system to another but integrate it along the way.  You might be familiar with the ability to create SQL like inner, outer, left and right joins within a database, but did you know you can access data from multiple source systems in the same query?  The DataConnect Studio IDE was recently re-engineered with regards to how joins are implemented, taking advantage of the ability to leverage multiple source connections in your queries.

With DataConnect Studio, you can build integrations that span multiple data sources, reconciling them together into a unified output set in the target system. Let’s consider where you might want to do this.

Analytics and Reporting

By merging data across source systems earlier in the data value chain, you can normalize your data into a canonical data model that is easier for your analysts and business users to understand.  This means they can spend less time finding data and more time interpreting data to determine its relevance to your business.

eCommerce Systems

Customer-facing systems, whether they be on a website or a mobile app, should provide a consistent and simple interface to users.  Multi-source joins in your data queries enable you to combine data from different systems, so your users get a high-quality experience without having to deal with whatever complexity is taking place behind the scenes.

Customer Support

Any company that has tried to develop a 360-degree view of its customers knows that the data comes from many different source systems.  Actian DataConnect enables you to join data from different customer records and transactional systems to give you the big picture perspective you are looking for.

Operations Monitoring

Many companies are integrating IoT devices, mobile apps, and embedded sensors into their operations processes.  The multi-source join capability can enable you to leverage data from different types of monitoring devices and more easily reconstruct the virtual process flows that your operations staff need to monitor your operations.

Data in motion is one of the best times to perform integration.  If you are going to merge data at rest, you either have to copy data into a merged table, or you create views and don’t really integrate the Data until later in the data value chain – your options are limited.  When you are moving data, you have the opportunity to transform it – changing data structures, summarizing, categorizing, and aggregating data from different sources.  Each time data moves, you should be seeking ways to make it even more valuable for your organization.

Actian DataConnect can help make managing data easier – not just moving data, but really integrating it. To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

SQLite’s Serverless Architecture Doesn’t Serve IoT Well

Actian Corporation

June 17, 2020

person juggling balls in a shirt and tie

Part Three: SQLite, the “Flat File” of Databases

Over the past few articles, our SQLite blog series has been looking at SQLite Serverless Architecture and how it is unsuitable for IoT environments. Those of you who have been following can jump ahead to the next section, but if you’re new to this discussion, you may want to review the predecessor parts.

  • In part one, mobile may be IoT, but IoT is not mobile when it comes to data, we examined the fact that though SQLite is the most popular database on the planet—largely due to its ubiquitous deployment on mobile smartphones and tablets, where it supports embedded applications for a single user—it cannot support the multi-connection, multi-user, multi-application requirements of the IoT use cases that are proliferating with viral ferocity in every industry. In a world that calls for the performance of cheetahs and peregrine falcons, SQLite is a banana slug.
  • In part two, Rethinking What Client-Server Means for Edge Data Management, we considered key features and characteristics of the SQLite Serverless Architecture (portability, little-to-no configuration, small footprint, SQL API, and some initially free version to seed adoption) in light of the needs of modern edge data management and discussed the shortcomings of the SQLite architecture in terms of its ability to integrate with critical features found in traditional client-server databases (chiefly those multi-point qualifiers above).

In our final analysis of this serverless architecture, I’d very much like to explore (read: clarify) what will happen if a developer ignores these cautionary points and doubles down on SQLite as a way to handle IoT use cases.

Don’t Mistake Multi-Connection and Multi-Threaded for Client Server

In the late 90s, as applications became more sophisticated, generated and ingested more data, and performed more complex operations on that data internally. Consequently, app developers had to develop a lot of workarounds to deal with the limitations of routine, operating system-based file management services. Instead of spending time on all these DIY efforts, application developers were clamoring for a dedicated database they could embed into an application to support their specific data management needs.

At the turn of the 21st century, SQLite appeared and seemed tailor-made to meet these needs. SQLite enabled indexing, querying, and other data management functionality through a series of standard SQL calls that could be inserted into the application code, with the entire database bundled as a set of libraries that became part of the final deployed executable. Keep in mind that the majority of these applications tended to be monolithic, single-purpose, single-user applications designed for the simpler CPU architectures in use at the time. They were not designed to run multiple processes, let alone multiple threads. End-user and data security were not yet the high priorities they are today. And as for performance in a networked environment? Wireless networks were reactive and spotty at best. Multiple, external, high-bandwidth data connections were uncommon.

So it’s really no surprise that SQLite wasn’t able to service simultaneous read and write requests for a single connection (let alone for multiple connections) when it was designed. Designers were thrilled to have an embeddable database that would allow multiple processes to have sequential read and write access to a data table within an application. They were not looking for enterprise-grade client-server capabilities. They were not designing stand-alone database systems that would support multiple applications simultaneously. They simply needed more than flat-file access mediated by an operating system.

And there lies the heart of the issue with SQLite. It was never intended to handle multiple external applications or their connections asynchronously, as would a traditional client-server database. Modern networked applications commonly have multiple processes and/or multiple threads. When you throw SQLite into a situation with multiple connections and the potential for multiple simultaneous read and write requests, you quickly encounter the possibility of race conditions and data corruption.

To be fair, SQLite has tried to accommodate these evolving demands. The current version of SQLite handles multiple connections through its thread-mode options: single-thread, multi-thread, and serialized. Single-thread is the original SQLite processing mode, handling one transaction at a time, either a read or a write from one and only one connection. Multi-thread will support multiple connections but still one at a time for read or write. Serialized—the default mode for the most current SQLite versions—can support multiple concurrent connections (and, therefore, can support a multi-threaded or multi-process application), but it cannot handle all of them simultaneously. SQLite can handle simultaneously read connections in multi-thread and serialized modes, but it locks the data tables to prevent attempts at simultaneous writes. Nor can SQLite handle the orchestration of writes from several connections.

Compare that to the architecture of a true client-server database that is built to manage simultaneous writes. The client-server database evaluates each write service request and, if attempts are made to write to the same data within a table, it blocks the request until the current operation on that data is completed. If attempts are made to different parts of the data table, the server allows them to go forward. That’s true orchestration. Locking the entire table and holding off writes (or faking it for sequential writes to occur alongside multiple reads with WAL) is not the same thing.

Why is this a showstopper for SQLite in an IoT environment? One of the most basic operations with IoT devices and gateways involves writing data from a variety of devices into your data repository, and the write locks imposed during multi-threaded/multi-connection operations render it non-viable in a production environment. Furthermore, a second basic operation taking place within an IoT environment involves performing data processing and analytics on previously collected datasets. While these may be read-intensive operations that are executed independently (either as separate processes or as separate threads) of the write-intensive operations just described, they still cannot occur concurrently in an SQLite environment and maintain ACID compliance.

As you scale up your deployments, or as system complexity increases—say you want to instrument more and more within an environment, be that an autonomous car or a smart building—you will invariably add more data connection points downstream or within your local environment. Each of these entities will have one or more additional database connections, if not their own database that needs a connection. You could try to establish these connections, but they will need to be handled through add-on application logic that will likely result in response times that are outside the design constraints for your IoT system.

Workarounds Designed to Deny (or Defy) Reality

SQLite partisans will wave their hands with dismissive nonchalance and tell you that SQLite is fast enough (it’s not; we’ve already discussed how slow SQLite is) and that you can build your own functionality to handle simultaneous reads and writes across multiple connections—in effect, manually synchronizing them specific to the use case being handled. One method by which they manage this scenario involves using the serialized mode mentioned above and building functionality to handle synchronization and orchestration within the application threads. This approach tries to avoid the transmission of read and write requests on multiple channels (thereby avoiding race conditions and the potential for data corruption). However, this approach also requires a high degree of skill, the assumption of long-term responsibility for the code, and a need for extensive test and validation to ensure that operations are transpiring properly.

An alternative approach would be to build the equivalent of a client-server orchestration front-end and use the single-thread option within SQLite, which would preclude race conditions or data corruption. But dropping back to a single-thread option would be like watching this banana slug move in even slower motion. That’s not a viable approach, given the high-speed, parallel write operations needed to accommodate multiple high-resolution data feeds or large-scale sensor grids. Moreover, all you’ve done is to accommodate the weaknesses of the database architecture by forcing the application to do something that the database should be doing. And you’d have to do that over and over, for every app in your IoT portfolio.

There are several sets of code and a couple of small shops that have tried to productize this latter approach, but with limited success. They work only with certain development platforms on a few of the SQLite supported platforms. Even if those platforms are a match for your use case, the performance issues may still increase the risk and difficulty of coding this workaround into your application.

We’ve Seen This Iceberg Before

This cautionary tale isn’t just about the amount of DIY that will be incurred with the unquestioned reliance on SQLite for a given application. Like the IoT itself, it’s much bigger than that. For example, if you commit to handling this in your own code, how will you handle the movement of data from a device to the edge on-premises? How will you handle moving data to or from the cloud? The requirements for interacting with servers on either tier may be different, requiring you to write more code to perform data transformations (remember the blog on SQLite and ETL?). You might try to avoid the ETL bottleneck by using SQLite on both ends, but that would just kick the virtual can down the virtual road. You would still have to write code to handle SQLite masquerading as a server-based database on the gateway and in the cloud.

Ultimately, you can’t escape the need to write more code to make SQLite work in any of these scenarios. And that’s just the tip of this iceberg. You would need to make trade-off comparisons between DIY and partial-DIY plus code modules/libraries for other functionality—from data encryption and public key management to SQL query editing, and more. The list of features that a true client-server infrastructure brings to the table—all lacking in SQLite—goes on and on.

Back in the day, SQLite enabled developers to avoid much of the DIY that flat-file management had required. For the use cases that were emerging back then, it was an ideal solution. For today’s use cases, though, even more DIY would be required to make SQLite work—and even then it would not work all that well. The vast majority of IoT use cases require a level of client-server functionality that SQLite cannot provide without incurring significant costs—in performance, in development time, and in risk. In a nutshell, it’s déjà vu, but now SQLite is the flat file whose deficiencies we must leave in the past.

Oh, and if you think that all this is just an issue for developers, think again. In the next and final blog in this series, we’ll widen the lens a bit and look at what this means for the business and the bottom line.

If you’re ready to reconsider SQLite and learn more about Actian Zen, you can just kick the tires for free with Zen Core, which is royalty-free for development and distribution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance and Data From ERP/CRM Packages: A Must Have

Actian Corporation

June 16, 2020

data-governance-erp-crm

For the last 3 decades, companies have been relying on ERP and CRM packages to run their operations.

In response to the need to comply with regulations, reduce risk, and improve profitability, competitiveness, and customer engagement, they have to become data-driven.

In addition to the need to leverage a wide variety of new data assets produced heavily by new means, strategic data from those historical systems has to be involved in any Data Initiative.

Challenges Faced by Companies Trying to Leverage Data From ERP/CRM to Feed Their Digital Initiatives

In the gold rush that companies are pursuing with Artificial Intelligence, Advanced Analytics, and in any Digital Transformation program, understanding and leveraging Data from ERP/CRM packages is on the critical path in any Data Governance journey.

First, they have large, complex, hard-to-understand, and customized database models. Understanding the descriptions, the relationship definitions, and more means to serve Data Citizens is almost impossible without an appropriate Data Catalog like the Actian Data Intelligence Platform with ad hoc ERP/CRM connectors.

As an example, SAP has more than 90.000 table sets. As a consequence, a Data Scientist will hardly understand the so-called TF120 table in SAP or the F060116 in JD Edwards.

Secondly, identifying a comprehensive subset of accurate datasets to serve a specific Data initiative is an obstacle course.

Indeed, a big percentage of the tables in those systems are empty, may appear redundant, or have complex links for those who are not experts of the ERP/CRM domain.

Thirdly, the demand for fast, agile and ROI focused Data-Driven initiatives put the ERP/CRM knowledgeable personnel in the middle of the game.

ERP/CRM experts are rare, busy and expensive workers and companies cannot afford increasing those team or having them losing their focus.

And finally, if a Data Catalog is not able to properly store Metadata information for those systems, in a smooth, comprehensive and effective way, any data initiative will be deprived of a large part of its capabilities.

The need for financial data, manufacturing data and customer data to take a few examples is obvious and therefore put ERP/CRM systems as mandatory data sources of any Metadata Management program.

Actian Data Intelligence Platform Value Proposition

An Agile and Easy Way

We believe in a Data Democracy world, where, any employee of a company can discover, understand and trust any dataset that is useful.

This is only possible with a reality proof data catalog easily and straightforwardly connecting to any data source, including the ones from ERP/CRMP packages.

But mostly, a Data Catalog has to be smart, easy to use, easy to implement and easy to scale in a complex IT Landscape.

A Wide Connectivity

Actian Data Intelligence Platform provides Premium ERP/CRM connectors for the following packages:

  • SAP and SAP/4HANA
  • SAP BW
  • Salesforce
  • Oracle E Business Suite
  • JD Edwards
  • Siebel
  • Peoplesoft
  • MS Dynamics EX
  • MS Dynamics CRM

“Premium ERP/CRM Connectors Help Companies in Various Aspects

Discovering and Assessing

Actian Data Intelligence Platform connectors help companies to build an automatic translation layer, hiding the complexity of the underneath database tables and automatically feeds the Metadata registry with accurate and useful information, saving time and money of the Data Governance Team.

Scoping Useful Metadata Information for Specific Cases

In a world with thousands of datasets, the platform provides a mean to build accurate and self-sufficient models to serve focused business needs by extracting in a comprehensive way:

  • Business and technical names for tables.
  • Business and technical names for columns in tables.
  • Relationships between tables.
  • Data Elements
  • Domains
  • Views
  • Indexes
  • Table row count.
  • Application hierarchy (where available from the package).

Compliance

Actian Data Intelligence Platform‘s “Premium ERP/CRM connectors” are able to identify and tag any personal data or Personal Identifiable Information coming from its supported CRM/ERP packages in its Data Catalog to stick with GDPR/CCPA regulation.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Deploy and Manage Your Integrations Anywhere, Anytime

Actian Corporation

June 15, 2020

deploy and manage data integrations

With Actian DataConnect Integration Manager, you can deploy, configure, manage and repair your integrations anywhere – meaning if it resides in the cloud, on-prem, or even if it is embedded in your SaaS applications, anytime. The latest release of Actian DataConnect Integration Manager includes an important set of enhancements to the Integration Manager API that will increase your organization’s ability to define integrations and enable them for either synchronous or asynchronous execution. Okay, this may sound like a bunch of technical jargon, but let’s break it down so you can see why this new feature is so important. Two primary execution patterns are used for data integration – synchronous and asynchronous.

Request-Response Integration

Synchronous integrations, sometimes called “request-response” integrations, are used when you want to tightly couple two applications together. In this pattern, one system generates a message to the other, waits for a response, and when it receives the response, it sends the next message. You can think of this much like a chat conversation where two parties are communicating back and forth with each other. Another example is a user interacting with a website – issuing a command or clicking a button and waiting for a response from the server. This is the most common type of data integration because it is most intuitive to implement and affords the sending system the ability to verify receipt of the message before continuing to the next step in a workflow.

The benefit of synchronous communication is that it works well for real-time integration and complex workflows with many back-and-forth interactions. We see this a lot when multiple applications serve as components of an overarching system or when the integration is part of a transactional workflow (such as a CRM system looking up the status of a customer order in an ERP system). The drawbacks are that both systems must be actively engaged in the messaging interactions to avoid processing delays.

Set and Forget Integration

Asynchronous integrations, sometimes called “set and forget” integrations, are used when you want to loosely couple applications together. In this pattern, one system sends out a message, then moves on with doing other things – it is not waiting for a response. The receiving system may have a listener configured, waiting to receive the message in real-time, or it may process incoming messages periodically (in batches). You can think of this much like a news agency publishing a story. Some readers may be watching the news feed for updates in real-time while others may check for news updates once per day. In either case, there is no expectation that the receiver of the communication will respond to the sender or even acknowledge receipt of the message.

The benefit of asynchronous communication is that it enables the publishing of data to many recipients at the same time. We see this pattern used often when a system performs batch processing of reports or pushes data to downstream systems. Asynchronous messaging is also used for things like event logs alerts, and system status messages that do not interfere with transactional processing. The drawback of this method is the sending system has no visibility into the acceptance and subsequent processing of the message that is sent. Was it received? How long was the message waiting before processing? It is difficult to build transactional workflows using asynchronous integration because of time delays and the inability to monitor the quality of service.

Your Integration Platform Needs to Support Both

As you can see, there are different situations where you might want to use one of these integration patterns over the other. That is why the enhancements to the Actian DataConnect Integration Manager are so important. You now have the flexibility to use both of these patterns in your integrations, depending on the unique needs of your business. There may even be times when you need both synchronous and asynchronous integration between the same systems. That is okay, Actian DataConnect can help you do that.

To learn more, visit DataConnect.

To download the latest DataConnect Integration Manager visit Actian ESD

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Science: Accelerate Your Data Lake Initiatives With Metadata

Actian Corporation

June 15, 2020

data-science

Data lakes offer unlimited storage for data and present lots of potential benefits for data scientists in the exploration and creation of new analytical models. However, this structured, unstructured, and semi-structured data are mashed together, and the business insights they contain are often overlooked or misunderstood by data users.

The reason for this is that many technologies used to implement data lakes lack the necessary information capabilities that organizations usually take for granted. It is, therefore, necessary for these enterprises to manage their data lakes by putting in place effective metadata management that considers metadata discovery, data cataloguing, and overall enterprise metadata management applied to the company’s data lake.

2020 is the year that most data and analytics use cases will require connecting to distributed data sources, leading enterprises to double their investments in metadata management. – Gartner 2019.

How to Leverage Your Data Lake With Metadata Management

To get value from their data lake, companies need to have both skilled users (such as data scientists or citizen data scientists) and effective metadata management for their data science initiatives. To begin with, an organization could focus on a specific dataset and its related metadata. Then, leverage this metadata as more data is added into the data lake. Setting up metadata management can make it easier for data lake users to initiate this task.

Here are the Areas of Focus for Successful Metadata Management in Your Data Lake

Creating a Metadata Repository

Semantic tagging is essential for discovering enterprise metadata. Metadata discovery is defined as the process of using solutions to discover the semantics of data elements in datasets. This process usually results in a set of mappings between different data elements in a centralized metadata repository. This allows data science users to understand their data and have visibility on whether or not they are clean, up-to-date, trustworthy, etc.

Automating Metadata Discovery

As numerous and diverse data gets added to a data lake on a daily basis, maintaining ingestion can be quite a challenge! By using automated solutions not only does it make it easier for data scientists or CDS to find their information but it also supports metadata discovery.

Data Cataloguing

A data catalog consists of metadata in which various data objects, categories, properties and fields are stored. Data cataloguing is both used for internal and external data (from partners or suppliers for example). In a data lake, it is used for capturing a robust set of attributes for every piece of content within the lake and enriches the metadata catalog by leveraging these information assets. This enables data science users to have a view into the flow of the data, perform impact analysis, have a common business vocabulary and accountability and an audit trail for compliance.

Data and Analytics Governance

Data and analytics governance is an important use case when it comes to metadata management. Applied to data lakes, the question “could it be exposed?” must become an essential part of the organization’s governance model. Enterprises must therefore extend their existing information governance models to specifically address business analytics and data science use cases that are built on the data lakes. Enterprise metadata management helps in providing the means to better understand the current governance rules that relate to strategic types of information assets.

Contrary to traditional approaches, the key objective of metadata management is to drive a consistent approach to the management of information assets. The more metadata semantics are consistent across all assets, the greater the consistency and understanding, allowing the leveraging of information knowledge across the company. When investing in data lakes, organizations need to consider an effective metadata strategy for those information assets to be leveraged from the data lake.

Start Metadata Management

As mentioned above, implementing metadata management into your organization’s data strategy is not only beneficial, but essential for enterprises looking to create business value with their data. Data science teams working with various amounts of data in a data lake need the right solutions to be able to trust and understand their information assets. To support this emerging discipline,  the Actian Data Intelligence Platform gives you everything you need to collect, update and leverage your metadata through its next generation platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.