Data Analytics

Connecting to Actian Data Platform From Pentaho via JDBC

Actian Corporation

May 21, 2019

Mountain Range

Actian delivers an operational data warehouse as a managed service in the cloud. Its developer-friendly high-performance analytics engine has been proven to outperform other popular analytics platforms without specialized hardware or complicated software development. Actian  requires minimal setup, provides automatic tuning to reduce database administration effort and enables highly-responsive end-user BI reporting.   

This is how to connect to Actian  from Pentaho, using the Actian JDBC driver.

A prerequisite is to have the Actian Avalanche JDBC driver downloaded and available in the environment where the tools run. The JDBC Driver is available via the Actian console Drivers and Tools link.  

 

Extract the iijdbc.jar file from the downloaded JDBC package.

If you have not done so already, you will need to put your IP addresses on the Allow List in the Actian console (Manage Update Allow List IPs).  

The steps below are provided for Pentaho 8.2 and Pentaho Server 8.2. The same configuration should work in similar ways for other versions.   

This is how to connect from Pentaho and Pentaho Server to Actian. 

Pentaho

Copy the iijdbc.jar file to the data-integrationlib directory. 

Start Pentaho (spoon.bat or spoon.sh, depending on the OS). 

Create a new transformation. 

Go to Tools > Wizard > Create database connection…

Provide a name for the database connection, e.g. Actian JDBC.

Option 1

Select Generic database for the type of database and Native (JDBC) for the type of access. 

Click Next. 

Specify the JDBC connection URL, which is provided in the Actian Data Cloud console (Manage > Connect). 

Example: 
jdbc:ingres://01c77d46010046ec7.vpaasstage.actiandatacloud.com:27839/db;encryption=on;

 

For the driver class, type com.ingres.jdbc.IngresDriver. 

Click Next. 

Enter your user name (dbuser) and password.  

Click Finish to complete the connection setup. 

Option 2

Alternatively, you can use the Ingres JDBC Connector  

The Actian Data Cloud console details (Manage > Connect) from the Common Properties tab should be used. 

Example: 

 

After the connection setup is completed, edit the newly created connection and go to Options, then add a new parameter encryption with the value of on. 

At this point the connection should test successfully.

To use the newly created connection, for example, add a Table input step. Select the connection that was just created as your Table input connection. 

Pentaho Server

Copy the iijdbc.jar file to the pentaho-servertomcatlib directory. 

Start Pentaho Server by running start-pentaho 

Go to the Pentaho User Console and login as administrator.  

Go to Create New > Data Source  

 

Select the desired source type, e.g. Database Table(s). 

Click on the “+” sign to add a new connection.  

 

Provide a name for the connection, e.g. Avalanche JDBC Connection. 

 

Select Generic database for the Database type 

Select Native (JDBC) for the Access parameter. 

Populate the URL value from the Actian Data Cloud console details (Manage > Connect > JDBC).  

Add the driver class name value as com.ingres.jdbc.IngresDriver. 

Also fill out the user name and password.  

Now test the connector to confirm that the connection is successful and that’s it!

To learn more about Actian, our fully managed cloud data warehouse service, visit https://www.actian.com/avalanche/.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Analytics

Is Your Data Making a Difference?

Actian Corporation

May 21, 2019

Data

For many years, companies have been accumulating large amounts of data with an intuitive feeling that it has value and would be put to good use to make more informed business decisions. As we transition into a new era where machine learning and artificial intelligence are enabling more robust analysis of a company’s data assets, it is a good time to assess your current data and whether you have the tools and processes to transform your data into actual business value.

Data as a Tool, Instead of an Asset

To understand fully how well your company is transforming data into business value, you must first re-orient your thinking about data and its purpose within your organization. For the past 25 years, industry leaders have been describing data as a company asset – sometimes a strategic asset, other times an operational asset. The asset designation included the perception that data is something your company should strive to collect and stockpile. Unfortunately, simply possessing data doesn’t mean it is creating value for you. On the contrary, storing and maintaining data you aren’t using is actually a liability.

Data only creates value for a company when it is used to drive business decisions, establish sustainable competitive advantage, and enable business agility. Data is a tool (not an asset) and value is only created when data is being consumed. This is an important mindset shift for many business and IT leaders, but essential if you want your data to make an actual difference. Instead of focusing on collecting more data (for the sake of having it), companies should be focusing on using their current data more effectively to drive greater impact.

Refining Data into Insights

Companies acquire data in raw form from many different sources – transactional systems, social platforms, 3rd party data feeds, data from the market, etc. Harvesting value from these data sources requires a process of refinement to convert the raw data into actionable insights. Data transformation is no different than the process of transforming raw materials into finished goods via a value stream. In this case, actionable business insights are the finished product you are seeking to provide to your data consumers.

The refinement process starts with the ingestion and aggregation of data from each of the source systems. This is often done in some sort of data warehouse. Once the data is in commonplace, it must be merged and reconciled into a common data model – addressing, for example, duplication, gaps, time differences and conflicts. The unified operational data set can then be processed through a variety of different analysis functions to aggregate, summarize, correlate and create forecasts that are meaningful to data consumers. Once the data is organized and processed into informational insights, it must then be presented to data consumers in a way that is easily understood and usable for their business tasks.

Big-Data and Real-Time Insights

What makes the modern era of data processing different from the past few decades is the increasing business demand for real-time insights that are informed by the holistic set of data a company has available to it. Every company now has a big-data scenario on their hands and when they combine it with the data demands of business processes that have undergone digital transformation, the need for massively scalable data processing solutions is apparent. Cloud services and distributed solution architectures, such as those Actian Data Platform leverages, provide companies with both the scale and speed they need to address big-data and real-time demands from business users.

Transforming data from a set of assets that you simply possess into a set of actionable insights that are actively being used across your business to make decisions is the key to developing sustainable competitive advantage in the modern business climate. You’ve been collecting data for many years, isn’t it time you use it to make an actual difference for your company? Actian can help. To learn more, visit www.actian.com/data-platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Governance

All You Need to Know About Data Governance

Actian Corporation

May 21, 2019

what is data governance

Whether it’s to accelerate your time-to-market, address your customer experience challenges, or put your company on the path to operational excellence, you’ve entered the data-driven era. At the heart of your approach is a demanding discipline: data governance. Here’s a complete overview of this essential discipline of your data strategy, from vision to definition to methodology.

Data governance is an essential discipline to adopt for companies that want to become data-driven. It was already a priority in 2021 and will be even more so in 2022. 

We define data governance as the exercise of authority with decision-making power (planning, monitoring, and enforcement of rules) and controls over data management.

On the one hand, ensuring effective data governance guarantees that data is consistent, reliable, and not misused. On the other hand, data governance allows you to ensure that your data is well-documented. The challenge is to never expose your company to the risk of data that does not comply with new data regulations. 

Indeed, a company’s data is a “shared asset” and must be treated as such. That’s why data governance is essential. But data governance is more than just a concept or a code of conduct; it is a strategic activity that sets the ambitions, the path to follow, and the technical solutions needed for your data-driven strategy.

Why is Data Governance Important?

In the past, data governance implementations within organizations were rarely successful. Data Stewards have too often focused on technical management or strict control of data.

For users who aspire to experiment and innovate around data, governance can evoke a set of restrictions, limitations, and unnecessary bureaucracy. These users sometimes have frightening visions of data locked away in dark catacombs, accessible only after months of struggling with administrative hassles. Others painfully recall the energy they wasted in meetings, updating spreadsheets, and maintaining wikis, only to find that no one benefits from the fruits of their hard work.

It’s clear that companies are conditioned by regulatory compliance: ensuring data privacy, security, and risk management. However, it is crucial to undertake an offensive axis that tends to improve the uses of a company’s data – by guaranteeing useful, usable, and used data – and to value this asset. 

Offensive vs. Defensive Data Governance Strategies

There are two approaches to data governance: defensive and offensive. It is about orienting business strategy towards IT requirements in terms of data security while promoting data exploitation and analysis to generate business value. Here are some examples of the objectives set by each of these two strategic approaches to data governance:

Defensive Data Governance:

  • Undertake compliance with country authorities to avoid penalties, such as the General Data Protection Regulation (GDPR) implemented in May 2018.
  • Meet internal obligations and rules to which the organization’s data is subject.
  • Ensure data security, integrity, and quality for proper use.

Offensive Data Governance:

  • Increase a company’s profitability and competitive position with the help of data.
  • Optimize data analysis, modeling, visualization, transformations, and enrichment.
  • Increase the flexibility of the company in the use of its data.

What are the Main Benefits of Good Data Governance?

The more data occupies an important place in corporate strategies, the more it is subject to demanding standards and regulations: SOX in the United States, the GDPR in Europe… On the one hand, it is essential not to expose yourself to the wrath of the legislator, and on the other hand, it is essential not to betray the trust of your customers and partners who accept that you collect and use data. 

Data governance allows you to continuously monitor data compliance at all stages of its life cycle (from collection to exploitation). Ensuring data compliance has other benefits as well. Compliance with regulations mechanically contributes to the strengthening of data security. Data governance includes tasks such as locating critical data, identifying the owners and users of the data. 

Data governance also sets the framework for data quality. More quality means a more efficient and effective use of data, especially in decision-making processes. Good data governance is also an asset for reducing and controlling management and storage costs.

Who are the Key Players in Data Governance?

Ensuring good data governance requires a little bit of methodology. To begin with, it is recommended that a precise charter of values be drawn up: A charter that sets out the principles and defines the means and technical solutions to be implemented in order to begin the data governance process. 

But data governance is also a matter of people, whose actions contribute to the excellence of your strategy. While the Chief Data Officer obviously plays a key role, they must be able to rely on Data Owners and Data Stewards. While the CDO supervises the entire system and reports directly to the CEO, the Data Steward is responsible for data quality. The Data Stewards are responsible for ensuring that the principles laid down in your charter are respected, but also for distilling the message to all the teams. Because, on a daily basis, data governance is everyone’s business.

 
actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Data Governance: A Competitive Advantage

Actian Corporation

May 16, 2019

data-governance-a-competitive-advantage

For the past few years, on the trails of GAFA (Google, Apple, Facebook, and Amazon), data is perceived as a crucial asset for enterprises. This asset is enhanced by digital services and new uses that disrupt our daily lives and weaken more traditional businesses.

This transformation, whether we like it or not, concerns all structures and all sectors. Enterprises have understood that in order to face up to innovative startups and powerful web giants, they must capitalize on their data. This awareness brings the great – likewise the small – enterprises to start a digital transformation to become what we call, Data-Driven.

In order to be data-driven, data should be considered like an asset in business, which must be mastered in order to be enhanced.

It is a means to collect, safeguard, and ensure data assets of the highest quality and security. In other words, users must have access to accurate, intelligible, complete, and consistent data in order to detect proven business opportunities, to minimize time-to-market, and also to undertake regulatory compliance.

The road to reach the Promised Land of Data Innovation is full of obstacles. Between siloed data on both sides in the enterprise and tribal knowledge, this legacy does not contribute anything to the overall quality of data.

The advent of Big Data has also reinforced the sentiment that the life cycle of one data must be mastered in order to find your way through the influx and the massive volume of the enterprise’s stored data. Talk about a challenge encompassing roles and responsibilities, processes and tools!

The implementation of such data governance is a chapter that a data-driven company must write

However, in our experience, exchanges with and lectures by major players of data confirmed our observation that the approaches to data governance from recent years have not kept their promises.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Integration

Connect SaaS Services Across Your Organization

Actian Corporation

May 15, 2019

SaaS

Increasingly, companies are embracing the use of SaaS and other cloud services to give their employees the feature-rich capabilities they require at an affordable cost. IT leaders are finding it much more efficient to buy IT services already built than to build and operate them internally.

Buying SaaS solutions and other technology capabilities from 3rd parties does not eliminate the need and importance of your IT organization and its staff. On the contrary, it makes the role of IT more important than ever. SaaS applications may come and go, but the holistic integrity of your company’s IT systems, policy control, governance, security and consistency of user interfaces and integrated data your company produces must be maintained.

It has been stated many times during the past few years that “IT organizations are transitioning from being design/build shops to being brokers of services from 3rd parties.” If you look throughout your company, then you will likely see signs of the truth of this statement. Hardware manufacturers build the devices with which users interact. Telecom companies manage the networks they use to connect and company resources. Third parties even develop the business applications used to manage sales processes, manufacturing and HR tasks.

When a company builds all its IT applications in-house, it has full control of what data is created, where it is stored, how it is managed and who can use it. Data is the lifeblood of a company’s business processes and a strategic asset for achieving profitability and a sustainable competitive advantage. SaaS applications (by design) are self-contained islands of IT capabilities with pre-defined data structures and a limited set of integration options (for simplicity and security).

Unfortunately, a single SaaS application supports few business processes, but rather, they use a set of applications linked into workflows and leveraging each other’s data.

While your company’s IT staff may not spend as much time and effort developing the building-block technology components when SaaS is involved, they are likely to spend considerably more time focusing on integrating the SaaS capabilities with other systems. One of the biggest challenges is data integration. Two primary types of data integrations must be managed with SaaS applications.

  1. Transactional Integrations
    These are the business workflows that pull data from source systems and push data to downstream systems, enabling end-to-end business processes to function. Transactional integrations are also important for establishing a consistent user experience for your staff. Digital transformation of business has increased the need for integrated transactional workflows and the frictionless flow of information among transactional systems.
  2. Data Aggregation
    In addition to the operational transactions that use your IT systems, SaaS software is the source for much of your enterprise data that is needed for analytics, reporting and harvesting actionable business insights to improve operations. To perform analytics effectively, companies will often transfer (copy) data from the various source systems into a data warehouse where it can be aggregated, integrated and further refined.

SaaS services are making IT’s data integration challenges more difficult. Service providers host and manage most SaaS components – including the underlying data stores. Similarly to the speed at which SaaS services can be added to the company’s IT ecosystem, they can leave the ecosystem just as fast when a business decides it wants something new.

Just because the software is no longer needed, doesn’t mean data that was created can vanish. Software and hardware may be disposable, but data is a durable asset with enduring business value.

Companies are addressing the data integration challenges of SaaS services by using an integration platform (like Actian DataConnect) that can connect to all the various data sources across the IT environment and serve as a data-integration hub to facilitate the efficient exchange of data between systems. Once the data in SaaS systems is unlocked through an integration platform, it can be easily connected to other transactions and replicated to an enterprise data warehouse for analysis.

The shift towards SaaS and other cloud services is projected to continue as the IT marketplace becomes more specialized and companies realize the value these components contribute to their business agility goals. Actian DataConnect can help you embrace the use of SaaS within your company by giving you the tools you need to integrate your data. To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Google Goods: The Management and Data Democratization Tool of Google

Actian Corporation

April 10, 2019

When you’re called Google, the data issue is more than just central. A colossal amount of information is generated every day throughout the world by all teams in this American empire. Google Goods, a centralized data catalog, was implemented to cross-reference, prioritize, and unify data.

This article is a part of a series dedicated to data-driven enterprises. We highlight successful examples of democratization and mastery of data within inspiring companies. You can find the Airbnb example here. These trailblazing enterprises demonstrate the Actian Data Intelligence Platform’s ambition and its data catalog: to help organizations better understand and use their data assets.

Google in a Few Figures

The most-used search engine on the planet doesn’t need any introduction. But what is behind this familiar interface? What does Google represent in terms of market share, infrastructure, employees, and global presence?

In 2018, Google had [1]:

  • 90.6% market share worldwide.
  • 30 million indexed sites.
  • 500 million new requests every day.

In terms of infrastructure and employment, Google represented in 2017 [2]:

  • 70,053 employees.
  • 21 offices in 11 countries.
  • 2 million computers in 60 datacenters.
  • 850 terabytes to cache all indexed pages.

Given such a large scale, the amount of data generated is inevitably huge. Faced with the constant redundancy of data and the need for precision for its usage, Google implemented Google Goods, a data catalog working behind the scenes to organize and facilitate data comprehension.

The Insights That Led to Google Goods

Google possesses more than 26 billion internal data [3]. And this includes only the data accessible to all the company employees.

Taking into account sensitive data that uses secure access, the number could double. This amount of data was bound to generate problems and questions, which Google listed as a reason for designing its tool:

An Enormous Data Scale

Considering the figure previously mentioned, Google was faced with a problem that couldn’t be ignored. The sheer quantity and size of data made it impossible to process all them all. It was hence essential to determine which ones are useful and which ones aren’t.

The system already excludes certain information deemed unnecessary and is successful in identifying some redundancies. Therefore, it’s possible to create unique access roads through data without it being stored in different places within the catalog.

Data Variety

Data sets are stocked in a number of formats and in very different storage systems. This makes it difficult to unify data. For Goods, it is a real challenge with a crucial objective: to provide a consistent way to query and access information without revealing the infrastructure’s complexity.

Data Relevance

Google estimates that 1 million data are both created and erased on a daily basis. This emphasizes the need to prioritize data and establish their relevance. Some are crucial in processing chains but only have value for a few days, others have a scheduled end of life that can last from several weeks to a few hours.

The Uncertain Nature of Metadata

Many of the data cataloged are from different protocols, making metadata certification complex.  Goods therefore proceeds by trial and error to create hypotheses. This is due to the fact that it operates on a post hoc basis. In other words, collaborators don’t have to change the way they work. They are not asked to combine data sets with metadata when they are created. It is up to Goods to work, collect, and analyze data to bring them together and clarify them for future use.

A Priority Scale

After working on discovery and cataloging, the question of prioritization arises. The challenge is the ability to respond to this question: “What makes a data important?” Providing an answer to this question is much less simple for an enterprise’s data than prioritizing web research, for example. In an attempt to establish a relevant ranking, Goods is based on the interactions between data, metadata, and other criteria. For instance, the tool considers that data is more important if its author has associated a description to go with it, or if several teams consult, use or annotate it.

Semantic Data Analysis

Carrying out this analysis allows, in particular, to better classify and describe the data in the search tool. It can thus respond to the correct requested information in the catalog. The example is given in the Google Goods reference article [3]: Suppose the schema of a data set is known and certain fields of the schema take on integer values. Thanks to inference on the data set’s content, the user can identify that these integer values are IDs of known geographical landmarks and then use this type of content semantics to improve geographical data research in the tool.

Google Goods Features

Google Goods catalogs and analyzes the data to present it in a unified manner. The tool collects the basic metadata and tries to enrich them by analyzing a number of parameters. By repeatedly revisiting data and metadata, Goods is able to enrich itself and evolve.

The main functions offered to users are:

A Search Engine

Like the Google we know, Goods offers a keyword search engine to query a dataset. This is the moment when the challenge of data prioritization is taking place. The search engine offers data classified according to different criteria such as the number of processing chains involved, the presence, or the absence of a description, etc.

Data Presentation Page

Each data has at its disposal a page containing as much information as possible. In consideration that certain data can be linked to thousands of others, Google compresses data upstream recognized as most crucial to make them more comprehensible on a presentation page. If the compressed version remains too large, the information presented keeps only the more recent entries.

Team Boards

Goods created boards to distribute all data generated by a team. For example, this makes it possible to obtain different metrics and to connect with other boards. The board is updated each time Goods adds metadata. The board can be easily integrated into different documents so that teams can then share it.

In addition, it is also possible to implement monitoring actions and alerts on certain data. Goods is in charge of the verifications and can notify the teams in case of an alert.

Goods Usage by Google Employees

Over time, Google’s teams have come to realize the use of its tool as well its scope was not necessarily what the company expected.

Google was thus able to determine that employees’ principal uses and favorite features of Goods were:

Audit Protocol Buffers

Protocol Buffers are serialization formats with an interface description language developed by Google. It is widely used at Google for storing and exchanging all kinds of information structures.

Certain processes contain personal information and are a part of specific privacy policies. The audit of these protocols makes it possible to alert the owners of these data in the event of a breach of confidentiality.

Data Recuperation

Engineers are required to generate a lot of data in the framework of their tests and often forget their location when they need to access it again. Thanks to the search engine, they can easily find them.

Understanding Legacy Code

It isn’t easy to find up-to-date information on the code or data sets. Goods manages the graphics that engineers can use to track previous code executions as well as the input and output of data sets and find the logic that links them.

Utilization of the Annotation System

The bookmark system of data pages is fully integrated to find important information quickly and to easily share them.

Use of Page Markers

It’s possible to annotate data and attribute different degrees of confidentiality to them. This is so that others at Google can better understand the data they have in front of them.

With Goods, Google achieves prioritizing and unifying data access for all their teams. The system is meant to be non-intrusive and therefore operates continuously and invisibly for users in order to provide them with organized and explicit data. Thanks to this, the company improves team performance, avoiding redundancy. It saves on resources and accelerates access to data essential to the company’s growth and development.

[1] Moderator’s blog: https://www.blogdumoderateur.com/chiffres-google/
[2] Web Rank Info: https://www.webrankinfo.com/dossiers/google/chiffres-cles
[3] https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/45390.pdf

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Insights

Moving Your Data to the Cloud? Read These Helpful Tips.

Actian Corporation

April 1, 2019

Data cloud connecting to helpful tips across multiple devices

When you talk to IT professionals about cloud migration, the conversation is likely to gravitate towards migration of applications from on-premises infrastructure to public and/or private cloud environments.

The application-centric perspective is understandable as applications are often the visible interface between business users and the technology platforms they leverage.  What many IT practitioners forget is that applications aren’t the whole solution, they are more like the visible tip of an iceberg with an expansive mass hidden below the surface.

As companies approach their cloud migration strategies, they need to address the whole iceberg, not just what users see.

Below the surface of the IT environment, abstracted and obscured by polished application interfaces, is a complex web of dependent components – both technology and data. This is where the meaningful information is aggregated, stored and processed into insights.  Its also where workflows are constructed, and business capabilities are enabled.

Unfortunately, it is also the source of much of the application performance issues that companies encounter.  Moving applications to the cloud is great, but if the data and all of the connections are left on-premises and leveraging outdated integration methods, users aren’t going to realize the full benefits that the cloud can offer. Moving your company’s data to the cloud is an important step in the overall cloud migration process.

When planning your cloud data migration effort, there are a few important decisions that you need to make.

1. Do You Want to Just “Lift and Shift” or Do You Want to Modernize the Way You Store and Manage Data?

Some companies move their data to the cloud with the simple goal of reducing IT infrastructure costs. The approach they take is to simply move their current databases, data, and connections from on-premises hardware into an IaaS (Infrastructure as a service) or PaaS (Platform as a service) environment – leaving functionality essentially the same.

This is often the “quick and easy” way to check the box and say that data has been moved to the cloud, but these companies will soon find that they aren’t able to leverage the full benefits that the cloud can offer them and end up regretting the decision later.

As a better alternative, many companies are combining their cloud migration with modernization efforts – moving data to the cloud but adopting new ways of storing, managing, integrating and processing it.

There is a new generation of cloud-based data management tools that leverage the unique scaling properties of cloud environments to manage large quantities of data (including streaming data from things like IoT devices) and provide it to target applications with remarkable speed that isn’t attainable using traditional approaches and on-premises infrastructure.

The effort to modernize your infrastructure at the same time as your data migration may require a bit more time and resources, but the business results will be easily apparent.

2. Are You Only Concerned With Storing Data, or Do You Want to Improve How You Process Data Too?

Managing data is more than storing it in a database and using applications to perform queries. Most modern applications, both robust platforms (like CRM, ERP, HCM and ITSM) and simple interfaces that users interact with (things like eCommerce and mobile applications) have their functionality largely driven by data, not the application code itself.  Applications and software are essentially specialized data processing engines.

Whether it is your logistics system tracking orders through your supply chain, a call center app providing agents with customer information or a marketing system identifying efficient means of targeting customers, most of the heavy-lifting in modern applications is all about processing and integration data from various sources and applications.

When migrating data to the cloud, this is a good time for companies to look at modernizing how they process the data that their applications rely on.

Traditional relational databases are optimized for the storage of data while systems like Actian Vector are optimized for processing and consuming data.  Upgrading your data processing capabilities can significantly improve your application performance.

3. How Do You Want to Manage Data Integration, Both in the Cloud and With On-Premises Systems?

The choices you make about how to manage the complex web of data connections between data sources and applications is arguably the most important decision you will make in your cloud data migration project.

The approach that has been used for decades in on-premises systems is for each application and database to maintain point-to-point connections with each of the systems it needs to share data with.

That approach worked fine when applications had a couple of data sources that they relied on and perhaps a few upstream or downstream systems they needed to integrate with for workflows. Modern applications are different, and it is making data integration a real headache for IT departments.

With the proliferation of SaaS and other 3rdparty software offerings, company’s IT ecosystems are getting more complex.  As companies embark on digital transformation journeys, business processes become dependent not just on a few, but many applications.  Those applications are also dependent on many data sources and share data with a lot of other applications.

Using the traditional point-to-point integration method, it is easy for even a mid-size company to have thousands of data connections that need to be maintained in order for their business to operate.

Cloud migration of your data is the ideal time to address the data integration challenge by implementing a system like Actian DataConnect.

DataConnect serves as a hub for managing all your data connections and integrations in one place, so you can manage where your data is going and make changes confidently.  Most cloud migration projects aren’t done as a “big bang” approach but happen in phases. Actian DataConnect enables you to manage your connections regardless of where the data source or application resides – on-premises, in the cloud, or with a 3rd party provider.

Moving your data cloud is an important part of your overall cloud migration strategy.  To achieve the full potential that cloud has to offer, you need to look beyond a simple lift and shift of your legacy solutions and leverage the modern data management capabilities that solutions like Actian can offer.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Integration

5 Ways a Hybrid Integration Platform Can Make Your Data More Secure

Actian Corporation

April 1, 2019

Data Security represented by a secure digital lock

Core to the success of any digital business transformation is the ability to rapidly and securely bridge intra- and inter-enterprise hybrid IT environments; including cloud, on-premises, and embedded applications and their associated data.

Digital Supply Chain integration and management is a strategic imperative for the modern enterprise and Industrial Internet of Things (IIOT) alike.

Data security is one of the top concerns for modern CIOs.  As companies digitally transform their business processes, users across the company become more reliant on IT systems to do their daily jobs.  Those systems need to communicate with each other to create seamless user experiences that are both effective and secure.

The web of data connections in your IT environment is an area prone to security vulnerabilities and potentially exposing your company to intellectual property risk and data loss.

The following are 5 ways a hybrid integration platform like Actian DataConnect can help IT organizations better manage their data connections and keep the company secure.

1. Manage All Your Connections in One Place Through a Management Console

Modern IT environments are complex and for them to work properly in support of your business, individual systems need to share data with each other. Managing a bunch of point-to-point interactions not only requires a lot of administrative overhead, it also makes it difficult for you to effectively manage changes.

Managing your data connections through a hybrid integration platform gives you a “single pane of glass” or centralized place to manage, monitor and update all of your data connections so you can keep your IT environment and business processes running smoothly.

The user interface that orchestrates all components of the iPaaS solution. The console is generally hosted in the iPaaS vendor’s cloud. Similar to a design studio, the console is used to define data paths and transformations. The user interface also deploys and manages integration engines. The console monitors performance and operational functions, including the creation of alerts. Metadata travels between each integration engine and the console.

Citizen integrators use the console to create nontechnical data integrations. Vendor connector ecosystems are also accessed via this console.

2. Control What Systems Are Accessing Your Data

Managing data access in individual systems creates a high risk for un-authorized access.  While a source system may be restricted, data in downstream system may still be exposed.

A hybrid integration platform enables you to establish policies and centrally manage which systems and users have access to individual data sets. Keep your sensitive data secure while enabling authorized systems and users to access it through governed data connections.

3. Ensure Secure Connection Protocols Between Systems

The wide variety of system components in your IT ecosystem make securing data connections difficult.  While some enterprise class components may use robust security protocols, consumer grade devices, some in-house developed applications and 3rd party services may not have the security management capabilities that you need.

A hybrid integration platform enables you to use the most robust security protocols available for each component and isolate your organization from potential risks.

4. Respond Confidently to Security Issues

Companies are being bombarded with information security threats continuously. Even with the best designed solutions, incidents will happen and your IT team needs to be ready to respond.

Managing data connections through a hybrid integration platform enables you to quickly and confidently disconnect compromised components from your IT ecosystem to prevent hackers from leveraging back-doors into your other IT systems.  Once the threat has been neutralized, the component can be easily re-connected to enable normal processing to resume.

5. Update Credentials

Information security best practices suggest that system passwords and other credentials be updated periodically to ensure they aren’t mis-used. If your environment has a lot of point-to-point data connections, password updates require changes to multiple systems and there is a good likelihood that something will get missed and your business processes will be impacted.

A hybrid integration platform provides you a centralized place to manage and update credentials so you can be confident that password changes are performed effectively and safely.

Securing company data is essential for mitigating risks and ensuring continuity of your digital business processes. Actian DataConnect has built-in security measures to address modern, risk-based security in hybrid cloud environments where traditional perimeter security falls short.

DataConnect’s architecture has been designed to focus on security at the user and application levels; leveraging existing customer security systems and policies, isolating running processes, in-depth user and role-based permission schemas, token-based authentication and encrypted macro files to keep passwords and metadata secure.

Actian DataConnect can help you do that by providing you a centralized system for managing the connections between all your IT systems, whether they be on-premises, in the cloud, 3rd party services or IoT devices.

To learn more, visit DataConnect.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Metacat: Netflix Makes Their Big Data Accessible and Useful

Actian Corporation

March 29, 2019

metacat-netflix

Like many other companies, Netflix has a large amount of data that comes from many different data sources in various formats. As the leading streaming video-on-demand company (SVOD), data exploitation is, of course, a major strategic asset. Given the diversity of its data sources, the streaming platform wanted a way to federate and interact with these assets using a single tool. This led to the creation of Metacat.

This article explains the motivations behind creating Metacat, a metadata solution intended to facilitate the discovery, treatment, and management of Netflix’s data.

Read our previous articles on Google and Airbnb.

Netflix’s Key Figures

Netflix has come a long way since its DVD rental company in the 1990s. Video consumption on Netflix accounts for 15% of global internet traffic. But Netflix today is also:

  • 130 million paying subscribers worldwide (400% increase since 2011).
  • $10 billion turnover, including $403 million in profits.
  • $100 billion market capitalization, or the sum of all the leading television groups in Europe.
  • $6 billion investment in original creations (TV shows and movies).

Netflix is also a data warehouse of 60 petabytes (60 million billion bytes), which is a real challenge for the firm to exploit and federate this data.

Netflix’s Big Data Platform Architecture

Its basic architecture includes three key services. These are the Execution Service (Genie), the Metadata Service (Metacat), and the Event Service (Microbot).

In order to operate between its different languages and data sources, which are not very compatible with each other, Metacat was born. This tool acts as a data and metadata access layer from Netflix’s data sources. A centralized service accessible by any data user in order to facilitate their discovery, treatment, and management.

Metacat and its Features

Netflix has data queries, such as Hive, Pig, or Spark, that are not operable together. By introducing a common abstraction layer, Netflix can provide data access to its users, regardless of their storage systems.

In addition, Metacat goes so far as to simplify transferring one dataset to a datastore to another.

Business Metadata

Hand-written, user-defined, business-oriented metadata, in free format can be added via Metacat. Its main information includes the connections, configurations, metrics, and the life cycles of each dataset.

Data Discovery

By creating Metacat, Netflix makes it easy for consumers to find business datasets. The tool publishes schema and business metadata defined by its users in Elasticsearch, making it easier to find full-text information in its data sources.

Data Modification and Audit

As a cross-functional tool for all data stores, Metacat registers and notifies all changes made to the metadata and the data itself from its storage systems.

Metacat and the Future of Netflix

According to Netflix, the current version of Metacat is a step towards the new features they are working on. They still want to improve the visualization of their metadata, as it would be very useful for restoration purposes.

Metacat, according to Netflix, should also be able to have a plug-in architecture. Thus, their tool could validate and maintain all of its metadata. This is because users define metadata in free form. Therefore, Netflix needs to put into place a validation process that can be done before storing the metadata.

As a centralizing tool for multi-source and multi-format data, Netflix’s Metacat has clearly made progress.

The development of this in-house service has adapted to all the tools used by the company, allowing Netflix to become Data Driven.

Sources

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

Metadata Management: A Trending Topic in the Data Community

Actian Corporation

March 28, 2019

gartner-metadata-management

On the 4th, 5th, and 6th of March, Actian had the opportunity to attend the famous Data & Analytics Summit in London, organized by Gartner. This is an indispensable and inspiring event for Chief Data Officers and their teams in the implementation of their data strategy.

This article outlines many concepts from the conference: “Metadata Management is a Must-Have Discipline” by Alan Dayley, Gartner Analyst. This subject has attracted the attention of many C-Levels, confirming that metadata management is a top priority for the years, even months, to come.

The Concept of Metadata Applied to Our Daily Lives

To introduce the concept of metadata, the speaker made an analogy to a situation that is known to all of us and that is becoming more and more important in our daily lives: identifying and selecting what we eat.

Take the example of a meal composed of many different ingredients that have been significantly modified. It’s thanks to the different labels, pricing schemes, and descriptions on a product’s packaging that consumers can identify what they have on their plates.

This information is what we call metadata.

How Does Metadata Bring Value to an Enterprise?

Applying metadata to data allows the enterprise to contextualize its data assets. Metadata addresses different subjects gathered within four different categories:

  • Data Trust
  • Regulations & Privacy
  • Data Security
  • and Data Quality

The implementation of a metadata management strategy depends on finding the balance between the identified business needs within the company and the regulations associated with data risks.

In other words, where should you invest your time and money? Should you democratize data access to your data teams (data scientists, data engineers, data analysts or data experts) to increase in productivity or to concentrate on the demands of regulatory bodies such as the GDPR, to avoid a hefty fine?

The answer to these questions is specific to each enterprise. Nevertheless, Alan Dayley highlights four use cases, identified as top priority cases by CDOs, where metadata management should be the key:

1. Data Governance

In this particular use case, the speaker confirms that data governance can no longer be thought of in a “top-down” manner. Data cross-references different teams and profiles with distinct roles and responsibilities. In light of this, everyone must work together to inform and complete their data’s information (its uses, its origin, its process, etc.). Contextualizing data is a fundamental element to establishing effective and easy data governance!

2. Risk Management and Compliance

The information requested below have been enforced since the arrival of the GDPR. Enterprises and their CDOs must:

  • Define the responsibilities linked to their data sets.
  • Map their data sets.
  • Understand and identify the processing operations on the data and associated risks.
  • Have a processing and/or a data lineage register.

3. Data Analysis

By addressing data governance in a more collaborative way and by favoring interactions between data users, the enterprise will benefit from collective intelligence and continuous improvement on the understanding and analysis of a data set. In other words, it’s extracting previous discoveries and experimentations from pertinent information for the next data users.

4. Data Value

In the quest for data monetization, data will have no value, so to speak, unless the information around it is:

  • Measured: By its quality, its economic characteristics, etc.
  • Managed: The persons in charge, documentation provided, its updates, etc.

How to Establish Metadata Management?

No matter your enterprise’s objectives, you can not reach them without metadata management.

Our recommendations to be able to undertake this exercise would be to:

  • Hire the right sponsor that values a metadata-centric approach in the enterprise.
  • Identify the main use case that you want to treat first (as defined above).
  • Check that the efforts made in terms of metadata are not isolated but are centralized and unified.
  • Select a key metadata management solution on the market, such as a data catalog.
  • Define where, who, and how you will start.

To conclude this article, not having metadata management is like driving on a road with no signs. Be careful not to get lost!

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Databases

Customer Buying Behavior to Improve Customer Targeting

Actian Corporation

March 27, 2019

understand consumer behavior

Imagine what you could do if you actually understood your customers’ buying behavior. You could predict future purchases, identify cross-sell and upsell opportunities, and personalized messaging that would resonate and drive the behavior you desire. Instead of taking broad strokes to your marketing campaigns, you could focus on specific opportunities and pursue them more aggressively. This would be a game changer for your marketing function.

Your Target Audience is Too Big

Most companies today are doing market segmentation based on basic account information and demographics. They are looking for groups of customers based on high-level account-and-behavior metrics that are designed to identify similarities, not differences. The results are a grouping of customers that fit into some conceptual “bucket,” but not necessarily a set of high-potential leads.

This is because it is each customer’s unique challenges that lead to highly motivated buying behavior, not general characteristics. For example, a grouping might be “females living in a certain geographical region with no children and an income between $50k and $100k/year.” The metrics used for targeting are accurate, but they may not be meaningful to identify customers likely to purchase the product you are offering during the next 3–6 months.

The marketing campaign you develop based on the group demographics is likely to be both expensive (due to the size of the target audience) and generate poor results (because the audience is too broad). You are effectively gambling with your marketing investment, hoping to be lucky and see a return.

Focus on Specific Customer Needs

True customer engagement is built on a deep understanding of specific needs and wants, which then leads to more satisfied customers and longer-lasting relationships, increasing revenue and wallet share for your business. If you can narrow your target audience by analyzing a richer and more diverse set of reference data, then you will be able to focus on a smaller target-market segment that is more likely to make a purchase.

With a smaller and more focused audience, you can create a better customer experience with targeted offers, appropriate responses, and effective dialogue.

Using Data to Understand Why Customers Behave the Way They Do

The key to understanding customer buying behavior is analyzing cause-and-effect relationships to determine what situations, characteristics or influences lead a customer to perform a specific action – statisticians refer to this as correlation analysis. Effective correlation analysis requires a diverse dataset and is most effective with a large amount of historical data.

Unfortunately, most companies don’t have the right systems to process large volumes of historical data and diverse data sets, so they instead work with samples. While sampling techniques work fine to identify averages and commonalities, you actually need to process the entire population of data for effective correlation analysis and to understand the causes of behaviors. This is where Actian Vector can help.

Better Tools Lead to Better Results

With Actian, you can connect and mine all your data, including big data sources, to obtain a detailed, holistic view of the customer. By including not just their actions (sales history), but also contextual data, such as individual profile demographics, social influencing networks, location data and sentiment (reviews of products/services), Actian can help you uncover relationships between customers and key purchase drivers.

You can use this information to predict the value of each customer according to thousands of customer attributes. You can uncover new segments that your competition has overlooked and you can engage in more meaningful customer conversions that generate higher returns on your marketing investment. To learn more, visit www.actian.com/databases/vector/.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.
Data Intelligence

How Big Data & Machine Learning Contributed to Zalando’s Success

Actian Corporation

March 21, 2019

zalando

For the second year in a row, Actian participated in Big Data Paris as a sponsor this past 11th and 12th of March to present the Actian Data Intelligence Platform data catalog.

During the event, we were able to attend many different conferences presented by professionals in the data field: chief data officers, business analysts, data science managers, etc.

Among those conferences, we had the opportunity to attend the Zalando conference, presented by Kshitij Kumar, VP of Data Infrastructure.

Zalando: The Biggest eCommerce Platform in Europe

With more than 2,000 different brands and 300,000 items available, the German online fashion platform has conquered 24 million active users in 17 European countries since its creation in 2008 [1].

In 2018, Zalando earned about € 5,4 billion: a 20% increase since the year 2017 [2].

With these positive results, Zalando has a lot of hope for the future. Their objective is to become the fashion reference :

“We want to become an essential element in the lives of our customers. Only a handful of apps make it to be part of a customer’s life, such as Netflix for television or Spotify for music. We aim to be this one fashion destination where the customer can fulfil all of their fashion needs. [3]” explains David Schneider, co-CEO of Zalando.

But how was Zalando able to become so successful in such a short time? According to Kshitij Kumar, it is a question of data.

Zalando on the Importance of Being a Data-Driven Enterprise

“Everything is based on data.” states Kshitij Kumar during his conference Big Data Paris this past March. For 20 minutes, he explains that everything must revolve around data : business intelligence and machine learning are built based on the company’s data.

With more than 2,000 technical employees, Zalando claims a Big Data infrastructure in different categories :

Data Governance

In response to the GDPR, the VP Data Infrastructure explains the importance of establishing data governance with the help of a data catalog: “It is essential to an organization in order to have safe and secure data.”

A Machine Learning Platform

It’s by exploring, working, curating and observing your data that a machine learning platform can be efficient.

Business Intelligence

It’s by putting into place visual KPIs and trusted datasets that BI can be proactive.

Zalando’s Machine Learning Evolution

Kshitjif reminds us that with Machine Learning, it is possible to collect data in real time.

In the online fashion industry, there are many use-cases: size recommendation, search experience, discounts, delivery time, etc…

Interesting questions were then brought up: How can you know exactly what a customer’s taste is? How to know exactly what he could want?

Kumar answers by telling us that it’s by repeatedly testing your data:

“Data needs to be first explored, then trained, deployed and monitored in order for it to be qualified. The most important step is the monitoring process. If it is not successful, then you must start the machine learning process again until it is.”

Another benefit in Zalando’s data strategy is their return policy. Customers have 100 days to send their items back. Thanks to these returns, Zalando can gather data and therefore, better target their clients.

Zalando’s Future

Kshitij Kumar tells us that by 2020, he hopes to have an evolved data structure.

“In 2020, I envision Zalando to have a software or program that allows any user to be able to search, identify and understand data. The first step in being able to centralize your data is by having a data catalog for example. With this, our data community can grow through internal and external (vendors) communication.”

Sources

[1] “L’allemand Zalando veut habiller l’Europe – JDD.” 18 oct.. 2018, https://www.lejdd.fr/Economie/lallemand-zalando-veuthabiller-leurope-3779498.

[2] “Zalando veut devenir la référence dans le domaine de la mode ….” 1 mars. 2019, http://www.gondola.be/fr/news/non-food/zalando-veut-devenir-la-reference-dans-le-domaine-de-la-mode.

[3] “Zalando Back in Style as It Bids to Be Netflix of Fashion – The New ….” 28 févr.. 2019, https://www.nytimes.com/reuters/2019/02/28/business/28reuters-zalando-results.html.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale, streamlining complex data environments and accelerating the delivery of AI-ready data. The Actian data intelligence approach combines data discovery, metadata management, and federated governance to enable smarter data usage and enhance compliance. With intuitive self-service capabilities, business and technical users can find, understand, and trust data assets across cloud, hybrid, and on-premises environments. Actian delivers flexible data management solutions to 42 million users at Fortune 100 companies and other enterprises worldwide, while maintaining a 95% customer satisfaction score.