Data Analytics

Affinity Analytics Using Actian Data Platform

Mary Schulte

October 29, 2021

affinity analytics

Affinity analytics is the practice of finding relationships and patterns in data. Businesses can use the results from affinity analytics for many positive impacts.   Here are just two examples from real customer use cases. First, in retail, management wants to know what products typically sell well together for product placement and advertising purposes. This information is critical to successfully upselling additional products. Another example, telecommunications providers need to study network traffic data to understand routing patterns and maximize equipment and topography. Like these use cases, your business likely has occurrences of data affinity that you can harness to make better business decisions. Actian provides the data warehouse platform to help you do it.

Despite being clearly useful, affinity is difficult to find in traditional data warehouses because it involves executing one of the most difficult, resource-intensive SQL statements known, the fact-table self-join (also known as a “market-basket” query). This query is difficult because data warehouse “fact” tables often contain billions of rows (like mine does here), and joining billions of rows back to themselves to find affinity takes a lot of processing power. In fact, some platforms can’t do it at all, or it takes so long it’s not usable. That is where the power of the Actian Data Warehouse shines.

In this blog, I discuss how to successfully achieve affinity analytics using solely the built-in functionality of the Actian Data Warehouse, with no other tooling required!

Actian provides industry-leading cloud analytics, purpose-built for high performance.  What I will show here is that Actian – natively – provides the necessary tooling to accomplish SQL analytics, allowing you can achieve things like affinity analytics without having to embark on giant, expensive projects involving additional third-party tooling.

Here is My Scenario:

I have a retail data warehouse. Marketing wants to plan an outreach mail campaign to promote sales of products that typically sell well with the store’s best-selling products. In particular, they want to mail coupons to customers that have NOT bought products that are normally bought together, but HAVE purchased at least one of the best-selling products. They would like me to provide data to support this campaign.

My Analytics Process Will Be As Follows:

  1. Investigate the data.
  2. Find best-selling products (A).
  3. Find products commonly sold with top products (B).
  4. Find the customer population who bought A but not B.
  5. Provide appropriate information to marketing.

For this blog, I have created an 8 AU (Actian Unit) warehouse in the Google Cloud Platform.  An Actian Unit is measure of cloud computing power that can be scaled up or down. See Figure 1.

Figure 1: Avalanche console warehouse definition
Figure 1: Avalanche console warehouse definition

My Actian database has a typical retail schema, but for this blog, I will just focus on four tables.  See Figure 2.

Figure 2: Retail ER diagram
Figure 2: Retail ER diagram

I have used a data generator to generate a large amount of data, but I’ve added some artificially superimposed patterns to make this blog more interesting. My tables have the following number of rows in them:

customer 5,182,631
order 1,421,706,929
lineitem 45,622,951,425
product 16,424

 

I can now use the tools provided in the Actian console Query Editor to execute my analytics process. You can find the Query Editor in the top right corner of the warehouse definition page. I have circled it in blue in Figure 1.

For all the queries in this blog, I performed the following sequence: I put my query into the query editor pane (1), formatted the query (optional) (2), then executed the query (3), then saved the query (4) for future reference. See sequence layout in Figure 3. Notice that you can also see the layout of my entire schema (red circle) in the Query Editor.

Figure 3: Query Editor layout
Figure 3: Query Editor layout

Investigate the Data

First, I want to understand my data by executing a few interesting queries.

I want to understand what months of data are in my Actian warehouse and understand some overall numbers.  (Note this blog was authored in early 2021).  I execute this query:

Figure 4: Line item statistics
Figure 4: Line item statistics

Because of the speed of Actian, in just a few seconds, I gleaned some valuable information from my warehouse. It looks like I have five years’ worth of data including over 45 billion line items sold, showing an average sale of $625. That’s terrific! See Figure 4.

Also, I would like to see trended sales by month. I execute this query:

Figure 5: Trended sales
Figure 5: Trended sales

This query also finished in just a few seconds, but with all these big numbers, it’s a little hard to grasp their relative values. It will be helpful to make a chart using the Actian Query Editor’s charting function.

I’ve used the charting function (see Figure 6) to create a bar chart. I’m running the same query essentially, but I’ve simplified it and limited the output to just last year. It’s easy to see now, that my sales really accelerated around Christmas. I’ve shown how I configured this chart in Figure 7.

Figure 6: Trended sales with chart
Figure 6: Trended sales with chart
Figure 7: Chart configuration
Figure 7: Chart configuration

Find Best-selling Products (A)

Now that I understand my data, I execute this query to find the best-selling product categories by spend in the last year:

Figure 8: Top categories by spend
Figure 8: Top categories by spend

In just a few seconds, I learn that Clothing and Electronics were my best-selling product categories overall. I know that marketing always likes to work with Electronics, so I’m going to concentrate there.

Next, I want to find the top-selling products in Electronics last year. I execute this query:

Figure 9: Top products in Electronics
Figure 9: Top products in Electronics

Again, because of the speed of Actian, in a few seconds, I learn that many of the top products in my Electronics category are Canon products.  See Figure 9.

Find Products Commonly Sold with Top Products (B)

Now I want to find the Electronics products that are most often sold with these top-selling Canon products in the last six months. This is the resource-intensive market-basket query that I referred to in my introduction.  To execute, this query will join my 45 billion line items back to the same 45 billion line items to see which items are typically bought together. I execute this query:

Figure 10: Market-basket query
Figure 10: Market-basket query

This query is much more complex than the previous queries, still, it only took a mere 17 seconds to execute in Actian. It is obvious from this query that Canon customers often buy SDHC Memory Cards of different types. This is something that seems logical, of course, but I have now proven this with analytics.

Find the Customer Population Who Bought A But Not B

Now I need to find the names and addresses of customers who have NOT bought memory cards. This is basically a reverse market-basket query. Actian will join the 45 billion row line item table back to itself, this time to find missing relationships…customers who have not bought memory cards. It then also needs to join the line item and order information back to the customer table to get the corresponding name and address information. Also, I need to make sure I don’t send duplicate mailings to any customer that may have bought multiple Canon products, so I have added the DISTINCT keyword to my SQL. I execute the query below. Once it is finished, I then choose the .csv download option to create an output file.  See the red circles in Figure 11.

Figure 11: Reverse market-basket.  No affinity.
Figure 11: Reverse market-basket.  No affinity.

Provide Appropriate Information to Marketing

I can now easily mail the .csv file of prospect customers to market so they can send out their marketing mail campaign.

Figure 12: Email with target list
Figure 12: Email with target list

In conclusion, the Actian Data Warehouse is a very powerful cloud data warehouse platform that also includes the basic tools and speed you need to be productive with affinity analytics in your business.

 

mary schulte headshot

About Mary Schulte

Mary Schulte is Senior Sales Engineer at Actian, drawing upon decades of experience with powerhouse database vendors like Informix and Netezza. She has written thousands of lines of Informix 4GL and ESQL/C for global clients, including American Airlines' groundbreaking datablade implementation. Mary has delivered countless training sessions, helping organizations optimize their database environments. Her posts on the Actian blog center on query performance, analytics databases like Vector, and practical tips for leveraging Informix. Browse her articles for expert guidance.
Data Architecture

Data Warehouse Best Practices

Teresa Wingfield

October 26, 2021

Data Warehouse Best Practices

In every industry, the need to follow best practices exists. Data warehouse best practices are no exception. Best practices are methods or techniques accepted as a good way or best way to accomplish an activity, process, or practice. All practices evolve, but the best way to start is with a foundation of best practices and then adapt those practices to meet the specific needs of an organization. Organizations that continually evolve their best practices based on industry, customer, and internal feedback will create unique best practices resulting in a strategic, tactical, or operational advantage over a similar organization serving the same markets.

Best practices enable assets, capabilities, and resources to deliver value to the organization, stakeholders, and customers. A data warehouse can be a strategic resource for any organization. Developing a data warehouse practice into a unique capability requires making the data warehouse best meet the organizational objectives that the data warehouse technology supports. 

Data Warehouse Best Practices

Data within an organization is sometimes not leveraged as much as it can be. Many organizations find themselves making decisions using best effort or expert opinions in most cases. These decisions can become more powerful and meaningful when backed with intelligent data, information, and knowledge relative to the needs of data consumers. To do this, organizations have to work as a team and remove as many silos as possible related to the services and products they deliver and support. Data exchanges between customers and all the functional units in the organization help make this happen.

Organizations rely on many best practices in various functions to perform as efficiently as possible. There are best practices for managing people, methods, processes, and technologies. Listed below are several best practices and data warehouse considerations that should be adopted within an organization to help enable value from a data warehouse:

  • Identify what decisions need to be made within each functional unit of the organization and how data supports their conclusions. Data should have a purpose. Data collected that does not have a goal is a waste of the organization’s precious resources. The organization must be efficient and effective with data collection, including exchanging data between functional units, transforming data into information, and then shifting it into knowledge for decision support.
  • Create models. Service models, product models, financial models, and process models help organizations understand data exchanges and the data needed by different stakeholders to define and architect the data warehouse data model. The data model helps the organization understand the value chains between different data consumers and how data should be presented.
  • Understand decisions that need to be made by each consumer of the data in the data warehouse. Analyze and understand data needs for each consumer of the data.
  • Decide governance, risk, and compliance (GRC) policies, processes, and procedures. Managing data is very important to any organization and should be treated with utmost care and responsibility. Data activities within the organization have to operate efficiently, effectively, and economically to avoid risk and resource waste.
  • Decide the type of initial data warehouse. Decide whether a 1-, 2-or 3-tier architecture is the best initial approach for your data warehouse. Remember, data warehouses support analytical processing and are not suitable for transactional processing.
  • Decide if the data warehouse should be on-premises, cloud or hybrid. This includes understanding the budget available for the overall initial program/project and its impact on the decision.
  • Decide initial sources for input into a data warehouse. Remember, data sources can grow over time as the needs of the organization grow. It is essential to make sure adding new data sources is easy to accomplish.
  • Create a project plan to break up the delivery approach into manageable units for showing value quickly using the data warehouse. Don’t try to be perfect with a never-ending project or try a big-bang approach. Show value as soon as possible. Be agile, get good enough and plan for continuous improvement.
  • Decide data warehouse needs for availability, capacity, security, and continuity. The data warehouse has to be available when needed, have enough capacity to support demand, secure and maintain levels of confidentiality, integrity, and be available to those who need it. For continuity, the data warehouse should be included in business impact analysis and risk assessment planning. Usability and performance are also considerations for data warehouse warranties to its consumers.
  • Decide how often data needs to be loaded and reconciled, based on timeliness and relevance of data change, from data warehouse sources for decisions. Use Extract, Transform and Load (ETL) to help migrate data between sources and destinations. Data warehouse staging is a best practice to help stage data on a regular schedule for data warehouse decision needs.
  • Setup data for reporting, analytics, and business intelligence. Data warehouse reporting best practices have to be enabled for ease of use by data consumers. The consumer should be able to create dynamic reports with ease from the data warehouse quickly.
  • Follow agile best practices for change, release, and deployment management to help reduce risks and increase knowledge transfer. These best practices should integrate and align with other best practices in the organization.
  • Make sure to hire experienced people who are experts in data warehouse planning, design, and implementation. Bringing the right team together is one of the most important best practices for data warehouse design, development and implementation. No matter how good the technology is, the overall results will be disappointing without the right people. Project managers, business analysts, data analysts, data engineers, data architects, security analysts, and knowledge managers are key roles that can help with a successful data warehouse.

Best practices in business intelligence and data warehousing go hand in hand. The better the data warehouse technical infrastructure, the better the organization can collect, store, analyze, and present the data for consumer intelligence. Organizations have to be careful of insufficient data quality resulting in bad data for business intelligence. The data warehouse should easily support tools or applications that need the data for business intelligence. Reporting, data mining, process analysis, performance benchmarking, and analytical tools support business intelligence and should be quickly implemented without creating homegrown solutions for the data warehouse.

In Summary

This blog has discussed many data warehouse best practices. Depending on the organization and challenges they have experienced, more best practices can be added to the ones listed above. Best practices can come from anywhere in the organization based on experiences, challenges, and the overall market dynamics that the organization faces. Data warehouses and enterprise data hubs are fast becoming a strategic component for many organizations. Since a data warehouse is a large project that will mature over time, it should become a major program in the organization. Data is the blood that runs through the organization; this will not change. Data management will advance with emerging technologies, but the purpose will remain to help the organization make better informed and more timely decisions. Make a plan to start or improve your data warehouse outcomes by using best practices and selecting the right partners, technologies, and software to aid your journey.

Actian combines one of the industry’s fastest hybrid-cloud data warehouses with self-service data integration in the cloud to create better customer insights. With an easy, drag-and-drop interface, Actian Data Platform empowers anyone in the organization – from data scientists to citizen integrators – to easily combine, clean, and analyze customer data from any source, in any location.

teresa user avatar

About Teresa Wingfield

Teresa Wingfield is Director of Product Marketing at Actian, driving awareness of the Actian Data Platform's integration, management, and analytics capabilities. She brings 20+ years in analytics, security, and cloud solutions marketing at industry leaders such as Cisco, McAfee, and VMware. Teresa focuses on helping customers achieve new levels of innovation and revenue with data. On the Actian blog, Teresa highlights the value of analytics-driven solutions in multiple verticals. Check her posts for real-world transformation stories.
Data Analytics

What is an Enterprise Data Hub?

Actian Corporation

October 25, 2021

data challenges telemetry

When managing big data, organizations will find that there will be many consumers of the vast amounts of data, ranging from applications and data repositories to humans via various analytics and reporting tools. After all, the data is an expression of the enterprise, and with digital transformation, that enterprise is increasingly expressed in the form of applications, data and services delivered. Data that is structured, unstructured, and in various formats become sources and destinations of exchanges between functional units in the organization that is no longer just done manually or with middleware but can now be hosted collaboratively utilizing data lakes, data warehouses, and enterprise data hub technologies.

The choice of which data management solution to use depends on the organization’s needs, capabilities, and the set of use cases. In many organizations, particularly large or complex ones, there is a need for all three technologies. Organizations would benefit from understanding each solution and how the solution can add value to the business, including how each solution can mature into a more comprehensive higher-performing solution for the entire organization. 

What is Enterprise Data Hub?

An Enterprise data hub helps organizations manage data directly involved – “in-line” –  with the various business processes, unlike data warehouses or data lakes, as they are more likely to be used to analyze data before or after use by various applications. Organizations can better govern data consumption by applications across the enterprise by passing it through an Enterprise data hub. Data lakes, data warehouses, legacy databases, and data from other sources such as enterprise reporting systems can contribute to governed data that the business needs.

Besides data governance protection, an enterprise data hub also has the following features:

  • Ability to make use of search engines for enterprise data. The enablement of search engines acts as filters to allow quick access to the enormous amounts of data available with an enterprise data hub.
  • Data indexing to enable faster searches of data.
  • Data harmonization enhances the quality and relevance of data for each consumer or data, including improving the transformation of data to information and information to knowledge for decision-making.
  • Data integrity, removing duplication, errors, and other data quality issues related to improving and optimizing its use by applications.
  • Stream processing binds applications with data analytics, including simplifying data relationships within the enterprise data hub.
  • Data exploration increases the understanding and ease of navigating the vast amount of data in the data hub.
  • Improved batch, Artificial Intelligence, Machine Learning processing of data because of the features listed above.
  • Data storage consolidation from many different data sources.
  • Direct consumer usage or application usage for further processing or immediate business decisions.

Enterprise data hubs can support the rapid growth of data usage in an organization. The flexibility in using multiple and disparate data sources is a massive benefit of selecting a data hub. Leveraging the features mentioned above increases this benefit.

Difference Between Enterprise Data Hub, Data Lake, and Data Warehouse

Data lakes are centralized repositories of unorganized structured, and unstructured data with no governance and specifications for organizational needs. The primary purpose of a data lake is to store data for later usage though many data lakes have developer tools that support mining the data for various forward-looking research projects.

A data warehouse organizes the stored data in a prescribed fashion for everyday operational uses, unlike a data lake. Data warehouses can be multitiered to stage data, transform data and reconcile data for usage in data marts for various applications and consumers of the data. A data warehouse is not as optimized for transactional day-to-day business needs as an enterprise data hub.

In addition to drawing data from and pushing data to various enterprise applications, an Enterprise data hub can use a data lake, data warehouse, and other data sources as input into or as destinations from the data hub. Once all the data is available for the hub, the aforementioned features, such as governance, can be applied to the data. Enterprise data hub vs data lake can be easily differentiated based on the data hub’s additional capabilities for processing and enriching the enterprise data. Enterprise data hub vs data warehouse can be confusing, but the data hub has additional capabilities for using the data more business process-oriented rather than business analytics-oriented operations.

Enterprise Data Hub Architecture

The following diagram shows a data hub architecture that includes multiple data sources, the hub itself, and the data consumers.

Enterprise Data Hub

The Enterprise data hub Architecture is designed for the most current needs of organizations. The architecture itself can grow to accommodate other data management needs, such as the usage of data in emerging technologies for decision support and business intelligence.

In Summary

With the increasing adoption of disparate data and Big Data practices, Enterprise data hubs are becoming the architectures to create a unified data integrated system to enable better business processes across the enterprise. Enterprise data hub can utilize data for any source and type to create a single source of data truth about the organization’s customer, service, and products. This single source of truth can be used collaboratively across the organization to share data for timely, higher-performing business operations, automation, and decision-making.

Organizations with data hubs and supporting data sources can become more competitive than those that do not. Data is the lifeblood of the organization that enables optimized and automated business processes and decision support for organizations to make better decisions. This capability is well worth the time and investment for the organization.

Actian can help you with your cloud data integration challenges. Actian DataConnect is a hybrid integration solution that enables you to quickly and easily design, deploy, and manage integrations on-premises, in the cloud, or hybrid environments.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.