Data Engineering

ML UDFs in Actian Data Platform, VectorH, and Vector – Part 1

Actian Corporation

August 6, 2020

Machine Learning UDF showing a robot hand and a human hand

Recently in Actian Data Platform, VectorH 6.0, and Vector 6.0, Actian introduced a capability for Scalar user-defined functions (UDFs). This has given Actian Data Platform, VectorH, and Vector a new dimension to run Machine Learning (ML) models in Python and JavaScript within a database. More about UDFs can be found in our documentation.

Model creation is simple with so many available libraries such as Spark, Tensorflow, Python Scikit-learn (SKlearn), which is the most commonly used. Once a production-grade model is created, it needs to be deployed into production.  Here Actian Data Platform, Vector, and VectorH get an advantage by deploying these models directly in the database, and therefore model scan be used to score data directly within the database.

To demonstrate this, we used Python SKlearn to train the model. The focus of this blog is to demonstrate how a UDF would work in the context of deploying a machine-learning model.

We found a very interesting project called sklearn-porter, which transpiles the model to JavaScript and m2cgen, which can be used to transpile the model to both JavaScript and Python. Actian Data Platform, Vector, and VectorH support both JavaScript and Python UDFs, and therefore our choice of library is m2cgen.  Since our UDFs are scalar UDFs, we needed to write some additional code for m2cgen to return scalar values.

For showcasing the ML UDF, I chose the Iris dataset. It has just 4 columns and 150 rows, which makes the use case easy to comprehend. I will demonstrate an end-to-end test case that creates the table, loads data in the database, builds the model using data from the database, and finally runs the model inside the database.

Iris Dataset

The Iris dataset is easily available. It can be downloaded from Kaggle: https://www.kaggle.com/uciml/iris/data#

Its fields are ID (int), SepalLengthCm (float), SepalWidthCm(float), PetalLengthCm(float), PetalWidthCm (float), Species (varchar (20)).

Details About Python Connection with Vector/VectorH

It is discussed in https://www.actian.com/blog/data-warehouse/integrating-python-vector-ingres/ on how to make python ODBC or JDBC connections. In this tutorial, I will be using ODBC connections.

Connect to DB

import pyodbc as pdb
import pandas as pd
import numpy as np
conn = pdb.connect("dsn=Vector6;uid=actian;pwd=passwd" )
conn.setdecoding(pdb.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pdb.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
cursor = conn.cursor()
iristbl='''create table iris1(
id integer,
sepallengthcm float,
sepalwidthcm float,
petallengthcm float,
petalwidthcm float,
species varchar(20))'''
conn.execute(iristbl)
conn.commit()

I have not taken any partition as dataset has just 150 rows

Load Data to DB

This will help in bulk loading the data for CSV we downloaded from Kaggle

query ="COPY iris() VWLOAD FROM '/home/actian/vidisha/datasets_19_420_Iris.csv' with fdelim=',', insertmode ='Bulk' ,header"
conn.execute(query)
conn.commit()

Note: datasets_19_420_Iris.csv is the dataset I downloaded from Kaggle and used vwload to load the data to database.

Building the Model

Classification and Prediction are the two most important aspects of Machine Learning. With the Iris Dataset, we will create a simple logistic regression model for Iris classification.  The focus here is not model building, however, showing how the model can be run inside the database.

Checking the Data

sql_case="select sepallengthcm ,sepalwidthcm, petalwidthcm ,petalwidthcm , 
CASE 
WHEN species='Iris-setosa' THEN '1' 
WHEN species='Iris-versicolor' THEN '2' 
ELSE '3' 
END as speciesclass 
FROM iris"
iris_case=pd.read_sql(sql_case, conn)
print(iris_case.shape)
iris_case.info(verbose=True)
iris_case.describe()
iris_case.head(10)

Split the Test and Train Data

sql_case="select sepallengthcm ,sepalwidthcm, petalwidthcm ,petalwidthcm , 
CASE 
WHEN species='Iris-setosa' THEN '1' 
WHEN species='Iris-versicolor' THEN '2' 
ELSE '3' 
END as speciesclass 
FROM iris"
iris_case=pd.read_sql(sql_case, conn)
print(iris_case.shape)
iris_case.info(verbose=True)
iris_case.describe()
iris_case.head(10)

In the second part of this two-part article we will go through the steps to create the UDFs in database.

To learn more about the capabilities of all the Actian products, visit our website.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

How Well Do You Actually Know Your Customers?

Actian Corporation

August 5, 2020

customer experience business strategy

One of a company’s biggest mistakes is assuming they understand their customers’ needs, preferences and buying behaviors when they are actually just guessing. The better you understand who your customers are, the more successful you will be in developing products and services to serve their needs, desirably packaging them and communicating your offers with messages that lead to a sale. There is much available information, the challenge is gathering it in a meaningful way that leads you to actionable marketing insights.

Why Developing a 360-Degree Customer Profile is Difficult

Marketers have known for many years that there is much more customer information available than they have had the tools to manage and use effectively. A true 360-degree view of your customers can’t be sourced from a single system, it must be aggregated from many different sources. Traditional CRM and data warehouse systems just couldn’t handle all the data sources available, and developing a complex set of integrations was both costly and time-consuming. With market dynamics and customer preferences changing continuously, the tools just weren’t available to provide the types of insights that were needed.

Your marketing team needs detailed, near-real-time customer-profile analytics that integrate data from multiple channels. Assembling a complete customer profile requires 3 key components:

  1. Access to data from different sources.
  2. A means of collecting and aggregating data in one place.
  3. Analysis tools to convert raw data into actionable insights.

Once you have these components in place, you will be able to generate the insights you need to improve the performance of your marketing efforts. Actian provides a complete solution to enable marketing success! Employ advanced data modeling to de-duplicate, identify common characteristics and create customer clusters that depict the complete picture of your customers’ purchasing habits and preferences.

Imagine What You Could Do With Better Data

Actian helps you develop a more complete and accurate profile of your customers by mining all available sources of information in any format, whether structured or unstructured, from any location or channel.  Combine data from your sales transactions, Websites, mobile apps, service history and more with public information about customers, such as social media posts, product reviews and customers’ networks of friends and family.

By gathering all this information, you will be able to identify the best methods to connect, assess your customers’ predilection to churn, develop offers and marketing messages that will resonate and learn how to personalize their entire customer experience. The result will be winning more business for your company and increasing customer loyalty by showing customers you understand them and want to solve their problems (instead of just pushing your products and hoping for a response).

If your company wants to understand your customers better and target your marketing efforts according to their individual needs, then Actian can help. Actian Data Connect can help you manage and integrate data sources from inside and outside your company and Actian Vector provides the rich analytics solution to transform your data into marketing insights. To learn more about this and other data analysis use-cases, visit actian.com/solutions/customer360-analytics/

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

The Evolution of Data Integration With the Internet of Things

Actian Corporation

August 4, 2020

windmills on an open road showing data integration

Loraine Lawson always does a wonderful job covering the world of integration, and this article that covers the integration challenges and opportunities of the Internet of Things (IoT) is no exception. Besides big data, I see the IoT as one of the most interesting data integration challenges coming down the road. Perhaps it’s time that we thought a bit more about it.

You can think of the IoT as the concept of having traditionally dumb devices, such as thermostats, car ignition systems, and even your power tools, begin to communicate with systems outside of those devices. These systems gather information that the devices produce and then analyze that information to take action or to understand more about the device itself.

For example, my motorcycle gathers and transmits data using devices that I integrated into the bike’s core systems, including ignition, fuel management, transmission, etc. The devices produce and store data that allows me to determine how the bike is performing, and can even spot forthcoming problems, and perhaps take preventive corrective action.

This information can do a few things. It can provide direct feedback to the core systems on the motorcycle to correct problems proactively using pre-directed processes (e.g., send a text if the oil temp is out of range). Or, the data is gathered in mass and analyzed to determine trends, such as data that indicates a failure to one of the air sensors is likely to occur in the near future. It’s handy to have these sorts of conversations with your motorcycle, versus the days gone by when you basically reacted to things as they failed.

If you get this, even if you don’t have a motorcycle, you get what IoT is all about. Just substitute an MRI machine, home thermostat, your car, an aircraft engine, or an industrial robot for my motorcycle, and you’ll find that many of the concepts remain the same. Indeed, the data these devices and machines can create and transmit allows us to understand more about how these devices work, and take proactive action to increase the value of using these devices or machines.

So, what about the data integration problem? As Loraine puts it, “IoT devices are becoming little silos of data, which makes it hard to share access.” We need to get good at sharing data with these devices, or else the value that the IoT is supposed to bring us won’t materialize.

The integration issues are much like the integration issues we dealt with back in the old days. IoT devices send data in very different formats, using very different interfaces. In many instances, they communicate with the outside world as an afterthought. Planning, or a lack of planning, for integration shows in how well the devices participate within a data integration approach, and how well they work and play with data integration technology. While device and machine vendors now deliver APIs, those APIs have a tendency to be proprietary. Thus, you end up writing very different interfaces to communicate with the different IoT devices.

As with any data, establishing open data standards is the key to resolve the issues of making data produced by devices more consumable. That’s why HyperCat is big news, and holds the promise of getting these devices on the same page, when it comes to data communications and integration.

HyperCat is an open standard developed by a UK-based consortium of 40 technology companies. Like other new standards that hit the industry, there is much additional work that needs to be done to get more IoT device providers onboard, and get the standard finished and implemented.  However, this is a step in the right direction.

On a technical level, HyperCat a catalog of specifications that provides a common way to describe the information stored on data hubs, or devices. HyperCat tells developers what they need to include in the data to make it easy for apps to search and identify it.

To me, HyperCat seems more like a data abstraction layer that provides a common way to view and understand the data. It’s missing things that deal with the complexities of communicating with the different devices, which I think is a much harder problem to solve.

No matter if it’s HyperCat, or some other standard, there needs to be some common sets of data services that IoT devices provide. Until that occurs, those who integrate with these devices will have to use whatever APIs and/or services that the devices are able to provide.  It will be a bit like the 90s, all over again.

In the meantime, it’s prudent to assemble your data integration strategy. Be sure to include IoT devices, and how the use of this technology will enhance your efficiency. For most enterprises, the ability to include devices and machines into the core business processes is something that’s long overdue, and the data integration technology is ready to get you there. Time to get started with IoT.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Actian Life

Introducing Our Actian Interns!

Actian Corporation

July 29, 2020

Actian Interns

Each month, we will be featuring a different employee in our Employee Spotlight series. This series was born with the belief that your life experiences make you the unique person you are today and influence what you bring to the amazing culture here at Actian. We have chosen to feature those incredible individuals with a blog post dedicated to them.

Beginning on June 15th, Actian introduced 8 new interns ranging from teams in Engineering, Finance, DevOps, and Human Resources. We hope to make an environment where our interns can learn, explore their area of interest, and bring new innovations to the table. Welcome to the Actian family, interns!

JT Kirages

Software engineer intern

JT is a rising Sophomore at the University of Illinois at Urbana-Champaign studying Computer Science and minoring in technology and management. His interest in computer science first began by attending an eight-week course where he was introduced to Java and game design. Little did he know that this program would mark the beginning of his exploration into the tech world. After taking a few more classes in high school, JT became fascinated with the problem-solving aspect of Computer Science and having the ability to find his own creative solution to a technical problem.

During his time at Actian, JT hopes to improve his web design skills and learn how companies work on large projects together – a vital step that plays a huge role in reaching business goals. Although still exploring his options, JT envisions his future self either as a software engineer or managing the business side of a tech company.

In his free time, you can usually catch JT in a game of ultimate frisbee, playing one of the four instruments he knows (guitar, violin, ukulele, and piano!), or getting ready to head to a 6 am practice for his school’s rowing team. Something you might not know about JT is that he is a huge history buff. Growing up, he would watch all the history crash course videos with John Green. Some of his favorite childhood memories involved traveling to different national parks with his family. He has visited Utah, Colorado, and has even taken a road trip to Tennessee to visit different civil war monuments.

 

Victoria Dunkle

DevOps intern

Victoria is currently finishing up her last year at Southwestern University majoring in Computer Science. A few years back, Victoria was a single mother to three children while working full time at a daycare. Today, she is set to graduate with her degree in December, continues to explore different careers within engineering, and recently got married during this internship! Inspired by her father who works in computer engineering, when considering what career path to pursue, she thought, “well I am pretty much just like my dad, so let me try this out too”. Coming into Actian, she had no idea what the world of DevOps would entail. However, after her fourth week here, Victoria feels as if she has finally found her niche. In the future, she would now love a full-time career in DevOps.

Victoria’s favorite part of the internship so far is being able to learn from her peers while still having a level of autonomy in her work. She loves being able to start from the beginning stages of a project, receive guidance from her team, and then obtain a chance to do things on her own.  Not only has she seen self-growth through this process, but she has found it very valuable to see a project evolve from start to finish.

When she is not working on creating or configuring a VM, she can be found playing with her three children or learning to cook a new recipe in the kitchen (she recently learned how to make some bomb tacos!). If there is one quote that resonates with her the most, it would be love never fails, which fits perfectly with her kind, fun, and caring personality.

Sam Nichols

Software Engineer Intern

Sam is an incoming senior at the University of Wisconsin-Madison and is currently pursuing a double major in electrical engineering and computer science. From a young age, Sam had always been intrigued by technology. After taking a couple of computer science classes in high school, he realized that he was really enjoying what he was learning. Although he has never worked with cloud computing or data storage prior to this internship, Sam felt compelled to apply to Actian because he was excited to be in a space that could challenge him and give him an opportunity to grow.

His favorite part of the internship so far has been the welcoming environment that Actian cultivates – despite the internship being fully remote. From day one, he recalls someone reaching out to him shortly after being introduced to the company, greeting him, and letting him know he could reach out if he had any questions. Working on the cutting-edge technology offered by the Actian Data Platform, Sam looks forward to visualizing data in a meaningful way.

During his days off, you can usually find Sam playing ultimate frisbee, collecting unique Arizona tea cans (which is affordable because they are only a dollar!), or competing nationally on his school’s Rocket League team.

Sam can recall his first job offer at 16. No, it wasn’t walking his neighbors’ dog, working as a cashier, or being a camp counselor. Sam’s first job was interning at IBM as a research associate! Although he was one of the youngest people there, it was a great experience because everyone was very supportive of him.  This is why his advice to any new intern would be to spend the first week of your internship reaching out to as many people as you can. Not only will this make you feel comfortable in a new environment, but now you can direct any questions you have to the appropriate subject matter expert.

Rohan Battula

Software Engineer Intern

Rohan is an incoming Sophomore at UCLA majoring in Computer Science.  At ten years old Rohan received a gift that would change his life forever – a Lego Robotics Kit. Through this kit, he was able to program his first robot. Flash forward to a few years later, Rohan would become the mechanical and electrical lead for his high school’s robotics team which ultimately impacted what he wanted to study in college.

Rohan chose to intern at Actian because he was very interested about Actian’s data warehouse – having lots of appeal to him since it is the fastest solution in the market. Wanting to learn about the technologies behind this, he decided to apply for the internship, landing himself a spot on the performance engineering team.

His favorite part of the internship to date has been receiving hands-on experience. Not only does he enjoy the opportunity to dive deep into his work, but also appreciates working with low level optimization programming – which is a different way of thinking than he is used to. Rohan would love to continue the software engineering route in the future and eventually aspire to lead and manage a team.

In his free time, you can usually find Rohan driving lengthy hours for food, biking, or browsing online places for the latest streetwear fashion. Rohan’s advice to any new interns would be to not become discouraged when you reach a roadblock in your work. Instead, you should be proactive – reach out to your teammates and ask lots of questions!

Jonny Ng

Finance Intern

Jonny is a rising junior at the University of Michigan studying Business Administration. His first insight into business began by helping run his family’s donut shop (Shoutout, Donut Delight Express ). In fact, Jonny first heard about this internship through one of his parent’s loyal customers who happened to work at Actian! Although Jonny was mostly looking for experience in corporate financing, what truly caught his attention was the mentorship aspect of the internship. He wanted to be in a space where someone could break down financial concepts while exploring the day to day operations of a finance team. Additionally, being in finance, allows you to work cross-functionally which is something he appreciates.

Jonny’s favorite part of the internship so far has been meeting everyone through the company’s virtual happy hours and coffee chats. Through these events, he enjoys seeing everyone’s different personalities and interacting with people he usually wouldn’t see on a regular basis.

Outside of work, you can find Jonny practicing Brazilian Jiu-Jitsu, reading, or being outdoors- either hiking or shooting some hoops. One life motto that resonates with him is, “not to take life too seriously”. He believes that you should work towards your goals but also have fun throughout the process. In the future, whatever career choice Jonny decides to pursue, there is one thing he undoubtedly would like to accomplish – retire his parents.

Jocelyn Liang

Marketing Data Analyst Intern

Jocelyn is a graduate student at Santa Clara University pursuing a Master of Science in Business Analytics. Born and raised in China, she came to America to pursue her master’s degree at 22 years old. She obtained her first master’s degree in accounting at the University of Illinois at Urbana-Champaign. However, after working in the field, she decided that she wanted to change her career path to a role where she could impact the future. This is when she decided to go back to school to pursue a degree as a data analyst. Being fascinated with data, Jocelyn is intrigued by how data analysts can implement their skills to predict what will happen in the future and produce data driven decisions.

During this internship, Jocelyn hopes to learn practical data analytic skills. Although she has worked on marketing analytic projects in college, often, the data used isn’t real-time and is relatively clean. She is excited to get her hands dirty and analyze some real, raw data that can make a difference for the company. Her favorite part of the internship so far is working with people from different departments as it provides her with a sneak peek on how other branches of the company operate.

One of Jocelyn’s favorite movies is The Lion King. She still remembers the first time her parents bought her the DVD – mostly because it was on repeat for what seemed like her entire childhood.  During her spare time, you can usually find Jocelyn playing with her cat Simba — an adorable short hair mix, who recently turned one year old.

Anna Bai

Software Engineer Intern

Anna is an incoming Sophomore at Rice University studying Computer Science. Coming from a family of engineers, Anna became interested in STEM early on in her life. During her four years of high school, she was involved with her school’s robotics team. Specifically, she worked in outreach, where she would organize STEM events for her local elementary schools. Anna soon realized however, that although her high school was very fortunate to have a strong encouragement of STEM education, other high schools in Texas were lacking this. To help resolve this issue, Anna co-founded a STEM advocacy organization. Through this, Anna was able to hold conferences in Austin and talk to legislatures about what could be done to expand STEM education across Texas.

Anna decided to apply to Actian because she wanted to challenge herself in other areas of Computer Science, such as back-end development. Throughout her interview process, what attracted Anna the most to this program was the constant emphasis that interns are here to learn. Anna explains how, “as a freshman, trying to find your first internship can be kind of intimidating, so knowing that I wasn’t expected to know everything made me feel much more comfortable”. Anna looks forward to learning from her experienced peers, working in teams, and presenting her final project to the company.

So far, Anna’s favorite part of the internship has been all the intern events. Whether it is a virtual lip sync battle or tiny campfire, they are all super engaging and a nice break from work. One of Anna’s favorite hobbies outside of work is calligraphy, specifically brush pen and watercolor lettering. Not only is it a nice therapeutic activity, but now she can say goodbye to boring lecture notes!

Amy Vides

HR Programs Intern

Now, transitioning to a rather interesting third person point of view…

Amy is an incoming junior at the University of California, Riverside studying Economics/Administrative studies. Born and raised in the bay area, Amy was always in awe by the growing number of tech companies around her. Though, she had always been more attracted to the business side of a company. Amy chose to study economics to learn more about the decision-making process of individuals, firms, and the economy.

Amy was sitting in an introductory business class when she first heard the phrase, “Human Resources Management”. Curious to learn more, Amy was excited to discover that Actian was hiring for an intern for the People Team – a unique role within people programs that offered hands on opportunity to impact Employee Engagement, Diversity & Inclusion, Employer Branding, and Professional Development. This seemed like the perfect opportunity to combine her passion for working with people and her love for technology. Her favorite part of the internship to date has been getting to know everyone on her team and learning about the different facets of HR. Amy is excited to continue to learn how creativity, collaboration, and communication help contribute to the success of a company.

When Amy is not working, you can usually find her getting lost on hiking trails, searching for the best lattes in town, or trying to perfect a new dessert recipe. Amy also enjoys volunteering and giving back to her community. One of her most memorable summers was teaching web development to middle school girls in her area. It was such a rewarding experience that taught her how to develop a curriculum, accommodate to different learning styles, and listen to the needs of her students.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

The Most Used Data Management Solutions in 2020

Actian Corporation

July 27, 2020

data-management-solutions-2020

It is no secret that, after various articles from Gartner and other famous data and analytics consulting firms, data catalogs are an essential data management solution. Combining artificial intelligence and human skills, Data Catalogs provide a next-generation workspace for data teams to find, understand, and collaborate on their data assets and usages.

In this article, we focus on the most-used data management solutions with which your enterprise can successfully collaborate with your data catalog software. These vendors have been repeatedly quoted by Gartner and used by many enterprises worldwide. We list the 5 main vendors in the following categories:

  • Data Integration
  • Data Preparation
  • Data Visualization
  • Data Governance

Let’s discover this list:

1. Data Integration Vendors

Data integration is the process of combining data from different sources, typically for analysis, business intelligence, reporting, or loading into an application. Data integration tools should be designed to transform, map, and clean data. They can also be integrated with data governance and data quality tools.

The top data integration vendors of 2020 include:

Informatica Data Integration Hub

Informatica’s data integration tools portfolio includes both on-prem and cloud deployments. The vendor combines advanced hybrid integration and governance functionalities with self-service business access for various analytic functions. Informatica touts strong interoperability between its growing list of data management software products.

IBM Infosphere Information Server

IBM offers several distinct data integration tools also both on-prem and cloud deployments, and for virtually every enterprise use case. Its on-prem data integration suite features tools for traditional and modern integration requirements. IBM also offers a variety of prebuilt functions and connectors. The mega-vendor’s cloud integration product is widely considered one of the best in the marketplace, and additional functionality is coming in the months ahead.

SAS Data Management

SAS is one of the largest independent vendor in the data integration tools market. The provider offers its core capabilities via SAS Data Management, where data integration and quality tools are interwoven. It includes query language support, metadata integration, push-down database processing, and various optimization capabilities.

SAP Data Services

SAP provides on-prem and cloud integration functionality through two main channels. Traditional capabilities are offered through SAP Data Services, a data management platform that provides capabilities for data integration, quality, and cleansing. Integration Platform as a Service features are available through the SAP Cloud Platform.

Oracle Data Integration Cloud Service

Oracle offers a full spectrum of data integration tools for traditional use cases as well as modern ones, in both on-prem and cloud deployments. The company’s product portfolio features technologies and services that allow organizations to full lifecycle data movement and enrichment.

2. Data Preparation Vendors

As defined in our last article on data preparation, is the process of gathering, combining, structuring and organizing data so it can be analyzed as part of data visualization, analytics and machine learning applications. In other words, it is the process of cleaning and transforming raw data prior to analysis.

The top data preparation vendors of 2020 include:

Alteryx Designer

Alteryx Designer features an intuitive user interface that enables users to connect and cleanse data from data warehouses, cloud applications, spreadsheets, and other sources. Users can leverage data quality, integration and transformation features as well.

Talend Data Preparation

Talend Data Preparation utilizes machine learning algorithms for standardization, cleansing, and pattern recognition. The product also provides automated recommendations to guide users through the data preparation process.

IBM Watson Analytics

Together with IBM Watson Machine Learning, IBM Watson Studio is a leading data science and machine learning platform built from the ground up for an AI-powered business. It helps enterprises scale data science operations across the lifecycle–simplifying the process of experimentation to deployment, speeding up data exploration and preparation, as well as model development and training.

Tableau Prep

Tableau Prep empowers more people to get to analysis faster by helping them quickly and confidently combine, shape, and clean their data. A direct and visual experience gives customers a deeper understanding of their data and smart features make data preparation simple.

Trifacta

Trifacta has been ranked as the top vendor in every analyst report published on data preparation to date. A self-service data preparation tool, Trifacta empowers all users, technical or non-technical, to clean & prepare their data efficiently.

3. Data Visualization Vendors

Data visualization is defined as a graphical representation of data. It is used to help people understand the context and significance of their information by showing patterns, trends and correlations that may be difficult to interpret in plain text form.

The top data visualization vendors of 2020 include:

Tableau

Tableau is a data visualization tool that can be used by data analysts, scientists, statisticians, etc. to visualize the data and get a clear opinion based on the data analysis. Tableau is known for being able to take in data and produce the required data visualization output in a very short time. And it can do this while providing the highest level of security with a guarantee to handle security issues as soon as they arise or are found by users.

Looker

Looker data visualization can go in-depth in the data and analyze it to obtain useful insights. It provides real-time dashboards of the data for more in-depth analysis so that businesses can make instant decisions based on the data visualizations obtained. Looker also provides connections with Redshift, Snowflake, BigQuery, as well as more than 50 SQL supported dialects so you can connect to multiple databases without any issues.

Zoho Analytics

Zoho Analytics helps you create wonderful looking data visualizations based on your data in a few minutes. You can obtain data from multiple sources and mesh it together to create multidimensional data visualizations that allow you to view your business data across departments. In case you have any questions, you can use Zia which is a smart assistant created using artificial intelligence, machine learning, and natural language processing.

Sisense

Sisense provides various tools that allow data analysts to simplify complex data and obtain insights for their organization and outsiders. The solution tries its best to provide various data analytics tools to business teams and data analytics so that they can help make their companies the data-driven companies of the future.

IBM Cognos Analytics

IBM Cognos Analytics is an Artificial Intelligence-based business intelligence platform that supports data analytics. You can visualize as well as analyze your data and share actionable insights with anyone in your organization. Even if you have limited or no knowledge about data analytics, you can use IBM Cognos Analytics easily as it interprets the data for you and presents you with actionable insights in plain language.

4. Data Governance Vendors

We like to define data governance as an exercise of authority over decision-making power (planning, surveillance, and enforcement of rules) and the controls on data management.

In other words, it allows the clear documentation of the different roles and responsibilities around data as well as determining the procedures and the tools supporting data management within an organization.

Cloudera Data Platform

Cloudera Data Platform (CDP) combines the best of Hortonworks’ and Cloudera’s technologies to deliver the industry’s first enterprise data cloud. CDP delivers powerful self-service analytics across hybrid and multi-cloud environments, along with sophisticated and granular security and governance policies that IT and data leaders demand.

Stealthbits

Stealthbits’ Data Access Governance solution discovers where your data lives and then classifies, monitors, and remediates the conditions that make managing data access so difficult in the first place. The result is effective governance that promotes security, compliance, and operational efficiency.

Varonis

Varonis gives you the enterprise-wide visibility you need for effective discovery, auditing, and compliance reporting across a wide variety of regulatory standards. It quickly and accurately classifies sensitive, regulated information stored in on-premises, and cloud data stores. Their classification engine prioritizes scans based on risk & exposure to give you actionable results quickly, no matter how much data you have.

Informatica

Informatica provides a quick fix for compliance and data governance which can be implemented on-premise or in the cloud. It offers strong visualization of data lineage and history, master data dashboards for proactive monitoring of data quality and dynamic masking for data security. It also provides the functionality to detect and protect sensitive customer data, managing GDPR data risks, and ensuring contact information is current, accurate and complete.

And Finally…

Actian Data Intelligence Platform

Our data catalog centralizes all data knowledge in a single and easy-to-use interface. Automatically imported, generated, or added by the administrator, data specialists are able to enrich their data assets documentation directly within our tool.

Give meaning to your data thanks to metadata.

If you are interested in getting more information, getting a free personalized demo, or just want to say hi, do not hesitate to contact our team who will get back to you as soon as we’ve received your request.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Platform

Where You Do Analytics Processing Matters

Actian Corporation

July 20, 2020

Analytics Processing

The Vector for Hadoop offering from Actian delivers increased performance for analytic queries without the associated increase in cost. If you are looking for high-performance analytics processing to drive operational decision-making, where you do your processing matters. By minimizing the movement of data and processing locally, you can drastically reduce latency. By using a system like Actian Vector to perform that local processing, you can achieve even higher levels of performance.

On the Box, in the Datacenter or Across the Country

When people hear the statement “where you do your processing matters,” the first thought that comes to mind is network latency. It’s easy to understand how transmitting data over the internet, across the country, or even across town, can slow down your processing.  The same holds true within your data center. Co-locating storage and compute near each other (on the same rack or even the same device) decreases processing latency.

Many companies are leveraging cloud services and distributed systems to increase performance for end-user OLTP operations. When it comes time to perform analytics, the distance issue comes into play again. Where should you be doing your analytics processing? For most companies, the cloud is the right place to host your data warehouse and perform analytics compute because it enables you to locate your analytics closer to your data stores and, at the same time, leverage cloud-scale compute resources.

Assuming you’ve addressed these “big distance” issues, is it possible to optimize further? Yes, it is. If big data processing or real-time analytics to drive operations and decision-making are the goals you are trying to achieve, you need to take your analytics performance to the next level and look at how the databases, and software you use can be optimized to take maximum advantage of the resource capacity available.

Disk is Slow. Memory is Better. Chip Cache is the Fastest

Let’s take a look at what happens within an analytics system (the hardware and software you use). These systems are typically comprised of three hardware components that have a direct influence on performance – disks, memory, and chip cache. When you perform compute operations (which are really just a bunch of mathematical formulas), you are manipulating data that is stored in one of these three places. Chips have some internal cache memory, which offers the fastest performance but the smallest capacity. RAM memory chips have more capacity (though it is limited) and performance that is fairly fast because data is temporarily held in a suspended state instead of written to a physical medium but much slower than chip cache. Disk storage is slowest because data is written to a physical media (a disk) and read from this physical media when it needs to be accessed. With cloud storage, the disk capacity available is nearly unlimited.

Data warehouse and analytics systems utilize each of these types of storage along with the compute capacity of the CPUs in different ways. This is what gives Actian Vector a performance advantage over other solutions. Vector optimizes the use of each layer in the system infrastructure, eliminating the wasted capacity to both maximize performance and minimize costs. Here are a couple of examples:

Maximize Utilization of CPU Cores

Modern CPUs have multiple cores, meaning they can execute multiple operations at the same time. Unfortunately, most software (including data warehouse systems) aren’t designed to take advantage of this parallel processing capability, and as a result, you end up utilizing a small portion of the available capacity. The Actian Data Platform and Actian Vector are designed to efficiently run a large number of concurrent queries requested by a large number of users. Queries are split into small chunks where they can be executed in parallel. This is important because it maximizes the use of the CPU capacity you have available. CPU cycles are time-based capacity. Think of it like hours in the day you have for work tasks. The challenge is to use your available capacity most efficiently and avoid idle time because once the time is passed, you can never get it back.

Reducing the Amount of Data That is Written to and Read from Disks

Actian solutions are designed for highly efficient use of disks – reducing I/O operations that can slow down analytics processing. Actian Data Platform is a pure columnar database. Traditional databases are row-based – records are in rows, and you have to read the entire row to perform a query and do analytics. Actian treats data as a series of columns – this is what optimizes it for analytics processing. Because a column of data is all the same data type, analytics operations can be optimized. Going under the hood, you’ll find that each column is stored as files on the disk with various blocks of data. MinMax indexes on data blocks enable faster sorting of data by helping the platform to more efficiently identify what data the user is trying to analyze and what can be ignored.

When you are doing operational analytics and trying to drive real-time decision making with data, you need the best performance you can get. Through a combination of increased operations taking place using chip cache and cache memory along with a more efficient process of managing the data stored to disk, Actian can optimize the performance and utilization of database hardware while at the same time minimizing the amount of data written to disk.  Both of these are important because they directly translate to lower operating costs. What it comes down to is “use the resources you have more efficiently” to achieve peak performance and minimize costs.

To learn more, visit https://www.actian.com/lp/actian-vector-sql-accelerator-for-hadoop/

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.