Data Integration

Connected Data in the Cloud Increases AI System Performance

Traci Curran

June 29, 2020

AI System for Artificial Intelligence

Artificial Intelligence (AI) is the core of the next wave of IT systems, and it is time to get ready. The past year has seen tremendous growth in the adoption of artificial intelligence systems to help companies improve operational insights and provide enhanced customer service experiences.  If you aren’t leveraging AI to support your business already, you should at least be investigating the possibilities.    Here are some steps you can take to establish a solid data foundation to support high-performance AI

Add More Data Sources

Artificial Intelligence systems run on data.  They have a tremendous ability to analyze information, perform pattern matching, and correlation analysis to provide real-time analytics and natural language response – but AI systems are only as “smart” as the data they are programmed to access.  Most of the AI systems in use today are nowhere near maxed out on compute capabilities – they are limited by their data sources.  If you want to increase the capabilities of your AI systems, the first thing you need to do is grant them access to more diverse data sources.

Connect Your Data

Once you’ve collected data from various sources, you need to connect it into the systems that will analyze and use it. You can use Actian DataConnect to help you do this.  DataConnect enables you to connect all your data sources, whether they are IT applications, deployed infrastructure, remote sensors, IoT devices, or 3rd party data feeds.  DataConnect can be used to move data into a cloud data warehouse like the Actian Data Platform, or it can be used to connect your AI system to the individual data sources directly.

Stream, Stream, Stream

Two of the highest-value use cases for AI are real-time analytics and natural language interactions (things like chatbots and voice response systems). Both of these use cases require real-time information to be effective, and that comes from streaming data. Traditional analytics systems struggled with streaming data because of the volume and latency involved in processing. Because AI systems run in the cloud, they have the compute capacity to process a nearly infinite amount of streaming data in real-time to help you understand what it means. So, once you get your data sources connected, turn on the streams of data.

Move Historical Data in the Cloud for Real-Time Analytics

Real-time streaming data isn’t the only source that AI systems can leverage.  Historical data is a valuable source for performing trend and correlation analysis and developing projections about future events.  Co-locating your data warehouse in-the-cloud with your AI engine enables both big-data analytics and AI-enabled real-time processing to access cloud-scale compute and storage resources with very little network latency.  Actian provides a modern cloud data warehouse ideal for supporting AI system processing.

Most companies have some sort of data warehouse today where they are storing and archiving transactional data from each of their IT systems.  Migrating your data warehouse to the cloud is a great way to accelerate value from AI systems as on-premises data warehouses introduce compute and network constraints that can significantly limit your AI performance.  If you have some data that you need to keep on-premises, that’s okay. Actian Data Platform can be deployed as a hybrid data warehouse, supporting both your cloud and on-premises needs.

What if You Aren’t Ready for AI Quite Yet?

Maybe your company isn’t quite ready to make the jump to leveraging Artificial Intelligence.  That’s okay—the steps outlined above support Online Transactional Processing (OLTP) integration and traditional analytics as well.  Connecting your data and unlocking the processing power of cloud compute in your data warehouse enables you to access more data for harvesting actionable insights that help your leaders make better decisions.  When you are ready for an AI system in the future, the data foundation will be ready for you to move forward confidently.

To learn more about how Actian can help you improve your AI performance and support the next wave of IT capabilities, visit https://www.actian.com/data-platform/

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Intelligence

Data Management is Embracing Cloud Technologies

Actian Corporation

June 29, 2020

data-management-cloud-computing

Contemporary business initiatives such as digital transformation are facing an explosion of data volume and diversity. In this context, organizations are looking for more flexibility and agility in their data management.

This is where Cloud strategies come in…

Data Management Definition

Before we begin, let’s define what data management is. Data Management, as described by TechTarget is “the process of ingesting, storing, organizing and maintaining the data created and collected by an organization”. Data management is a crucial part of an enterprise’s business and IT strategy and provides analytical help that drives overall decision-making by executives.

As mentioned above, data is seen as a corporate asset that can be used to make better and faster decisions, improve marketing campaigns, increase overall revenue and profits, and above all: innovate. As a result, organizations are seeing cloud technologies as a way to improve their data initiatives.

Cloud Strategies are the New Black in Data Management Disciplines

It is an undeniable fact that Cloud service providers are becoming the new default platform for database management. This phenomenon provides data management teams with great advantages:

  • Cost-Effective Deployment: Greater flexibility and a more rapid configuration.
  • Consumption-Based Spending: Pay for what you use and do not over-provision.
  • Easy Maintenance: Better control over the associated costs and investments.

By knowing this, there is no doubt that data leaders perceive cloud as a less expensive technology, driving this choice even more.

Data leaders will embrace the cloud as an integral part of their IT landscape in the coming years and months. However, we strongly believe that the rate at which organizations migrate  to the cloud will differ by organization size. Small or midsize organizations will migrate quicker , while larger organizations will take months, even years to migrate.

Thus, the Cloud is going to become a default option for all data management technologies.  Many strategies appear including various deployment types or approaches. We have identified 3 main strategies:

  • Hybrid Cloud: Made up of two or more separate Cloud infrastructures that may be private or public and that remain single entities
  • Multicloud: Use more than one cloud service provider infrastructure as well as on-premises solutions.
  • Intercloud: Where data is integrated or exchanged between cloud service providers as part of a logical application deployment.

The Cloud is also Seen as an Opportunity for Data Analytics Leaders

The increased adoption of cloud strategy deployments regarding data management  has important implications for data and analytics strategies. As data is moving to the cloud, the data and analytics applications they use must follow.

Indeed, the emphasis on the speed of value delivery has made cloud technologies the first choice for new data management solution development for vendors, and deployment for enterprises. Thus, enterprises and data leaders are choosing next-gen data management solutions. They will migrate their assets by selecting applications that connect to future cloud strategies and preparing their teams & budgets for the upcoming challenges they will overcome.

Those data leaders who use analytics, business intelligence (BI) and data science solutions are seeing Cloud solutions as greater opportunities to:

  • Use a cloud sandbox environment for trial purposes in terms of onboarding, usages, connectivity and create a prototyping analytics environment before actually buying the solution.
  • Facilitate application access wherever you are and improve collaboration between peers.
  • Access to new emerging capabilities over time with ease, with continuous delivery approaches.
  • Support heavy lifting with the cloud’s elasticity and scalability along the analytics process.

A Data Catalog, the new Essential Solution for Cloud Data Management Strategies

Data and analytics leaders will inevitably engage in more than one cloud where data management, governance and integration become more complex than ever before. Thus, data leaders must equip their organization to new metadata management solutions to assist in finding and inventorying data distributed across a hybrid and multi cloud ecosystem. Failure to do so will result in a proliferation of data silos, leading to derailed data management, analytics and data science projects.

Data management teams will have to choose among the wide-range of data catalog in the market the most relevant one.

We like to define a data catalog as a way to create and maintain an inventory of data assets through the discovery, description and organization of distributed datasets.

If you are working on the data catalog project, you will find: 

  • On the one hand by fairly old players, initially positioned on the Data Governance market.
    These players provide on premises solutions with rich but complex offers, which are expensive, difficult and time-consuming to deploy and maintain, and are designed for cross-functional governance teams. Their value proposition is focused on control, risk management and compliance.
  • on the other hand by suppliers of data infrastructures (Amazon, Google, Microsoft, Cloudera, etc.) or data processing solutions (Tableau, Talend, Qlik, etc.), for which metadata management is an essential block to complete their offer. They offer much more pragmatic (and less costly) solutions, but are often highly technical and limited to their ecosystem.

We consider those alternatives as not sufficient enough. Here are  some essential guidelines to find your future data catalog. It must:

  • Be a cloud data catalog enabling competitive pricing and rapid ROI for your organization.
  • Have universal connectivity, adapting to all systems and all data strategies (edge, cloud, multi-cloud, cross-cloud, hybrid).
  • Have very advanced automation for the collection and enrichment of data assets as well as their attributes and links (augmented catalog). The automatic feeding mechanisms, as well as the suggestion and correction algorithms reduce the overall cost of the catalog and guarantees the quality of the information it contains.
  • Be strongly focused on user experience, especially for business users, to improve solution adoption.

To conclude, data management capabilities are becoming more and more cloud-first and in some cases cloud-only.

Data leaders who want to drive innovation in analytics will need to leverage cloud technologies from data assets. They will have to go from ingestion to transformation without forgetting to invest in an efficient data catalog in order to find their way in an ever more complex data world.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Analytics

Marketing Agility Through Real-Time Analytics

Actian Corporation

June 24, 2020

person working on a laptop with data and graphs researching advanced financial analytics

Today’s marketplace is complex and competitive, and changes quickly. New competitive products and services are entering the market every day. Customer preferences shift with social trends and pricing dynamics continuously evolve. To win in this type of environment, marketing agility is critical to identify opportunities and threats and to respond to them quickly – this requires data. The key to enabling marketing agility is real-time analytics. The question you must ask is “Do your marketers have the tools they need to succeed?”

Real-Time Market Insights

In a highly dynamic marketplace, conditions can change frequently, and without notice. Competitors adjust their offerings (new features, price changes, promotions, etc.). Customers influence each other’s preferences through social media discussions and reviews. Media coverage causes wide swings in customer sentiment about both your product and your company. Each of these forces represents a potential opportunity or threat to your marketing efforts.

Real-time marketing analytics is a powerful tool to monitor your business environment, listen to the chatter and identify when action is required. The faster you identify the change and respond to it, the better outcome you will be able to achieve. Actian can help you quickly learn when to adjust and adapt to changes in the market or your customer base.

Develop Personalized Marketing Campaigns

Customers want to feel like you understand and care about them as individuals. Help your company be noticed in a crowded market and capture more wallet share by deploying effective, innovative, and highly personalized campaigns informed by deep analysis. Traditional campaign optimization models use limited samples of transactional data, which can lead to incomplete customer views. Actian allows you to connect to a wide variety of diverse data sources, including social media and competitors’ Websites in real time to learn which competitive offerings are gaining traction in the marketplace. Web-purchasing patterns and call center text logs stored on Hadoop provide valuable insights into customer interactions. Marketing and campaign data ensure any recommended actions comply with company goals, rules, and regulations.

By combining these data sets, your marketing team will be able to develop a richer understanding of your customer’s needs, motivations, influences, and buying behaviors. These insights can then be used to develop targeted market segmentations and personalized marketing campaigns that speak directly to the target customer. Actian helps you build, test and deploy campaigns with rapid succession. Real-time analytics enables your marketing team to monitor the effectiveness of these campaigns, adjust to market dynamics and fine-tune messaging for peak performance.

Achieve Results

With your customer scores and optimized lists in hand, you can design innovative campaigns that allow you to create and sustain a competitive advantage. Increase campaign revenue and minimize marketing costs by focusing your resources on opportunities that will create the most value. Increase your customer satisfaction and loyalty by demonstrating that you understand their needs and are focused on solving their problems. Improve your product development processes and supply chain by providing marketing insights upstream that lead to better products and services.

The real-time analytics provided by Actian Data Platform can help your marketing team succeed in a highly competitive and rapidly changing business environment. Identify opportunities faster and achieve first-mover advantage. Neutralize threats through decisive action to prevent encroachment by competitors into your customer base. To learn how Actian can help your company achieve marketing agility powered by real-time cloud-based analytics, visit www.actian.com/data-platform

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Connected Data to Drive Service Assurance in IT

Traci Curran

June 22, 2020

Service Assurance Artificial Intelligence Eye

IT Service Management (ITSM) staff have a unique challenge to provide service assurance across a broad and diverse technology ecosystem. Connected data is essential to enable them to do their job effectively. Your incident and problem management staff don’t know what they don’t know. They can only see the data that is made available to them through your ITSM system and other administrative consoles. If the data they are looking at is incomplete or fragmented, it is difficult for them to know where things are broken and need attention. All the while, your business processes, employees, and potentially your customers are being affected. Connected data is the key to service assurance in IT.

Doesn’t My ITSM Platform Take Care of This Already?

Modern ITSM platforms from companies like ServiceNow, Cherwell, BMC, and FreshWorks have a lot of great capabilities to help you orchestrate your incident and problem management workflows. They also have some slick visualizations to provide the “single pane of glass” user experience that your ITSM staff needs. But what is that single pane of glass showing, and where is that data coming from?  Often the data comes from other systems such as your operations management tools, cloud dashboards, synthetic transaction monitors, and other utility-type services and devices deployed across your IT environment.

The large ITSM platform products have many out-of-the-box connectors. However, there are often many missing that you need to address on your own. This is understandable. An ITSM platform vendor isn’t going to know all of the different technologies you have deployed in your IT environment, or what management and diagnostic tools you have available to support them. Even if they did understand your needs, they would need time to develop the connectors, test them, and build them into their product release cycle. Eventually, you might get what you need, but relying entirely on your ITSM vendor for integration doesn’t give the flexibility and agility that most companies need.

DataConnect Provides a Flexible Solution to Connecting Your IT Data

This is where a platform like Actian DataConnect can help. Offered as an Integration Platform as a Service (IPaaS), DataConnect supplies a flexible and easy-to-implement platform that enables you to design, deploy, and execute data integrators across your whole IT environment. Do you have streaming data from monitoring tools? No problem! Do you have administrative console data that you want to make available in your ITSM system? DataConnect can help you do that. Do you have embedded telemetry built into your in-house developed applications? Your ITSM vendor doesn’t even know about those data sources, but with DataConnect, you can add them to your IT monitoring solution. Are you leveraging services, SaaS, network infrastructure that is managed by 3rd parties? DataConnect can make those data sources available too. The DataConnect platform gives you the flexibility to source data from almost anywhere (inside or outside your company), and if you need to add something, you can just add it.

Ease of Deployment

Application development teams and IT project teams deploying new 3rd party systems can stream telemetry data to your ITSM system from day-1 that you release the system, without a lot of custom coding or waiting for the ITSM vendor to build/update a connector.  If you have existing components and services deployed in your IT environment that aren’t currently sending data to your ITSM system, you can use DataConnect to get them connected and make their telemetry data available quickly.  By combining the data integration capabilities of Actian DataConnect with the single pane of glass and workflow orchestration capabilities of your ITSM platform, you have the opportunity to give your IT staff access to the rich, accurate and connected data that they need to do their job effectively.

To learn more, visit DataConnect.

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Integration

Don’t Just Move Data, Integrate It!

Traci Curran

June 18, 2020

Moving Data with Data Connect

If you are simply “lifting and shifting” data from one place to another, you are missing out on the power that a data integration platform can bring you. It is time to look beyond extract, transfer, load (ETL) from individual source systems, and expand your integrations to include multi-source joins that enable you to see across source systems. Don’t just move data, integrate it!

Data integration is more than moving data from source to target systems. It is part of the greater data value chain that transforms raw source data into information and actionable insights that help drive decisions and operational processes. Like any other value chain, each step in the process moves the data one step closer to consumption by transforming it in ways that add value to the end user. One might argue that moving data into a centralized repository, or a downstream database adds value.  Yes, it does, and if all you have is essentially a “data forklift,” this may be the best you can do. If you have a true data integration platform like Actian DataConnect, you can do a whole lot more (and you should).

Multi-Source Joins

A data integration platform, like Actian DataConnect, gives you a powerful set of tools at your fingertips to help you not just move data from one system to another but integrate it along the way.  You might be familiar with the ability to create SQL like inner, outer, left and right joins within a database, but did you know you can access data from multiple source systems in the same query?  The DataConnect Studio IDE was recently re-engineered with regards to how joins are implemented, taking advantage of the ability to leverage multiple source connections in your queries.

With DataConnect Studio, you can build integrations that span multiple data sources, reconciling them together into a unified output set in the target system. Let’s consider where you might want to do this.

Analytics and Reporting

By merging data across source systems earlier in the data value chain, you can normalize your data into a canonical data model that is easier for your analysts and business users to understand.  This means they can spend less time finding data and more time interpreting data to determine its relevance to your business.

eCommerce Systems

Customer-facing systems, whether they be on a website or a mobile app, should provide a consistent and simple interface to users.  Multi-source joins in your data queries enable you to combine data from different systems, so your users get a high-quality experience without having to deal with whatever complexity is taking place behind the scenes.

Customer Support

Any company that has tried to develop a 360-degree view of its customers knows that the data comes from many different source systems.  Actian DataConnect enables you to join data from different customer records and transactional systems to give you the big picture perspective you are looking for.

Operations Monitoring

Many companies are integrating IoT devices, mobile apps, and embedded sensors into their operations processes.  The multi-source join capability can enable you to leverage data from different types of monitoring devices and more easily reconstruct the virtual process flows that your operations staff need to monitor your operations.

Data in motion is one of the best times to perform integration.  If you are going to merge data at rest, you either have to copy data into a merged table, or you create views and don’t really integrate the Data until later in the data value chain – your options are limited.  When you are moving data, you have the opportunity to transform it – changing data structures, summarizing, categorizing, and aggregating data from different sources.  Each time data moves, you should be seeking ways to make it even more valuable for your organization.

Actian DataConnect can help make managing data easier – not just moving data, but really integrating it. To learn more, visit DataConnect.

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Management

SQLite’s Serverless Architecture Doesn’t Serve IoT Well

Actian Corporation

June 17, 2020

person juggling balls in a shirt and tie

Part Three: SQLite, the “Flat File” of Databases

Over the past few articles, our SQLite blog series has been looking at SQLite Serverless Architecture and how it is unsuitable for IoT environments. Those of you who have been following can jump ahead to the next section, but if you’re new to this discussion, you may want to review the predecessor parts.

  • In part one, mobile may be IoT, but IoT is not mobile when it comes to data, we examined the fact that though SQLite is the most popular database on the planet—largely due to its ubiquitous deployment on mobile smartphones and tablets, where it supports embedded applications for a single user—it cannot support the multi-connection, multi-user, multi-application requirements of the IoT use cases that are proliferating with viral ferocity in every industry. In a world that calls for the performance of cheetahs and peregrine falcons, SQLite is a banana slug.
  • In part two, Rethinking What Client-Server Means for Edge Data Management, we considered key features and characteristics of the SQLite Serverless Architecture (portability, little-to-no configuration, small footprint, SQL API, and some initially free version to seed adoption) in light of the needs of modern edge data management and discussed the shortcomings of the SQLite architecture in terms of its ability to integrate with critical features found in traditional client-server databases (chiefly those multi-point qualifiers above).

In our final analysis of this serverless architecture, I’d very much like to explore (read: clarify) what will happen if a developer ignores these cautionary points and doubles down on SQLite as a way to handle IoT use cases.

Don’t Mistake Multi-Connection and Multi-Threaded for Client Server

In the late 90s, as applications became more sophisticated, generated and ingested more data, and performed more complex operations on that data internally. Consequently, app developers had to develop a lot of workarounds to deal with the limitations of routine, operating system-based file management services. Instead of spending time on all these DIY efforts, application developers were clamoring for a dedicated database they could embed into an application to support their specific data management needs.

At the turn of the 21st century, SQLite appeared and seemed tailor-made to meet these needs. SQLite enabled indexing, querying, and other data management functionality through a series of standard SQL calls that could be inserted into the application code, with the entire database bundled as a set of libraries that became part of the final deployed executable. Keep in mind that the majority of these applications tended to be monolithic, single-purpose, single-user applications designed for the simpler CPU architectures in use at the time. They were not designed to run multiple processes, let alone multiple threads. End-user and data security were not yet the high priorities they are today. And as for performance in a networked environment? Wireless networks were reactive and spotty at best. Multiple, external, high-bandwidth data connections were uncommon.

So it’s really no surprise that SQLite wasn’t able to service simultaneous read and write requests for a single connection (let alone for multiple connections) when it was designed. Designers were thrilled to have an embeddable database that would allow multiple processes to have sequential read and write access to a data table within an application. They were not looking for enterprise-grade client-server capabilities. They were not designing stand-alone database systems that would support multiple applications simultaneously. They simply needed more than flat-file access mediated by an operating system.

And there lies the heart of the issue with SQLite. It was never intended to handle multiple external applications or their connections asynchronously, as would a traditional client-server database. Modern networked applications commonly have multiple processes and/or multiple threads. When you throw SQLite into a situation with multiple connections and the potential for multiple simultaneous read and write requests, you quickly encounter the possibility of race conditions and data corruption.

To be fair, SQLite has tried to accommodate these evolving demands. The current version of SQLite handles multiple connections through its thread-mode options: single-thread, multi-thread, and serialized. Single-thread is the original SQLite processing mode, handling one transaction at a time, either a read or a write from one and only one connection. Multi-thread will support multiple connections but still one at a time for read or write. Serialized—the default mode for the most current SQLite versions—can support multiple concurrent connections (and, therefore, can support a multi-threaded or multi-process application), but it cannot handle all of them simultaneously. SQLite can handle simultaneously read connections in multi-thread and serialized modes, but it locks the data tables to prevent attempts at simultaneous writes. Nor can SQLite handle the orchestration of writes from several connections.

Compare that to the architecture of a true client-server database that is built to manage simultaneous writes. The client-server database evaluates each write service request and, if attempts are made to write to the same data within a table, it blocks the request until the current operation on that data is completed. If attempts are made to different parts of the data table, the server allows them to go forward. That’s true orchestration. Locking the entire table and holding off writes (or faking it for sequential writes to occur alongside multiple reads with WAL) is not the same thing.

Why is this a showstopper for SQLite in an IoT environment? One of the most basic operations with IoT devices and gateways involves writing data from a variety of devices into your data repository, and the write locks imposed during multi-threaded/multi-connection operations render it non-viable in a production environment. Furthermore, a second basic operation taking place within an IoT environment involves performing data processing and analytics on previously collected datasets. While these may be read-intensive operations that are executed independently (either as separate processes or as separate threads) of the write-intensive operations just described, they still cannot occur concurrently in an SQLite environment and maintain ACID compliance.

As you scale up your deployments, or as system complexity increases—say you want to instrument more and more within an environment, be that an autonomous car or a smart building—you will invariably add more data connection points downstream or within your local environment. Each of these entities will have one or more additional database connections, if not their own database that needs a connection. You could try to establish these connections, but they will need to be handled through add-on application logic that will likely result in response times that are outside the design constraints for your IoT system.

Workarounds Designed to Deny (or Defy) Reality

SQLite partisans will wave their hands with dismissive nonchalance and tell you that SQLite is fast enough (it’s not; we’ve already discussed how slow SQLite is) and that you can build your own functionality to handle simultaneous reads and writes across multiple connections—in effect, manually synchronizing them specific to the use case being handled. One method by which they manage this scenario involves using the serialized mode mentioned above and building functionality to handle synchronization and orchestration within the application threads. This approach tries to avoid the transmission of read and write requests on multiple channels (thereby avoiding race conditions and the potential for data corruption). However, this approach also requires a high degree of skill, the assumption of long-term responsibility for the code, and a need for extensive test and validation to ensure that operations are transpiring properly.

An alternative approach would be to build the equivalent of a client-server orchestration front-end and use the single-thread option within SQLite, which would preclude race conditions or data corruption. But dropping back to a single-thread option would be like watching this banana slug move in even slower motion. That’s not a viable approach, given the high-speed, parallel write operations needed to accommodate multiple high-resolution data feeds or large-scale sensor grids. Moreover, all you’ve done is to accommodate the weaknesses of the database architecture by forcing the application to do something that the database should be doing. And you’d have to do that over and over, for every app in your IoT portfolio.

There are several sets of code and a couple of small shops that have tried to productize this latter approach, but with limited success. They work only with certain development platforms on a few of the SQLite supported platforms. Even if those platforms are a match for your use case, the performance issues may still increase the risk and difficulty of coding this workaround into your application.

We’ve Seen This Iceberg Before

This cautionary tale isn’t just about the amount of DIY that will be incurred with the unquestioned reliance on SQLite for a given application. Like the IoT itself, it’s much bigger than that. For example, if you commit to handling this in your own code, how will you handle the movement of data from a device to the edge on-premises? How will you handle moving data to or from the cloud? The requirements for interacting with servers on either tier may be different, requiring you to write more code to perform data transformations (remember the blog on SQLite and ETL?). You might try to avoid the ETL bottleneck by using SQLite on both ends, but that would just kick the virtual can down the virtual road. You would still have to write code to handle SQLite masquerading as a server-based database on the gateway and in the cloud.

Ultimately, you can’t escape the need to write more code to make SQLite work in any of these scenarios. And that’s just the tip of this iceberg. You would need to make trade-off comparisons between DIY and partial-DIY plus code modules/libraries for other functionality—from data encryption and public key management to SQL query editing, and more. The list of features that a true client-server infrastructure brings to the table—all lacking in SQLite—goes on and on.

Back in the day, SQLite enabled developers to avoid much of the DIY that flat-file management had required. For the use cases that were emerging back then, it was an ideal solution. For today’s use cases, though, even more DIY would be required to make SQLite work—and even then it would not work all that well. The vast majority of IoT use cases require a level of client-server functionality that SQLite cannot provide without incurring significant costs—in performance, in development time, and in risk. In a nutshell, it’s déjà vu, but now SQLite is the flat file whose deficiencies we must leave in the past.

Oh, and if you think that all this is just an issue for developers, think again. In the next and final blog in this series, we’ll widen the lens a bit and look at what this means for the business and the bottom line.

If you’re ready to reconsider SQLite and learn more about Actian Zen, you can just kick the tires for free with Zen Core, which is royalty-free for development and distribution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Data Governance and Data From ERP/CRM Packages: A Must Have

Actian Corporation

June 16, 2020

data-governance-erp-crm

For the last 3 decades, companies have been relying on ERP and CRM packages to run their operations.

In response to the need to comply with regulations, reduce risk, and improve profitability, competitiveness, and customer engagement, they have to become data-driven.

In addition to the need to leverage a wide variety of new data assets produced heavily by new means, strategic data from those historical systems has to be involved in any Data Initiative.

Challenges Faced by Companies Trying to Leverage Data From ERP/CRM to Feed Their Digital Initiatives

In the gold rush that companies are pursuing with Artificial Intelligence, Advanced Analytics, and in any Digital Transformation program, understanding and leveraging Data from ERP/CRM packages is on the critical path in any Data Governance journey.

First, they have large, complex, hard-to-understand, and customized database models. Understanding the descriptions, the relationship definitions, and more means to serve Data Citizens is almost impossible without an appropriate Data Catalog like the Actian Data Intelligence Platform with ad hoc ERP/CRM connectors.

As an example, SAP has more than 90.000 table sets. As a consequence, a Data Scientist will hardly understand the so-called TF120 table in SAP or the F060116 in JD Edwards.

Secondly, identifying a comprehensive subset of accurate datasets to serve a specific Data initiative is an obstacle course.

Indeed, a big percentage of the tables in those systems are empty, may appear redundant, or have complex links for those who are not experts of the ERP/CRM domain.

Thirdly, the demand for fast, agile and ROI focused Data-Driven initiatives put the ERP/CRM knowledgeable personnel in the middle of the game.

ERP/CRM experts are rare, busy and expensive workers and companies cannot afford increasing those team or having them losing their focus.

And finally, if a Data Catalog is not able to properly store Metadata information for those systems, in a smooth, comprehensive and effective way, any data initiative will be deprived of a large part of its capabilities.

The need for financial data, manufacturing data and customer data to take a few examples is obvious and therefore put ERP/CRM systems as mandatory data sources of any Metadata Management program.

Actian Data Intelligence Platform Value Proposition

An Agile and Easy Way

We believe in a Data Democracy world, where, any employee of a company can discover, understand and trust any dataset that is useful.

This is only possible with a reality proof data catalog easily and straightforwardly connecting to any data source, including the ones from ERP/CRMP packages.

But mostly, a Data Catalog has to be smart, easy to use, easy to implement and easy to scale in a complex IT Landscape.

A Wide Connectivity

Actian Data Intelligence Platform provides Premium ERP/CRM connectors for the following packages:

  • SAP and SAP/4HANA
  • SAP BW
  • Salesforce
  • Oracle E Business Suite
  • JD Edwards
  • Siebel
  • Peoplesoft
  • MS Dynamics EX
  • MS Dynamics CRM

“Premium ERP/CRM Connectors Help Companies in Various Aspects

Discovering and Assessing

Actian Data Intelligence Platform connectors help companies to build an automatic translation layer, hiding the complexity of the underneath database tables and automatically feeds the Metadata registry with accurate and useful information, saving time and money of the Data Governance Team.

Scoping Useful Metadata Information for Specific Cases

In a world with thousands of datasets, the platform provides a mean to build accurate and self-sufficient models to serve focused business needs by extracting in a comprehensive way:

  • Business and technical names for tables.
  • Business and technical names for columns in tables.
  • Relationships between tables.
  • Data Elements
  • Domains
  • Views
  • Indexes
  • Table row count.
  • Application hierarchy (where available from the package).

Compliance

Actian Data Intelligence Platform‘s “Premium ERP/CRM connectors” are able to identify and tag any personal data or Personal Identifiable Information coming from its supported CRM/ERP packages in its Data Catalog to stick with GDPR/CCPA regulation.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Integration

Deploy and Manage Your Integrations Anywhere, Anytime

Traci Curran

June 15, 2020

deploy and manage data integrations

With Actian DataConnect Integration Manager, you can deploy, configure, manage and repair your integrations anywhere – meaning if it resides in the cloud, on-prem, or even if it is embedded in your SaaS applications, anytime. The latest release of Actian DataConnect Integration Manager includes an important set of enhancements to the Integration Manager API that will increase your organization’s ability to define integrations and enable them for either synchronous or asynchronous execution. Okay, this may sound like a bunch of technical jargon, but let’s break it down so you can see why this new feature is so important. Two primary execution patterns are used for data integration – synchronous and asynchronous.

Request-Response Integration

Synchronous integrations, sometimes called “request-response” integrations, are used when you want to tightly couple two applications together. In this pattern, one system generates a message to the other, waits for a response, and when it receives the response, it sends the next message. You can think of this much like a chat conversation where two parties are communicating back and forth with each other. Another example is a user interacting with a website – issuing a command or clicking a button and waiting for a response from the server. This is the most common type of data integration because it is most intuitive to implement and affords the sending system the ability to verify receipt of the message before continuing to the next step in a workflow.

The benefit of synchronous communication is that it works well for real-time integration and complex workflows with many back-and-forth interactions. We see this a lot when multiple applications serve as components of an overarching system or when the integration is part of a transactional workflow (such as a CRM system looking up the status of a customer order in an ERP system). The drawbacks are that both systems must be actively engaged in the messaging interactions to avoid processing delays.

Set and Forget Integration

Asynchronous integrations, sometimes called “set and forget” integrations, are used when you want to loosely couple applications together. In this pattern, one system sends out a message, then moves on with doing other things – it is not waiting for a response. The receiving system may have a listener configured, waiting to receive the message in real-time, or it may process incoming messages periodically (in batches). You can think of this much like a news agency publishing a story. Some readers may be watching the news feed for updates in real-time while others may check for news updates once per day. In either case, there is no expectation that the receiver of the communication will respond to the sender or even acknowledge receipt of the message.

The benefit of asynchronous communication is that it enables the publishing of data to many recipients at the same time. We see this pattern used often when a system performs batch processing of reports or pushes data to downstream systems. Asynchronous messaging is also used for things like event logs alerts, and system status messages that do not interfere with transactional processing. The drawback of this method is the sending system has no visibility into the acceptance and subsequent processing of the message that is sent. Was it received? How long was the message waiting before processing? It is difficult to build transactional workflows using asynchronous integration because of time delays and the inability to monitor the quality of service.

Your Integration Platform Needs to Support Both

As you can see, there are different situations where you might want to use one of these integration patterns over the other. That is why the enhancements to the Actian DataConnect Integration Manager are so important. You now have the flexibility to use both of these patterns in your integrations, depending on the unique needs of your business. There may even be times when you need both synchronous and asynchronous integration between the same systems. That is okay, Actian DataConnect can help you do that.

To learn more, visit DataConnect.

To download the latest DataConnect Integration Manager visit Actian ESD

Traci Curran headshot

About Traci Curran

Traci Curran is Director of Product Marketing at Actian, focusing on the Actian Data Platform. With 20+ years in tech marketing, Traci has led launches at startups and established enterprises like CloudBolt Software. She specializes in communicating how digital transformation and cloud technologies drive competitive advantage. Traci's articles on the Actian blog demonstrate how to leverage the Data Platform for agile innovation. Explore her posts to accelerate your data initiatives.
Data Intelligence

Data Science: Accelerate Your Data Lake Initiatives With Metadata

Actian Corporation

June 15, 2020

data-science

Data lakes offer unlimited storage for data and present lots of potential benefits for data scientists in the exploration and creation of new analytical models. However, this structured, unstructured, and semi-structured data are mashed together, and the business insights they contain are often overlooked or misunderstood by data users.

The reason for this is that many technologies used to implement data lakes lack the necessary information capabilities that organizations usually take for granted. It is, therefore, necessary for these enterprises to manage their data lakes by putting in place effective metadata management that considers metadata discovery, data cataloguing, and overall enterprise metadata management applied to the company’s data lake.

2020 is the year that most data and analytics use cases will require connecting to distributed data sources, leading enterprises to double their investments in metadata management. – Gartner 2019.

How to Leverage Your Data Lake With Metadata Management

To get value from their data lake, companies need to have both skilled users (such as data scientists or citizen data scientists) and effective metadata management for their data science initiatives. To begin with, an organization could focus on a specific dataset and its related metadata. Then, leverage this metadata as more data is added into the data lake. Setting up metadata management can make it easier for data lake users to initiate this task.

Here are the Areas of Focus for Successful Metadata Management in Your Data Lake

Creating a Metadata Repository

Semantic tagging is essential for discovering enterprise metadata. Metadata discovery is defined as the process of using solutions to discover the semantics of data elements in datasets. This process usually results in a set of mappings between different data elements in a centralized metadata repository. This allows data science users to understand their data and have visibility on whether or not they are clean, up-to-date, trustworthy, etc.

Automating Metadata Discovery

As numerous and diverse data gets added to a data lake on a daily basis, maintaining ingestion can be quite a challenge! By using automated solutions not only does it make it easier for data scientists or CDS to find their information but it also supports metadata discovery.

Data Cataloguing

A data catalog consists of metadata in which various data objects, categories, properties and fields are stored. Data cataloguing is both used for internal and external data (from partners or suppliers for example). In a data lake, it is used for capturing a robust set of attributes for every piece of content within the lake and enriches the metadata catalog by leveraging these information assets. This enables data science users to have a view into the flow of the data, perform impact analysis, have a common business vocabulary and accountability and an audit trail for compliance.

Data and Analytics Governance

Data and analytics governance is an important use case when it comes to metadata management. Applied to data lakes, the question “could it be exposed?” must become an essential part of the organization’s governance model. Enterprises must therefore extend their existing information governance models to specifically address business analytics and data science use cases that are built on the data lakes. Enterprise metadata management helps in providing the means to better understand the current governance rules that relate to strategic types of information assets.

Contrary to traditional approaches, the key objective of metadata management is to drive a consistent approach to the management of information assets. The more metadata semantics are consistent across all assets, the greater the consistency and understanding, allowing the leveraging of information knowledge across the company. When investing in data lakes, organizations need to consider an effective metadata strategy for those information assets to be leveraged from the data lake.

Start Metadata Management

As mentioned above, implementing metadata management into your organization’s data strategy is not only beneficial, but essential for enterprises looking to create business value with their data. Data science teams working with various amounts of data in a data lake need the right solutions to be able to trust and understand their information assets. To support this emerging discipline,  the Actian Data Intelligence Platform gives you everything you need to collect, update and leverage your metadata through its next generation platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

SQLite’s Serverless Architecture Doesn’t Serve IoT Environments – Part 2

Actian Corporation

June 11, 2020

SQLite imagery

Part Two: Rethinking What Client-Server Means for Edge Data Management

Over the past few weeks, our SQLite blog series has considered the performance deficiencies of SQLite when handling local persistent data and looked at the performance complications created by the need for ETL when sharing SQLite data with back-end databases. In our last installment—Mobile may be IoT but IoT is not Mobile—we started to understand why the SQLite serverless architecture doesn’t serve IoT environments very well. The fact that SQLite is the most popular database on the planet lies in the fact that it was inexpensive (read: free) and seemingly sufficient for the single-user embedded applications emerging on mobile smartphones and tablets.

That was yesterday. Tomorrow is a very different story.

The IoT is expanding at an explosive rate, and what’s happening at the edge—in terms of applications, analytics, processing demands, and throughput—will make the world of single-user SQLite deployments seem quaint. As we’ll see in this and the next installment of this blog, the data requirements for modern edge use cases lie far outside SQLite’s wheelhouse.

SQLite Design-Ins for the IoT: Putting the Wrong Foot Forward

As we’ve noted, SQLite is based on an elegant but simple B-tree architecture. It can store any type of data, is implemented in C, and has a very small footprint—a few hundred KBs—which makes it portable to virtually any environment with minimal resourcing. And while it’s not fully ANSI-standard SQL, it’s close enough for horseshoes, hand grenades, and mobile applications.

For all these reasons, and because it has been used ubiquitously as mobile devices have proliferated over the past decade, IoT developers naturally adopted SQLite into many early IoT applications. These early design-ins were almost mirror images of mobile applications (minus the need for much effort at the presentation layer). Data was captured and cached on the device, with the expectation that it would be moved to the cloud for data processing and analytics.

But that expectation was simply an extrapolation of the mobile world that we knew, and it was shortsighted. It didn’t consider how much processing power could be packed into an ever-smaller CPU package nor where those packages might end up. It didn’t envision the edge as a locus for analytics (wasn’t that the domain of the cloud and the data center?). It didn’t envision the true power of AI and ML and the role those would soon begin to play throughout the IoT. And it didn’t count on the sheer volume of data that would soon be washing through the networks like a virtual tsunami.

Have you been to an IoT trade show recently? Three to five years ago, many of the sessions described PoCs and small pilots in which all data was sent up into the cloud. Engineers and developers we spoke to on the trade show floor expressed skepticism about the need for anything more than SQLite. Some even questioned the need for a database at all (let alone databases that were consistent across clients and servers). In the last three years, though, the common theme of the sessions has changed. They began to center on scaling up pilots to full production and infusing ML routines into local devices and gateways. The conversations started to consider more robust local data management needs. Discussions, in hushed tones at first, about client-server configurations (OMG!) began to appear. The realization that the IoT is not the same as mobile was beginning to sink in.

Rethinking Square Pegs and Round Holes

Of course, the rationale for not using a client-server database in an IoT environment (or, for that matter, any embedded environment) made perfect sense—as long as the client-server model you were eschewing was the enterprise client-server model that had been in use since the ‘80s. In that client-server paradigm, databases were designed for the data center. They were built to run on big iron and to support enterprise applications like ERP, with tens, hundreds, even thousands of concurrent users interacting from barely sentient machines. Collect these databases, add in sophisticated management overlays, an army of DBAs, maybe an outside systems integrator, and steep them in millions of dollars of investment monies — and soon you’ve got yourself a nice little enterprise data warehouse.

That’s not something you’re going to squeeze into an embedded application. Square peg, round hole. And that explains why developers and line-of-business technical staff tended to announce that they had pressing business elsewhere whenever the words “client-server” began to pop up in conversations about the IoT. The use cases emerging in what we began to think of as the IoT were not human end-user centric. Unless someone were prototyping or doing some sort of test and maintenance on a device or gateway or some complex instrumentation, little or no ad hoc querying was taking place. Client-server was serious overkill.

In short, given a very limited set of use cases, limited budgets, and an awareness of the cost and complexity of traditional client-server database environments, relying on SQLite made perfect sense.

Reimagining Client-Server With the IoT in Mind

The dynamics of modern edge data management demand that we reframe our notions of client-server, for the demands of the IoT differ from those of distributed computing as envisioned in the 80s. The old client-server paradigm involved a lot of ad hoc databases interaction—both directly for ad hoc query and indirectly by applications that involved human end-users. In IoT use cases, data access is more prescribed, often repeated and event-driven; you know exactly which data needs to be accessed, as well as when (or at least under which circumstances) an event will generate the request.

Similarly, in a given IoT use case there are no unknowns about how many applications are running on a device or about how many external devices will be requesting data from (or sending data to) an application and its database pairing (and here, whether the database is embedded or separate standalone doesn’t really matter). While these numbers vary among use cases and deployments, a virtual team of developers, systems integrators, product managers, and others will design structure, repeatability, and visibility into the system—even if it’s stateless (and more so if it’s stateful).

In the modern IoT space, client-server database requirements are more like well-defined publish and subscribe relationships (post by publisher/read by subscriber and access from publisher/write to subscriber). They operate as automated machine-to-machine relationships, in which publishing/broadcasting and parallel multichannel intake activities often take place concurrently. Indeed, client-server in the IoT is like publish-subscribe—except that everything needs to perform both operations, and most complex devices (including gateways and intelligent equipment) will need to be able to perform both operations not just simultaneously but also across parallel channels.

Let me repeat that for emphasis: most complex IoT devices (read: pretty much anything other than a sensor) is going to need to be able to read simultaneously and write simultaneously.

SQLite cannot do this.

Traditional client-server databases can, but they were not designed with a small footprint in mind. Most cloud and data center client-server databases require hundreds of megabytes, even gigabytes, of storage space. However, the core functions needed to handle simultaneous reads and writes efficiently take up far less space. The Actian Zen edge database, for example, has a footprint of less than 50MB. And while this is 100X the installed footprint of SQLite, it’s merely a sliver of the space attached to the 64-bit ARM and Intel embedded processor-based platforms we see today. Moreover, Actian Zen edge’s footprint provides all the resources necessary for multi-user management, integration with external applications through ODBC and other standards, security management, and other functionality that is a must once you jump from serverless to client-server. A serverless database like SQLite does not provide those services because their need—like the edge itself—was simply not envisioned at the time.

If we look at the difference between Actian Zen edge and Actian Zen enterprise (with its footprint under 200MB), we can see that most of the difference has to do with human end-user enablement. For example, Actian Zen enterprise includes an SQL editor that enables ad-hoc queries and other data management operations from a command line. While most of that same functionality resides in Zen edge, it is accessed and executed through API calls from an application rather than a CLI.

But Does Every IoT Edge Scenario Need a Server?

Those of you who have been following closely will now sit up and say, Hey, wait: Didn’t you say that not every IoT edge data management scenario needs a client-server architecture?

Yes, I did. Props to you for paying attention. Not all scenarios do—but that’s not really the question you should be asking. The salient question is, do you really want to master one architecture, implementation, and vendor solution for those serverless use cases and separate architectures, implementations, and vendor solutions for the Edge, cloud, and data center? And, from which direction do you approach this question?

Historically, the vast majority of data architects and developers have approached this question from the bottom up. That’s why we started with flat files and then moved to SQLite. Rather than looking from the bottom up, I’m arguing that we need to step back, embrace a new understanding of what client-server can be, and then revisit the question from the top down. Don’t just try to force-fit serverless into a world for which it was never intended—or worse, kluge up from serverless to a jury-rigged implementation of a late 20th century-server configuration.

That way madness lies, as we’ll see in the final installment of this series, where we’ll look at what happens if developers decide to use SQLite anyway.

Ready to reconsider SQLite, learn more about Actian Zen.  Or, you can just kick the tires for free with Zen Core which is royalty-free for development and distribution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Management

SQLite’s Serverless Architecture Doesn’t Serve IoT Environments – Part 1

Actian Corporation

June 11, 2020

computer screen showing code and sqlite

Part One: Mobile May Be IoT—But, When it Comes to Data, IoT is Not Mobile

Three weeks ago, we looked at the raw performance—or the lack thereof—of SQLite. After that, we looked at SQLite within the broader context of modern edge data management and discovered that its performance shortcomings were in fact compounded by the demands of the environment. As a serverless database, SQLite requires integration with a server-based database—which inevitably incurs a performance hit as the SQLite data is transformed through an ETL process for compatibility with the server-based database’s architecture.

SQLite partisans might then adopt a snarky tone and say: “Yeah? Well if SQLite is so slow and integration is so burdensome, can you remind me why it is the most ubiquitous database out there?”

Well, yeah, we can. And in the same breath, we can provide even partisans with ample reason to doubt that the popularity of SQLite will continue going forward. Spoiler alert: What do the overall growth curves of the IoT look like outside the realm of mobile handsets and tablets?

How the Banana Slug Won the Race

In the first blog in this series, we looked at why embedded developers adopted SQLite over both simple file management systems on the one end of the data management spectrum and large complex RDBMS systems on the other end. The key technical reasons, just to recap, include its small footprint; its ability to be embedded in an application; its portability to almost any operating system and programming language with a simple architecture (key-value store); and its ability to deliver standard data management functionality through an SQL API. The key non-technical reason—okay, reason—is that, well, it’s free!  in use cases dominated by personal applications that needed built-in data management (including developer tools), web applications that needed a data cache, and mobile applications that needed something with a very small footprint. If you combine free with these technical characteristics and consider where and how SQLite has been deployed, it’s no surprise that, in terms of raw numbers, SQLite found itself more widely deployed than any other database.

What all three of the aforementioned use cases have in common, though, is that they are single-user scenarios in which data associated with a user can be stored in a single file and data table (which, in SQLite are one and the same). Demand for data in these use cases generally involves serial reads and writes; there’s little likelihood of concurrent reads, let alone concurrent writes. In fact, it wasn’t until later iterations of SQLite that the product’s developers even felt the need to enable simultaneous reads with a single write.

But here’s the thing: Going forward, those three use cases are not going to be the ones driving the key architectural decisions. Ironically, the characteristics of SQLite that made it so popular among developers and in turn gave rise to a world in which billions of devices are acting, reacting, and interacting in real time—at the edge, in the cloud, and in the data center—and that’s a world for which the key characteristics of SQLite are singularly ill-suited.

SQLite has essentially worked itself out of a role in the realm of modern edge data management.

As we’ve mentioned earlier, SQLite is based on an elegant but simple architecture, key-value store, that enables you to store any type of data. Implementation is done in C with a very small footprint, a few hundred KBs, making it portable to virtually any environment with minimal resourcing. And, while it’s not fully ANSI standard SQL, it’s close enough for horseshoes, hand grenades, and mobile applications.

SQLite was adopted in many early IoT applications as these early design-ins were almost mirror images of mobile applications (minus the need for much effort at the presentation layer), focused on local caching of data with the expectation that it would be moved to the cloud for data processing and analytics. Pilot projects on the cheap meant designers and developers knee-jerk to what they know and what is free – ta-dah SQLite!

Independent of SQLite, the IoT market and its use cases have rapidly moved off this initial trajectory. Clear proof of this is readily apparent if you’ve had the opportunity to go to IoT trade shows over the last few years. Three to five years ago, recall how many of the sessions described proof of concepts (PoCs) and small pilots where all data was sent up into the cloud. When we spoke to engineers and developers on the trade show floor, they were skeptical about the need for anything more than SQLite or if you needed a database at all – let alone client-server versions. However, in the last three years, more of the sessions have centered on scaling up pilots to full production and infusion of ML routines into local devices and gateways. Many more of the conversations involved considerations to use more robust local data management, including client-server options.

Intelligent IoT is Redefining Edge Data Management

For all its strengths in the single-user application space, SQLite and its serverless architecture are unequal to the demands of autonomous vehicles, smart agriculture, medical instrumentation, and other industrial IoT spaces. The same is true with regard to the horizontal spaces occupied by key industrial IoT components, such as IoT gateways, 5G networking gear, and so forth. Unlike single-user applications designed to support human-to-machine requirements, innumerable IoT applications are being built for machine-to-machine relationships occurring in highly automated environments. Modern machine-to-machine scenarios involve far fewer one-to-one relationships and a far greater number of peer-to-peer and hierarchical relationships (including one-to-many and many-to-one subscription and publication scenarios), all of which have far more complex data management requirements than those for which SQLite was built. Moreover, as CPU power has migrated out of the data center into the cloud and now out to the edge, a far wider array of systems are performing complex software-defined operations, data processing, and analytics than ever before. Processing demands are becoming both far more sophisticated and far more local.

Consider: Tomorrow’s IoT sensor grids will run the gamut from low-speed, low-resolution structured data feeds (capturing tens of thousands of pressure, volume, and temperature readings, for example) to high-speed, high-resolution video feeds from hundreds of streaming UHD cameras. In a chemical processing plant, both sensor grids could be flowing into one or more IoT gateways that, in turn, could flow into a network of edge systems (each with the power one would only have found in a data center a few years ago) for local processing and analysis, after which some or all of the data and analytical information would be passed on a network of servers in the Cloud.

Dive deeper: The raw data streams flowing in from these grids would need to be read and processed in parallel. These activities could involve immediately discarding spurious data points, running signal-to-noise filters, normalizing data, or fusing data from multiple sensors, to name just a few of the obvious data processing functions. Some of the data would be stored as it arrived—either temporarily or permanently, as the use case demanded—while other data might be discarded.

A World of Increasing Complexity

Throughout these scenarios we see far more complex operations taking place at every level, including ML inference routines being run locally on devices, at the gateway level, or both. There may be additional operations running in parallel on these same datasets—including downstream device monitoring and management operations, which effectively create new data streams moving in the opposite direction (e.g., reads from the IoT gateway and writes down the hierarchical ladder). Or data could be extracted simultaneously for reporting and analysis by business analysts and data scientists in the cloud or data center. In an environment such as the chemical plant we have envisioned, there may also be more advanced analytics and visualization activities performed at, say, a local operations center.

These scenarios are both increasingly commonplace and wholly unlike the scenarios that propelled SQLite to prominence. They are combinatorial and additive; they present a world of processing and data management demands that is as far from that of the single-user, single-application world—the sweet-spot for SQLite—as one can possibly get:

  • Concurrent writes are a requirement, and not just to a single file or data table—with response times between write requests of as little as a few milliseconds.
  • Multiple applications will be reading and writing data to the same data tables (or joining them) in IoT gateways and other edge devices, requiring the same kind of sophisticated orchestration that would be required with multiple concurrent users.
  • On-premise edge systems may have local human oversight of operations, and their activities will add further complexity to the orchestration of multiple activities reading and writing to the databases and data tables.

If all of this sounds like an environment for which SQLite is inadequately prepared, you’re right.  In parts two and three of this blog we’ll delve into these issues further.

Ready to reconsider SQLite, learn more about Actian Zen.  Or, you can just kick the tires for free with Zen Core which is royalty-free for development and distribution.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.
Data Intelligence

Build Your Citizen Data Scientist Team

Actian Corporation

June 8, 2020

citizen-data-science-team

”There aren’t enough expert data scientists to meet data science and machine learning demands, hence the emergence of citizen data scientists. Data and analytics leaders must empower “citizens” to scale efforts, or risk failure to secure data science as a core competency”. – Gartner 2019

As data science provides competitive advantages for organizations, the demand for expert data scientists is at an all-time high. However, supply remains pretty scarce for that demand! This limitation is a threat to enterprises’ competitiveness and, in some cases, their survival in the market.

In response to this challenge, an important analytical role providing a bridge between data scientists and business functions was born: the citizen data scientist.

What is a Citizen Data Scientist?

Gartner defines the citizen data scientist as “an emerging set of capabilities and practices that allows users to extract predictive and prescriptive insights from data while not requiring them to be as skilled and technically sophisticated as expert data scientists”. A “Citizen Data Scientist” is not a job title. They are “power users” who can perform both simple and sophisticated analytical tasks.

Typically, citizen data scientists don’t have coding expertise but can nevertheless build models using drag-and-drop tools and run prebuilt data pipelines and models using tools such as Dataiku. Be aware: citizen data scientists do NOT replace expert data scientists. They bring their expertise but do not have the specialized expertise for advanced data science.

The citizen data scientist is a role that has evolved as an “extension” from other roles within the organization! This means that organizations must develop a citizen data scientist persona. Potential citizen data scientists will vary based on their skills and interests in data science and machine learning. Roles that filter into the citizen data scientist category include:

  • Business Analysts.
  • BI Analysts/Developers.
  • Data Analysts.
  • Data Engineers.
  • Application Developers.
  • Business Line Manager.

How to Empower Citizen Data Scientists

As expert skills for data science initiatives tend to be quite expensive and difficult to come by, utilizing a citizen data scientist can be an effective way to close the current gap.

Here are ways you can empower your data science teams:

Break Enterprise Silos

As I’m sure you’ve heard this many times before, many organizations tend to operate independently in silos. Mentioned above, all of roles are important in an organization’s data management strategy, and they all have expressed interest in learning about data science and machine learning skills. However, most data science and machine learning knowledge is siloed in the data science department or specific roles. As a result, data science efforts are often invalidated and unleveraged. Lack of collaboration between data roles makes it difficult for citizens data scientists to access and understand enterprise data!

By establishing a community of both business and IT roles that provides detailed guidelines and/or resources allows enterprises to empower citizens data scientists. It is important for organizations to encourage the sharing of data science efforts throughout the organization and thus, break silos.

Provide Augmented Data Analytics Technology

Technology is fueling the rise of the citizen data scientist. Traditional BI vendors such as SAP, Microsoft and Tableau Software, provide advanced statistical and predictive analytics as part of their offerings. Meanwhile, data science and machine learning platforms such as SAS, H2O.ai and TIBCO Software, provide users that lack advanced analytics capabilities with “augmented analytics”. Augmented analytics leverages automated machine learning to transform how analytics content is developed, consumed and shared. It includes:

Augmented data preparation: Machine learning automation to augment data profiling and quality, modeling, enrichment and data cataloguing.

Augmented data discovery: Enables business and IT users to automatically find, visualize and analyse relevant information, such as correlations, clusters, segments, and predictions, without having to build models or write algorithms

Augmented data science and machine learning: Automates key aspects of advanced analytics modeling such as feature selection, algorithm selection and time-consuming step processes.

By incorporating the necessary tools and solutions and extending resources and efforts, enterprises can empower citizen data scientists.

Empower Citizen Data Scientists With a Metadata Management Platform

Metadata management is an essential discipline for enterprises wishing to bolster innovation or regulatory compliance initiatives on their data assets. By implementing a metadata management strategy, where metadata is well-managed and correctly documented, citizen data scientists are able to easily find and retrieve relevant information from an intuitive platform.

actian avatar logo

About Actian Corporation

Actian empowers enterprises to confidently manage and govern data at scale. Actian data intelligence solutions help streamline complex data environments and accelerate the delivery of AI-ready data. Designed to be flexible, Actian solutions integrate seamlessly and perform reliably across on-premises, cloud, and hybrid environments. Learn more about Actian, the data division of HCLSoftware, at actian.com.