If you’re a data engineer or a data architect, then you’re probably kept awake at night wondering how to design your data integration, management and analytics support platforms, so your DBAs and IT Ops colleagues can easily manage them while a varied set of data users are able to consume them simultaneously. These users range from new, demanding, hands-on users, such as developers and data scientists, to those who traditionally use the data through SQL and ad hoc queries, such as business analysts, as well as those who interact with the data indirectly through business applications.
Not everyone in your company is a data scientist and, given how scarce they are, you’d be in quite a small company if they were the majority of staff members. At the risk of over-generalizing the role of data scientists, they tend to need data that supports designing and training algorithms that can be deployed downstream, embedded in other applications and used by other users. Data scientists often need large and varied sets of data, but it seldom needs to be real-time, yet freshness is a paramount requirement as they iterate heuristic training of their models.
Application developers, like data scientists tend to interact with their data through programming APIs. The data sets on which they operate tend to be smaller, or time-series and real-time, embedded directly in the business process instead of informing it as is often the case with what a data scientist is doing. For business analysts, the needs are yet again different and for end-users the point is to make the data invisible to their operations – even if its integral and essential to those operations. The point here is that designers of data systems must be able to make data available, but to several different factions that don’t have the same skill sets, roles and responsibilities or interest levels when it comes to data.
What mandate does this place on data engineers or data architects? Simple. Make data usable for people of varied skill levels to consume what they need, when they need it and in ways that are most useful. Okay, maybe not so simple. How do you avoid siloed sets of data, managed by bespoke systems if you narrowly cater to each of these constituencies?
Understand your user community and how it is using data
Everyone within your company has a unique set of data needs, both in terms of the type of data and tools he or she needs to use and how this data use is deemed effective. You may have some users who need access to a very specific datasets to perform a focused job task while other users may need big-picture data for planning and strategic decision making, for example. Some of your users will need detailed raw data, while others need curated dashboards, reports and visualizations. In many cases, the same user may fit into each of the scenarios above, but during different phases of a project. In other cases, these different scenarios are leveraging the same data in different forms or manipulated in different ways and in combination with other sets of data.
Understanding how to make your data users successful is a function of understanding consumers’ skill levels and the tools and datasets they will need. For example, in addition to the data sets referenced above, data scientists tend to spend much time preparing data and hand-coding algorithms or using libraries for AI and ML, such as TensorFlow. Conversely, business analysts are more inclined to leverage SQL for reporting and popular BI and Visualization tools on those queried datasets. Power users on the business side may be able to handle simpler queries, but are most comfortable manipulating spreadsheets, such as your finance and planning staff. Each of these users has a unique set of needs not just for the data, but also for the tools that actually define how they leverage data to do their jobs. You can learn more about the range of Actian data management solutions here.