Many organizations have started utilizing DevOps practices and tools for data warehousing and data lake setups. Data Analysts and Database Managers can follow DevOps practices for managing updates and new database releases across various environments in a uniform fashion, to produce repeatable results. Just like application teams create and manage the CI/CD pipeline for applications, the data that these applications consume can have its own release pipeline that is managed by the database teams. In many cases, cloud based data warehousing platforms provide the ability to host the applications that consume this data, all within the same environment.
Applications that consume data housed in a data warehouse may also leverage Kafka or other DevOps tools to achieve low latency query performance. As you release updates to your applications, you may also need to account for the updates to the service bus layer and the database layer. Continuous deployment and continuous integration becomes all the more important. Data teams that institute DevOps practices and tools for data warehousing can promote an agile culture within their silos. This includes the process of fetching or discovering the data for data warehousing, the process of making sure it is current and accurate for the consuming applications, and the process of organizing it for data mining and analysis.
You can apply DevOps practices and policies to data automation (just like infrastructure automation). Starting from self-service models to request new data instances, to requesting updates, and other data lifecycle steps. There are many organizations that have built entire data platforms on containers. For infrastructure and database teams, it is imperative to provide data “as-a-service” with measured and tracked SLAs and costs – whether these services are provided on container platforms or otherwise. Public cloud platforms have made it easy for consumers to leverage SaaS data warehousing solutions. Using DevOps practices do not have to be limited to providing the underlying infrastructure or service, but can also be applied to the building of reports. Jenkins automation can be used to release database updates, integration tools can be used to fetch the relevant data from multiple sources to populate the target systems, and opensource tools like Grafana can be used for dashboards. Primary objective of such a setup would be to capture data from various components and locations within the environment to a centralized location via ETL, and process that data to produce business intelligence.
When bringing data in from multiple sources for data warehousing, the exercise of data mapping and data reconciliation and sanitization usually take the most time and effort upfront. Architectural considerations also include the paradigm of monitoring the data warehouse components, as well as the data within it. Data processing engines like Hadoop MapReduce or Spark, along with the database serving platforms form the core components of any data warehouse setup. By implementing the best practices architecture, and tuning specifically for your environment, you can optimize your data warehouse setup to achieve a balance between performance and cost.
Various industry use cases like fraud prevention in banking, storing health records and doctors notes in healthcare, customer profiling for retail, real time streaming in media, and others, have already leveraged the benefits provided by data lakes for capturing and storing unstructured data, and data warehousing for structured data. With the adoption of blockchain technologies, the relevance of Big Data is only anticipated to grow. Most enterprises depend heavily on applications for their business, and thereby have adopted agile processes for application releases. Combining the consumption of Big Data with emphasis on extracting relevant and accurate data at the right time, is paramount for business critical applications. The adoption of DevOps practices and tools for data warehousing within data teams is still in its nascent stage, but is being picked up by more and more data experts every day.
If you need assistance with data warehousing to move your disparate data from various sources, or need help assessing the feasibility of a data warehouse platform without substantially affecting your business critical applications, Keyva can help. Associates at Keyva have worked with many different organizations in various verticals to help in data migration and application modernization projects. These include things like creating a data migration factory, creating ETL strategies with data mapping, refactoring existing applications, adding a wrapper over current applications so they can be consumed easily by DevOps processes, modifying existing applications to consume data from SaaS platforms, and more.
If you’d like to have us review your environment and provide suggestions on what might work for you, please contact us at [email protected].
Anuj joined Keyva from Tech Data where he was the Director of Automation Solutions. In this role, he specializes in developing and delivering vendor-agnostic solutions that avoid the “rip-and-replace” of existing IT investments. Tuli has worked on Cloud Automation, DevOps, Cloud Readiness Assessments and Migrations projects for healthcare, banking, ISP, telecommunications, government and other sectors.
During his previous years at Avnet, Seamless Technologies, and other organizations, he held multiple roles in the Cloud and Automation areas. Most recently, he led the development and management of Cloud Automation IP (intellectual property) and related professional services. He holds certifications for AWS, VMware, HPE, BMC and ITIL, and offers a hands-on perspective on these technologies.
Like what you read? Follow Anuj on LinkedIn at https://www.linkedin.com/in/anujtuli/