Many organizations have started applying DevOps practices within their data warehousing and data lake setups. Data Analysts and Database Managers can follow DevOps methodologies for managing updates and new database releases across various environments in a uniform fashion, to produce repeatable results. Just like application teams create and manage the CI/CD pipeline for applications, the data that these applications consume can have its own release pipeline that is managed by the database teams. In many cases, cloud based data warehouse platforms provide the ability to host the applications that consume this data, all within the same environment.
Applications that consume data housed in a data warehouse may also leverage Kafka or another solutions to achieve low latency query performance. As you release updates to your applications, you may also need to account for the updates to the service bus layer and the database layer. Continuous deployment and continuous integration becomes all the more important. Data teams can institute tools and philosophies that promote an agile culture within their silos. This includes the process of fetching or discovering the data to be put in the data warehouse, the process of making sure it is current and accurate for the consuming applications, and the process of organizing it for reporting purposes.
You can apply DevOps methods and policies to data automation (just like infrastructure automation). Starting from self-service models to request new data instances, to requesting updates, and other data lifecycle steps. There are many organizations that have built entire data platforms on containers. For infrastructure and database teams, it is imperative to provide data “as-a-service” with measured and tracked SLAs and costs – whether these services are provided on container platforms or otherwise. Public cloud platforms have made it easy for consumers to leverage SaaS data warehousing solutions. Using DevOps models do not have to be limited to providing the underlying infrastructure or service, but can also be applied to the building of reports. Jenkins automation can be used to release database updates, integration tools can be used to fetch the relevant data from multiple sources to populate the target systems, and opensource tools like Grafana can be used for dashboards. Primary objective of such a setup would be to capture data from various components and locations within the environment to a centralized location via ETL, and process that data to produce business intelligence.
When bringing data in from multiple sources, the exercise of data mapping and data reconciliation and sanitization usually take the most time and effort upfront. Architectural considerations also include the paradigm of monitoring the data warehouse components, as well as the data within it. Data processing engines like Hadoop MapReduce or Spark, along with the database serving platforms form the core components of any data warehouse setup. By implementing the best practices architecture, and tuning specifically for your environment, you can optimize your data warehouse setup to achieve a balance between performance and cost.
Various industry use cases like fraud prevention in banking, storing health records and doctors notes in healthcare, customer profiling for retail, real time streaming in media, and others, have already leveraged the benefits provided by data lakes for capturing and storing unstructured data, and data warehousing for structured data. With the adoption of blockchain technologies, the relevance of Big Data is only anticipated to grow. Most enterprises depend heavily on applications for their business, and thereby have adopted agile processes for application releases. Combining the consumption of Big Data with emphasis on extracting relevant and accurate data at the right time, is paramount for business critical applications. The adoption of DevOps within data teams is still in its nascent stage, but is being picked up by more and more data experts every day.
If you need assistance in moving your disparate data from various sources in to a data warehouse, or need help assessing the feasibility of a data warehouse platform without substantially affecting your business critical applications, Keyva can help. Associates at Keyva have worked with many different organizations in various verticals to help in data migration and application modernization projects. These include things like creating a data migration factory, creating ETL strategies with data mapping, refactoring existing applications, adding a wrapper over current applications so they can be consumed easily by DevOps processes, modifying existing applications to consume data from SaaS platforms, and more.
If you’d like to have us review your environment and provide suggestions on what might work for you, please contact us at email@example.com.
Anuj joined Keyva from Tech Data where he was the Director of Automation Solutions. In this role, he specializes in developing and delivering vendor-agnostic solutions that avoid the “rip-and-replace” of existing IT investments. Tuli has worked on Cloud Automation, DevOps, Cloud Readiness Assessments and Migrations projects for healthcare, banking, ISP, telecommunications, government and other sectors.
During his previous years at Avnet, Seamless Technologies, and other organizations, he held multiple roles in the Cloud and Automation areas. Most recently, he led the development and management of Cloud Automation IP (intellectual property) and related professional services. He holds certifications for AWS, VMware, HPE, BMC and ITIL, and offers a hands-on perspective on these technologies.
Like what you read? Follow Anuj on LinkedIn at https://www.linkedin.com/in/anujtuli/