Leaf in the Wild: KPMG France Enters the Cloud Era with New MongoDB Data Lake
Love it or loathe it, the term “big data” continues to gain awareness and adoption in every industry. No longer just the preserve of internet companies, traditional businesses are innovating with “big data” applications in ways that were unimaginable just a few years ago.
A great example of this is KPMG France’s deployment of a MongoDB-based data lake to support its accounting suite named Loop, and the release of its industry-first financial benchmarking service – enabling KPMG France customers to unlock new levels of insight into how each of their businesses are really performing. In the true spirit of big data, this application would have truly overwhelmed the capabilities of traditional data management technologies. I spoke with Christian Taltas, Managing Director of KPMG France Technologies Services, to learn more.
Can you start by telling us about KPMG France?
KPMG is one of the world’s largest professional services firms operating as independent businesses in 155 countries, with 174,000 staff. KPMG provides audit, tax and advisory services used by corporations, governments and not-for-profit organizations.
KPMG France provides accounting services to 65,000 customers. I am the managing director of KPMG Technologies Services (KTS), a software company subsidiary of KPMG France. KTS developed Loop, a complete collaborative accounting solution which is used by KPMG France’s Certified Public Accountants (CPAs) and their clients.
Please describe how you use MongoDB.
MongoDB is the database powering the Loop accounting suite, used by KPMG’s 4,800 CPAs. The suite is also currently used in collaboration with around 2,000 of KPMG’s customers. We are expecting more than 20,000 customers to adopt Loop’s collaborative accounting within the next 18 months.
What services does MongoDB provide for the accounting suite?
It serves multiple functions for the suite:
Data Lake: All raw accounting data from our customers’ business systems, such as sales data, invoices, bank statements, cash transactions, expenses, payroll and so on, is ingested from Microsoft SQL Server into MongoDB. This data is then accessible to our CPAs to generate the customer’s KPIs. A unique capability we have developed for our customers is financial benchmarking. We can use the data in the MongoDB data lake to allow our customers to benchmark their financial performance against competitors operating in the same industries within a specified geographic region. They can compare salary levels, expenses, margin, marketing costs – in fact almost any financial metric – to help determine their overall market competitiveness against other companies operating in the same industries, regions and markets. The MongoDB data lake enables us to manage large volumes of structured, semi-structured, and unstructured data, against which we can run both ad-hoc and predefined queries supporting advanced analytics and business intelligence dashboards. We are continuously loading new data to the data lake, while simultaneously supporting thousands of concurrent users.
Metadata Management:- Another unique feature of our accounting suite is the ability to customize reporting for each customer, based on specific criteria they want to track. For example, a restaurant chain will be interested in different metrics than a construction company. We enable this customization by creating a unique schema for each customer which is inherited from a standard business application schema, and then written to MongoDB. It stores the schema classes for each customer, which are then applied at run time when accounts and reports are generated. The Loop application has been designed as a business framework that generates reports in real time, running on top of Node.js. MongoDB is helping us manage the entire application codebase in order to deliver the right schemas and application business modules to each user depending on their role and profile, i.e.: bookkeeper, CPA, sales executive. It is a very powerful feature enabled by the flexibility of the MongoDB document data model that we could not have implemented with the constraints imposed by a traditional relational data model.
Caching Layer: The user experience is critical, so we use MongoDB as a high-speed layer to manage user authentication and sessions.
Logging Layer: We also use MongoDB to store all the Loop application’s millions of clients requests each day. This enables us to build Tableau reports on top of the logs to troubleshoot production performance issues for each user session, and for each of the 220 regional KPMG sites spread across France. We are using the MongoDB Connector for BI to generate these reports in Tableau.
Why did you choose MongoDB?
When we started development back in 2012, we knew we needed schema flexibility to handle the massive variances in data structures the accounting suite would need to store and process. This requirement disqualified traditional relational databases from handling the caching, metadata management and KPIs benchmarking computation. As we explored different NoSQL options, we were concerned that we’d over-complicate our architecture by running separate caches and databases. However, in performance testing MongoDB offered the flexibility and scalability to serve both use cases. It outperformed the NoSQL databases and dedicated caches we tested, and so we took the decision to build our platform around MongoDB.
As we were developing our new financial benchmarking service last year, we evaluated Microsoft’s Azure Cosmos DB (note, at the time this was called DocumentDB), but MongoDB offered much richer query and indexing functionality. We also considered building the benchmarking analytics on Hadoop, but the architecture of MongoDB, coupled with the power of the aggregation pipeline gave us a much simpler solution, while delivering the data lake functionality we needed. Aggregation enhancements delivered in MongoDB 3.2, especially the introduction of the $lookup operator, were key to our technology decisions.
Can you describe what your MongoDB deployment looks like?
Both the caching layer and metadata management are run on dedicated three node replica sets. This gives the accounting suite fault resilience to ensure always-on availability. The metadata is largely read only, while the caching layer serves a mixed read / write workload.
The data lake is deployed as a sharded cluster handling both large batch loads of data from clients business systems while concurrently serving complex analytics queries and reporting to the CPAs.
We are running MongoDB on Windows instances in the Microsoft Azure cloud, after migrating from our own data center. We needed to ensure we could meet the scalability demands of the app, and the cloud is a better place to do that, rather than investing in our own infrastructure.
How do you support and manage your deployment?
We use MongoDB's fully managed database, MongoDB Atlas, and have access to 24x7 proactive support from MongoDB engineers. We have also recently used the Production Readiness package from MongoDB consulting services.
The combination of the cloud database service, professional services, and technical support are proving invaluable:
- The MongoDB consultants reviewed our operational processes and Azure deployment plans, from which they able to provide guidance and best practices to execute the migration without interruption to the business. They also helped us create an operations playbook to institutionalize best practices going forward.
- MongoDB Atlas automated the configuration and provisioning of MongoDB instances onto Azure, and we rely on it now to handle on-going upgrades and maintenance. A few simple clicks in the UI eliminates the need for us to develop our own configuration management scripts.
- MongoDB Atlas also provides high-resolution telemetry on the health of our MongoDB databases, enabling us to proactively address any issues before they impact the CPAs.
- Data integrity is obviously key to our business, and so Atlas is invaluable in providing continuous backups of our data lake. We evaluated managing backup ourselves, but ultimately it was much more cost effective for MongoDB to manage it for us as part of the fully managed backup service available through Atlas.
As part of your migration to Azure, you also migrated to the latest MongoDB 3.2 release. Can you share the results of that upgrade?
One word – scalability. With MongoDB 3.2 now using WiredTiger as its default storage engine, we can achieve much higher throughput and scalability on a lower hardware footprint.
The accounting suite supports almost 7,000 internal and external customers today, with half of them connecting for an average of 5 hours every working day. But we plan to roll it out to 20,000 customers over the next 18 months. We’ve been able to load test the suite against our development cluster, and MongoDB has scaled to 5x the current sessions, analytics and data volumes with no issues at all. WiredTiger’s document level concurrency control and storage compression are key to these results.
What future plans do you have for the Loop accounting suite?
We want to automate more of the benchmarking, and enable further data exploration to build predictive analytics models for our customers. This will enable us to provide benchmarks against both historic data, as well as evaluate future likely business outcomes. We plan on using the Azure Machine Learning framework against our MongoDB data lake.
How are you measuring the impact of MongoDB on your business?
The accounting suite’s financial benchmarking service is a highly innovative application that provides KPMG France with significant competitive advantage. We have access to a lot of customer information which becomes actionable with our data lake built on MongoDB. It allows us to store that data cost effectively, while supporting rich analytics to give insights that other accounting practices just can’t match.
Christian, thanks for taking the time to share your story with the MongoDB community.
Thinking of implementing a data lake? Learn more from our guide:
About the Author - Mat Keep
Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.