Optimizing hierarchical data storage in MongoDB
Vertex Systems have always offered Master´s Thesis opportunities to university students to support their academic and professional qualifications. There are more than 40 Master´s Theses completed for Vertex during the years. The research results are implemented in our product development to provide cutting-edge software solutions for our customers.
In the “My Master´s Thesis Journey” blog series, our young professionals tell about their Master´s theses and what they have learned and accomplished during the journey.”
Background
I joined Vertex in the summer of 2023 as a summer trainee and later continued part-time during my studies, eventually becoming part of the cloud development team working on the backend of Vertex Sync. Working alongside my colleagues, I quickly became familiar with how the service handled customer projects and how it behaved under different workloads. Over time, we noticed that some operations were putting heavy demand on the system, affecting both performance and the costs of running the service reliably.
MongoDB has been one of the most resource-intensive parts of our cloud infrastructure. To make sure the service could handle peak operations without disruption, we had to run a high-capacity tier, which meant paying for resources that were only occasionally needed. At the same time, as the product continued to grow, we were aware that the system would need to handle more usage and heavier workloads in the future. These operational challenges were a shared concern for the team and sparked discussions about how we could improve both performance and scalability.
With guidance from my coworkers and our CTO, we identified this as a suitable topic for my master’s thesis. The goal was to study the underlying issues more systematically and explore solutions that could benefit the team and the product, not just from a technical perspective but also in terms of cost efficiency and future growth.
Addressing the Issue
The work began by analyzing how the existing data structures and access patterns affected performance and scalability in Vertex Sync. We looked at how the system used MongoDB in practice and identified the operations that caused the most strain under normal workloads. Based on these insights, I designed and implemented an alternative schema in a prototype version, aiming to restructure how hierarchical data was stored to reduce unnecessary data volume and make queries more efficient. To evaluate whether this redesign would translate into real benefits, I created a set of performance tests, comparing the original and optimized versions under load. The goal was to determine whether schema-level improvements could meaningfully ease the resource demands and increase the system’s capacity for heavier workloads.
The testing showed that the optimized schema did deliver measurable improvements. In several cases, data volume decreased, and certain operations handled larger project structures than before without immediately exceeding resource limits. At the same time, the improvements remained modest when viewed on the scale of typical use. Some query latencies improved, and resource utilization dropped, but these changes alone were not sufficient to deliver noticeably better user experience or significantly lower operational cost. Equally importantly, the tests revealed a deeper issue. Once the schema inefficiencies were addressed, the dominant limitation shifted from data structure to access patterns. In other words, how the service logic read and wrote data did not fully align with MongoDB’s strengths. This pointed to a more fundamental mismatch between the service architecture and the database model than initially expected.
Conclusion
While schema-level optimizations did not solve all performance and scalability challenges on their own, the thesis clarified where the system’s true boundaries lie. The work provided Vertex with important insights into how data model, access patterns and infrastructure costs interrelate and what kinds of gains are realistically achievable within the current architecture. It also became clear that some long-term challenges cannot be mitigated by schema changes alone. Depending on how the product evolves, the next steps may involve rethinking access patterns to better fit MongoDB or considering an alternative database model that better supports the service logic.
For me personally, this thesis deepened my understanding of Vertex Sync and how its different layers interact. I have stayed at Vertex as a developer and continued working on the next steps, expanding the focus from internal data schema toward broader data flow and service interfaces. The findings from the thesis have laid a stronger foundation for that work and help guide future decisions with a clearer understanding of what works and what does not.