The Challenge
The client is a technology company focused on the management and use of digital imagery and media. The company wanted to enhance its core product by adding a feature that converts entities extracted from unstructured documents, such as PDFs and video transcripts, into a knowledge graph, which is a series of descriptions of entities that put data into context and enable data analytics and sharing. This requires using a graph-structured data model or topology to represent and operate on data. The process also required building relationships between extracted entities, which could then be charted and displayed.
The goal of this development effort was to increase the value of extracted data by providing a structured visualization of relationships and dependencies. Such a visualization can be crucial for deeper analysis and business intelligence applications, offering a significant value add for end users utilizing the company’s core product.
The Solution
The Persistent solution leveraged advanced language models, including Google Foundation Large Language Model Gemini 1.0 Pro and Microsoft OpenAI Model GPT4, to automate the ingestion of PDF’s and video transcripts into Neo4j. Persistent also leveraged its NOVA accelerator to provide a multi-agent framework that accelerated solution development. The language models facilitate the generation of document chunks, entity extraction, and the production of embeddings, i.e., mathematical representations of text, images, and audio that are used by AI and machine learning systems to understand complex knowledge domains.
The target repository, Neo4j, is a powerful native graph storage tool for data science and analytics with enterprise-grade security. Neo4j data elements include nodes, the edges connecting them, and attributes of nodes and edges. The integration of these next-gen technologies ensures a robust, scalable system capable of semantic search and complex data analysis. Furthermore, by including support for regression testing, our solution guarantees reliability and adaptability to evolving data needs.
The Outcome
The knowledge graph solution proved to be remarkable both for speed of execution and measurable benefits. Initially, the Persistent consulting and development teams were able to progress from art of possible workshops and frameworks to an ROI analysis in just eight weeks for a proof of concept. Within six months, the production solution was fully built, quality checked, and deployed.
Benefits to the client included ~70% reduction in manual effort, a ~50% increase in data volume, and an impressive ~40% cost savings. Given the rapid increases in performance seen in Google’s Gemini 1.0 Pro and related tools, further improvements, capabilities, and benefits for the client are virtually certain.
Persistent recently announced a Strategic Partnership Agreement with Google Cloud to jointly deliver market-leading solutions that help enterprises maximize the ROI from their cloud investments while modernizing their infrastructure and data stack. Contact us today for more details.