Exploring LLMs: Data Science Practices and the Impact on Development Lifecycle.
Warming up for The AI Summit London next month, chair and Omdia’s chief AI analyst, Bradley Shimmin talked to Mathis Luka, Head of Product at deepset, to explore how enterprises should be ready to implement LLMs to improve and optimise current practices and processes.
We had seen data platforms cosy up to data science tools prior to the ChatGPT craze. Those efforts, like building Jupyter notebooks into data warehouses, seems right now like child's play compared with the emerging demand for companies to accommodate LLMs into their data and data science practices.
From an architectural and best practices viewpoint, how should enterprises "gear up" their existing NLP efforts to take full advantage of LLMs quickly and safely?
The biggest change is that the [ML/NLP] technology isn't an issue anymore. Previously, teams had to invest a lot of time and effort into getting a model to do what they needed it to do to deliver a product. That led to that many teams not actually building any products, let alone shipping anything, but getting stuck in an eternal experimentation and research cycle.
Now with large language models, they must adapt to being a team that can ship applications, that can build products, that can deliver something that is useful to their customers.
The speed of that is going to be faster than anything we've ever seen before. If you look at your typical data science team that might have spent six to 12 months to deliver a proof-of-concept, they'd be expected to ship that proof-of-concept to real users within a week or two, and then release a product within two to three months.
It's also now not just the model you need to consider, there are many different components to mix and match. You must adapt the behaviour to your users’ needs and [product] teams are lacking a space to do this.
For example:
- Software development typically evolves around ‘places,’ [such as] GitHub, for example, where they communicate about their code, create issues, and review code.
- Product management and design, Figma, for example, where designs are discussed,
However, for a product team that wants to ship an LLM application, that place, that platform doesn't exist because legacy data-engineering-focused platforms are not optimised for this. Teams need to get used to operating much faster, get used to shipping, get early feedback from the end-users about how the model behaves —and find a place, a platform for that, where they can work.
It's easy to get swept up in the hype of LLMs right now, particularly in their seemingly emergent capability to take on new, unexpected tasks without IT intervention - something of a considerable pain point for practitioners rolling out new NLP functionality in the past.
What advice would you give practitioners just starting their NLP journey, who are under the impression that all one need do is put an LLM in front of a search engine to basically future proof enterprise search?
If you want to improve enterprise search — I don't think that large language models are the solution.
Large language models are a new language user interface to search. So, providing answers that sound like you're talking to a real human is a new kind of user interface that people will expect. This will be common for consumer apps in a few years and the enterprises must adapt to this.
For the actual search, the hard problem is retrieving the right information that you want to feed into that language model so that it can generate a human-like answer. There's vector-based search which helps, but it's a lot of work to make it good for your use case. Honestly, I haven't seen any major enterprise search products where the only missing part now would be that new language interface.
We’re not looking at a solved ‘search’ problem just because LLM’s might be in use now. It's always a combination of different things. You need to have information retrieval, vector search in place, and then you need the large language model to provide your users with the experience they expect.
With the recent rise in generative AI concerns over the value vs. risk of opening up LLMs as open-source software, how do you characterize the "openness" of Haystack – deepset’s NLP framework - and whether that should be viewed by enterprise practitioners adverse to risk surrounding GAI and open source?
Haystack is an open-source framework for building large language models applications, and as such it will always remain open and free. It’s permissibly licenced, so you can use it for commercial projects — as well as in research.
ChatGPT is proprietary and ‘closed source,’ and Open AI doesn’t give a lot of information about how this new model is trained. However, there is a wave of open-source models emerging, like Alpaca, DALL-E, Llama, which are trained by a great community of researchers, start-ups, enterprises, and what we are seeing is the start of commoditisation of large language models.
In a few years LLM’s will not be the differentiator, they will be commoditised as something you can buy, and something that you might get as free and open source. The task will be to find out how to apply that LLM to your specific problem and how to mix and match the components to build the application you want to build, that’s what Haystack does.
With Haystack technology we are model-agnostic and provider-agnostic; you can have all these components and just change them. For example, you can start with a model from Open AI and if a suitable open-source model is released, you can adopt it too. For enterprises trying to manage dependency risks this is ideal, as it means they are not locked into a single model provider and means they can switch to an alternative to remain competitive. To manage this kind of risks, my advice is to pick a software architecture that allows you to experiment with different models and switch components easily.
Do you see the emerging "skill" of prompt engineering playing a crucial role in how companies go about using LLMs for their many and varied NLP tasks?
NLP models have a few different ways of adapting the model to your specific task. When starting off with a ‘foundation model,’ you have to understand that it had been trained on a lot of text but not on a specific task. It’s very good at ‘understanding’ language, but it might not understand the task you are giving to it. To get the model to be able to perform the task, the model would usually need to be ‘fine-tuned’ using training data — which is not a method available for proprietary LLMs. With prompt engineering, that is a different way of adapting a hosted model to the task you want it to perform. Large language models like GPT4 can react to human instructions, to natural language instructions. The way you phrase these instructions is quite important in how the model will perform on the task you are giving to it.
Prompt engineering is becoming an essential skill, so it is a bit different from traditional software engineering. For enterprises with and existing team of NLP engineers, it’s just another skill for them to learn.
If you’re leading a data team through the world of LLMs, Mathis’ session on the Practitioners Stage, 14 June 2023, will walk you through what the changes in the application development lifecycle mean day to day. Make sure to add ‘The Changing Role of Enterprise Data Teams in the World of LLMs’ to your agenda