Inside MySQL: Sakila Speaks
In this episode, leFred and Scott are joined by Onur Korcerber to explore the many features of HeatWave AutoPilot. Learn how AutoPilot’s intelligent automation helps manage MySQL instances with ease, optimizes performance, and reduces operational costs. Onur shares practical insights and real-world examples showing how customers can streamline their database operations with HeatWave AutoPilot. ------------------------------------------------------------- Episode Transcript: 00:00:00:00 - 00:00:31:20 Welcome to Inside MySQL: Sakila Speaks. A podcast dedicated to all things MySQL. We bring...
info_outlineInside MySQL: Sakila Speaks
In this episode, leFred and Scott welcome Jayant Sharma and Sanjay Jinturkar to the Sakila Studio for an insightful conversation on machine learning and generative AI within HeatWave. Discover how these cutting-edge technologies are integrated, what makes HeatWave unique, and how organizations can leverage its capabilities to unlock new possibilities in data and AI. Tune in for practical insights, real-world use cases, and a closer look at the future of analytics. ------------------------------------------------------------ Episode Transcript: 00:00:00:00 - 00:00:32:01 Welcome to Inside...
info_outlineInside MySQL: Sakila Speaks
Kick off Season 3 of Inside MySQL: Sakila Speaks as leFred and Scott welcome Matt Quinn for an engaging introduction to the world of Artificial Intelligence. In this episode, we step back from the database and explore what AI really is, how it’s shaping society and technology, and why it matters to anyone in tech today. Whether you’re just curious about AI or eager to understand its key concepts, join us as we break down the basics and set the stage for a season of discovery. ------------------------------------------------------------ Episode Transcript: 00:00:00:00 - 00:00:31:22 Welcome...
info_outlineInside MySQL: Sakila Speaks
MySQL Rockstar, René Cannaò, drops in on Fred & Scott to wax philosophical about the success of MySQL, the MySQL Community, and his inspiration for ProxySQL ----------------------------------------------------------------- Episode Transcript: 00:00:00:00 - 00:00:36:10 Unknown Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite...
info_outlineInside MySQL: Sakila Speaks
For this episode, Fred and Scott are joined by Laurynas Biveinis - one of the most prolific individual contributors to MySQL Community. Take a listen as Laurynas discusses the process he uses when he discovers bugs and how he sets up tests for the engineering team. ----------------------------------------------------------- Episode Transcript: 00;00;09;14 - 00;00;36;00 Unknown Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members...
info_outlineInside MySQL: Sakila Speaks
Pedro Andrade joins Fred and Scott to talk about how MySQL's mascot was named. Pedro shares a conversation he had with Ambrose Twebaze where they discuss the competition where Sakila was given her name. ------------------------------------------------------------- Episode Transcript: 00;00;09;13 - 00;00;32;08 Welcome to Inside MySQL: Sakila Speaks a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you...
info_outlineInside MySQL: Sakila Speaks
Fred & Scott have a chance speak with special guests from MyNA—the Japanese MySQL user association—which is one of the most active MySQL user groups in the world. ---------------------------------------------------------- Episode Transcript: 00;00;09;13 - 00;00;35;05 Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts. Bring you the latest updates on your favorite...
info_outlineInside MySQL: Sakila Speaks
In this episode, Fred & Scott share their history with MySQL - including when they first started using MySQL and discuss some of their favorite features. --------------------------------------------------------- Episode Transcript: 00;00;08;15 - 00;00;30;23 Welcome to Inside MySQL Sakila speaks - a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL product updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite open-source database. Let's...
info_outlineInside MySQL: Sakila Speaks
Fred and Scott are joined by Mughees Minhas, Product Management Senior Vice President of Enterprise and Cloud Manageability for an informative discussion of the latest LTS release fo MySQL and how the new versioning of MySQL provides a balance of innovation and stability. -------------------------------------------------------- Episode Transcript: 00;00;00;00 - 00;00;31;20 Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL product updates and insightful interviews with members of the MySQL community....
info_outlineInside MySQL: Sakila Speaks
Luis Soares, Senior Software Development Director and the "face" of all things MySQL replication, drops by to enlighten us about group replication and its different uses in the MySQL ecosystem. --------------------------------------------------------- Episode Transcript: http://insidemysql.libsyn.com/mastering-mysql-group-replication Mastering MySQL Group Replication 00;00;00;00 - 00;00;31;20 Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL product updates, and insightful interviews with...
info_outlineIn this episode, leFred and Scott welcome Jayant Sharma and Sanjay Jinturkar to the Sakila Studio for an insightful conversation on machine learning and generative AI within HeatWave. Discover how these cutting-edge technologies are integrated, what makes HeatWave unique, and how organizations can leverage its capabilities to unlock new possibilities in data and AI. Tune in for practical insights, real-world use cases, and a closer look at the future of analytics.
------------------------------------------------------------
Episode Transcript:
00:00:00:00 - 00:00:32:01
Welcome to Inside MySQL: Sakila Speaks. A podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite open source database. Let's get started!
00:00:32:03 - 00:00:54:17
Hello and welcome to Sakila Speaks, the podcast dedicated to MySQL. I am leFred and I'm Scott Stroz. Today for the second episode of season three dedicated on AI. I am pleased to welcome Sanjay Jinturkar. Sorry if I pronounce it badly. No, you did it right. Hi there. Thank you. So Sanjay is the senior director at Oracle based in New Jersey.
00:00:54:19 - 00:01:21:13
He leads product development for it with AutoML and GenAI with a strong focus on integrating these technologies directly into each HeatWave database. And Sanjay has been instrumental in enhancing HeatWave's machine learning and GenAI tool sets, enabling use case like predictive maintenance, fraud detection and intelligent dicument and Q&A. And also we have a second guest today.
00:01:21:13 - 00:01:48:21
It's a Jayant Sharma. Hi, Jayant. Hello. So Jayant Sharma is senior director of product management at Oracle. He has over 20 years of experience in databases, spatial analytics and application development. He's currently focused on the product strategy and design of the Heatwave MySQL managed services offering. Hey Fred. Thank you, both of you for joining us today. So I'm going to dive right in with the question for Jayant.
00:01:48:23 - 00:02:12:14
Why did Oracle decide to integrate machine learning in generative AI capabilities directly into HeatWave? Thank you Scott, first for this opportunity. And yes, we have to start with first, you know, talking about MySQL, right? MySQL is the world's most popular open source database. And what do all of these customers, the thousands of customers that they have, do with it?
00:02:12:16 - 00:02:47:05
They manage a business process. They manage their enterprise, right? Their focus is on what they want to do, why they want to do it, and not so much the how. That's what MySQL makes it easier. And Heatwave is a managed service on MySQL. Okay, so as folks are modernizing their applications, taking advantage of new technology, they want to be able to use new workloads, new analytics, and modernize their business processes, make it more efficient, make it more effective.
00:02:47:07 - 00:03:09:17
In order to do that, they want to do things such as machine learning and use the benefits of generative AI. However, what they want to focus on, as we said, is what they want, why they want to do it and not the how. So they don't want to have to think about. I have all of this data that's potentially a goldmine.
00:03:09:19 - 00:03:40:07
How do I extract nuggets from it, and how do I safely move it and transfer in between the best of breed tools? I want to be able to do things where they are. I want to bring the capabilities, these new capabilities to my data. I don't want to take my data to where those capabilities are exposed, right? That is why we made it possible to do machine learning and GenAI where your gold mine is, where your data is in MySQL in Heatwave.
00:03:40:09 - 00:04:06:07
Awesome. Thank you. So, I would like to ask you to Sanjay, then. How Do the the, machine learning engine in the HeatWave, offer differ from, using external machine learning pipelines with the with the data we have in the database? It differs in a couple of weeks, specifically how the models are built, who builds them and where they are built.
00:04:06:09 - 00:04:46:09
So our pipeline, we provide, automated pipeline, which can take your data in MySQL database or Lakehouse, and then automatically generate the model for you. So it does the, usual tasks of pre-processing, hyperparameter optimization, and, data cleansing, etc. automatically so that the user doesn't have to do that. We would even go ahead and do, explanations for you in certain use cases, given that this is automated, a big side effect of that is users don't need to be experts in machine learning.
00:04:46:11 - 00:05:16:08
What they need to focus on is their business problem, and how that business problem maps onto one of the features that we provide. From there onwards, the pipeline takes over and generates the models for it. And the third piece is that all of this work is done within HeatWave. We don't take the data going back to what Jayant was say, saying, we have got machine learning and generative AI to where the data resides, not the other way around.
00:05:16:10 - 00:05:47:20
So we are building the models inside Heatwave whereby the data is not taken out and thereby it is more secure and the user does not have to worry about data leakage or track where all they have taken the data and how many times they have done it. So these are the three key ways in which we differ. If you use one of the third party solutions, they will end up asking you to do this on your own or asking you to take the data out of the database and build it on your machine, so on and so forth.
00:05:47:22 - 00:06:21:06
But we have made it automated, easy to use and very secure to do so. So Sanjay, we're going to stay with you to, to keep talking about AutoML in HeatWave. So what are some of the key features of AutoML and how does it simplify model training and deployment for users? Fantastic question. You know, as I said in my in the previous, conversation, we are hitting the common tasks that are associated with model training and deployment.
00:06:21:08 - 00:06:46:03
So let's take training here. Typically when the user has to train a model, they are going to take their data. They will clean it up, do some pre-processing. Then they will figure out which particular algorithm they should be using. Tune those algorithms in doing the hyperparameter tuning, so on and so forth. All of these are individual tasks.
00:06:46:05 - 00:07:12:15
Our goal is to have the user focus on their business problem and take away the engineering piece of it, take away the technology piece of it, and do it automatically for them. So we have this pipeline which does this, all of it, all of it automatically in a single pass. So it will do pre-processing. It's going to figure out, the appropriate algorithm to use during model building.
00:07:12:17 - 00:07:39:05
It will figure out what are the best set of hyperparameters and what their values should be, during the training process and give you the, the model. So that's one part the second part is we provide an ability to deploy these models via REST interfaces. So once the model is trained they can deploy this.
00:07:39:07 - 00:08:09:09
And thirdly from time to time the users data is going to drift. Or what I mean by that is the train model. The data on which it was trained no longer reflects the reality. And in that case, you have to retrain the model. So we provide tools to measure that drift. And if it goes beyond a certain threshold, then you can go ahead and retrain your model automatically.
00:08:09:11 - 00:08:53:01
So these are a couple of ways in which we have simplified the model training and the deployment for users. Thank you. Thank you very much for this, detailed, answer. And now... So as we discussed about, you know, the, the data not leaving, to a third party, product. But I would like to, to ask, to, Jayant, if, if there were some performance improvement that, users have seen by doing this, ML natively in HeatWave, instead of removing the data, to external platforms. Certainly, Fred.
00:08:53:03 - 00:09:24:01
So there are two aspects to this. There's, there are efficiencies that, result and there are performance improvement because of the way AutoML is implemented and how it works in HeatWave. Let's start with the efficiency first. The first thing as Sanjay was talking about right, is that we've automated the pipeline. You have to only focus on what is your business problem and how that maps to a particular task in machine learning.
00:09:24:01 - 00:09:47:04
So for example, do I want to predict something. And therefore use regression, do I want to identify or label something and therefore use classification. And AutoML will figure out which particular algorithm. There are multiple ways in which you may do regression, for example, which particular one applies or is best suited for the task at hand. Right.
00:09:47:04 - 00:10:15:06
So efficiency there is AutoML handles it in a single pass, not the normal process requires you to have an iterative do things multiple times. Try it on multiple algorithms or different ways of solving the same problem, and then evaluate which one does it best. AutoML does this in a single pass by. Very smart ways of sampling your data and running quick tests to identify the best approach.
00:10:15:08 - 00:10:35:15
So that's the efficiency. The second when it does this, why is it so fast? It's so fast because it uses it the full capability of the underlying infrastructure, which is the HeatWave nodes. Right. The number of heat wave nodes you've got the size of these HeatWave nodes. It does these things in parallel and fully utilizes the infrastructure.
00:10:35:17 - 00:11:02:22
So what is the benefit of that? You can do things a) faster and b) potentially cheaper, which gives you the luxury of trying multiple what if scenarios. Right. It's not a laborious process. It's more efficient. So if you know exactly what you want, you get it done faster. If you want to try multiple scenarios, you can do that faster and at a lower cost.
00:11:03:00 - 00:11:32:23
So that is the efficiency and the performance enhancements that you get. Awesome. So all right let's switch gears a little bit GenAI is one of the latest additions to HeatWave. What specific GenAI features are currently available or if you can talk about them in development? So, Scott, indeed. GenAI has been one of the latest additions to our platform and frankly it encompasses two separate components.
00:11:33:01 - 00:12:04:05
One is the customer usage part and the second is the technology part. So from a customer usage perspective, what people want to do is bring in their knowledge bases. And by that I mean bring in the PDF documents, PowerPoint documents, so on and so forth, and ask questions of that or get summaries of that text, or translate that text into another language, or develop a chatbot around it so that they can get answers, things of that nature.
00:12:04:07 - 00:12:36:19
So what we have done is keeping this in mind for an enterprise setting. We have developed the technology components which are needed to serve these needs, such that going back to the earlier conversation, they focus on their business needs, and we provide them the tools to actually, serve those or we provide the plumbing to do so. So what we have done is to provide a full pipeline to ingest their documents and create the knowledge base.
00:12:36:19 - 00:13:04:06
And by that I mean bring in your PDF documents, which will get converted into embeddings and stored into vector store. So we provide all of that. And then we provide ways in which to search this knowledge base and give answers to the users via easy to use APIs like retrieval augmented generation (RAG), or doing just semantic search over those documents or doing summarization or translation.
00:13:04:08 - 00:13:32:14
We also have the ability to, support chat. Now, one very interesting thing that we have done is to provide the users a LLMs, which ran on commodity hardware. Jayant was talking about running this on HeatWave nodes. So we, we, we have provided these LLMs which found on commodity hardware so that people can quickly prototype their application to test it out.
00:13:32:16 - 00:14:00:13
And if they like the results of the like the performance, they stick with it. Or if they want high performance, and then they go to our OCI GenAI services and use those LLMs. So, quick prototyping, quick testing, quick evaluation done using the commodity LLMs commodity, the LLMs which are running on commodity hardware. And then they can use the OCI GenAI LLMs to get high performance.
00:14:00:15 - 00:14:26:07
Now going to your question about what newer things are coming in. You know, OCI is at the forefront of this revolution. And they are providing, newer models, newer frameworks and tools. And we are continuously incorporating MySQL and Heatwave with, their tools and technologies so that we can provide the same to our customers, in coming weeks and months.
00:14:26:08 - 00:14:53:23
Yeah. And then an example would be the agent framework. Right. Integrating with the agent framework, integrating with the hosted frontier models on GPU infrastructure. So you develop your prototype, develop, you can choose to deploy. The integration is preexisting. You don't it's not an after the fact exercise. You use the same infrastructure. And we provide the pre-built integration with those AI services.
00:14:54:01 - 00:15:28:22
Excellent. So because you are talking about, using, these, LLMs, on commodity hardware, which model, are available from these LLMs and how are we using them? So, we provide as I mentioned earlier, we provide two sets of, access to two sets pof LLMs. One are the smaller parameters LLMs, which are running on, commodity hardware on the same cluster on which HeatWave is run.
00:15:29:00 - 00:16:00:12
And in that context, we provide models like, mistral 8 billion or, llama 3 billion, 1 billion. And we are continually sort of upgrading these as newer models come into play, since this is a very fast moving fleet. Also, once people have prototyped it, once people have gotten to see the results, if they want to switch over to OCI GenAI LLMs, then we provide access to the wide variety of LLMs that OCI provides.
00:16:00:14 - 00:16:31:11
Those includes, you know, things like all the llama models, the upcoming models from all the other vendors, all of that is going to be accessible through, all of that is accessible through, MySQL. And these can be used for, as I said, but I do have use cases, be it retrieval augmented generation, chat bots, summarization, translation and many other use cases that, you know, customers continue to think of.
00:16:31:11 - 00:16:57:06
And we are always amazed at the way they are using, this technology. So when we talked to Matt Quinn, in our first episode, he had mentioned, he briefly touched on, security and privacy issues when it comes to GenAI or when it comes to AI. And we all know that OCI's, biggest concerns are and always have been, security and privacy.
00:16:57:07 - 00:17:26:12
How is data privacy managed when training models or generating text using GenAI in HeatWave? Thank you for that, Scott. So security and privacy and data privacy are and have always been Oracle's primary concern. Right. It is true for the OCI the infrastructure. And it was built with security in mind. It has been true of Oracle's ever since Oracle since its inception.
00:17:26:14 - 00:18:04:09
So that is core that is our DNA including for MySQL and Heatwave. So the two primary ways in which this is a benefit here. Number one, the data doesn't move. Number two when it does have to move, as you said, in the case of text generation etc., we don't use only send the relevant snippets. So Sanjay talked about the fact is that you just the documents and then you create embeddings, which essentially that means is you look at various snippets, the model creates a semantic, you know, captures the semantics of it.
00:18:04:09 - 00:18:33:08
So that you can do a similarity. Right. Do you want to search for, GenAI? You'll also get documents that talk about retrieval augmented generation. Even though you didn't ask about retrieval augmented generation, because the embeddings know that these two things are related. So we extract the relevant content and only that is then sent. For example, if you want to if you're using in database a LLMs nothing is sent anywhere, everything stays in your environment.
00:18:33:09 - 00:18:58:04
If you're using an OCI service, for example, the relevant pieces sent that you get the answer and OCI is built, as you said, with security in mind. So the relevant piece that you sent is discarded after it has answer your question. So once again, your data doesn't leave the ecosystem. It stays within the Oracle infrastructure. Your infrastructure. Great and secure like we want.
00:18:58:06 - 00:19:29:03
Exactly. So thank you very much. No. The tricky question that, we have every time we present, anything new or compelling, at conference and stuff, it's that. And I would like to ask you if you could share, with us, a few compelling customer stories or a real world application of, the, ML engineand GenAI in HeatWave, because it's bice theoretically.
00:19:29:03 - 00:19:47:23
But are people using it? Really? And who are they? If you can say, I know we cannot always share names, but maybe some, are public. So if you can explain us a bit of that would be great. Yeah, Fred, that's a very good question. And indeed, all of this makes sense if customers are using it.
00:19:48:01 - 00:20:31:14
So, yeah, let me share a couple of them. In fact, let me share two of them, one on GenAI and one on, AutoML, so on GenAI. And this is a public reference. We have a customers, Smarter D their goal, or what they provide is a platform to track the compliance of a company's policy to the guidelines released by various, bodies, for example, DoD. now, in the past, what they have done is to have an auditor, which is going to compare the policy with respect to the guideline and then say if a specific companies are adhering to those guidelines or not, which is kind of
00:20:31:14 - 00:21:04:12
a, manual process, fairly intensive, right? What they did was to start using, Heatwave GenAI and they stored both the policies and the guidelines in HeatWave GenAI's knowledge based. And then using our RAG using our other capabilities, they compared the policy with the guidance automatically. And the results of these are then given to the auditor.
00:21:04:14 - 00:21:28:15
So now what does what does this do. This reduces the amount of time the auditor has to spend comparing the two. Instead, he or she gets a, very short description of what has been done, what has not been done, and whereby they can make a judgment whether it is indeed adhering or not. So it makes the auditor far more efficient compared to earlier.
00:21:28:17 - 00:21:57:04
So that's a that's a use case in the compliance space for HeatWave. GenAI, I, I would give me another example here. I can't, quote the name, but this is in the industrial setting where the customer wants to have early signals of failures of their computer servers. So they are running, large amount of, workloads.
00:21:57:07 - 00:22:39:20
And these workloads generate logs, and those logs sometimes are indicative of failures. So what they do is they use our log anomaly detection, tool models that, they have built actually, and feed their logs continuously to these models. And these models are trained to detect abnormal logs. So when the model notices abnormal logs, it's going to go ahead and give you an early warning of the fact that a particular system is likely to fail.
00:22:39:22 - 00:23:03:14
An example of that would be hey, your CPU utilization is trending higher, or your memory utilization is going up, or the number of network connections are going up. All of this the model best efforts based on the data it was trained earlier and then in real time. It looks at the newer data and figures out, you know what, this is looking problematic.
00:23:03:20 - 00:23:29:22
So let me give you an early warning. And that's what they are doing. So these are two, kind of use cases that some of our customers are two of our customers of, have developed and, use. Great. So just to add to what they were saying on the second case, right, it's high tech manufacturing, their assembly line, the manufacturing process, which is if something goes wrong and that process is interrupted, that's expensive.
00:23:30:00 - 00:23:52:13
Right. And that is what is being, the benefit here is that they get an early warning and they can mitigate, so they get a chance. They have monitoring systems. Obviously, the monitoring system tells you after the fact something went wrong. Go fix it. The log analytics, the predictive part gives you an early warning so you can mitigate the likely.
00:23:52:15 - 00:24:21:14
Yeah, you can either prevent or you can reduce the effect of something getting affected on the assembly line. That is really cool. So it's interesting that, Sanjay, you talked about the compliance issue because when we talked to Matt in the last episode, that was actually an example that he used as well, but he was just talking generally not specific, you know, not a specific, customer or, real world, application of it.
00:24:21:14 - 00:24:50:06
He was just throwing it out there like, this is what you could do. So it's really cool that you brought that up to, to wrap up, what advice would you give to MySQL users who want to start exploring AI with Heatwave? Simple. Get started with the always free instance. So today, and in case y'all aren't aware and folks aren't aware, you can get an always free instance in your own region in OCI.
00:24:50:06 - 00:25:14:15
If you, you know, sign up on OCI account and create an always free your instance. And that actually has the GenAI and AutoML capability. Now to help you get started, we've got live labs that walk you through scenarios. Right. So we've got 4 or 5 live labs - two on GenAI, two on ML and a few on specific use cases.
00:25:14:15 - 00:25:45:16
For example, wholesale SailGP uses machine learning with MySQL to help these people perform better in a race. But more importantly, once you we also have a collection of videos. Some of them by you Scott, for example, on what you can do with AutoML, what you can do with GenAI, what you can do with it in healthcare, what can you do with it in e-commerce and etc..
00:25:45:16 - 00:26:07:00
So and sample code on GitHub on our GitHub site, sample notebooks, sample SQL code. Try them out on your always free instance. Or if you have access to a beefier instance because you have credits. Try it out on that. And then if you have a specific thing in mind, feel free to reach out to any one of us.
00:26:07:00 - 00:26:25:06
If you want help to do a prototype or do a POC, we'd be more than happy to do that. Nice. Thank you very much. So it was a pleasure to have you both here, Sanjay and Jayant. So thank you very much. To participate to this podcast. So thank you and bye bye. Bye bye. Thank you. Bye bye. Thank you.
00:26:25:06 - 00:26:53:23
Guys, that's a wrap on this episode of MySQL: Sakila Speaks. Thanks for hanging out with us. If you enjoyed listening, please click subscribe to get all the latest episodes. We would also love your reviews and ratings on your podcast app. Be sure to join us for the next episode of MySQL: Sakila Speaks.
Episode Transcript:
00:00:00:00 - 00:00:32:01
Unknown
Welcome to Inside MySQL: Sakila Speaks. A podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite open source database. Let's get started!
00:00:32:03 - 00:00:54:17
Unknown
Hello and welcome to Sakila Speaks, the podcast dedicated to MySQL. I am leFred and I'm Scott Stroz. Today for the second episode of season three dedicated on AI. I am pleased to welcome Sanjay Jinturkar. Sorry if I pronounce it badly. No, you did it right. Hi there. Thank you. So Sanjay is the senior director at Oracle based in New Jersey.
00:00:54:19 - 00:01:21:13
Unknown
He leads product development for it with AutoML and GenAI with a strong focus on integrating these technologies directly into each HeatWave database. And Sanjay has been instrumental in enhancing HeatWave's machine learning and GenAI tool sets, enabling use case like predictive maintenance, fraud detection and intelligent dicument and Q&A. And also we have a second guest today.
00:01:21:13 - 00:01:48:21
Unknown
It's a Jayant Sharma. Hi, Jayant. Hello. So Jayant Sharma is senior director of product management at Oracle. He has over 20 years of experience in databases, spatial analytics and application development. He's currently focused on the product strategy and design of the Heatwave MySQL managed services offering. Hey Fred. Thank you, both of you for joining us today. So I'm going to dive right in with the question for Jayant.
00:01:48:23 - 00:02:12:14
Unknown
Why did Oracle decide to integrate machine learning in generative AI capabilities directly into HeatWave? Thank you Scott, first for this opportunity. And yes, we have to start with first, you know, talking about MySQL, right? MySQL is the world's most popular open source database. And what do all of these customers, the thousands of customers that they have, do with it?
00:02:12:16 - 00:02:47:05
Unknown
They manage a business process. They manage their enterprise, right? Their focus is on what they want to do, why they want to do it, and not so much the how. That's what MySQL makes it easier. And Heatwave is a managed service on MySQL. Okay, so as folks are modernizing their applications, taking advantage of new technology, they want to be able to use new workloads, new analytics, and modernize their business processes, make it more efficient, make it more effective.
00:02:47:07 - 00:03:09:17
Unknown
In order to do that, they want to do things such as machine learning and use the benefits of generative AI. However, what they want to focus on, as we said, is what they want, why they want to do it and not the how. So they don't want to have to think about. I have all of this data that's potentially a goldmine.
00:03:09:19 - 00:03:40:07
Unknown
How do I extract nuggets from it, and how do I safely move it and transfer in between the best of breed tools? I want to be able to do things where they are. I want to bring the capabilities, these new capabilities to my data. I don't want to take my data to where those capabilities are exposed, right? That is why we made it possible to do machine learning and GenAI where your gold mine is, where your data is in MySQL in Heatwave.
00:03:40:09 - 00:04:06:07
Unknown
Awesome. Thank you. So, I would like to ask you to Sanjay, then. How Do the the, machine learning engine in the HeatWave, offer differ from, using external machine learning pipelines with the with the data we have in the database? It differs in a couple of weeks, specifically how the models are built, who builds them and where they are built.
00:04:06:09 - 00:04:46:09
Unknown
So our pipeline, we provide, automated pipeline, which can take your data in MySQL database or Lakehouse, and then automatically generate the model for you. So it does the, usual tasks of pre-processing, hyperparameter optimization, and, data cleansing, etc. automatically so that the user doesn't have to do that. We would even go ahead and do, explanations for you in certain use cases, given that this is automated, a big side effect of that is users don't need to be experts in machine learning.
00:04:46:11 - 00:05:16:08
Unknown
What they need to focus on is their business problem, and how that business problem maps onto one of the features that we provide. From there onwards, the pipeline takes over and generates the models for it. And the third piece is that all of this work is done within HeatWave. We don't take the data going back to what Jayant was say, saying, we have got machine learning and generative AI to where the data resides, not the other way around.
00:05:16:10 - 00:05:47:20
Unknown
So we are building the models inside Heatwave whereby the data is not taken out and thereby it is more secure and the user does not have to worry about data leakage or track where all they have taken the data and how many times they have done it. So these are the three key ways in which we differ. If you use one of the third party solutions, they will end up asking you to do this on your own or asking you to take the data out of the database and build it on your machine, so on and so forth.
00:05:47:22 - 00:06:21:06
Unknown
But we have made it automated, easy to use and very secure to do so. So Sanjay, we're going to stay with you to, to keep talking about AutoML in HeatWave. So what are some of the key features of AutoML and how does it simplify model training and deployment for users? Fantastic question. You know, as I said in my in the previous, conversation, we are hitting the common tasks that are associated with model training and deployment.
00:06:21:08 - 00:06:46:03
Unknown
So let's take training here. Typically when the user has to train a model, they are going to take their data. They will clean it up, do some pre-processing. Then they will figure out which particular algorithm they should be using. Tune those algorithms in doing the hyperparameter tuning, so on and so forth. All of these are individual tasks.
00:06:46:05 - 00:07:12:15
Unknown
Our goal is to have the user focus on their business problem and take away the engineering piece of it, take away the technology piece of it, and do it automatically for them. So we have this pipeline which does this, all of it, all of it automatically in a single pass. So it will do pre-processing. It's going to figure out, the appropriate algorithm to use during model building.
00:07:12:17 - 00:07:39:05
Unknown
It will figure out what are the best set of hyperparameters and what their values should be, during the training process and give you the, the model. So that's one part the second part is we provide an ability to deploy these models via REST interfaces. So once the model is trained they can deploy this.
00:07:39:07 - 00:08:09:09
Unknown
And thirdly from time to time the users data is going to drift. Or what I mean by that is the train model. The data on which it was trained no longer reflects the reality. And in that case, you have to retrain the model. So we provide tools to measure that drift. And if it goes beyond a certain threshold, then you can go ahead and retrain your model automatically.
00:08:09:11 - 00:08:53:01
Unknown
So these are a couple of ways in which we have simplified the model training and the deployment for users. Thank you. Thank you very much for this, detailed, answer. And now... So as we discussed about, you know, the, the data not leaving, to a third party, product. But I would like to, to ask, to, Jayant, if, if there were some performance improvement that, users have seen by doing this, ML natively in HeatWave, instead of removing the data, to external platforms. Certainly, Fred.
00:08:53:03 - 00:09:24:01
Unknown
So there are two aspects to this. There's, there are efficiencies that, result and there are performance improvement because of the way AutoML is implemented and how it works in HeatWave. Let's start with the efficiency first. The first thing as Sanjay was talking about right, is that we've automated the pipeline. You have to only focus on what is your business problem and how that maps to a particular task in machine learning.
00:09:24:01 - 00:09:47:04
Unknown
So for example, do I want to predict something. And therefore use regression, do I want to identify or label something and therefore use classification. And AutoML will figure out which particular algorithm. There are multiple ways in which you may do regression, for example, which particular one applies or is best suited for the task at hand. Right.
00:09:47:04 - 00:10:15:06
Unknown
So efficiency there is AutoML handles it in a single pass, not the normal process requires you to have an iterative do things multiple times. Try it on multiple algorithms or different ways of solving the same problem, and then evaluate which one does it best. AutoML does this in a single pass by. Very smart ways of sampling your data and running quick tests to identify the best approach.
00:10:15:08 - 00:10:35:15
Unknown
So that's the efficiency. The second when it does this, why is it so fast? It's so fast because it uses it the full capability of the underlying infrastructure, which is the HeatWave nodes. Right. The number of heat wave nodes you've got the size of these HeatWave nodes. It does these things in parallel and fully utilizes the infrastructure.
00:10:35:17 - 00:11:02:22
Unknown
So what is the benefit of that? You can do things a) faster and b) potentially cheaper, which gives you the luxury of trying multiple what if scenarios. Right. It's not a laborious process. It's more efficient. So if you know exactly what you want, you get it done faster. If you want to try multiple scenarios, you can do that faster and at a lower cost.
00:11:03:00 - 00:11:32:23
Unknown
So that is the efficiency and the performance enhancements that you get. Awesome. So all right let's switch gears a little bit GenAI is one of the latest additions to HeatWave. What specific GenAI features are currently available or if you can talk about them in development? So, Scott, indeed. GenAI has been one of the latest additions to our platform and frankly it encompasses two separate components.
00:11:33:01 - 00:12:04:05
Unknown
One is the customer usage part and the second is the technology part. So from a customer usage perspective, what people want to do is bring in their knowledge bases. And by that I mean bring in the PDF documents, PowerPoint documents, so on and so forth, and ask questions of that or get summaries of that text, or translate that text into another language, or develop a chatbot around it so that they can get answers, things of that nature.
00:12:04:07 - 00:12:36:19
Unknown
So what we have done is keeping this in mind for an enterprise setting. We have developed the technology components which are needed to serve these needs, such that going back to the earlier conversation, they focus on their business needs, and we provide them the tools to actually, serve those or we provide the plumbing to do so. So what we have done is to provide a full pipeline to ingest their documents and create the knowledge base.
00:12:36:19 - 00:13:04:06
Unknown
And by that I mean bring in your PDF documents, which will get converted into embeddings and stored into vector store. So we provide all of that. And then we provide ways in which to search this knowledge base and give answers to the users via easy to use APIs like retrieval augmented generation (RAG), or doing just semantic search over those documents or doing summarization or translation.
00:13:04:08 - 00:13:32:14
Unknown
We also have the ability to, support chat. Now, one very interesting thing that we have done is to provide the users a LLMs, which ran on commodity hardware. Jayant was talking about running this on HeatWave nodes. So we, we, we have provided these LLMs which found on commodity hardware so that people can quickly prototype their application to test it out.
00:13:32:16 - 00:14:00:13
Unknown
And if they like the results of the like the performance, they stick with it. Or if they want high performance, and then they go to our OCI GenAI services and use those LLMs. So, quick prototyping, quick testing, quick evaluation done using the commodity LLMs commodity, the LLMs which are running on commodity hardware. And then they can use the OCI GenAI LLMs to get high performance.
00:14:00:15 - 00:14:26:07
Unknown
Now going to your question about what newer things are coming in. You know, OCI is at the forefront of this revolution. And they are providing, newer models, newer frameworks and tools. And we are continuously incorporating MySQL and Heatwave with, their tools and technologies so that we can provide the same to our customers, in coming weeks and months.
00:14:26:08 - 00:14:53:23
Unknown
Yeah. And then an example would be the agent framework. Right. Integrating with the agent framework, integrating with the hosted frontier models on GPU infrastructure. So you develop your prototype, develop, you can choose to deploy. The integration is preexisting. You don't it's not an after the fact exercise. You use the same infrastructure. And we provide the pre-built integration with those AI services.
00:14:54:01 - 00:15:28:22
Unknown
Excellent. So because you are talking about, using, these, LLMs, on commodity hardware, which model, are available from these LLMs and how are we using them? So, we provide as I mentioned earlier, we provide two sets of, access to two sets pof LLMs. One are the smaller parameters LLMs, which are running on, commodity hardware on the same cluster on which HeatWave is run.
00:15:29:00 - 00:16:00:12
Unknown
And in that context, we provide models like, mistral 8 billion or, llama 3 billion, 1 billion. And we are continually sort of upgrading these as newer models come into play, since this is a very fast moving fleet. Also, once people have prototyped it, once people have gotten to see the results, if they want to switch over to OCI GenAI LLMs, then we provide access to the wide variety of LLMs that OCI provides.
00:16:00:14 - 00:16:31:11
Unknown
Those includes, you know, things like all the llama models, the upcoming models from all the other vendors, all of that is going to be accessible through, all of that is accessible through, MySQL. And these can be used for, as I said, but I do have use cases, be it retrieval augmented generation, chat bots, summarization, translation and many other use cases that, you know, customers continue to think of.
00:16:31:11 - 00:16:57:06
Unknown
And we are always amazed at the way they are using, this technology. So when we talked to Matt Quinn, in our first episode, he had mentioned, he briefly touched on, security and privacy issues when it comes to GenAI or when it comes to AI. And we all know that OCI's, biggest concerns are and always have been, security and privacy.
00:16:57:07 - 00:17:26:12
Unknown
How is data privacy managed when training models or generating text using GenAI in HeatWave? Thank you for that, Scott. So security and privacy and data privacy are and have always been Oracle's primary concern. Right. It is true for the OCI the infrastructure. And it was built with security in mind. It has been true of Oracle's ever since Oracle since its inception.
00:17:26:14 - 00:18:04:09
Unknown
So that is core that is our DNA including for MySQL and Heatwave. So the two primary ways in which this is a benefit here. Number one, the data doesn't move. Number two when it does have to move, as you said, in the case of text generation etc., we don't use only send the relevant snippets. So Sanjay talked about the fact is that you just the documents and then you create embeddings, which essentially that means is you look at various snippets, the model creates a semantic, you know, captures the semantics of it.
00:18:04:09 - 00:18:33:08
Unknown
So that you can do a similarity. Right. Do you want to search for, GenAI? You'll also get documents that talk about retrieval augmented generation. Even though you didn't ask about retrieval augmented generation, because the embeddings know that these two things are related. So we extract the relevant content and only that is then sent. For example, if you want to if you're using in database a LLMs nothing is sent anywhere, everything stays in your environment.
00:18:33:09 - 00:18:58:04
Unknown
If you're using an OCI service, for example, the relevant pieces sent that you get the answer and OCI is built, as you said, with security in mind. So the relevant piece that you sent is discarded after it has answer your question. So once again, your data doesn't leave the ecosystem. It stays within the Oracle infrastructure. Your infrastructure. Great and secure like we want.
00:18:58:06 - 00:19:29:03
Unknown
Exactly. So thank you very much. No. The tricky question that, we have every time we present, anything new or compelling, at conference and stuff, it's that. And I would like to ask you if you could share, with us, a few compelling customer stories or a real world application of, the, ML engineand GenAI in HeatWave, because it's bice theoretically.
00:19:29:03 - 00:19:47:23
Unknown
But are people using it? Really? And who are they? If you can say, I know we cannot always share names, but maybe some, are public. So if you can explain us a bit of that would be great. Yeah, Fred, that's a very good question. And indeed, all of this makes sense if customers are using it.
00:19:48:01 - 00:20:31:14
Unknown
So, yeah, let me share a couple of them. In fact, let me share two of them, one on GenAI and one on, AutoML, so on GenAI. And this is a public reference. We have a customers, Smarter D their goal, or what they provide is a platform to track the compliance of a company's policy to the guidelines released by various, bodies, for example, DoD. now, in the past, what they have done is to have an auditor, which is going to compare the policy with respect to the guideline and then say if a specific companies are adhering to those guidelines or not, which is kind of
00:20:31:14 - 00:21:04:12
Unknown
a, manual process, fairly intensive, right? What they did was to start using, Heatwave GenAI and they stored both the policies and the guidelines in HeatWave GenAI's knowledge based. And then using our RAG using our other capabilities, they compared the policy with the guidance automatically. And the results of these are then given to the auditor.
00:21:04:14 - 00:21:28:15
Unknown
So now what does what does this do. This reduces the amount of time the auditor has to spend comparing the two. Instead, he or she gets a, very short description of what has been done, what has not been done, and whereby they can make a judgment whether it is indeed adhering or not. So it makes the auditor far more efficient compared to earlier.
00:21:28:17 - 00:21:57:04
Unknown
So that's a that's a use case in the compliance space for HeatWave. GenAI, I, I would give me another example here. I can't, quote the name, but this is in the industrial setting where the customer wants to have early signals of failures of their computer servers. So they are running, large amount of, workloads.
00:21:57:07 - 00:22:39:20
Unknown
And these workloads generate logs, and those logs sometimes are indicative of failures. So what they do is they use our log anomaly detection, tool models that, they have built actually, and feed their logs continuously to these models. And these models are trained to detect abnormal logs. So when the model notices abnormal logs, it's going to go ahead and give you an early warning of the fact that a particular system is likely to fail.
00:22:39:22 - 00:23:03:14
Unknown
An example of that would be hey, your CPU utilization is trending higher, or your memory utilization is going up, or the number of network connections are going up. All of this the model best efforts based on the data it was trained earlier and then in real time. It looks at the newer data and figures out, you know what, this is looking problematic.
00:23:03:20 - 00:23:29:22
Unknown
So let me give you an early warning. And that's what they are doing. So these are two, kind of use cases that some of our customers are two of our customers of, have developed and, use. Great. So just to add to what they were saying on the second case, right, it's high tech manufacturing, their assembly line, the manufacturing process, which is if something goes wrong and that process is interrupted, that's expensive.
00:23:30:00 - 00:23:52:13
Unknown
Right. And that is what is being, the benefit here is that they get an early warning and they can mitigate, so they get a chance. They have monitoring systems. Obviously, the monitoring system tells you after the fact something went wrong. Go fix it. The log analytics, the predictive part gives you an early warning so you can mitigate the likely.
00:23:52:15 - 00:24:21:14
Unknown
Yeah, you can either prevent or you can reduce the effect of something getting affected on the assembly line. That is really cool. So it's interesting that, Sanjay, you talked about the compliance issue because when we talked to Matt in the last episode, that was actually an example that he used as well, but he was just talking generally not specific, you know, not a specific, customer or, real world, application of it.
00:24:21:14 - 00:24:50:06
Unknown
He was just throwing it out there like, this is what you could do. So it's really cool that you brought that up to, to wrap up, what advice would you give to MySQL users who want to start exploring AI with Heatwave? Simple. Get started with the always free instance. So today, and in case y'all aren't aware and folks aren't aware, you can get an always free instance in your own region in OCI.
00:24:50:06 - 00:25:14:15
Unknown
If you, you know, sign up on OCI account and create an always free your instance. And that actually has the GenAI and AutoML capability. Now to help you get started, we've got live labs that walk you through scenarios. Right. So we've got 4 or 5 live labs - two on GenAI, two on ML and a few on specific use cases.
00:25:14:15 - 00:25:45:16
Unknown
For example, wholesale SailGP uses machine learning with MySQL to help these people perform better in a race. But more importantly, once you we also have a collection of videos. Some of them by you Scott, for example, on what you can do with AutoML, what you can do with GenAI, what you can do with it in healthcare, what can you do with it in e-commerce and etc..
00:25:45:16 - 00:26:07:00
Unknown
So and sample code on GitHub on our GitHub site, sample notebooks, sample SQL code. Try them out on your always free instance. Or if you have access to a beefier instance because you have credits. Try it out on that. And then if you have a specific thing in mind, feel free to reach out to any one of us.
00:26:07:00 - 00:26:25:06
Unknown
If you want help to do a prototype or do a POC, we'd be more than happy to do that. Nice. Thank you very much. So it was a pleasure to have you both here, Sanjay and Jayant. So thank you very much. To participate to this podcast. So thank you and bye bye. Bye bye. Thank you. Bye bye. Thank you.
00:26:25:06 - 00:26:53:23
Unknown
Guys, that's a wrap on this episode of MySQL: Sakila Speaks. Thanks for hanging out with us. If you enjoyed listening, please click subscribe to get all the latest episodes. We would also love your reviews and ratings on your podcast app. Be sure to join us for the next episode of MySQL: Sakila Speaks.