Inside MySQL: Sakila Speaks
Oracle Ace Alkin Tezuysal joins leFred and Scott to introduce the MyVector plugin for MySQL Community Edition, bringing powerful vector search capabilities to your favorite open-source database. Learn how MyVector enables advanced AI and similarity search features, why this matters for modern applications, and how the MySQL community can easily get started. ------------------------------------------------------------- Episode Transcript: 00:00.000 --> 00:25.000 Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team,...
info_outlineInside MySQL: Sakila Speaks
leFred and Scott sit down with Gaurav Chadha to explore MySQL AI, a new solution that brings advanced AI features available in HeatWave to organizations running MySQL Enterprise Edition on-premises. Discover how MySQL AI bridges the gap between cloud innovation and on-premise infrastructure, making transformative AI capabilities more accessible, secure, and efficient for teams that rely on MySQL Enterprise Edition wherever their databases reside. -------------------------------------------------------------- Episode Transcript: 00:00.000 --> 00:25.000 Welcome to Inside MySQL: Sakila...
info_outlineInside MySQL: Sakila Speaks
In this episode, leFred and Scott are joined by Onur Kocberber to explore the many features of HeatWave AutoPilot. Learn how AutoPilot’s intelligent automation helps manage MySQL instances with ease, optimizes performance, and reduces operational costs. Onur shares practical insights and real-world examples showing how customers can streamline their database operations with HeatWave AutoPilot. ------------------------------------------------------------- Episode Transcript: 00:00:00:00 - 00:00:31:20 Welcome to Inside MySQL: Sakila Speaks. A podcast dedicated to all things MySQL. We bring...
info_outlineInside MySQL: Sakila Speaks
In this episode, leFred and Scott welcome Jayant Sharma and Sanjay Jinturkar to the Sakila Studio for an insightful conversation on machine learning and generative AI within HeatWave. Discover how these cutting-edge technologies are integrated, what makes HeatWave unique, and how organizations can leverage its capabilities to unlock new possibilities in data and AI. Tune in for practical insights, real-world use cases, and a closer look at the future of analytics. ------------------------------------------------------------ Episode Transcript: 00:00:00:00 - 00:00:32:01 Welcome to Inside...
info_outlineInside MySQL: Sakila Speaks
Kick off Season 3 of Inside MySQL: Sakila Speaks as leFred and Scott welcome Matt Quinn for an engaging introduction to the world of Artificial Intelligence. In this episode, we step back from the database and explore what AI really is, how it’s shaping society and technology, and why it matters to anyone in tech today. Whether you’re just curious about AI or eager to understand its key concepts, join us as we break down the basics and set the stage for a season of discovery. ------------------------------------------------------------ Episode Transcript: 00:00:00:00 - 00:00:31:22 Welcome...
info_outlineInside MySQL: Sakila Speaks
MySQL Rockstar, René Cannaò, drops in on Fred & Scott to wax philosophical about the success of MySQL, the MySQL Community, and his inspiration for ProxySQL ----------------------------------------------------------------- Episode Transcript: 00:00:00:00 - 00:00:36:10 Unknown Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite...
info_outlineInside MySQL: Sakila Speaks
For this episode, Fred and Scott are joined by Laurynas Biveinis - one of the most prolific individual contributors to MySQL Community. Take a listen as Laurynas discusses the process he uses when he discovers bugs and how he sets up tests for the engineering team. ----------------------------------------------------------- Episode Transcript: 00;00;09;14 - 00;00;36;00 Unknown Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members...
info_outlineInside MySQL: Sakila Speaks
Pedro Andrade joins Fred and Scott to talk about how MySQL's mascot was named. Pedro shares a conversation he had with Ambrose Twebaze where they discuss the competition where Sakila was given her name. ------------------------------------------------------------- Episode Transcript: 00;00;09;13 - 00;00;32;08 Welcome to Inside MySQL: Sakila Speaks a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you...
info_outlineInside MySQL: Sakila Speaks
Fred & Scott have a chance speak with special guests from MyNA—the Japanese MySQL user association—which is one of the most active MySQL user groups in the world. ---------------------------------------------------------- Episode Transcript: 00;00;09;13 - 00;00;35;05 Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL project updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts. Bring you the latest updates on your favorite...
info_outlineInside MySQL: Sakila Speaks
In this episode, Fred & Scott share their history with MySQL - including when they first started using MySQL and discuss some of their favorite features. --------------------------------------------------------- Episode Transcript: 00;00;08;15 - 00;00;30;23 Welcome to Inside MySQL Sakila speaks - a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL product updates and insightful interviews with members of the MySQL community. Sit back and enjoy as your hosts bring you the latest updates on your favorite open-source database. Let's...
info_outlineOracle Ace Alkin Tezuysal joins leFred and Scott to introduce the MyVector plugin for MySQL Community Edition, bringing powerful vector search capabilities to your favorite open-source database. Learn how MyVector enables advanced AI and similarity search features, why this matters for modern applications, and how the MySQL community can easily get started.
-------------------------------------------------------------
Episode Transcript:
00:00.000 --> 00:25.000
Welcome to Inside MySQL: Sakila Speaks, a podcast dedicated to all things MySQL. We bring you the latest news from the MySQL team, MySQL product updates and insightful interviews with members of the MySQL community.
00:25.000 --> 00:32.000
Sit back and enjoy as your hosts bring you the latest updates on your favorite open source database. Let's get started.
00:32.000 --> 00:37.000
Hello and welcome to Sakila Speaks, the podcast dedicated to MySQL. I'm LeFred.
00:37.000 --> 00:38.000
And I'm Scott Stroz.
00:38.000 --> 00:47.000
Joining us today is Alkin Tezuysal. We know each other for a long time already and Alkin serves as Director of Services at Altinity Inc.
00:47.000 --> 00:55.000
Bringing over 30 years of experience in open source relational databases with deep expertise in MySQL, of course, and ClickHouse.
00:55.000 --> 01:08.000
He co-authored key references works including MySQL Cookbook 4th edition that came in 2022 and Database Design and Modeling with Postgres and MySQL in 2024.
01:08.000 --> 01:21.000
Alkin, you have been honored as MySQL Rockstar in 2023. And since this year, you are also an Oracle Ace Pro for MySQL. Congratulations and welcome to Inside MySQL: Sakila Speaks.
01:21.000 --> 01:23.000
Thank you very much, everyone.
01:23.000 --> 01:34.000
We're glad you're here. Alkin, as you may not know, this season of the podcast is dedicated to all things AI as it relates to MySQL and HeatWave.
01:34.000 --> 01:43.000
And you actually created or wrote a plugin for MySQL Community that kind of helped with that, MyVector.
01:43.000 --> 01:48.000
Can you give us an overview of what MyVector is and what problem it's meant to solve?
01:48.000 --> 01:50.000
Sure. Thank you very much for the question.
01:50.000 --> 02:00.000
And I'm very happy that this year of AI and HeatWave, everything that actually contributes to this technology because it's fairly new.
02:00.000 --> 02:06.000
It's been developing for many years, as we already know, but now it's in our hands.
02:06.000 --> 02:16.000
We can use it. We can definitely use it on our day-to-day activities, whether it's troubleshooting your dishwasher or your washing machine.
02:16.000 --> 02:20.000
But we could also use it in a business-wise database.
02:20.000 --> 02:29.000
So one correction I want to make is I am a contributor to MyVector plugin, not to author.
02:29.000 --> 02:34.000
The author is Shankar Iyer, and he's a developer for databases for many years.
02:34.000 --> 02:40.000
He's got a lot of experience where I've actually been presenting and supporting this project.
02:40.000 --> 02:49.000
And that's the small correction. Other than that, MyVector is a native plugin for MySQL that adds support for storing and searching high dimensional vectors.
02:49.000 --> 02:55.000
This is basically a very, in simple terms, what it does.
02:55.000 --> 03:00.000
And this has been in development for some time.
03:00.000 --> 03:14.000
And as we have seen other, you know, databases, other open source databases also went into this with the, you know, launching of AI to our, you know, end users.
03:14.000 --> 03:24.000
Adding approximate nearest neighbor n-search directly in SQL within MySQL database was kind of needed.
03:24.000 --> 03:29.000
And there has been similar implementations with MySQL.
03:29.000 --> 03:33.000
But MyVector is the open source version of that as a plugin.
03:33.000 --> 03:39.000
So just to wrap up that answer is MyVector column type for embedding storage.
03:39.000 --> 03:41.000
And there's a MyVector.
03:41.000 --> 03:46.000
There's a bunch of functions that MyVector distance for the similarity competition.
03:46.000 --> 03:50.000
Of course, it uses HNSW-based index algorithm, which is very popular.
03:50.000 --> 03:52.000
There's a white paper around it.
03:52.000 --> 04:01.000
It's not a rocket science or just something that was invented for MyVector that is known science.
04:01.000 --> 04:06.000
And basically, it provides an SQL native interface within MySQL.
04:06.000 --> 04:08.000
Hope that answers that question.
04:08.000 --> 04:10.000
Thank you very much, Alkin, yeah.
04:10.000 --> 04:22.000
It answers everything and very happy that you also, let's say, talk about the author that we already met also in Belgium recently.
04:22.000 --> 04:31.000
So I would like to ask you, so why is it important to have this similarity search indexes in MySQL then?
04:31.000 --> 04:40.000
Yeah. So again, going back to the AI-driven application, semantic search, product recommendation, question and answering, anomaly detection, etc.
04:40.000 --> 04:43.000
These really require a similarity searches.
04:43.000 --> 04:47.000
Have we done similarity searches in the past? Yes, we have.
04:47.000 --> 04:52.000
If you remember, this is a long, long time ago, but those technologies are still in effect.
04:52.000 --> 05:03.000
And we had search indexes like the Solr, this Phoenix, if you recall those, where we used to have a replica, generate index and search for it.
05:03.000 --> 05:10.000
I used to work for an e-commerce site and users would search for a product.
05:10.000 --> 05:15.000
And then we would also display the similar products.
05:15.000 --> 05:23.000
And in order to do that in MySQL, we had to use external services like, like I said, some search.
05:23.000 --> 05:25.000
So it is very important.
05:25.000 --> 05:29.000
But with the AI-driven application, it's not important anymore.
05:29.000 --> 05:30.000
It's a must have.
05:30.000 --> 05:35.000
Basically, you don't need to run a separate vector database.
05:35.000 --> 05:45.000
And basically, if the data is already in MySQL, you could use this technology using, you know, similarity search functionalities.
05:45.000 --> 05:49.000
Back at FOSDEM, you gave a presentation about MyVector.
05:49.000 --> 05:55.000
And over the weekend at FOSDEM, there were a lot of other sessions about vector and indexes.
05:55.000 --> 06:01.000
Has MyVector made any significant changes since you last talked about it in public?
06:01.000 --> 06:07.000
Yes, there was another public talk after FOSDEM that was a vector search conference.
06:07.000 --> 06:14.000
And we've had a bunch of talks about vector searches, vector technologies, which was around this open source databases, including MySQL.
06:14.000 --> 06:19.000
There were, I think, four or five MySQL talks around the vector search.
06:19.000 --> 06:33.000
From the development side, yes, there's one important improvement that was made that was the necessary support for binary distributions other than the Docker images.
06:33.000 --> 06:43.000
So we worked on those and built, you know, three different versions of MySQL binary distributions for testing, because it's more like a DIY.
06:43.000 --> 06:51.000
And you have to compile and everyone is not very competent enough or have enough time to compile MySQL.
06:51.000 --> 07:02.000
So we built images for 8.0 and 8.4 and 9x versions for easy testing.
07:02.000 --> 07:12.000
And there were some improvements on performance and index stability, of course, and so that's about it.
07:12.000 --> 07:18.000
Maybe it doesn't sound a lot, but this is a lot of work, basically, considering it's an open source project.
07:18.000 --> 07:21.000
Yeah, thank you. I can imagine it's a lot of work.
07:21.000 --> 07:31.000
So let's go now in the more technical, let's dig a bit in technical and a bit deeper there.
07:31.000 --> 07:41.000
So you said earlier that MyVector is using this HNSW, which is a hierarchical navigable small world indexes, right?
07:41.000 --> 07:48.000
Why was this type chosen over other or over alternatives?
07:48.000 --> 07:55.000
And do you know if or you yourself have tried alternatives or not?
07:55.000 --> 07:59.000
We would like to know a bit more about why that choice.
07:59.000 --> 08:01.000
That's a great question, actually.
08:01.000 --> 08:11.000
And when we first all heard or started knowing about this HNSW, hierarchical navigable small word for the n-search, like approximate nearest neighbor search.
08:11.000 --> 08:21.000
That was, it sounded like when I did my research and started reading about it, I think we met with you in London last year.
08:21.000 --> 08:26.000
We were talking about this, you know, the n-search and everything else.
08:26.000 --> 08:33.000
This is basically, I thought it was more like a de facto standard of the n-search.
08:33.000 --> 08:44.000
And it turned out to be that way because a lot of the other open source databases or implementations were circling around HNSW.
08:44.000 --> 08:49.000
And that's not to say that there are not other options out there.
08:49.000 --> 09:00.000
But usually when technologies like this launched, you don't go and reinvent the wheel, but basically build upon an existing technology.
09:00.000 --> 09:09.000
Since HNSW was widely available in terms of a knowledge wise, it was chosen HNSW.
09:09.000 --> 09:13.000
And, you know, it has high accuracy.
09:13.000 --> 09:16.000
It's a, it's got support for dynamic inserts and leads.
09:16.000 --> 09:19.000
And, and it has an efficient memory usage.
09:19.000 --> 09:21.000
These are the top three things that I know about it.
09:21.000 --> 09:31.000
But, you know, you know, from the other open source databases, like I said, the benchmarking were all circling around this.
09:31.000 --> 09:41.000
And if you were to use a different indexing, it would be very difficult to compare apple to apple from a different indexing perspective.
09:41.000 --> 09:52.000
So, I think, again, I'm, I'm not saying there are no other and methods there are, but they might be less accurate.
09:52.000 --> 09:54.000
They may have different, options.
09:54.000 --> 10:04.000
but if you want to kind of, play something in the market that everybody knows, it would be better off, using the known, methodologies.
10:04.000 --> 10:11.000
So you've given me something I need to look up so I know what I'm going to be doing over the weekend, which HNSW.
10:11.000 --> 10:18.000
so does the data that we use need to be trained before it can be indexed?
10:18.000 --> 10:20.000
No, there's no training.
10:20.000 --> 10:23.000
Basically it's the, it's the, embeddings.
10:23.000 --> 10:30.000
The, the, the difference between the training and the embeddings is you just need to generate the embeddings.
10:30.000 --> 10:55.000
And that's where, that's where an additional step, like if your data is already in the database and you, you want to use this, vector, search technology using HNSW indexing for the n-search, you need to generate the embeddings, whether externally or internally with, with, with a service or, something like that.
10:55.000 --> 11:18.000
We, I know that there are some, the, the other types of index that are maybe less popular, that, index the embeddings, sometime they also need to, to have some training before, but, yeah, this one doesn't, which is, which is good because every time you want to, to add the data, whatever, it's quite complicated if you want to train it.
11:18.000 --> 11:19.000
Right.
11:19.000 --> 11:20.000
Yeah.
11:20.000 --> 11:25.000
Basically it's, it's, it's generate the embedding, insert in the MySQL and build the HNSW index.
11:25.000 --> 11:38.420
So, as you are discussing about, this, this index, what, what I'm, curious because, I also try, I try and check, different type of flow indexes to, to understand what they do and what it is.
11:38.420 --> 11:46.160
but, I would like to know what's the size of this index compared to the actual size of the data, right?
11:46.440 --> 11:59.440
Because I know, and maybe it's the case, on your implementation that, the full representation of the, of the, of the vector is stored on the index on some of them or most of them.
11:59.440 --> 12:09.620
So I would like to know, if you have made some check there and, if, if the size is compared, right, to the, the, the full embeddings and the index.
12:09.920 --> 12:16.940
Just to recap that, the, the, the full vector is stored inside the index structure on a fast axis.
12:17.540 --> 12:22.560
So, so there's, there's no reference in back or anything like that.
12:22.560 --> 12:46.380
It's in the, this, we were talking about this, the size of the, index is that depends on the, vector dimension dimensions, a number of vectors that we're storing and, and then, and some of the parameters that, you know, per node, that, that index, but, we did some, some sizing and testing around it as yes.
12:46.380 --> 12:54.160
The accuracy increases when the dimensions are high as we know, and the size, size gets, gets higher.
12:54.440 --> 13:04.800
So, we're looking into this also, if there is any, any option to optimize that or, use some compression technology to, for this index.
13:05.160 --> 13:16.360
And, that's, something, is, is kind of, important to know that because this is not in, you know, DB, this is basically in the file system.
13:16.380 --> 13:19.080
And it needs to be, you know, placed correctly.
13:19.360 --> 13:32.200
You, you mentioned before how the, how MyVector is available and I know it's available as a plugin and just to clarify for our listeners, do you also provide the binary packages or do we still need to compile it, from the source?
13:32.980 --> 13:33.480
Yes.
13:33.480 --> 13:46.860
As I mentioned earlier, since FOSDEM the, the, the latest, release of, MyVector included, the, binary releases or, x86.
13:47.480 --> 13:53.840
And, and, that will, that is published in the GitHub page and, as open source.
13:54.100 --> 13:57.220
so you no longer need to compile it from the source.
13:57.220 --> 14:02.360
If you want to test it out, just to plug in, my, um,
14:02.360 --> 14:16.980
I have launched a new blog technical blog page and started blogging and, I will blog about this so, so that, the listeners and, and the readers can actually, have a link available in the, in the blog.
14:16.980 --> 14:28.440
So they can go in and test it out and Docker images were available, but, binary releases also added, recently, for the, for the test.
14:28.440 --> 14:30.380
Awesome. Thank you, Alkin.
14:30.720 --> 14:42.720
So last question, do you know if there are already, some, companies or, users, using MyVector in production or not yet?
14:43.260 --> 14:45.500
There are some POCs going on.
14:45.860 --> 14:47.700
And as you know, this is open source.
14:47.700 --> 14:58.360
there, there were some interest after force them, you know, we, people reached out, who were actually doing, this type of research and analysis.
14:58.360 --> 15:03.920
It was on the existing MySQL databases and, we provided help and information.
15:04.380 --> 15:05.540
They might have taken it.
15:05.600 --> 15:06.580
They might have forked it.
15:06.640 --> 15:08.980
They might have embedded into their existing implementation.
15:09.580 --> 15:16.500
we don't know, but, as far as I know, there, there are a few POCs are going, they're testing with the existing data.
15:16.500 --> 15:29.520
So, as you know, just to add up since FOSDEM, there has been one shift happened in this type of technology, which is the MCP servers.
15:29.520 --> 15:45.640
So, that is one thing that I wanted to add over here with that shift, generating embeddings and, and actually having an MCP server that will actually add context to the n-search.
15:45.640 --> 15:53.620
Like a chat bot implementation made it, made things a little bit more, not only useful, but also interesting.
15:54.320 --> 16:01.380
So say you have, you know, support tickets or some, some data that's actually related to your, you know, internal customers.
16:01.580 --> 16:03.600
You could add MCP server.
16:03.800 --> 16:06.160
There are some public MCP servers.
16:06.320 --> 16:09.780
There are some open source and MCP server implementations.
16:10.080 --> 16:12.300
And, and we're also looking into that.
16:12.560 --> 16:15.560
And, I want to mention over here on my last.
16:15.640 --> 16:24.200
talk during the vector search conference, I have actually, presented a MyVector with an MCP server demo.
16:24.200 --> 16:27.880
and, and that recording is should be available.
16:27.880 --> 16:30.900
So for that, and, and, and actually it works.
16:30.900 --> 16:44.520
this is pure MySQL open source, pure MyVector open source and pure MCP server open source with the, you know, clinical trials data, like the, one of the public data data sets that I've used.
16:44.520 --> 16:52.300
you can actually ask questions and it'll answer, and then you can continue the chat using this, this, very given technology.
16:52.300 --> 16:54.060
That's awesome.
16:54.060 --> 17:00.060
I've actually been playing around with, writing my own MCP servers and having them interact with MySQL.
17:00.060 --> 17:14.880
And I think MCP is going to be, is going to wind up being pretty, being pretty big because it does give that, domain specific context that the LLMs can actually use to generate content or answers or whatever.
17:14.880 --> 17:18.120
So I, I'm, I'm interested to play around with that.
17:19.000 --> 17:19.520
Absolutely.
17:19.740 --> 17:20.120
Absolutely.
17:20.120 --> 17:23.100
This is very interesting development, very recent.
17:23.740 --> 17:28.440
a lot of people are experimenting right now with the MCP servers.
17:29.160 --> 17:44.300
MCP servers, are going to be, I think the salt and pepper of, or, or sauce of, of this technology, you know, having the vectors and beddings and the, you know, n-search, HNSW index and the MCP server.
17:44.300 --> 17:47.220
So they, it's going to complete the puzzle in my opinion.
17:47.600 --> 17:50.920
And, and also, like I said, I've already given a demo.
17:51.060 --> 17:53.840
We are also looking into this, this technology.
17:54.120 --> 17:59.220
and, of course there's, that is also still under development.
17:59.740 --> 18:08.980
if you're opening up for a public MCP, then it's actually your, your security and compliance is, is now outside again.
18:08.980 --> 18:16.200
We want to do, if you want to do everything internally, if you want to do everything in your own database, with your own security and compliance.
18:16.700 --> 18:20.660
So that's a game changer to have something, available for yourself.
18:21.260 --> 18:21.660
Excellent.
18:21.840 --> 18:23.140
So thank you very much, Alkin.
18:23.660 --> 18:25.060
thank you for your time.
18:25.460 --> 18:27.740
We know you are on your boat right now, sailing.
18:27.960 --> 18:28.900
So that's awesome.
18:29.180 --> 18:29.580
Thank you.
18:29.640 --> 18:30.420
Thanks a lot, guys.
18:30.600 --> 18:31.120
Thank you, Alkin.
18:31.180 --> 18:33.120
And thank you for all your contributions to the community.
18:33.120 --> 18:36.880
That's a wrap on this episode of Inside MySQL: Sakila Speaks.
18:37.040 --> 18:38.280
Thanks for hanging out with us.
18:38.560 --> 18:42.320
If you enjoyed listening, please click subscribe to get all the latest episodes.
18:42.600 --> 18:45.540
We would also love your reviews and ratings on your podcast app.
18:45.860 --> 18:50.080
Be sure to join us for the next episode of Inside MySQL: Sakila Speaks.