• 30 August 2016More brainstorming of the database performance

    Since the last post I've tried different techniques and tested the performance of various indexes in the database. I thought I had a decent solution now, but I am still seeing queries from 30 to 100 ms. 100 ms is ten times better than 1 second, but it's still pretty bad considering I'm running the queries on my computer, where I am the only user!

    I'm pretty sure there is a significant improvement already if I updated the site with the changes I've made.

    One good improvement already is that the "New Stories" and "Shared Stories" can both be cached together. This wasn't the case before for odd reasons involving how I wrote the SQL queries. So a better caching of the shared stories is a good thing.

    Another thing is that the improved schema and indexing of the stories removed the temporary table step from the queries, and I suspect that was a bottleneck when many users are accessing the site at the same time.

    However today I've just run into the notion of Partitioned Tables. This sounds like it could be very useful. Indeed only 6 to 7 % of 6+ million rows, that is approx. 6 % of 1 GIGABYTE of stories is actually being publicly shared. And I suspect a big performance impact is due to how some of the older stories that have many votes are at beginning of the physical storage, while newer stories that get one or two votes will be towards the ends. Likely what happens is that MySQL, even with a good index, has to collect stories that are spread in a 1 GB physical storage space.

    It would be far more efficient if all the public stories were all clustered together on disk. Even if the server has SSD drives, a 1GB table won't fit in memory and that also limits the performance.

    So I need to look into this Partitioned Tables thingamagic right now. In theory if I can have a partition store only the public stories, it will be less than a hundred megabytes (ie. ~7 % of 1 GB) which may fit into memory. And even if it doesn't fit in memory it means all the stories pulled by the Shared Stories query will be clustered together on the physical storage.

    Assuming the other partition would be private stories, and it takes approx 900+ MB .. it is not impacted in the same way because typically what the website does is simply read ONE row of the database to retrieve whichever story the user is viewing / editing. So in that case the index works fine.

    All in all it's nice to get to grips with MySQL.. and I need to take care of this because it's a problem that only gets worse as the database grows. In fact this problem wasn't there several years ago. I'm surprised actually that no one ever tells me about it... but I suspect the reason for that is that when studying you advance through the pages at a relatively slow pace.

    But even when users themselves don't feel the lag too much when browsing the Study pages, it will affect users who are reviewing! While there are different tables the MySQL server will be busy with slow queries and when people are reviewing this can cause lag too. That is why I really want to solve this problem.

    Gaining this experience now is crucial also because the flashcards table is growing too and it's nearing 500 MB as I write and will run into performance issues as well. In fact today if I want to add any kind of feature to the flashcards or stories, I am invariably expanding the size of those tables.