Blog / May 2022

  • 24 May 2022Database Maintenance Completed

    The website was unavailable today (around noon CEST) for ~1h20m for database maintenance as described in the previous post. I'm sorry I didn't give advance notice today, just had to make do with the time available - I wasn't able to do this last Friday afternoon.

    The goal of this maintenance was to prevent the database from slowing down due to large indexes, as well as keep the full backups to a manageable size. The full backup is now 215 MB smaller (the gzipped file size) - pretty good!

  • 19 May 2022Downtime Friday 20th: Spring cleaning time for the database

    The website will be unavailable tomorrow (Friday May 20th, 2022) for up to an hour. The downtime will be sometime between 3-6 pm CEST.

    This is an annual "spring cleaning time" where I will run some manual queries on the database to trim stale data from very old, inactive accounts. The goal is to prevent the database from slowing down due to large indexes, as well as keep the full backups to a manageable size.

    As I have done in the past I will use the following criteria :

    • Delete flashcards from accounts inactive for 2 YEARS.
    • Delete private stories where the user's account was inactive for 5 YEARS.

    An inactive account is an account for which the user did not sign in within the period. For example if you sign in just once within the last 5 years, the account data is persisted. It is not necessary to do any activity - just sign in. In other words what I'm looking at is "the last time the user logged in".

    For flashcards : I have found in the past from emails asking "how do I reset my account?" that users ; when they come back after a long break ; usually want to delete all their cards and start over. It makes sense for how the SRS works after all. So the 2 YEAR interval seems sensible to me. Keep in mind on Koohii a "flashcard" refers to the SRS status for each kanji - not stories.

    For stories : we're looking at deleting private stories only, from old inactive accounts. This ensures the shared stories area is unaffected and as you'll see below, the shared stories only account for 7% of 7 million+ stories! Still, it's not unusual for users to come back after several years break, which is why I used a more sensible 5 year interval.

    How to Backup Your Data

    If you think you may take a long break, or have finished with Kanji Koohii for the foreseeable future - keep in mind your data is not "locked in". You can export all your stories from the My Stories page. This will download a CSV format file which you can load up in a typical spreadsheet app like Google Docs or Libre Office.

    The stories export could be useful also for eg. printing and browsing offline.

    You can also export your kanji flashcards from the Manage Flashcards section of the site. This is a CSV export and how you can use that data elsewhere is up to you. To clarify, this is the SRS status data.

    On Koohii, stories are independent of kanji cards. Many users are only interested in the stories section of the site, other users are only interested in the SRS. Some use both. Just remember : stories are stored independently of flashcards. You can remove a card and add it back later and the corresponding story for that kanji will still be there.

    More database nitty gritty and thoughts

    I have found over the years that as long as the full gzipped database backup is under 1 GB, it is manageable for me. It doesn't take too long to download, and if I need to debug something, it doesn't take forever either, to "copy" that database in my development environment.

    By running some statistics I have found in the past that the stories table in the database is, unsurprisingly, the most space consuming.

    As of a recent May backup, there are over 7,294,500 stories! (yes, that is seven millions). Of those, "only" 513,800 are shared - that is just 7% !

    So what this means, is when users go through RTK and input or copy shared stories for 2000 kanji, and then leave and never come back,... this just adds up to a lot of data over the years.

    Thankfully due to the nature of this service, I can afford to trim some of that excess data: it turns out that those users who do in fact return (after several years break) often want to "reset" their account and start over. I know that because I'd get occasional emails like "hey, I haven't used koohii in a while, how can I reset my account ?".

    More recently, the crude admin panel that I have lets me see when users delete accounts and create accounts - useful to pinpoint issues with registration, or abuse by spam bots. And I could see a user initiated an account deletion, and minutes later created a new account under the same name. I had a hunch that not all accounts deleted are necessarily users leaving and this confirmed it - many users want to "reset" their account after a long break.

    While we're here let's have some numbers - as of a recent May backup:

    Table         Rows        data_length   index_length
    ----------------------------------------------------
    reviews       8,513,399    194 MB       273 MB
    stories       7,294,513   1103 MB       191 MB
    custkeywords  1,834,099     58 MB        23 MB
    

    The reviews (ie. flashcards) table may have an issue with how I designed it. I'm not sure. It's pretty silly you can see that the index itself is larger than the data... although I think it is not unusual when the average row length is small (24 bytes).

    Based on some test runs tomorrow's "spring cleaning" will shave off 2,057,521 private stories (from accounts not used in 5+ years), of a total of 7,294,513. That is 28% of the stories table, and not negligible 300 MB trim. It will also shave 40% of the flashcards table (flashcards from accounts not used in 2+ years), another significant ~180 MB trim.

    Hope you enjoyed this little foray into the database!

    I will post an update when the site is back up tomorrow after the downtime.

  • 9 May 2022Minor update to the dashboard progress bar

    A minor update today adds a decimal to the percentage displayed on the main progress bar of the dashboard. Thanks UncomprehendingGale for the suggestion! (issue 261 on Github)

  • 6 May 2022New : Create a Review Deck from Japanese Text

    The Custom Review page now features a new mode called Create a Review Deck from Japanese Text.

    With this feature you may post raw text from Japanese video subtitles (or even Japanese stories or novels) and review all of the RTK kanji within that text using a one-off study deck.

    You can also use the "Shuffle cards" and "Kanji to Keyword" options in this mode. Please note the "Shuffle cards" option is now checked by default (this is meant to be convenient, assuming most people want to shuffle the kanji cards).

    I made sure the Repeat Review button works (from the Review Summary screen), so you don't have to paste the text again should you want to repeat the review. Keep in mind if you checked Shuffle cards, the kanji cards in that one-off study deck will be shuffled again when you repeat the review.

    When you paste text, you'll see the count of unique RTK kanji below the text area. For now, this feature filters out any non-RTK kanji, since those do not have keywords - and we need some kind of prompt on the flashcards. So keep in mind the count shown is the count of kanji present in the raw text, which are included in the RTK Volume 1 & 3 books (this is also affected by your selection of Old or New Edition in Account Settings). Many kanji used in placenames and family names for example are not part of RTK.

    If you have any related ideas or suggestions you can leave a comment on issue #259 on Github.

    You can also simply send feedback via the contact form.

    Thanks to long time user chamcham for this idea!

By Month