• 21 May 2013Remembering the Kanji 6th Edition Support

    I've been procrastinating on this for a long time... it's just one of those things that require a lot of changes and testing, but is not really exciting to do.

    Technically speaking, it's not very difficult. It will require an extra table in the database. It will have an impact on speed of querries, but maybe I'll be able to simplify and reduce the number of querries in some places... the code is really old.

    So far I'm identifying these tasks:

    * Manage Pages: add an explicit "RTK Nth Edition" label anywhere the user is expected to enter a frame number. The label would be a clickable link that takes you to the Options page to set the current index or "sequence" for the characters.

    * Options: a new Options page is required for the user to chose the character sequence. For most people this will be 5th or 6th edition, however it will also be possible to add community-made sequences, such as "RTK Lite".

    * Refactoring: pretty much 80% of the MySQL queries assume that there is one fixed sequence of characters, they all have to be updated with an additional table to "map" the actual UTF-8 character to an arbitrary sequence number (aka "frame number" for Heisig's method).

    * Additional refactoring is required because the total number of characters is not the same. 6th edition includes 2200 characters, while 5th edition includes 2042 characters. Simple logic like: if frame_num < 2043 then book_title = "Volume 1". won't work.

    * More refactoring is required because if we want to support custom sequences such as RTK Lite , we can no longer assume that the frame number sequence is sequential, and it may have gaps. So eg, instead of 1, 2, 3, 4, 5 a custom sequence could look like 1, 3, 20, 25 (random example).

    * Another thing to consider is to allow for non-digit sequence numbers. I don't remember for sure right now but I think 6th edition includes frame numbers like "123a" and "123b". At this point, you can see that the frame number can no longer be treated as an index in the database, at all. It has to be a completely arbitrary label, that can be linked to any UTF-8 character. There is still a sequence, but internally in the database used for optimizing querries and probably for sorting as well.

    * Testing: eventually I'll have to go through all features, systematically, and test them one by one to check that everything is working. When you refactor a lot of code, it's almost certain that things will be broken.

By Month