Database Format

From FreeM Wiki
Jump to navigation Jump to search

From David Wicksell

The first block in my global is a bottom level pointer. Maybe that's what a ROOT block always is? But there is no data in any other block, as it's a brand new global. Confusing.
Oh, I take that back. There is another block!
So with a global with even just one small record in it, there will be more than one block.
I suspect bottom pointer is called that because it's abstracted to be an upside down tree, and it's the root block for the global, which is the "bottom" of the global tree.
I still don't quite understand this format though.
That's block 0 of ^dlw. The 1, 3, and 1 on the top line appear to be pointers to another block, block 1.
[4:33 PM]
The end of that block has the metadata about the type of block it is, but the front of this block has at least one pointer to block number 1.
Yep, that is definitive now. Block 1, or the second block, contains my first key, the only key, in the global, so far.
[4:37 PM]
I will figure this out.
Ok, I see where key compression is done.
[4:46 PM]
I almost have the first block structure figured out.
[4:51 PM]
Ok, if the key you are going to insert isn't in the data block where it sorts to, then FreeM will lseek back in to the ROOT, and add the pointer first. So it has to traverse the path twice.
Ok, that's strange. The key format for canonical numbers is odd. It turns a subscript of (2) in to a 5 in the key.
[4:58 PM]
I believe it also turns a (0) in to a 3 and a (1) in to a 4. I guess it can't use ASCII 0, 1, or 2 for that. (edited)
[5:01 PM]
I can definitelly see ways to speed up globals in FreeM! For one thing, it recalculates the file path to the global over and over, even when it has the file descriptor stored already.
Ok, so I have the ROOT block, which is a bottom pointer block figured out, format wise.
[5:21 PM]
[5:24 PM]
Starting at the first byte of the block, the 1 is the length of the first key in the global. Then it skips a byte, and the 3 is the first key. Then, because the block is a pointer block, all keys are hard-coded with a 3 byte pointer to their data block. So the next three bytes, 00 00 01 are a pointer to the block containing key 3. Now key 3 is actually ^dlw(1), but that 1 is encoded in to a compact key form, and ends up as 3. I'm not 100% sure why yet. So that points to block 1. I'm inserting a new pointer for ^dlw(2) right now, so in a few minutes, this first block will have another key up on the top line, and I'll show that to you once it's there.
[5:25 PM]
FreeM also maintains a path through the blocks in a global, just like RSM does and just like GT.M/YottaDB do.
[5:28 PM]
In RSM, that path is recorded in an array of pointers to global buffer descriptors in shared memory, called blk, it is 12 pointers long. In FreeM, the path seems to be in 2 arrays of longs, the first traceblk stores the block numbers in the path, and the second, traceadr also stores numbers, but shorts instead of unsigned longs. So far, I haven't figured out what it stores semantically. Both hold 32 numbers, so that would support the idea that it can handle a B-tree of 32 levels, which is gigantic
dlw — Yesterday at 6:56 PM
Ok, before inserting the second key, here is what block 0, the ROOT, a BOTTOM pointer for ^dlw looks like:
"\001\000\003\000\000\001", '\000' <repeats 1010 times>, "\001\000\000\000\000\006\000\006"
You can see stuff at the beginning and the end. Starting at the beginning, is 001 then 000, which added together is 1 (because of key compression), the length of the first key. Then the key is 003 (which is the compact form of (1) I guess). Then, since it's a BOTTOM block, which is a pointer, and pointers are always 3 bytes, the next 3 bytes are 000 000 001, which when calculated is a pointer to block 1.

Then, looking at the end of the block, the third to last byte is 006, which is the block type, BOTTOM. Then the last two bytes are 000 006, which, when calculated is an offset in to the block itself, of 6, meaning that it's the beginning of the free space in the block, offset from the beginning of the block. Then, to the left of the 006 type, is 3 bytes of 000 000 000, which would be the right pointer in a data block (maybe a pointer block too), but in the ROOT block, it's the number of free blocks in the global, which is 0, since there are only 2 blocks right now, and neither are empty. Then, one byte to the left of that is 000, which is the collation for the block, and a 0 is just M collation. Then, the three bytes before that are 000 000 001, and if this were a data block (maybe a pointer block too), it would be the left pointer, but since this is the ROOT block, it's the number of total blocks in the global, in this case 1. Since there is the ROOT block and a data block, the total block count must not include the ROOT block, but is how many more blocks.

Next we'll look at the second block, a data block.
dlw — Yesterday at 7:04 PM
Ok, before inserting the second key, here is what block 1, the DATA block, for ^dlw looks like:
"\001\000\003\0011", '\000' <repeats 1016 times>, "\b\000\005"

You can see stuff at the beginning and the end. Starting at the beginning, is 001 then 000, which added together is 1 (because of key compression), the length of the first key. Then the key is 003 (which is the compact form of (1) I guess). So far, identical to the ROOT, but once more keys are there, enough to need more blocks, then that will likely change. Then, since it's a DATA block, which is not a pointer, the next byte is the length of the data record at the key, which is a length of 1. Then, the next byte is 1 (ASCII 49, since the compact form is only in keys, and this is data), the data at ^dlw(1).

Then, looking at the end of the block, the third to last byte is \b (which is 008), which is the block type, DATA. Then the lsat two bytes are 000 005, which, when calculated is an offset in to the block itself, of 5, meaning that it's the beginning of the free space in the block, offset from the beginning of the block. Then, to the left of the 008 type, is 3 bytes of 000 000 000, which would be the number of free blocks in the ROOT, but since this is DATA, is the right pointer. There are no blocks to the right, so it's 0. The, one byte to the left of that is 000, which in a DATA block is a reserved byte, and always 0. You could do something with that in the future. Then, the three bytes before that are 000 000 000, which would be the number of total blocks in the ROOT, but since this is DATA, is the left pointer. There are not blocks to the left, so it's 0.
[7:06 PM]
So that's what I've worked out so far. One observation, currently a key length is stored in two bytes, a byte for the length, and one for how much is compressed. So a key in a global in FreeM is restricted to 255 bytes without changing the format. The more important observatoin is that the length of a data record, or node, is stored in a single byte, also limiting it to 255 bytes without changing the format. Since I know you want to raise your string limits, you'll have to look at that first. I can help you figure out where to go and what to do if you like.

dlw — Yesterday at 7:11 PM
By the way, I'm doing this for two reasons. First, I love this sort of code, so I'm enjoying it. Second, and more importantly, I know you hate it, so I'm trying to help you out, because I want you to succeed at increasing the string length in FreeM.

dlw — Yesterday at 7:55 PM
Ok, I understand the FreeM block structure much better now.
Ok, some weird things happen, they may or may not matter. I'll have to look in to that. But no block joins happen at all. Rather, most of a block that has all of its keys removed stays there empty.
[12:37 AM]
I had a global with 1 pointer block and 3 data blocks, and I killed all the records in the first and third data block, and for the most part, they are just sitting there all empty, while the second block is still full. The weirdness is that they both have one record still in them, and I'm not sure why. It's the last key, and it appears to have been moved to the front of the block. So maybe a FOR loop in FreeM is not inclusive in its END condtion, but exclusive. Or maybe something else is going on. Now I'm going to remove keys from the middle of the second block to see what happens.
[12:41 AM]
Blocks in FreeM do not contain their block number.
[12:41 AM]
I guess I just assumed they would.
Ok, so that weirdness seems to just be an artifact, and I know why.

dlw — 1:06 AM
I was removing records in a loop, and when it got to the last record in a block, it didn't bother to remove it, instead it left it there, and just marked the block as empty. So it definitely does not do joins, it will remove records, and each time it does, it will re-key the block so that the keys after the one that was removed, are moved up to fill in the empty spot. Once it gets to that last record, it just marks the block free and leaves the record there, as the first record in the block.

One thing I don't quite understand is that I created records to fill 3 data blocks, so there was 1 bottom pointer and 3 data blocks, but after all of that, what was left was block 0 was the bottom pointer, it has one pointer to block 2. Block 1 is marked empty and has one record in it, the original last record, as according to my last paragraph. Block 2 which I removed like 50 records from the middle of the block, and the records after it were moved up to fill the hole. Block 3 is marked empty and has one record in it at the beginning, the original last record, as according to my last paragraph. It also has a bunch of records at the end, but it's marked empty, so that seems fine. But why did leave them there, and why don't they match what I'd expect? They have records that were originally in block 3, and some of them are still there in block 3. So that part is a mystery, based on the code I ran, and the state of the blocks at each point. Block 5 is really weird. It was never used for data. But it has a few bits of things in it, and is marked as something I have no clue about, an FBLK. So not an EMPTY, but an FBLK. It is used though, as it has an offset and something before that offset and has that FBLK marker. So I'll have to figure out what an FBLK is and what it's used for. It's the one block type I don't know yet.
[1:07 AM]
Long story short, FreeM will always fill in holes during record deletion, by moving later records up to fill them, and it does not do block joins, it will just mark the block as empty. Both are things RSM does not do.
[1:10 AM]
Ohhhh!
[1:10 AM]
I know what it is!
[1:10 AM]
Clever!
[1:12 AM]
It's the free block list! So once FreeM has freed a block that had data in it, it creates a new block called the free block (FBLK), which is just a list of pointers to free blocks in order. So in my testing data, it just has a pointer to block 1 and block 3, which I told you above are both free. Does it always maintain that block at the end of the global somehow, or does it have another way of knowing where to find it? I'll have to figure that out.

dlw — 1:15 AM
Ooo, that's how!
[1:17 AM]
Remember earlier I told you that the ROOT block, always a BOTTOM pointer, didn't have a left and right pointer, since it has no siblings, so instead it uses those fields to store the number of blocks other than the ROOT that are in the global, and the number of free blocks? Yeah, it's not the number of free blocks, it's the block number that stores the FBLK, or free block list! So it can always find that block when it needs a new block, before it would add a new block to the global. So instead of a map block using a bitmap, like RSM uses, it has a pointer from the ROOT block to a FBLK, which maintains the list of free blocks!
[1:18 AM]
So there you go, other than a few details on internal key formats, I have taught you pretty much the entire globals module tonight! I have not dug in to block splits, but I'll do that soon, and let you know any details.
[1:21 AM]
So, remember that because all of the block pointers are 3 bytes, that means a global in FreeM can be at most, 16 MiB blocks, and at 1 KiB per block, that means no larger than 16 GiB.
[1:22 AM]
Just an FYI or a reminder.
One other thing, FreeM's index/pointer block keys are different than RSM or GT.M/YottaDB.

In GT.M/YottaDB, an index/pointer block stores keys such that the first key, which points to the first data block for that global, is a key that points to every key from the root of the global all the way to, and including, that key. In other words, the keys in the index blocks are the last keys stored in the data block they point to. Then at the end, there is a star pointer (a literal *, for everything else) that points to the last data block, containing any keys that collate after the previous key.

In RSM, an index/pointer block stores keys such that the first key (always the null/empty/root key), which points to the first data block for that global, is a key that points to every key from the root of the global all the way to, and excluding the next key in the index block. In other words, the keys in the index blocks are the first keys stored in the data block they point to, and for the first key, even if there is no data at the root of the global, that key is still stored, just with no data if it doesn't exist (and it flagged as such in the entry for the global in the global directory). So the last key points at the first key in the last block that contains the keys to the end.

In FreeM, an index/pointer block stores keys such that the first key, which points to the first data block for that global, is a key that points to every key, starting with that key, up to, and excluding, the next key in the index block. In other words, the keys in the index blocks are the first keys stored in the data block they point to. But unlike RSM, there is no key for before the first stored key (the null/empty/root key), the first key stored is the first key in the global. Also, unlike GT.M, there is no star pointer at the end. So every index pointer has to be updated if that key is deleted, with another one.
t's basically a separately maintained linked-list that stores pointers to free blocks. I'm not sure if it would reorganize the pointers in order to keep them in block number order or not though.
[11:14 AM]
The root bottom pointer block stores the first free block in its footer, and then that block maintains a right pointer to the next free blook in the linked list, which itself has a left pointer back to the first, and so on. So it's a doubly-linked list.
[11:16 AM]
So if you wanted to calculate how many used blocks there are, you'd have to grab the block total from the ROOT footer, and then grab the block free pointer to the first free block, and walk through it, and its siblings, via the right pointers, marking down every block as free, and then what's left is used. There is no direct way to know what blocks are used, without walking the entire B-tree, other than the free block walk I just mentioned, which would be much faster.

dlw — 11:21 AM
The FBLK format is super simple. The footer maintains a left and right pointer, a block type of FBLK or 1, and an offset to the next available free space in the block. And then is just lists block numbers, 3 bytes at a time.
[11:21 AM]
No keys, or data, or anything else, just block pointers to free blocks, blocks with types of FREE or 0.
[11:23 AM]
The ROOT block's "right" pointer, points to the first FBLK for that global. The ROOT block's "left" pointer, stores the total number of blocks in the global, not including itself.
[11:23 AM]
But that does include FREE blocks and FBLK blocks.
[11:25 AM]
Because no spaces are allowed between keys, you can easily calculate how full a block is by simply subtracting the OFFSET (stored as the last two bytes in the footer) from DATALIM (the size of the block minus the footer, hard-coded), which then gives you how much free space is left in that block. True for all block types, other than FREE, which just have DATALIM of free space.
[11:26 AM]
Unlike RSM, which zeroes out blocks when they are garbage collected, FreeM leaves the data as-is when using or reusing a block, just setting the type to FREE, and the OFFSET to 0.
[11:26 AM]
Though when it goes to fill data in to a formerly FREE block, it will then zero the block I believe.

An older conversation about the key format:

The first byte is 1, that says that the key is 1 byte in length. The second byte is 0, that says there is no key compression. The third byte is 4B, which is 75 in decimal. That is the encoded character that encodes for this first key of length one. Here we use our decoding chops.
[12:09 PM]
75 is greater than 31, so we shift its bit pattern `0100 1011` to the right by 1, resulting in `0010 0101`, which is decimal 37. And since it was greater than 31, that's all you do to decode it (between 0 and 19 you do something else, and between 20 and 31, you do something else as well). And ASCII 37 is a % character, which is the full key for that node!

dlw — 12/16/23, 12:10 PM
The fourth byte is 0C, which is 12 decimal, which is the length of the data at that node, so 12 characters. Then the next 12 characters make up the data, and they don't encode subscript delimiters of course, so it's straight forward. Here that is:

dlw — 12/16/23, 12:18 PM
53 63 72 61  74 63 68 20  61 72 65 61 - which in decimal are - 83 99 114 97  116 99 104 32  97 114 101 97 - which are the ASCII codes for "Scratch area".
[12:18 PM]
So that key and its data are ^XVEMS("%")="Scratch area", which you can see is what FreeM things as well:
USER> w ^XVEMS("%")
Scratch area
USER>
[12:19 PM]
There was your first walk through of how to decode a FreeM key in a global! (edited)
[12:20 PM]
I'll do the next key, just to show you.
[12:21 PM]

[12:21 PM]
This one has a longer key, and a shorter data node.

dlw — 12/16/23, 12:28 PM
So, the first byte is 9, so the key is 9 bytes. The second byte is 0, so no key compression here either. So then the next 9 bytes, the key, are:
4B 12 02 04  12 11 A6 90  99 - which converted to the decimal we find easier to think about is - 75 18 2 4  18 17 166 144  153 - so now we have to decode them each in turn. First up is 75, which is higher than 31, so we take its bit pattern `0100 1011` and shift it to the right by one, leading to `0010 0101`, which is 37 decimal, or `%` again. That really should have been part of key compression, no idea what it wasn't. But,here we remember that these characters can encode that they are a subscript boundary, and they do that in their encoded form, which for `%` here, was 75. Remember the 75 bit pattern? It ended with a 1 in the LSB, which means this encodes as a delimiter too, so we now have `"%",` at this point.

I'll continue on the next message.
[12:31 PM]
The second key byte is 18, with a bit pattern of 0001 0010 which does not end in 1, so this does not encode another , or the end of the key. Since this is between 0 and 19, we have to do an addition after the shift, but the shift is still one to the right, for 0000 1001, which is 9. Then, we add '0' to it, since this is numeric, and '0' is ASCII 48, so we get ASCII 57, which codes the number 9 as a string. So now our key is "%",9. On to the next byte, the third byte, which is 2. The bit pattern is 0000 0010 which doesn't end in 1, so doesn't end the subscript. It's a 2. That gets right shifted, to a 0000 0001, or a 1,  which is then added to '0' (ASCII 48) to get ASCII 49, which is the code for a string of 1, which is not surprising, now that you see the patter. The key is at "%",91 now. (edited)

dlw — 12/16/23, 12:41 PM
Now we do the fourth byte, which is 4, same story, bit pattern is 0000 0100, not the end of a subscript, and it shifts to 0000 0010 a 2, which when added to '0' 48 again, becomes the string 2. Notice the pattern? When dealing with numerics or numbers, FreeM encodes them as what their ASCII code number is, and then just adds 48 to them, to get back their string equivalent. But since the get right shifted, and lose their last bit, they take up two places in the ASCII code.

For example, ASCII 0 and 1 will result in a 0, 2 and 3 will result in a 1, 4 and 5 will result in a 2, and so on. So, you can take the ASCII number, if it's even, and cut it in half to get right to the number in a string form. And if it's odd, that means it is the last character for that subscript, and you can then subtract 1, and then cut it half to get the character, so an 18 is the number 9, while a 19 is the number 9, and it's the last character of the subsript!

So to finish up the key, we are now at the fifth character, an 18, followed by a sixth, which is a 17. After that, the numbers are way over the 0 to 19 range for numbers. So an 18 is even and cut in half is 9, the next character decoded. The 17 is the first odd one, so it's the end of the subscript, take off 1, and cut in half, and you have 16 to 8, the last character. So our key now looks like "%",91298,.
[12:41 PM]
Now on to the last three characters, 166 144 153! (edited)
[12:46 PM]
So all are over 31, so only get right shifted. The first two are even, so they are part of a subscript, and the last one is odd, so it's the end of a subscript. But since the key was only 9 characters, FreeM won't put a , after them, but will end it with the ) instead.

So 166 is 1010 0110, which right shifted is 0101 0011, which is a decimal 83, or the character S. 144 is 1001 0000 shifted right is 0100 1000a decimal 72, ASCII H. Then a 153, which is 1001 1001, shifted right is 0100 1100, a decimal 76, an ASCII L, so the last subscript is "SHL", and our key is finished!

It is ("%",91298,"SHL").
The next byte is a 3, so the data node is 3 characters, which are 52 55 4E, convert them to decimal and we have 82 85 78, which are the ASCII codes for R U N, and as data nodes aren't encoded with any shifting or masking or anything, that's just what they are! So the ("%",91298,"SHL") key stores "RUN". Now let's see if I was correct.
[12:50 PM]
USER> w $q(^XVEMS)
^XVEMS("%")
USER> w $q(^XVEMS("%"))
^XVEMS("%",91298,"SHL")
USER> w ^XVEMS("%")
Scratch area
USER> w ^XVEMS("%",91298,"SHL")
RUN
USER>

Yep! So that is how you decode global data in FreeM!
Canonical numbers are shifted down in the control character range, so that they collate before strings, and that means control characters are unavailable for use. . and - are likewise moved to the area right after that, taking up a few other control characters. And strings are shifted up to over ASCII 127, taking over the higher bit, meaning only 7-bit ASCII is supported, no extended character sets.

So the characters that it intends to allow in keys are ASCII 32 - 126, and that's it. Though it does something with 127, but I haven't figured it out yet.

Serena Bloodfeather — 12/16/23, 1:22 PM
in light of everything you learned, have you gotten any new insights into how difficult it would be to get us to where we can support block sizes larger than 1KiB and a commensurate increase in key and node length? strictly in the globals themselves, not accounting for the interpreter having its own issues with long strings.

dlw — 12/16/23, 1:25 PM
There are only two things preventing a larger block size. First, BLOCKLEN is hard-coded, but you know that. Second, all the macros that offset off of BLOCKLEN are hard-coded, and would need to support scaling.

If you fix those two things, you can have bigger block sizes. But, for longer keys and longer data, right now, each stores their lengths in a byte, limiting them to 255 characters. So you'd have to adjust the data size for key and/or record lengths to support larger ones.

Serena Bloodfeather — 12/16/23, 1:25 PM
that's basically what I was thinking

dlw — 12/16/23, 1:26 PM
Sorry, they would not need to support scaling, they just need to follow the offset of a different BLOCKLEN. That's even easier than I was saying.

Serena Bloodfeather — 12/16/23, 1:26 PM
haha that's what I thought you meant by scaling

dlw — 12/16/23, 1:26 PM
So if you set BLOCKLEN based on a user configuration, that should support block sizes larger than 1024 without much else.
[1:26 PM]
Scaling is multiplying, offsetting is adding.

Serena Bloodfeather — 12/16/23, 1:26 PM
I've tried changing the BLOCKLEN macro to larger values, and FreeM did not behave well at all

dlw — 12/16/23, 1:27 PM
I'd have to follow all the places that use it to see why.
[1:30 PM]
BLOCKLEN is defined in 3 places. It should just work if you change it in all three to the same size.

Serena Bloodfeather — 12/16/23, 1:31 PM
gfix.c, fma_globals.h, and mpsdef0.h?

dlw — 12/16/23, 1:31 PM
Yes
[1:31 PM]
But, you'd have to blow away every global you already have, and start with a brand new instance environment.

Serena Bloodfeather — 12/16/23, 1:31 PM
yeah
[1:32 PM]
it wouldn't have much utility until nodes and/or keys can be longer than 255 characters

dlw — 12/16/23, 1:32 PM
Until you store the BLOCKLEN itself in block 0, which is totally possible, and probably easy.