Database Format
Jump to navigation
Jump to search
From David Wicksell
The first block in my global is a bottom level pointer. Maybe that's what a ROOT block always is? But there is no data in any other block, as it's a brand new global. Confusing. Oh, I take that back. There is another block! So with a global with even just one small record in it, there will be more than one block. I suspect bottom pointer is called that because it's abstracted to be an upside down tree, and it's the root block for the global, which is the "bottom" of the global tree. I still don't quite understand this format though. That's block 0 of ^dlw. The 1, 3, and 1 on the top line appear to be pointers to another block, block 1. [4:33 PM] The end of that block has the metadata about the type of block it is, but the front of this block has at least one pointer to block number 1. Yep, that is definitive now. Block 1, or the second block, contains my first key, the only key, in the global, so far. [4:37 PM] I will figure this out. Ok, I see where key compression is done. [4:46 PM] I almost have the first block structure figured out. [4:51 PM] Ok, if the key you are going to insert isn't in the data block where it sorts to, then FreeM will lseek back in to the ROOT, and add the pointer first. So it has to traverse the path twice. Ok, that's strange. The key format for canonical numbers is odd. It turns a subscript of (2) in to a 5 in the key. [4:58 PM] I believe it also turns a (0) in to a 3 and a (1) in to a 4. I guess it can't use ASCII 0, 1, or 2 for that. (edited) [5:01 PM] I can definitelly see ways to speed up globals in FreeM! For one thing, it recalculates the file path to the global over and over, even when it has the file descriptor stored already. Ok, so I have the ROOT block, which is a bottom pointer block figured out, format wise. [5:21 PM] [5:24 PM] Starting at the first byte of the block, the 1 is the length of the first key in the global. Then it skips a byte, and the 3 is the first key. Then, because the block is a pointer block, all keys are hard-coded with a 3 byte pointer to their data block. So the next three bytes, 00 00 01 are a pointer to the block containing key 3. Now key 3 is actually ^dlw(1), but that 1 is encoded in to a compact key form, and ends up as 3. I'm not 100% sure why yet. So that points to block 1. I'm inserting a new pointer for ^dlw(2) right now, so in a few minutes, this first block will have another key up on the top line, and I'll show that to you once it's there. [5:25 PM] FreeM also maintains a path through the blocks in a global, just like RSM does and just like GT.M/YottaDB do. [5:28 PM] In RSM, that path is recorded in an array of pointers to global buffer descriptors in shared memory, called blk, it is 12 pointers long. In FreeM, the path seems to be in 2 arrays of longs, the first traceblk stores the block numbers in the path, and the second, traceadr also stores numbers, but shorts instead of unsigned longs. So far, I haven't figured out what it stores semantically. Both hold 32 numbers, so that would support the idea that it can handle a B-tree of 32 levels, which is gigantic dlw — Yesterday at 6:56 PM Ok, before inserting the second key, here is what block 0, the ROOT, a BOTTOM pointer for ^dlw looks like: "\001\000\003\000\000\001", '\000' <repeats 1010 times>, "\001\000\000\000\000\006\000\006" You can see stuff at the beginning and the end. Starting at the beginning, is 001 then 000, which added together is 1 (because of key compression), the length of the first key. Then the key is 003 (which is the compact form of (1) I guess). Then, since it's a BOTTOM block, which is a pointer, and pointers are always 3 bytes, the next 3 bytes are 000 000 001, which when calculated is a pointer to block 1. Then, looking at the end of the block, the third to last byte is 006, which is the block type, BOTTOM. Then the last two bytes are 000 006, which, when calculated is an offset in to the block itself, of 6, meaning that it's the beginning of the free space in the block, offset from the beginning of the block. Then, to the left of the 006 type, is 3 bytes of 000 000 000, which would be the right pointer in a data block (maybe a pointer block too), but in the ROOT block, it's the number of free blocks in the global, which is 0, since there are only 2 blocks right now, and neither are empty. Then, one byte to the left of that is 000, which is the collation for the block, and a 0 is just M collation. Then, the three bytes before that are 000 000 001, and if this were a data block (maybe a pointer block too), it would be the left pointer, but since this is the ROOT block, it's the number of total blocks in the global, in this case 1. Since there is the ROOT block and a data block, the total block count must not include the ROOT block, but is how many more blocks. Next we'll look at the second block, a data block. dlw — Yesterday at 7:04 PM Ok, before inserting the second key, here is what block 1, the DATA block, for ^dlw looks like: "\001\000\003\0011", '\000' <repeats 1016 times>, "\b\000\005" You can see stuff at the beginning and the end. Starting at the beginning, is 001 then 000, which added together is 1 (because of key compression), the length of the first key. Then the key is 003 (which is the compact form of (1) I guess). So far, identical to the ROOT, but once more keys are there, enough to need more blocks, then that will likely change. Then, since it's a DATA block, which is not a pointer, the next byte is the length of the data record at the key, which is a length of 1. Then, the next byte is 1 (ASCII 49, since the compact form is only in keys, and this is data), the data at ^dlw(1). Then, looking at the end of the block, the third to last byte is \b (which is 008), which is the block type, DATA. Then the lsat two bytes are 000 005, which, when calculated is an offset in to the block itself, of 5, meaning that it's the beginning of the free space in the block, offset from the beginning of the block. Then, to the left of the 008 type, is 3 bytes of 000 000 000, which would be the number of free blocks in the ROOT, but since this is DATA, is the right pointer. There are no blocks to the right, so it's 0. The, one byte to the left of that is 000, which in a DATA block is a reserved byte, and always 0. You could do something with that in the future. Then, the three bytes before that are 000 000 000, which would be the number of total blocks in the ROOT, but since this is DATA, is the left pointer. There are not blocks to the left, so it's 0. [7:06 PM] So that's what I've worked out so far. One observation, currently a key length is stored in two bytes, a byte for the length, and one for how much is compressed. So a key in a global in FreeM is restricted to 255 bytes without changing the format. The more important observatoin is that the length of a data record, or node, is stored in a single byte, also limiting it to 255 bytes without changing the format. Since I know you want to raise your string limits, you'll have to look at that first. I can help you figure out where to go and what to do if you like. dlw — Yesterday at 7:11 PM By the way, I'm doing this for two reasons. First, I love this sort of code, so I'm enjoying it. Second, and more importantly, I know you hate it, so I'm trying to help you out, because I want you to succeed at increasing the string length in FreeM. dlw — Yesterday at 7:55 PM Ok, I understand the FreeM block structure much better now. Ok, some weird things happen, they may or may not matter. I'll have to look in to that. But no block joins happen at all. Rather, most of a block that has all of its keys removed stays there empty. [12:37 AM] I had a global with 1 pointer block and 3 data blocks, and I killed all the records in the first and third data block, and for the most part, they are just sitting there all empty, while the second block is still full. The weirdness is that they both have one record still in them, and I'm not sure why. It's the last key, and it appears to have been moved to the front of the block. So maybe a FOR loop in FreeM is not inclusive in its END condtion, but exclusive. Or maybe something else is going on. Now I'm going to remove keys from the middle of the second block to see what happens. [12:41 AM] Blocks in FreeM do not contain their block number. [12:41 AM] I guess I just assumed they would. Ok, so that weirdness seems to just be an artifact, and I know why. dlw — 1:06 AM I was removing records in a loop, and when it got to the last record in a block, it didn't bother to remove it, instead it left it there, and just marked the block as empty. So it definitely does not do joins, it will remove records, and each time it does, it will re-key the block so that the keys after the one that was removed, are moved up to fill in the empty spot. Once it gets to that last record, it just marks the block free and leaves the record there, as the first record in the block. One thing I don't quite understand is that I created records to fill 3 data blocks, so there was 1 bottom pointer and 3 data blocks, but after all of that, what was left was block 0 was the bottom pointer, it has one pointer to block 2. Block 1 is marked empty and has one record in it, the original last record, as according to my last paragraph. Block 2 which I removed like 50 records from the middle of the block, and the records after it were moved up to fill the hole. Block 3 is marked empty and has one record in it at the beginning, the original last record, as according to my last paragraph. It also has a bunch of records at the end, but it's marked empty, so that seems fine. But why did leave them there, and why don't they match what I'd expect? They have records that were originally in block 3, and some of them are still there in block 3. So that part is a mystery, based on the code I ran, and the state of the blocks at each point. Block 5 is really weird. It was never used for data. But it has a few bits of things in it, and is marked as something I have no clue about, an FBLK. So not an EMPTY, but an FBLK. It is used though, as it has an offset and something before that offset and has that FBLK marker. So I'll have to figure out what an FBLK is and what it's used for. It's the one block type I don't know yet. [1:07 AM] Long story short, FreeM will always fill in holes during record deletion, by moving later records up to fill them, and it does not do block joins, it will just mark the block as empty. Both are things RSM does not do. [1:10 AM] Ohhhh! [1:10 AM] I know what it is! [1:10 AM] Clever! [1:12 AM] It's the free block list! So once FreeM has freed a block that had data in it, it creates a new block called the free block (FBLK), which is just a list of pointers to free blocks in order. So in my testing data, it just has a pointer to block 1 and block 3, which I told you above are both free. Does it always maintain that block at the end of the global somehow, or does it have another way of knowing where to find it? I'll have to figure that out. dlw — 1:15 AM Ooo, that's how! [1:17 AM] Remember earlier I told you that the ROOT block, always a BOTTOM pointer, didn't have a left and right pointer, since it has no siblings, so instead it uses those fields to store the number of blocks other than the ROOT that are in the global, and the number of free blocks? Yeah, it's not the number of free blocks, it's the block number that stores the FBLK, or free block list! So it can always find that block when it needs a new block, before it would add a new block to the global. So instead of a map block using a bitmap, like RSM uses, it has a pointer from the ROOT block to a FBLK, which maintains the list of free blocks! [1:18 AM] So there you go, other than a few details on internal key formats, I have taught you pretty much the entire globals module tonight! I have not dug in to block splits, but I'll do that soon, and let you know any details. [1:21 AM] So, remember that because all of the block pointers are 3 bytes, that means a global in FreeM can be at most, 16 MiB blocks, and at 1 KiB per block, that means no larger than 16 GiB. [1:22 AM] Just an FYI or a reminder. One other thing, FreeM's index/pointer block keys are different than RSM or GT.M/YottaDB. In GT.M/YottaDB, an index/pointer block stores keys such that the first key, which points to the first data block for that global, is a key that points to every key from the root of the global all the way to, and including, that key. In other words, the keys in the index blocks are the last keys stored in the data block they point to. Then at the end, there is a star pointer (a literal *, for everything else) that points to the last data block, containing any keys that collate after the previous key. In RSM, an index/pointer block stores keys such that the first key (always the null/empty/root key), which points to the first data block for that global, is a key that points to every key from the root of the global all the way to, and excluding the next key in the index block. In other words, the keys in the index blocks are the first keys stored in the data block they point to, and for the first key, even if there is no data at the root of the global, that key is still stored, just with no data if it doesn't exist (and it flagged as such in the entry for the global in the global directory). So the last key points at the first key in the last block that contains the keys to the end. In FreeM, an index/pointer block stores keys such that the first key, which points to the first data block for that global, is a key that points to every key, starting with that key, up to, and excluding, the next key in the index block. In other words, the keys in the index blocks are the first keys stored in the data block they point to. But unlike RSM, there is no key for before the first stored key (the null/empty/root key), the first key stored is the first key in the global. Also, unlike GT.M, there is no star pointer at the end. So every index pointer has to be updated if that key is deleted, with another one. t's basically a separately maintained linked-list that stores pointers to free blocks. I'm not sure if it would reorganize the pointers in order to keep them in block number order or not though. [11:14 AM] The root bottom pointer block stores the first free block in its footer, and then that block maintains a right pointer to the next free blook in the linked list, which itself has a left pointer back to the first, and so on. So it's a doubly-linked list. [11:16 AM] So if you wanted to calculate how many used blocks there are, you'd have to grab the block total from the ROOT footer, and then grab the block free pointer to the first free block, and walk through it, and its siblings, via the right pointers, marking down every block as free, and then what's left is used. There is no direct way to know what blocks are used, without walking the entire B-tree, other than the free block walk I just mentioned, which would be much faster.  dlw — 11:21 AM The FBLK format is super simple. The footer maintains a left and right pointer, a block type of FBLK or 1, and an offset to the next available free space in the block. And then is just lists block numbers, 3 bytes at a time. [11:21 AM] No keys, or data, or anything else, just block pointers to free blocks, blocks with types of FREE or 0. [11:23 AM] The ROOT block's "right" pointer, points to the first FBLK for that global. The ROOT block's "left" pointer, stores the total number of blocks in the global, not including itself. [11:23 AM] But that does include FREE blocks and FBLK blocks. [11:25 AM] Because no spaces are allowed between keys, you can easily calculate how full a block is by simply subtracting the OFFSET (stored as the last two bytes in the footer) from DATALIM (the size of the block minus the footer, hard-coded), which then gives you how much free space is left in that block. True for all block types, other than FREE, which just have DATALIM of free space. [11:26 AM] Unlike RSM, which zeroes out blocks when they are garbage collected, FreeM leaves the data as-is when using or reusing a block, just setting the type to FREE, and the OFFSET to 0. [11:26 AM] Though when it goes to fill data in to a formerly FREE block, it will then zero the block I believe.