Friday, June 10, 2016

Friday, Jube 10

Very productive day today. I finished debugging my compressors (there were a few, including one pretty hefty one that I managed to pound out) and implemented the optimal compression. It compresses the bitmap using a segment length of 7, 14, and 28 and picks the length that resulted in the shortest file size. It does this by storing the shortest bitmap in a buffer (the filename is the same path but storing it in a column number that doesn't exist) and then simply renaming the compressed file the appropriate name. It all works perfectly on all of the small files but complains at various points of the larger files, I think because of a lack of memory, so I think I have a memory leak somewhere that I will look into, but I'm not terribly concerned. I will be moving on to querying next week!

I also briefly talked to David to to check in. Not very eventful but I did confirm with him that after implementing the query engine, I should extend the implementation I have to support VAL64, not just my current VAL32 implementation. That shouldn't be terribly difficult as all of my current utilities are very adaptable depending on segment length. It's more just a matter of variable and type declarations throughout the program.

Thursday, June 9, 2016

Thursday, June 9

I actually decided to take the day off today. I had to go to the hospital on Tuesday and wasn't feeling so great today so took a day off.

Wednesday, June 8, 2016

Wednesday, June 8

I ran tests today to make sure my VAL compressor engine is working correctly. The main issue I wanted to address was regarding the extra padding of 0s at the end of each column which I scoured over trying to figure out why it was wrong until I remembered that my computer is little-endian so I was just reading it backwards and it was, in fact, correct all along.

While running all the tests I found a bug that caused the first word in every column but the first to be in correct due to not clearing the memory of the last column so I made sure to set that to 0 at the beginning of compressing each column.

Other than that, it looks like it is working properly! The rest of this week I will be focusing on implementing the optimal length compressor which compressors each column using a segment length of 7, 14, and 28 and then picks the shortest length compression for saving each compressed column and uses that one for querying.

Tuesday, June 7, 2016

Tuesday, June 7

Sorry for not checking in yesterday. Time got the best of me.

Big day today! It looks like my VAL compressor is working! I've only been running it with a segment length of 7 to be consistent but it should be working with 14 and 28 bit segments as well. There's one little issue with the padding on the last word that I want to look into but otherwise it seems to be functioning very well! The only thing to do with it really it to implement the optimal segment length compressor. That basically involves running the 7, 14, and 28-bit compressors and then selecting the smallest compressed column. Then, the entire compressed bitmap will have compressed columns with different segment lengths. It will be interesting to see the distribution of segment lengths within each bitmap and find out which length is chosen more often.

The rest of the week I want to dedicate not only to finishing up these last few things but also to running tests with the compressor to make sure it is working EXACTLY like it should. The last thing I want is for a random bit to make it's way somewhere in the compressed columns...

Friday, June 3, 2016

Friday, June 3

Still working on implementing VAL. I had to make what felt like a significant decision regarding file formatting and variable declarations. You see the original bitmap formatted I wrote last summer rewrites text files as binary files with appropriate padding at the beginning of words. WAH is easy to reformat as you just add an extra 0 every 31 or 63 bits. However, with VAL, the segments can vary in length so I thought a lot about what the best way to format these files is. I eventually deciding to pad the segments every 7 bits since all of the lengths are divisible by 7. This also allows the compressor to read the file byte by byte and recreate the 14 and 28 bit segments of necessary by reading two or three bytes and then building a 14 or 28 bit segment respectively.

This also raised an issue regarding the type definition of a word. WAH 32 and WAH 64 both read and  write 32 bit words. However, VAL reads by 8 bits but writes to 32 bits. The WAH engine I wrote uses the same type for both reading and writing so I had to add another type definition for reading and writing which would be the same for both WAH compressors but different for the VAL compressor. 

Thursday, June 2, 2016

Thursday, June 2

Nothing terribly exciting to report today. Still working on the core of the compressBlock(struct blockSeg *) method. I've decided to focus on just compress using one segment length first. Then once I have that working properly, I will add in the optimal segment length compression that compresses the column using 7, 14, and 28 bit segments and then chooses the length that resulted in the compressed column of the smallest size.

Wednesday, June 1, 2016

Wednesday, June 1

I spent today reading through my WAH code. I want to mimic the tools and format of that as much as possible to I need to have a good understanding of how it maps out. Specifically, I use a struct I call a "blockSeg" that holds all of the information needed to begin to compress a column, like the column file pointer, the current block being compressed, etc. I wanted to read through how I used the blockSeg struct in the WAH code so I could easily mirror how it is handled but with the VAL algorithm underneath instead of WAH.