Thursday, August 18, 2016

Thursday, August 18

Pardon the silence. I've been driving back to Seattle this week so have been bad with communication. This week I've been polishing all of my code for the final submission deadline which is approaching and now everything is done! All of my final code, in addition to some sample bitmap files and query files, can be found at https://github.com/aingerson/GSoC_2016_Compressor. It's been a great summer and I am extremely satisfied with all of the work I have done on this. Many thanks for the support!

Wednesday, August 10, 2016

Wednesday, August 10

As I mentioned in a previous post, I'm driving to Seattle so my posts this week are going to be rather spotty but I'm currently developing tests to test the capabilities of VAL versus WAH.

Monday, August 8, 2016

Monday, August 8

Preparing my code for the final submission. All of the code is pretty much good to go. I'm driving back to Seattle on Wednesday so I won't be able to code for the second half of the week which is fine because the coding is done. But during my driving I will be coming up with tests to run in order to evaluate the performance of the new algorithm against the old. 

Thursday, August 4, 2016

Thursday, August 4

Letting my computer recover from yesterday's adventure. I went to the Apple Store because of the advice Bart gave me (thank, you, Bart!) and they took it apart and could not find any trace of liquid damage so that's very good news!!

Wednesday, August 3, 2016

Wednesday, August 3

I decided that I should design and run some tests to evaluate the performance of VAL against WAH so I was working on that today when I spilled coffee all over my computer. Needless to say today I was panicking a bit but managed to keep my cool. All of the code is backed up though and my computer seems to work perfectly fine so I'm not terribly concerned. But that was definitely the thrill of the day!

Tuesday, August 2, 2016

Tuesday, August 2

Finished commenting and organizing my code today. Haven't heard from David yet so I'll teach it again tomorrow. 

Monday, August 1, 2016

Monday, August 1

I messaged David on Friday to let him know I have finished coding VAL and to ask him what he thinks will be the best way to test the code will be but haven't heard anything from him so I spent the day commenting and cleaning my code. I'll message him again tomorrow. 

Friday, July 29, 2016

Friday, July 29

Finished debugging today. Also put the final touches on the optimized compression length mode (that compresses each column using all segment lengths and then picks the one that resulted in the smallest file). Also cleaned up my code a lot. It's looking good!

Thursday, July 28, 2016

Thursday, July 28

Good day today! Started debugging my code and found the bug. Turns out my code was fine, I had just forgotten to reformat the original file into a format to support the 64 bit compression. See the base length for VAL 32 is 7 so I reformat the file by chars so that I can read in one byte at a time (with an extra 0 at the beginning to signify a literal). However, to compress in VAL 64, it makes more sense to reformat the file by shorts (2 bytes) since the base length is 15. Once I reformatted the file correctly, it compressed the file perfectly in 64 bit. As a result, I changed some of reformatter and compressor code to make sure that it would select the correctly formatted version before compressing into the desired format. 

Wednesday, July 27, 2016

Wednesday, July 27

Unfortunately I had to go to urgent care for the second time this summer so I took another rest  day to recover. 

Tuesday, July 26, 2016

Tuesday, July 26

Feeling good after today! I finished reorganizing all of my code for simplicity and to extend it into VAL 64. I started debugging but there's more to be done the rest of the week but it's looking good!

Monday, July 25, 2016

Monday, July 25

Last week was an interview heavy week so coding took a bit of a back seat last week but now I'm back at it full force! Continued streamlining it all. Fairly close to my end goal. Just need to do some more overall reorganization and refactoring of all of the code. 

Thursday, July 21, 2016

Thursday, July 21

I think I mentioned this in my last post but I have a few job interviews this week so I've been stepping back from my project to prepare and focus on those but I should be done by tomorrow morning and then I'll be jumping all the way in again!

Tuesday, July 19, 2016

Tuesday, July 19

Continued my work from yesterday but didn't work as much as I had hoped. I have a job interview on Thursday that I have been preparing for. 

Monday, July 18, 2016

Monday, July 18

Putting the final touches on the extension to 64-bit compression. I think all of the variables are set up but I want to streamline the entire program and move the old WAH compressor to use the active block struct I used for VAL. I think it should be finished by tomorrow and debugged/polished by the end of the week. 

Friday, July 15, 2016

Friday, July 15

Still continuing the extension to VAL 64. It's a bit trickier than I anticipated as it requires the addition of quite a few different global variables that I need to integrate throughout the entire compressor and query engine. Number of segments and word length are only two. The biggest change that I need to make is the reformatting segment length. For VAL 32, it was simple enough to reformat the bitmap byte by byte as the segment length was 7 so every 7 bits would be written with a leading 0. However, now that the segment length is 15, I need to reformat the file by every 16 bits with the addition of a leading 0. This means I need to use a typedef short to reformat the file. A bit tedious but it should be done next week. 

Thursday, July 14, 2016

Thursday, July 14

I've just been working on extending my code to support 64 bit compression. It's mainly a matter of changing variables throughout my code. For example, instead of 7/14/28 bit segments it will use 15/30/45 bit segments. I just need to make sure I make these changes everywhere in the code and then it should be good to go!

Tuesday, July 12, 2016

Tuesday, July 12

After running tests on my VAL query engine, I have decided that I am satisfied with the code I have written. Now I can move on to extending the entire system into a 64-bit compression scheme using VAL. 

Monday, July 11, 2016

Monday, July 11

I went through my code today and debugged it, finding some errors. But I think everything is running properly! I have started running some tests to make sure it is all working and am going to continue doing so tomorrow. After that, it only be a matter of extending it into 64-bit compression and then editing the old WAH code to use the activeWord struct if I have time which I think I will since I am running ahead of schedule!

Wednesday, July 6, 2016

Wednesday, July 6

Today I worked extra hours to make up for the next few days because I am going to the mountains with my family for a few days. I got A LOT done today and I believe I'm pretty much done with all of the logic and flow of the program. There are a bunch of kinks to work out like parameter consistencies and which variables are being passed through but that will be all debugging work for next week. For now, I think the program as a whole is fairly close to working which is very exciting!

As I mentioned, I will be out of town Thursday and Friday so am taking those days off of coding. Until Monday!

Tuesday, July 5, 2016

Tuesday, July 5

Took the day off yesterday for the holidays. Today I continued adapting my WAH compressor to utilize the activeWord struct I'm using for the VAL compressor. For now I've decided to keep the WAH compressor using the old code but eventually I would like to alter the WAH query engine to use the new struct. The apending method is proving to be harder than I was expecting it to be but it's going. 

Friday, July 1, 2016

Friday, July 1

I finished debugging the decoder and have moved on to incorporating that into the query engine. This requires a bit of reorganization of my old code so it uses the active word struct instead of the raw word as in the WAH query code. It will take some finagling but shouldn't be awfully difficult. 

Thursday, June 30, 2016

Thursday, June 30

Finished writing the decode up! I also started debugging it for a while. I had a few hiccups but it seems to be working well now. I'll keep debugging it and then start putting it together with the existing ANDing and ORing code I have and then it should be pretty set!

Tuesday, June 28, 2016

Tuesday, June 28

Today I finished half of the decode up method of decoding. I finished coding the decoding of a fill word and started working on how to decode a partial fill. Tomorrow I'll finish that up which overlaps with decoding a literal so that will lead into the last part nicely. 

Monday, June 27, 2016

Monday, June 27

Heard back from David today. It looks like there was an error in the VAL paper but that my approach was correct so I'm glad I continued on a debugged the version I have. Today, I planned out the decode up algorithm by hand and I'm going to write it into my code the rest of the week. This one is much harder than the the decode down since this one takes multiple segments to make a single larger segment so there are many more different combinations of fills and literals that can result from the subsegments of the pre-decoded words.

Friday, June 24, 2016

Friday, June 24

Still haven't heard from David so I've decided to go forward with my current implementation. Today I worked on thoroughly debugging my decode down method and it seems to be working well now, after a few hurdles. I also started planning out each of the four cases of the decode up and will finish implementing it next week. 

Thursday, June 23, 2016

Thursday, June 23rd

Still waiting for advice regarding the issue I posted about yesterday. There are two different ways to proceed depending on which way David says to proceed so I took the day off to make sure I'm going the right direction. If I don't have an answer by tomorrow I will proceed how I was planning on doing it and fix it later if needed. 

Wednesday, June 22, 2016

Wednesday, June 22

Alright, so I finished writing the decode down. I want to test it some more to make sure it works how I want it to though.

After I finished writing it, I wanted to reread through the VAL papers to make sure my code was doing exactly what the papers explain, but I think I ran into a discrepancy between them. The paper illustrates a word VAL(s=60) being decoded into a segment length of 15. The VAL(s=60) word is as follows: 1000 000000000...00000 0100 0000 0000 1110. This word represents 0100 0000 0000 1110 runs of 0s (16,398). The decoding example shows its reduction into a VAL(s=15) word of that following: 1100 011111111111111 00000000001111... which represents 16,383 runs of 0s followed by 15 runs of 0s. While this adds up to 16,398 runs of 0s like the original word before decoding, the segment length has changed from 60 to 15. This means that while the original VAL(s=60) word represents 60x16,398 0s (983,880), the VAL(s=15) segments represent 15*16,383 (245,745) and 15*15 (225) 0s. This means that if we assume the segment length has changed from 60 to 15 with the decoding process, the decoded word in the example does not represent the same number of words that the original word represents. I wrote to David to clarify the issue, so hopefully I will have an answer to this tomorrow!

Tuesday, June 21, 2016

Tuesday, June 21

Today I continued work on the decoding. I am almost done with writing the decoding down. When I'm done with that I will work on the more difficult decoding up. A big part of today was writing various different utilities involving the activeWord struct to hold the decoded words, including a loop that I'm rather proud of:

void updateActiveWord(activeWord *toUpdate, word_32 newWord){
     int i;
     for(i=0;i<toUpdate->numSegs;i++){
          toUpdate->flag[i]=(newWord<<(4-(toUpdate->numSegs)+i))>>((toUpdate->length*toUpdate             ->numSegs)+3);
          toUpdate->seg[i]=(newWord<<(4+(toUpdate->length*i)))>>(WORD_LENGTH-toUpdate                   ->length);
     }
     toUpdate->currSeg=0;

}

It's not actually that complicated; the main reason I'm proud of this is because all of that looped bit work worked the first time through so I did not have to debug! What a day!

Monday, June 20, 2016

Monday, June 20

Nothing much to report today. I started working on the decoding. That said I can't even begging to describe how excited I am about my rewriting of the query engine. Not only is it much more organized and cleaner than before but it's working so much better. All of the work was worth it!

Friday, June 17, 2016

Friday, June 17

Just putting the final touches on my complete revamp of my project. It sure was a headache but I'm excited to write the decoding methods for the VAL querying using my streamlined edits!

Thursday, June 16, 2016

Thursday, June 16

I have debugged the existing issue in the WAH query engine and it works fine (very silly mistake on my part, not initializing some of the utility package).
Today I did a MAJOR reorganization of the entire query engine. I have decided that though it currently feels like I am essentially rewriting my entire WAH query engine, it is very much worth the time because it will allow me to implement the VAL engine very easily (as well as any other algorithms). My old implementation was a pretty naive one, solely focused on making WAH work. However, I am adding an extrapolation of the implementation using the activeWord struct. The main purpose of this is that I can use the activeWord struct as a parameter for my existing utilities for ORing and ANDing segments (there are 6 to be precise: fillORfill, fillORlit, litORlit, fillANDfill, fillANDlit, and litANDlit). They currently take in a word_32 (an entire compressed word). However, if I change the implementation to work on activeWord structs instead, I can use the same methods for VAL. The WAH implementation should not change much as each activeWord will be essentially equivalent to one WAH word. The main reason for this change is that each word_32 read in from the compressed files can be equivalent to 1, 2, or 3 segments, depending on the segment length, so this struct will allow me to separate the segments as needed and store the individual parts in the activeWord that can then be ORed or ANDed.
Once I rewrite everything in the format that I'm aiming for, all I have to do is write the decodeNext method which calls the decodeUp and decodeDown methods that coordinate the flow of segments from the compressed files through the activeWord struct.
It feels slow rewriting this and often involves breaking a lot of what I have but I know this is going to make everything A LOT easier in the future.

Wednesday, June 15, 2016

Wednesday, June 15

I think I have isolated this bug. I believe it has to do with how I am allocating and freeing up my results data space as the query engine does fine for the first query but seems to be failing on the second one. I think tomorrow I should be able to move on.

Tuesday, June 14, 2016

Tuesday, June 14

I began structuring my VAL Query engine and realized that to make this much more usable and extendable in the future, I would need to reorganize the path of my existing program. The way I wrote it before was solely for a WAH compressor and WAH query engine. However, now that I'm extending its usage to VAL, it would be very messy to add in VAL because I wrote all of the query utilities in one file. However, half of it is used for both WAH and VAL. For this reason, I spent 7 hours today reorganizing and rewriting my existing code to separate the strictly WAH functionality. Now, all queries pass through Query.c which performs tasks that both algorithms need and then passes it off to the appropriate engine as needed. It was a headache and a half and I managed to create a slight bug in the existing WAH implementation, but I think it is all for the better and is much more streamlined as well. Tomorrow I will fix the bug I left today and then mirroring my rewritten implementation for VAL.

Monday, June 13, 2016

Monday, June 13

I spent the day reading papers on VAL and planning on how to tackle the querying. VAL querying is very tricky and there are two different ways to go about it, both of which I will be implementing. The first method is to "decode down" which means that if we query two columns of different segment lengths, we essentially translate the column using the longer segment length down into words of the smaller segment lengths. The alternate method is to "decode up" which means doing the opposite (translating the word with a smaller length into an equivalent using a longer length). I want to make sure that I set up my query engine correctly. I brainstormed different tools to facilitate my implementation but what I have decided to do is add a new struct activeWord which contains the aligned information ready for ANDing and ORing between columns of different segment lengths. Essentially, my code will read the next word in the compressed file, translate them into activeWord structs (decoding the appropriate column in the user-selected method of decoding), and then query the activeWord structs together (assuming that they are now correctly aligned).

I need to sleep on all of this to make sure this is how I want to set up my query engine but I'm confident this is the way I will proceed with tomorrow.

Friday, June 10, 2016

Friday, Jube 10

Very productive day today. I finished debugging my compressors (there were a few, including one pretty hefty one that I managed to pound out) and implemented the optimal compression. It compresses the bitmap using a segment length of 7, 14, and 28 and picks the length that resulted in the shortest file size. It does this by storing the shortest bitmap in a buffer (the filename is the same path but storing it in a column number that doesn't exist) and then simply renaming the compressed file the appropriate name. It all works perfectly on all of the small files but complains at various points of the larger files, I think because of a lack of memory, so I think I have a memory leak somewhere that I will look into, but I'm not terribly concerned. I will be moving on to querying next week!

I also briefly talked to David to to check in. Not very eventful but I did confirm with him that after implementing the query engine, I should extend the implementation I have to support VAL64, not just my current VAL32 implementation. That shouldn't be terribly difficult as all of my current utilities are very adaptable depending on segment length. It's more just a matter of variable and type declarations throughout the program.

Thursday, June 9, 2016

Thursday, June 9

I actually decided to take the day off today. I had to go to the hospital on Tuesday and wasn't feeling so great today so took a day off.

Wednesday, June 8, 2016

Wednesday, June 8

I ran tests today to make sure my VAL compressor engine is working correctly. The main issue I wanted to address was regarding the extra padding of 0s at the end of each column which I scoured over trying to figure out why it was wrong until I remembered that my computer is little-endian so I was just reading it backwards and it was, in fact, correct all along.

While running all the tests I found a bug that caused the first word in every column but the first to be in correct due to not clearing the memory of the last column so I made sure to set that to 0 at the beginning of compressing each column.

Other than that, it looks like it is working properly! The rest of this week I will be focusing on implementing the optimal length compressor which compressors each column using a segment length of 7, 14, and 28 and then picks the shortest length compression for saving each compressed column and uses that one for querying.

Tuesday, June 7, 2016

Tuesday, June 7

Sorry for not checking in yesterday. Time got the best of me.

Big day today! It looks like my VAL compressor is working! I've only been running it with a segment length of 7 to be consistent but it should be working with 14 and 28 bit segments as well. There's one little issue with the padding on the last word that I want to look into but otherwise it seems to be functioning very well! The only thing to do with it really it to implement the optimal segment length compressor. That basically involves running the 7, 14, and 28-bit compressors and then selecting the smallest compressed column. Then, the entire compressed bitmap will have compressed columns with different segment lengths. It will be interesting to see the distribution of segment lengths within each bitmap and find out which length is chosen more often.

The rest of the week I want to dedicate not only to finishing up these last few things but also to running tests with the compressor to make sure it is working EXACTLY like it should. The last thing I want is for a random bit to make it's way somewhere in the compressed columns...

Friday, June 3, 2016

Friday, June 3

Still working on implementing VAL. I had to make what felt like a significant decision regarding file formatting and variable declarations. You see the original bitmap formatted I wrote last summer rewrites text files as binary files with appropriate padding at the beginning of words. WAH is easy to reformat as you just add an extra 0 every 31 or 63 bits. However, with VAL, the segments can vary in length so I thought a lot about what the best way to format these files is. I eventually deciding to pad the segments every 7 bits since all of the lengths are divisible by 7. This also allows the compressor to read the file byte by byte and recreate the 14 and 28 bit segments of necessary by reading two or three bytes and then building a 14 or 28 bit segment respectively.

This also raised an issue regarding the type definition of a word. WAH 32 and WAH 64 both read and  write 32 bit words. However, VAL reads by 8 bits but writes to 32 bits. The WAH engine I wrote uses the same type for both reading and writing so I had to add another type definition for reading and writing which would be the same for both WAH compressors but different for the VAL compressor. 

Thursday, June 2, 2016

Thursday, June 2

Nothing terribly exciting to report today. Still working on the core of the compressBlock(struct blockSeg *) method. I've decided to focus on just compress using one segment length first. Then once I have that working properly, I will add in the optimal segment length compression that compresses the column using 7, 14, and 28 bit segments and then chooses the length that resulted in the compressed column of the smallest size.

Wednesday, June 1, 2016

Wednesday, June 1

I spent today reading through my WAH code. I want to mimic the tools and format of that as much as possible to I need to have a good understanding of how it maps out. Specifically, I use a struct I call a "blockSeg" that holds all of the information needed to begin to compress a column, like the column file pointer, the current block being compressed, etc. I wanted to read through how I used the blockSeg struct in the WAH code so I could easily mirror how it is handled but with the VAL algorithm underneath instead of WAH.

Tuesday, May 31, 2016

Tuesday, May 31

Today I finished organizing and setting up my old environment to support variable lengths. Specifically, I decided to separate all segment manipulation methods into one package that builds and checks segments of any length given as a variable parameter. I also noticed a few places for optimization in my old code. I realized that this segment building was being called enough over the course of compression that it would be better to run every segment utility once before compression and then access each as needed. I wrote similar code in my new VAL compressor that would load all segment utility in each necessary compression length (7, 14, and 28) accessible at any point during compression. With all of this set up, I can officially begin coding the actual compression algorithm!

Monday, May 30, 2016

Monday, May 30

I spent a lot of time today reading through my old code to continue integrating the new compression format into the old one. As a result, I decided to separate a lot of the old code into subsections for easier management. For example, I created a new file called SegUtil that consists of a variety of segment construction and checking methods to assist in compression. Before, these utilities were integrated into the WAH Compressor. However, I separated it so that the VAL Compressor can use its functionality in a similar way. The problem is that WAH maintains a consistent segment length of 32/64, VAL varies between 7, 14, and 28 (or all 3) in one run. Because of this, I separated these functions and am focusing on rewriting all of these utilities to be able to run on varied word length per run.

Saturday, May 28, 2016

Saturday, May 28

Today I spent the refactoring my code from last summer to support usage for VAL. I added all of the appropriate variables, outlined the VAL compressor, and set up the compressor to send the bitmap to the appropriate compressor (the WAH compressors that I already have or the VAL compressor that I outlined today).

Friday, May 27, 2016

Friday, May 27

Once again, sorry for the lack of communication. I just got home safely after a week of long driving days so I haven't gotten much time for coding or computer time.  I've been organizing the structure of my code on paper and have been outlining the variables and their usage when possible.

I'm planning on taking this entire weekend to apply all of the non-code work I have been doing on the road to my actual code base and catching up on the coding time I lost this week, so I will be checking in both tomorrow and the next day.

Wednesday, May 25, 2016

Wednesday, May 25

My apologies for not blogging yesterday. I am currently moving halfway across the country so my access to internet is limited. Yesterday and today I was able to read through one of the VAL papers I have to get acquainted with the decodeUp and decodeDown methods to query columns of different lengths. I am not able to use my computer to begin coding these methods but reading up on them and planning how to code them has been helpful preparation for coding them.

I won't be arriving until Friday evening so I will likely not be blogging again tomorrow but I will be sure to check in Friday evening again. I also plan on coding Saturday and Sunday to get my coding in for the week and be on track for Monday.

Until then!

Tuesday, May 24, 2016

Monday, May 23

Met with David today to discuss the extra control parameters necessary for my VAL implementation.

We've decided to include 4 modes for compression: constant 7bit segment length, 14bit, 28bit, or optimal smallest total column compression size.

I have started setting up my code base to support the VAL implementation for this summer. 

Friday, May 20, 2016

Thursday, May 19

This weekend I graduated and today I just said goodbye to the last of my family that came to the event. This marks the shifting of gears for me. The GSoC coding start date has not officially begun so I have not started coding but I am doing my reading up on VAL to prepare for the endeavor. My days are filled with meeting with David to make sure we're on the same page and doing my research so I can hit the ground running when coding actually begins. That and packing up all of my belongings...