Action Figure Daniel

Category: Work

An Analysis for Transition from Spreadsheets to Fusion Tables, part 1

This is part of a multipart post taken from my essay entitled “An Analysis for Transition from Spreadsheets to Fusion Tables”. It may sound boring, but a questioning reader may find it rather useful to understand the value of Google Sheets and Google Fusion Tables in the workplace and readers may also find additional ideas for improving their usage of either product.

Beginning the Ecosystem

Back when I first started working for my current employer, they were using an Access database. I say “they” because I did not use it any more than I had to. Instead, I instantly started adapting their systems to something I knew would handle the work better: Google Sheets.

I knew some of the issues the old systems had before even starting work. Being traditional software, the software and the data were bound to the hardware. Employees had to be at work in order to do work. This meant frequent calls to the office while on the road, daring “read-only” copies of the database for extend trips (and corresponding lists of changes that needed to be made) and messing remote control setups when phone calls were not enough.

Even on-site, there were still issues. Certain changes and reports could only be made while everyone was out of the system. Many days there was heavy coordination between users to determine who should access when and for how long and to make what changes.

I am no expert on Access change tracking (or Access in general) but there was no revision history. The database was always live and living. Catastrophic fails could only be recovered from the last back up. For small projects, this is an acceptable risk but considering that the entire company was centered around that single data store made the current revision controls difficult.

Finally, reporting was dismal. This is likely more because I was inexperienced with the tools, but even considering that, I knew that there were easier ways to liberate the data and combine it in new and interesting ways.

My first project was to create what would be dubbed “The Ecosystem”. It was so named because it would be a collection of spreadsheets, forms, documents, scripts and drawings that were all designed to flow and interact together. While the old database had provided a mostly unified data front, it was difficult to get elegant reports (again, more likely an issue with my skill than the actual software) and, perhaps more importantly, it was difficult to get the database to talk to anything beyond itself, let alone beyond our network.

Google Drive

The first leg of the project was actually not the data, it was the files. We had an onsite file server which was used to store all client documents. PDF files galore! There were folders for adding units, folders for removing units, folders for thinking about changing units, folders for actually doing it and more. Inside each folder were the pertinent files for the work done. Accuracy of records is rarely an issue because we tend to keep a record of everything.

The transition was smooth and easy. One day everything was on the server, the next day it started moving off. Files that were in transit to Google Drive were simultaneously moved to a new location on the server so that everyone knew that the files were in transit and not to make changes to the original documents. Using Google Drive, files were downloaded as soon as they had been uploaded so shortly everyone had access to the files.

There are a few added bonuses to using Google Drive for file storage. This method provides a form of distributed backup in that the software synchronizes a file across all of the computers to which the file has been shared. Previously, all data was hosted on the server which meant that a server crash would be detrimental to our operations. Now a server loss would be unfortunate but not detrimental. In fact, loss of any one computer would not affect operations beyond the function of the assigned user. This distributed scenario also works well for traveling users who can now access all but the most recent documents offline.

On the negative side, we have noticed that mass file changes can take an exorbitant amount of time to synchronize and has a high failure rate.

Company Data

The key component of the Ecosystem is a massive spreadsheet called “Company data”. This spreadsheet replaced the original database. While we lost some flexibility by converting a database into a spreadsheet, we gained far more power than we lost.

Access: Anytime, Anywhere

Putting Company data online instantly evolved our access options. Users can now access our data on their phones, tablets and computers while at home, across the state and, of course, at the office. No matter where we are, we can access the system. Google Drive apps provide adequate offline capability so that even if a device does not have an active internet connection, users can still see the data and edit it when connection is reestablished.

(While it would be superior to be able to edit the data while offline and sync when connected again, I can understand the immense technical issues with such a feat—not the least of which is figuring out conflict notification and resolutions—and we are quite content with the portability of data we currently enjoy; a vast improvement over our previous options.)

Most of the data transitioned very easily from the database into the spreadsheet. A separate sheet was created within Company data for each major table (i.e. “Truck data”, “Trailer data”, “Driver data”, etc.). While in some cases this combined multiple tables into a single sheet, we found that it generally worked with efficiency while reducing overlap and duplication of information.

Separate spreadsheets could have been created to handle the different tables and data types but we found it more efficient (for formulas, reporting and users) to put everything “under the same roof.” It is easier to say, “Everything that used to be in the old database can now be found here.”

Ad Hoc Modification

An unexpected benefit of Company data being a spreadsheet is that it became easy to modify the data structure. We began rearranging the column based on their logical groupings (i.e. keeping specialized columns together), moving columns most frequently referenced to the front and removing superfluous columns entirely.

While this was possible with the old database, such changes could not be done during live operations while everyone was in the database. The ease of change the structure in the spreadsheet means that changes are often instigated more by active users than the data keepers.

The changes could also be done by anyone, which is not necessarily a good thing. It is possible for a disgruntled or absent minded employee to easily rearrange the spreadsheet. We estimated the risk of such happening to be minimal as the changes are likely be noticed quickly and are easily recovered.

Formulas

One of my greatest qualms with a database is the incredibly static nature of the data or rather what can be done with it. Data in a spreadsheet is also inherently static but, being a spreadsheet, can easily be brought to life with formulas.

It is better to not think of formulas in the boring algebraic sense but in the more dynamic programmatic sense. A simple example is an early column that we added to Company data to indicate whether a client had registered for our online services or not. The first formula we had was an example of simple algebra: “=if(isblank(onlineKey), “No”, “Yes”)”, this simply puts a “Yes” or “No” in the column indicating if the client has an online key (an indication of online services registration). This formula was eventually replaced with a more programmatic one: “=if(isblank(onlineKey), “Not available”, hyperlink(‘http://spreadsheetURL’ & onlineKey, ‘Open’))”, which would provides users with a link to the client’s online services menu—a much more helpful feature in troubleshooting than simply saying, “Yes, the client is registered but you have to figure out how to get into their account before you can walk them through their problem.”

The ability to include not just basic information but more advanced features (such as the hyperlink) based on certain information allowed for an already valuable tool to become more useful and allowed for quick integration with new features and components of the Ecosystem as they came online. It should also be noted that this rapid adaptation via formulation was only possible when used in conjunction with the ability to simply and quickly alter the columns of the tables.

Collaboration

In the old database each user worked in a high degree of isolation. While we could all be social offline, there was no sociality online. Previous to my experiences, I would likely have scoffed at someone suggesting the need to be socially engaged online with people who sit within a few feet of your desk. I have learned, however, that it is more comfortable to work collaboratively online with people when there is a social presence online too.

Our Company data spreadsheet provides two tools that make online collaboration easy and natural. The first (and less useful of the two) in an in-sheet chat feature. With a click and a few keystrokes we can send a message to everyone looking at the spreadsheet. This is useful for moments when we are about to rearrange a sheet or, more frequently, resort the data (while the data will remain, its row will likely change, so it is a nice courtesy to inform other users before doing it).

The second tool (and, by far, the more useful of the two) is the cell highlight. It is an automatic feature that puts a brightly colored border around whatever cell each of the other users has clicked on. The highlight is a unique color for each user (though not each instance), moves in near real-time and includes a cell shading feature when the user is actually editing the cell (useful for knowing when something is about to update). While this presence feature may seem almost trivial, I have seen it profoundly affect our online workflow: reducing the online mystery, streamlining data input and work assistance.

Online collaboration is fraught with many pitfalls, one of which is overlapping work and conflict resolution. It is frustrating to be working through some data updates only to find out that someone else has been updating the same data, either right before or after you. In either case, time and effort were wasted. The cell highlight lets the users know where each other are working and, by inference, what they are working one. Duplication of work almost never happens with real-time collaboration tools.

This reduction of mystery also allows us to streamline data input. For example, if a user is adding a new row of data, another user can jump in and copy down any formatting and formulas to complete the input faster. The first user can focus on inputting the information (which only they have) while the second user can focus on the mundane (which they can access). This allows for more efficient work on a project when previously we would have been forced to separate the work for fear of overlap.

Beyond streamlining there are many times when working on a project with other users across the office where someone will be inquiring about a piece of data. In these cases, being able to see their highlight makes it easier to help quickly understand the context of their inquiry and to guide them (through one’s own highlight) to the answers they are looking for. This is especially true when they are on the phone with a client and we can casually identify the information they are looking at, anticipate what they will probably need and show them the relevant information.

Colors

While visual appeal should by no means be a “deal breaker,” it is certainly nice to be able to manipulate the presentation of Company data to look nice. Beyond just looking nice, we are able to adapt the spreadsheet to include the color schemes used in our branding. This helps to maintain a consistent image internally and thus externally. In general, we have matched our colors consistently across the work flow process so that banners printed on forms (and sometimes the color of the form itself) and shown in ancillary spreadsheets coordinate with the banner atop the appropriate Company data sheet.

In addition to making the spreadsheets look prettier, we also use their conditional formatting to quickly communicate important information through color coding. For example, cells with missing, non-vital information is flagged in a subtle yellow while missing vital information is flagged in a dull red (not too distracting but very apparent). Ratings and status updates also benefit from color coding; good is green, bad is red. This color coding helps users to quickly identify the most critical information, or lack thereof, in a data set.

Reporting

With formulas that incorporate programmatic formulas, we found a greatly reduced need for many basic reporting functions. Who wants to wait for a report to run when they can instantly sort, filter and read information they are looking for in the sheet already open on the screen?

Advanced reporting, however, has blossomed in the Ecosystem. This is for two reasons: the first is a semi-dedicated report designer (being myself) and a robust information infrastructure designed to be tapped for data.

Naming

This reporting friendly design came in the transition process from the old database to the new spreadsheet and was comprised mostly of consistent naming. For example, each of the sheets are labeled with a single plural descriptor and the word “data”. Additionally, each of the column names from the old database were tuned for optimal communication of the contents of the columns. Sometimes the column names are a bit lengthy, but that is preferable to not knowing what information can be found in the column.

While naming may seem something of a trivial nature, there can be a vast difference between building a report with highly cryptic placeholders that leave little indication for the casual user as to the data that should be expected and a robust naming scheme that removes most doubt about the expected data.

In a similar vein, placeholder tags in template reports are also verbosely named. For example, all client data related placeholders are flagged with a preceding “co” and our company data related placeholders (e.g. ratings generated using proprietary methods) are flagged with a preceding “tt”. While not as friendly as reading a completed report, this method still allows easy reading.

Styles

We identified two possible venues for running the reports and we use both depending on the usage case.

The first method generates the report entirely using internal process. This is best used for shorter reports that do not need to be visually sophisticated but where functionality is preferred. Most frequently, these are reports that we email out (either for employee usage or directly to clients). While such reports lack visual impact because we strip them of almost all visualizations, the bland environment actually improves readability because there are fewer distractions. In this case, blandness works very favorably for us: simple visual effects, such as font size changes, bolding or italicizing, can be used to quickly guide the reader to the most critical information.

The rule of simple visual presentation is particularly important considering that this style is most frequently used for emailed reports in which advanced visuals garner little respect and often discarded entirely leaving the recipient with a horrid jumbled mess of code and words. Keeping the presentation simple with infrequent accents reduces the chance for an unreadable message while keeping the message clean and readable.

Templates

The second method is almost the complete opposite of the first. While the data arranged and concatenated internally, the report layout is prebuilt in a documented template. This allows for the templates to be highly visually sophisticated, though in many cases less functional. Such a tradeoff is acceptable because generally these reports are delivered through print or specialized online methods and are intended to be more attention grabbing than day-to-day communications.

While we still strive to keep the layout clean and simple by reducing visual clutter and mitigating unnecessary distraction (i.e. not bolding and italicizing without good cause) we do allow for added accents to make it look better. For example, headers vary in color (to coordinate with the Company data color scheme) to differentiate the sections and a standardize company footer may be included.

Perhaps most importantly with the templates is that they are shared documents which means that they can be edited by anyone. That is, it does not take a feat of programming to alter the presentation of the reports. This was an important feature to prevent “personnel lock-in” while also greatly simplifying the process of programming the reports. Even simple visual alteration can be difficult to program, especially when compared with the ease of changing the look in even a basic text editor. Using templates allows for more time to be spent programming functionality instead of troubleshooting design through code.

Custom Menu Options

Using Google Apps Scripts, we were able to add a custom menu to the Company data spreadsheet (as well as some of our other ancillary sheets). These custom menus allow for the execution of scripts and thus expose the scripts to our users in an easily accessible manner.

One of the greatest learning experiences in building these menus options was building in protection against accidental execution. The first round of scripting merely required activation to run. This quickly led to problem of accidentally activating the wrong script. We added confirmation boxes to almost all scripts that describes what the script is about to do and then requests the user to confirm that they wish for the script to run. This generally adds a few seconds to the script’s running time but greatly reduces the number embarrassing, confusing and frequently redundant emails sent to our clients and users.

Connections

While Company data hosts most of our data, some of it feeds in from other spreadsheets. Google Sheets allows us to import this data nearly live by using a function called importRange. A good example of the usefulness of this function is our proprietary Ratings data. The source data and formulas are too complicated to want to include them in Company data, so they were set up in their own sheet. All the data in this spreadsheet is fed to a Ratings summary page which is then fed to the Company data sheet. While the data is not “live” it is very “fresh” (generally less than a minute old) and updates automatically. Breaking the two data sets apart while still being able to cross reference them made management of the data much easier.

Another example is one in which we “import” account numbers to a shared spreadsheet that tracks our progress, by company, through our work flow. Importing the data saves us the time of flipping between the tracking sheet and the Company data. Instead, all of the information we need is presented in a single view.

A drawback to using importRange is that in sharing the sheet for access, we have to enable full access to the reference sheet and the reference URL is exposed. This means that we cannot “import” data to show clients on spreadsheets that we share with them. To do so would expose other client data and represent a significant data breach. Instead, online reports generated for our clients have to have their data “pushed” to them instead of simply pulling it; this is a much more difficult process that is also less reliable.

17 June 2013
Canary Colours

Since its first release version, Canary Coloring has used the default Adobe color picker:

To the programmer, this color arrangement is very logical and even beautiful: each row and column represents a single step through a six step progression of the three segments of the hexadecimal color system. (Just writing that statement conjures images of precessionally perfect higher orders of programmer paradise.) This is evidenced by the clustering of colors into panels of six by six. The progression of the 36 colors in each panel do not just look nice, they are also representative of three complex processes going on: base color changes, blending and shading.

The first column and top row of each panel represent the shading steps of the pure colors which are shown at the bottom of the first column and end of the top row. Stepping back from the pure color to the upper left most swatch in the panel are the six shades for each respective color. The remaining columns and rows are simply blending the corresponding shades from each column and row together.

An interesting affect to persons who frequently use paint can be observed as brighter shades blend together: they approach white. This is because in light, unlike paint, white is made by combining the three pure primary colors of light: red, green and blue. In paint, the primary colors are red, yellow and blue and when combined make black.

Each panel represents a step in another six step process of changing the base color from no red, to full red. Adjusting the blending of blue and green from the base of red allows for six shades (four actual shades plus the pure color and absence of that color) to interact with of each six swatches of the other primary colors in every possible combination. This gives the color palette a total of 216 colors, which are considered the “web safe colors”.

Back in the early days of the internet (as in: the days of Netscape) those 216 colors were the only colors that a developer could reliably use across all web browsers and computers. Today we are spoiled with the fairly consistent rendering of the full range 16.78 million colors possible for display technology (not to mention the animation, music and videos we also get to enjoy).

216 colors is not accidental nor is the frequent usage of six arbitrary. Indeed, as I have frequently been taught by programmers, few numbers in the computer sciences are incidental.

Numbers in the computer world are generally stored in a hexadecimal format. For definition, we normally use a decimal number system. This is not referring to the numbers that come after a “.” but rather the number of items that increment the count to the next category. In a decimal system (also called “base ten”), there are 10 digits (zero through nine) before the next level is incremented. This means that the highest two digit number can be is equivalent to 99. In a hexadecimal system (also called “based 16”), there are 16 digits before the next level is incremented. To effectively communicate this, the letters “a” through “f” are used to substitute digits greater than nine. For example, “a” is equivalent to 10 and “f” is equivalent to 15 (15 plus a count for zero gives us sixteen). This means that the highest two digit number can be is equivalent to 256 (or “ff”, 16 times 16). 256 is a favorite number for computers because it can be easily built using 8-bits (while use of 8-bits is important, it is beyond the scope of this essay).

With the hexadecimal system, all 16.78 million colors a can easily be represent with a color code such as “ffffff” (pure white) or “0000ee” (the blue used by most web browsers to highlight a link). These same colors using a decimal system would be “000000000” and “000000238” respectively. Fewer digits in a sequences means less code, faster programs and more accurate communication, all very good things when under the tight constraints of sending data across the world.

Going back to the color picker: the color progression is not simply shading and blending but is actually a mathematical progression in hexadecimal. Following are the color codes from the first panel, notice how each row in the first column and each column in the first row increase by “33” (“33” in hexadecimal is actually 51 which, when adding zero makes 256 shades) while each of the other rows and columns is simply a combination of the first row and first column:

000000 000033 000066 000099 0000cc 0000ff
003300 003333 003366 003399 0033cc 0033ff
006600 006633 006666 006699 0066cc 0066ff
009900 009933 009966 009999 0099cc 0099ff
00cc00 00cc33 00cc66 00cc99 00cccc 00ccff
00ff00 00ff33 00ff66 00ff99 00ffcc 00ffff

While most hexadecimal color systems order the colors as red, green, blue, this system orders the colors as red, blue, green. This first block has no red (hence the “00” as the first two digits) and then progress the blue down by row (the third and fourth digits) and green by column (the final two digits). I will not go through the boring process of iterating the second (or anything other) block of numbers because they are almost identical except the first two digits are “33” and thus red is incremental in each panel with the blending and shading of blue and green remaining the same.

Unfortunately, beautiful numbers does not necessarily translate into a beautiful interface and the color picker did not do a good job of letting artists find the color and shade they were looking for (or, as my nephew said, “I need grey in my palette”.)

To the artist, this color arrangement is almost useless. For example, and addressing my nephew’s point, finding grey is a chore. There are four shades of grey plus black and white. Black and white are easy to find in the top left and bottom right corners respectively. Greys however, are a bit harder to find: the first grey from black is in the top row, middle panel, second column, second row. The second grey is in the third panel, the next column right and the next row down from the previous grey. This pattern progresses through the panels to white in the bottom row, last panel, last column, last row. This makes perfect sense when looking at the numbers, but not when looking at the colors.

Most often, when picking a color, we have a general idea of what color we want and we want to compare the shades and blends within a group of colors to get the right match. Think of paint swatches. They are clustered by base color with each swatch strip containing different shades of a given color. This system allows us to quickly compare a particular color against another particular color. Noting this, I set out to rebuild the palette to be more useful for artist (not programmers).

My first thought to present a more artistically appealing color picker was to rearrange the colors by hand and sight. I got to my third swatch before I gave up. My problem: the colors progressed three dimensionally (base color, blending and shading) making it rather daunting to rearrange 216 swatches. I stopped working on the colors after rearranging three swatches; then I realized that if math got me into this predicament the math could get me out of it.

My second thought was to transpose the data twice. What do you get when you transpose a data set twice? The same thing you started with (like rotating an image 180 degrees, twice).

My third thought was to transpose the entire data set and then transpose and mirror every other panel. This transformed the original color palette into something beautiful:

Notice how the colors now gradually (well, as gradually as can be done with a 20 percentage point increment) change between base colors by transitioning through the various shades and blends. This arrangement makes finding the right red (out of the available 30 variants of red) or green or blue much easier because the colors are generally grouped together.

(Oh, and I added the greys and basic colors at the bottom to make them even easier for my nephew.)

22 May 2013
IFTA: All in the wording…

In my new work we deal with a lot of taxes. In fairness, I do very little with the actual taxes. No, I have fun building the spreadsheets that calculate them–which I can say is rather nerve racking as the Federal Government is not very forgiving when mistakes are made–but I digress.

One of our most frequent taxes is called the International Fuel Tax Agree (or IFTA). It applies to the United States and most of Canada and was developed as a way to avoid the old method of making a truck register for fuel taxes in each state they would be traveling through. So, in general, IFTA was a really good idea all around.

One day my boss asked if I could make a diagram that explains how IFTA works so our clients could better understand it.

I said, “Sure… How does it work?” In defense of my question, I knew how it worked because I reviewed the formulas that make up the calculations that determine the taxes we report. I was really hoping for exactly what my boss gave me: an oral version of the formulas, which follow:

Total miles / Total gallons = MPG
State miles / MPG = Taxable gallons
Taxable gallons – Gallons bought in the State = Net taxable gallons
Net taxable gallons * Tax rate = Tax due

It is no wonder to me that people get confused on how the tax works. After a few minutes, it hit me: IFTA is actually very simple. Instead of thinking about the formulas we use to generate taxes amounts, I just had to think about what the tax was actually taxing (the key is in the MPG calculations). Thus, the grand conclusion is that IFTA is a calculation of fuel consumed while traveling through the state. In other words, how much fuel would you have needed to buy to operate in a given state. Sure, there is some averaging in there, but that is just to make it calculate easier.

It felt good to take something as complicated as the robust formulas and summarize it in a handful of words.

IFTA: A tax on the fuel you used in a state.

(Okay, two handfuls and a toe, but it is still much less complicated than the original formulas. A copy of the diagram is attached for amusement.)

How IFTA is Calculated. ©2012, used by permission. Details available upon request.

20 November 2012
The Ambient Exchange of Knowledge

Note: I wrote this back when I was an intern for a certain government research facility. I wrote it on the long the (2 hour) bus ride from “the site” and edited the next day on the same, long bus ride. I have included some additional thought in italics.

Working for a government contractor (basically the government) has been an interesting experience. Of course there have the normal “we can’t do that” and “we have to submit the change for approval” (it took more than 5 months to get approval to change minor wording on a post-training survey) but there has also been an interesting transition as they have been cutting staff.

The need and want to cut cost is present in every company I can think of and has the oblivious benefit of making the company more profitable. There is, however, an interesting and unfortunate side effect to these cost cutting efforts: innovation is stifled.

Time and again, there is a clear pattern of disparate entities getting together and sparking evolutionary or revolutionary changes seemingly based on their proximity alone. (Duncan Hines revolutionary invention of cake mix came when a baking guy was having lunch with a powdered soap guy and they developed a method of creating the liquid mix and then spraying it onto screen for powdering like the soap guys did; Post-it notes’ evolutionary invention came about when one inventor created the weak, reusable glue and the other found a use for it; and many more, if you do some digging.) Ideas that otherwise would have taken a long time to come about, if they ever emerged. Which then confuses me as to why you would be stripping away most of the opportunities for new, innovative ideas by saddling down employees with so much work (by cutting down the help for those tasks) that they have no chance to interact with others. It would seem that instead of cutting everything down, companies should set more people free: free to interact with and learn about other departments, free to dream, free to bear those dreams into reality, free to create new revenue streams.

But then, what do I know about “the real world,” I am just an intern who created a website in a week that took other departments several months to do.

P.S. Can I mention that because the web programmer (namely me) was working with “the site” management while hanging out with the Training department instructors and cavorting with the PR people, he was able to completely redesign their daily internal publication in a way that greatly improved the appeal and readability of the content? How? By asking why people never read the publication. No magic, just someone who could do something being around people with good ideas.

9 November 2012
root.scaleX

I recently completed an image creation project using Adobe Flash and Adobe Air. After spending many hours on the project I was exciting to find that I could get Adobe Air to work with the device’s built-in image saving mechanism (called the “CameraRoll”). That excited quickly disappeared as I found out that images saved with the CameraRoll were really lousy quality, small but lousy. After brainstorming on how to fix the issue, I discovered that if I changed the resolution of the project from 480×320 to 1440×960 (a threefold increase) that the quality issue became a moot point and the image size was still small.

Enter problem two: I adjusted the stage to the new resolution easily enough, but I had to figure out how to increase all the assets threefold. This project has a large number of assets and the thought of scaling them all up was overwhelmingly daunting. Then I realized that I could just scale up the root object and, because everything but the stage is part of root, everything that mattered would also be scaled up.

So, in an effort to share information with the world, if you find that your Adobe Flash project needs to be a higher resolution than when you first built it, and you want to avoid rescaling every element in your project, simply increase the stage size and add in some Action Script to scale up root (root.scaleX = 2; root.scaleY = 2;).

11 June 2012