Current Data Serialization Formats May Be a Waste of Money

- Programming, Business

Stream of concatenated JSON objects

Storing data. Transmitting data. Processing data. These fundamental topics of computer science are often overlooked nowadays thanks to the historical exponential growth of processing power, storage availability and bandwidth capabilities, along with a myriad of existing solutions to tackle them. So much so, that we're assuming these technologies are properly adapted for today's needs.

Specifically, we're going to look at the cloud computing costs of data serialization, and question whether current data serialization technologies are adapted for them. (Spoiler: They're probably not.)

The money problem

Let's consider a scenario where we would like to offer a service that would send and receive data over the Internet. We would have to deal with the following expenses:

As such, we would like to minimize the total sum of these costs over the lifetime of the service. In addition, we would also like to minimize these same costs for our consumers to give ourselves a competitive advantage.

Picking optimal data serialization formats is therefore critical to achieving this objective, because it will have an impact on all of these costs.

For implementation and maintenance, we also have to consider that once a data serialization format becomes popular, there's going to be a bunch of people that will have already done the base work, and thus shall not be considered here.

Current technologies

Human-readable formats

CSV, XML, JSON, YAML... those are all great data serialization formats because anyone can read them and modify them using a simple text editor. In terms of compactness however, they are pretty terrible because they are very verbose by design.

Let's say, for example, that you would like to represent an object with 5 boolean properties. Simply writing the values would require multiple bytes simply for writing "True" or "False" and delimiters between them. Similarly, if the name of the properties must be included in the format, that's more bytes to be consumed for writing them.

As such, not only does it take a bunch of space, but it also requires parsing text to deserialize the data, which is not very efficient. Removing some of the optional padding may help, but doing so has its limits.

Data compression

One quick fix in terms of bandwidth and storage consumption is to apply data compression over text data. However, the results are relatively generic and generally not optimal. Also, while they may save in bandwidth and storage, they also require additional processing power, although the net result is usually worth it in terms of raw expenses.

As for the existing data compression algorithm themselves, some common issues include:

Protocol Buffers

As a need for pure binary data serialization arose from the above issues, Protocol Buffers rose to fill the need. While not the only binary serialization solution, it became popular thanks to its open-source nature, its versatile data encoding, the powerful object definition, and the possibility of extending it using gRPC to define full web services. However, the encoding of Protocol Buffers is a bit strange, which may lead to some unexpected issues. For example:

As such, it's not a surprise that Protocol Buffers became popular, as each potential issue also have related advantages. Still, there is room for potential improvements.

Future technologies?

Based on the above, here are ideas that I could identify as potential optimizations for the original objective of minimizing costs:

This is far from an exhaustive list, and I do not know if these ideas could lead to a significantly better solution than those that currently exists, but I believe they are certainly worth consideration for future designs and prototypes.

Disclaimer: I originally wrote this article back in 2020-10-12 at the request of Steeve Leblanc as an independent analysis of his data encoding invention, but he asked me to refrain from publishing it at the time due to a pending patent application. As this is no longer an issue, I have released the above article in its exact original wording. Note that since then, he has founded TS-Alpha, a company I have acquired shares in, and later joined as a full-time employee in order to help him realize said future technologies.

Related articles I wrote

Dice stacked in a triangle shape, with their face numbers matching their row position

I Designed the Perfect Gambling Game, But...

- Mathematics, Business, Game Design

Back in 2006-07-08, during the 13th Canadian Undergraduate Mathematics Conference at McGill University, I presented a gambling game I designed with the novel property of being both advantageous to players and the house, and that despite this proprety, that pretty much nobody in their right mind…

Stream of zeros and ones in space

Minifying JSON Text Beyond Whitespace

- Programming, Mathematics

JSON is a common data serialization format to transmit information over the Internet. However, as I mentioned in a previous article, it's far from optimal. Nevertheless, due to business requirements, producing data in this format may be necessary. I won't go into the details as to how one could…

Field of CG-rendered disembodied arms pointing at a dark sky at sunrise

Current Generative AIs Have Critical Quality Issues

- Business, Quality Assurance, Security

The hype for generative AI is real. It is now possible for anybody to dynamically generate various types of media that are good enough to be mistaken as real, at least at first glance, either for free or at a low cost. In addition, the seemingly-creative solutions they come up with, and the…

Cowboy riding a horse in the sunset

Upgrading Your Cybersecurity from Cowboys to Sheriffs

- Security, Business, Anecdotes

Roaming throughout the countryside, dangerous desperados are awaiting in their hideout for the perfect opportunity to rob their victims in silence. Powerless, the authorities have posted wanted posters on public boards with cash bounties for any information that could lead to their arrest or death…

Slippery road signs scattered everywhere

Scrum Is Not Agile

- Programming, Business, Psychology

While there is no denying that Scrum revolutionized the software industry for the better, it may seem a little strange to read about someone that dislikes it despite strongly agreeing with the Agile Manifesto, considering the creator of Scrum was one of its signers. However, after having experienced…

See all of my articles