
AI has a massive efficiency problem. It uses too much compute. It costs too much. It uses too much energy. And it is too slow.
Today, it can take a serious cluster of GPUs and a very non-trivial amount of electricity just to answer a simple question like “Can you summarize this document?” or “What should I reply to this email?” The machinery underneath is anything but.
This is why we invested in ByteShape. The company was co-founded by a world-class team out of the University of Toronto: Professor Andreas Moshovos [link]—whose group’s papers have amassed more than 10,000 citations—together with scientists Miloš Nikolić [link], Enrique Torres Sánchez [link], and Ali Hadi Zadeh [link], whose life’s work is making computation more efficient. Both Ali and Miloš were also postgraduate affiliates of the Vector Institute, and Miloš’s PhD research formed the foundation of ByteShape’s core technology—work that earned him recognition as an “ML and Systems Rising Star” by MLCommons last year.
They are building the kind of deep technology that changes the economics of AI deployment, then changes what products become possible.
Quantization, In Plain English
Many techniques underpin what ByteShape does. One of them jumped out: quantization.
Quantization is about using fewer bits to represent the numbers inside a model. Many models are trained with higher precision formats because it helps learning remain accurate. But AI inference often does not need that much precision everywhere. If you can safely represent weights and activations with fewer bits, you shrink memory use and speed up compute, often dramatically, while keeping outputs essentially the same.
ByteShape’s approach, ShapeLearn, does this in a way that feels obvious in hindsight and very hard in practice. ShapeLearn adaptively taps into the AI training process to learn optimal datatypes for parameters and inputs. The result can be 7x faster training and 10x faster inference.
In layman’s terms, the idea is simple and powerful: fewer bits, less work, and smaller models, without sacrificing results. All is being done adaptively.
Then ByteShape takes it one step further. ShapeSqueeze is their lossless compression layer that applies per-value encoding to minimize off-chip data transfers, with up to 40% extra compression.
Put the two together, and you get something that really matters in the real world. ShapeLearn reduces what the model needs to store and compute. ShapeSqueeze reduces what the hardware needs to move around. Less compute and less data movement means faster AI, lower cost, and lower energy.
This is not limited to savings in cloud data centres. It is a step-function improvement in what can run locally, which means a step-function improvement in what products can exist. It opens the door to privacy-sensitive and offline workflows, on-device agents, and embedded intelligence in robots and machines where speed, power and thermals matter.
Why TSF invested
Two Small Fish Ventures is an early-stage deep tech venture capital firm investing globally in the next frontier of computing and its applications. We invest where foundational breakthroughs create the conditions for new category-defining companies, and we back founders at the earliest stages when the technology is ready for commercialization.
ByteShape fits that thesis perfectly. They are building a foundational efficiency layer for AI that can reshape performance and cost across cloud, enterprise, and edge deployments. And because all TSF partners are engineers with deep operating experience, we do not just evaluate the science. We lean into technology through commercialization, with hands-on support informed by having built and scaled companies ourselves.
With ByteShape, the future is models that run faster, use less energy, and fit on far smaller hardware, without sacrificing the quality that makes them worth using.
Try it yourself on Hugging Face! [link]


















