Tiny ML and the future of on-device AI | by Jeremie Harris | Nov, 2020


Jeremie (00:00):
Hey everyone, Jeremie here. I’m the host of the Towards Data Science podcast and I’m also on the team over at the SharperScience mentorship program, and today I am really excited, because we’re talking to Matthew Stewart, who is a PhD student at Harvard. He’s working on a series of different problems in the environmental monitoring space, but specifically he started to focus lately on something called TinyML, and this is an area of machine learning that’s focused on getting machine learning onto embedded devices, drones and sort of IoT type devices where you have strong constraints in terms of the amount of energy that can be used for computation, the amount of space, the amount of memory that can be put on these devices, and there are all kinds of implications. Ethical, philosophical, engineering implications, even, of this technology. So what I want to do is talk to Matthew about what he’s been working on, where he sees the space going, and what some of the sort of ethical constraints are around this work, what ethical problems we might see emerging in the space going into the future. So really excited about this one, thanks so much for joining us, and without further ado, here we go.

Jeremie (01:05):
Well, Matthew, thanks so much for joining us for the podcast.

Matthew Stewart (01:07):
Thanks for having me, yeah, good to be back.

Jeremie (01:11):
Yeah, we’re really happy to have you here again. Last time we spoke, we talked a little bit about machine learning as applied to environmental monitoring. I think that’s still kind of part of what you’re working on, but your work right now also involves this relatively new category of machine learning applications called TinyML, and I think, at least for me, I hadn’t really heard of TinyML, I think I might’ve read a blog post or two about it or something like that, but I had no idea really what it amounted to until we talked about doing this podcast. So one thing I was hoping you could start with is just introducing the idea of TinyML and what it is, if you don’t mind.

Matthew Stewart (01:47):
Yeah, sure. So it’s kind of a difficult topic to introduce, because no one really knows what it is at this stage. Even myself, I was in a TinyML talk with one of the creators of, I think it was a Qualcomm engineer, and he was talking about TinyML, and I asked him, “What is it? Is it a framework, is it a system of functions or something?” And he said it’s basically just industry best practices at this stage. It’s trying to run things like machine learning models, especially neural networks on memory and sort of compute-strained devices like microcontrollers, mainly for [inaudible 00:02:31] purposes, and he talked about the fact that mainly it’s using it on devices that are constrained to one milliwatt, because if you have a microcontroller that’s running a coin battery, if you’re only using one milliwatt of power, it can last for about a year, and that opens a lot of interesting applications.

Matthew Stewart (02:50):
So yeah, that’s kind of what it is. The reason that it’s sort of become important is because, I mean we’ve all seen over the past few years, like cloud platforms get more important, you’ve got things like Amazon Web Services, Microsoft Azure now, and we’ve seen very much a push towards compute-centric machine learning. And now it’s interesting, because you’ve got this weird sort of bifurcation where you’ve got big data going one direction, and now you’ve got this TinyML and microcontrollers going in the other direction, but they’re not necessarily unrelated, because you need to use the models that you made using the big data in order to sort of train and produce and analyze data on the microcontrollers. So you got this sort of weird hybrid mode going on here, where you’re using it for applications that are edge-based, but they still need some sort of interaction with this machine learning system.

Jeremie (03:51):
Right, so that machine learning system, I mean, you pump out models because you said they could be really big models, you could train a computer vision algorithm, you mentioned VGGNet before we started recording here, but it could be anything, and some of these models are gigabytes in size. How do you put a model that’s multiple gigabytes in size onto a small edge device?

Matthew Stewart (04:15):
Yes, that’s a great question. And I guess the first thing you have to, before you even get to the model itself and all of the parameters involved, you also have to think about the framework itself. So it’s not like there’s a TinyML TensorFlow version going on, or at least there wasn’t until someone went ahead and built it. So I know one of the principal developers of that, Pete Warden, he works at Google on TensorFlow for mobile applications, and he helped develop this TensorFlow Lite, which is sort of a very lightweight version of TensorFlow, which removes a lot of the higher level things you can do, like debugging and visualizations, so that you just have the raw things that you really need in order to do the learning itself.

Matthew Stewart (05:06):
But for some microcontrollers, even that’s too big, and that’s, we’re only talking about 500 kilobytes at this point, that’s how big this TensorFlow Lite is. When you go to something like an Arduino Nano, you’re looking at very small amounts of flash memory, it can be 28 kilobytes or something, and they’ve actually made now something … I think it’s TensorFlow Nite Nano or Micro, and it’s like 20 kilobytes, so it’s-

Jeremie (05:33):

Matthew Stewart (05:33):
… really, really tiny. But yeah, you can run neural networks on this tiny framework on a microcontroller, it’s actually quite incredible.

Jeremie (05:43):
But they’re much smaller-

Matthew Stewart (05:44):
[crosstalk 00:05:44]-

Jeremie (05:44):
… [crosstalk 00:05:44] the full size neural networks that you would, say, train without constraints?

Matthew Stewart (05:50):
Sorry, you mean the network itself, or the framework?

Jeremie (05:54):
Yeah, the network, so when you put the, let’s say you want an Arduino with some kind of visualization capability so it can navigate its environment and know its way around. The model it’s running, obviously, is constrained to something like 28 kilobytes, so there’s some process that gets you from a really large VGGNet or ResNet or something like that that gets you down to Arduino size, like is that …

Matthew Stewart (06:19):
Yes, definitely. So once you have this framework, this TensorFlow Lite Micro or Nano, so that’s, we’re talking about just the libraries here that you’re running the functions on the neural network, but then you’ve also got the problem of the neural network itself, which is the configuration of the neural network, and also the inference that’s going to be going on, and then you’ve got to store the parameters somewhere, so you’ve got a lot of issues to deal with.

Matthew Stewart (06:44):
And this is one reason we don’t do the training, or at least it’s not commonly done at the moment to do training on the devices, because there’s just not enough bandwidth in the sensors or microcontrollers to be able to do this. We were talking before about when you have an automatic differentiation algorithm, which is how neural networks are trained, you need to get to sort of machine level precision, so we’re talking maybe zero with 15 or 16 decimal points after it, that level of accuracy in terms of the gradient to train it accurately on these large neural networks, and you just can’t do that when you’re running these tiny microcontrollers that have eight bit, what’s it called …

Jeremie (07:33):
Floating point.

Matthew Stewart (07:35):
Yes. Eight bit architectures, you just can’t do it. You need at least 16 or 32 bit architectures. So typically what’s done is you train things in the cloud, and then you have to just compress them onto a microcontroller, and this is done using sort of a pipeline that some people call, there’s a famous paper called Deep Compression which describes how to do this through several means. So what you can do is you can go from a 32 bit representation down to something like eight bit, and that’s going to reduce the size of the network by a factor of four or something. Then you can prune the neural network, so you can cut out parameters that aren’t useful. You can do things like network distillation, where you’re basically compressing the knowledge that’s in the neural network into a smaller representation, and that can severely reduce how many parameters you have to [inaudible 00:08:27].

Matthew Stewart (08:27):
And you can do further things like weight sharing so you’re not saving all these individual parameters, you’re grouping them via k-means clustering and stuff. And then you can also encode that even further using something called Huffman encoding, which is, I like to think of it analogously to like jpeg compression, it can reduce the size by like 90% and you still maintain all of the information, so it’s pretty incredible.

Jeremie (08:52):
So that point-

Matthew Stewart (08:53):
But you can-

Jeremie (08:53):
Sorry, that point actually seems pretty important, right? This idea that you can conserve the qualities, the characteristics, and the behavior of a large model when you compress it down to a much smaller size. Is that a source of concern, like if you look at a very safety-conscious application, like let’s say we were making self-driving cars, and we wanted a self-driving car that for whatever reason had much smaller capacity than the system that it did all the training, in that case I would imagine you’d want to be pretty sure that you haven’t just allowed some edge case to sneak by when you went through that process. Are there was of checking that performance, of making sure that some of those edge cases are captured? I imagine it’s a really difficult problem.

Matthew Stewart (09:41):
So I guess the first thing I would mention is you probably wouldn’t use TinyML for something so safety critical.

Jeremie (09:47):

Matthew Stewart (09:48):
So when you have things like Tesla’s Autopilot in its car, it’s running a very powerful computer inside there, and I mean, you can do that because the car has the constraints available to it, or it’s not constrained by thing like location and availability, it can be powered by the batteries that are in the car. Whereas if you have a sensor system that you’re putting on a tree, and you’re trying to get it to listen to whether there are birds around and what kind of birds those are, maybe someone’s only going to look at that every year or so physically, but they want to be able to get data from it and they want to be able to sort of relay information.

Matthew Stewart (10:31):
So they’re slightly different use cases, but in terms of dealing with edge cases, it’s the same with other machine learning frameworks, it’s all about the data. 90% of any sort of data science or machine learning is getting a good dataset and making sure that it accounts for all of these potential edge cases, or at least has a way to deal with them. So a recent example I heard about is when you have a sound detection system, so for example, let’s say that you have a TinyML system outside your house which is listening for glass breaking. Say if it hears glass breaking it’s going to call the police, because it thinks someone’s broke into your house. But it’s very difficult for it to distinguish between glass breaking on your window and maybe you dropping a glass in the kitchen by accident. So you could just drop a kitchen glass and then all of a sudden you have the police turn up at your door, if you didn’t specifically train it on that sound.

Matthew Stewart (11:32):
But it’s pretty hard to interpret all of these different edge cases that can happen, which is probably why they’re called edge cases, and that’s something that definitely someone who’s making the system, or specifically the dataset has to think about, like what are the potential issues that could occur? And I would say it’s an important thing to consider just when you have event-driven TinyML systems like this TinyML device that’s calling the police.

Jeremie (12:00):
Yeah. Well, and I guess, I mean that’s a very general category of issue, too, like out of distribution learning, when you run into a sample that just was not captured in the subspace of data that it was trained on initially, I guess what I’m wondering with respect to TinyML is how can people be confident that the behavior of a large model maps well onto the behavior of a smaller model? Like I guess in a way you could show both, like a certain validation set that neither model has seen before during training, and see how they correlate in terms of performance. Is that roughly what’s done, or …

Matthew Stewart (12:38):
Yeah, so basically what’s done is you compare, so you take the network, and then you basically try and run it as if it’s on this framework. I mean, there’s various ways you could do this. You could do this on an emulator, on your local machine, you can take your local machine and you can download this TensorFlow Lite Nano or Micro and you can run and just see what sort of accuracy you’re going to get. The only thing that you don’t have is you don’t have debugging capabilities and all of these things, but you’re at least going to get the output, and you can encode that however you want.

Matthew Stewart (13:13):
But yeah, so when you’re looking at the performance between the two, a common thing to do in papers is they say, “Okay, we did a certain level of sparsity,” so let’s say we removed 50% of the parameters, or we compressed it by a factor of 10, how does that affect the top one accuracy or the top five accuracy, is pretty common for classification models, so top one accuracy is, that’s your vanilla accuracy on a binary classification, it’s like, I’ve gotten 80% on this set, whereas top five is, say I have 100 classes, the chances of the correct class being in the top five is this accuracy. And you can see that you can compress it a lot, you can compress it maybe, sort of 80, 90%, and you’re only going to lose maybe two or 3% of the accuracy.

Jeremie (14:04):
Which is really astonishing, I mean-

Matthew Stewart (14:05):
It’s very astonishing, yeah. So yeah, I’m not too worried in terms of those kind of issues, of, well, we’re going to compress it and all of a sudden it’s going to become useless, that does inevitably happen once you compress it too much, if you start going into the realms of like, seven, six, five bits, yeah, you’re going to get a lot of quantization losses.

Jeremie (14:27):
But at the same time, though, I imagine, it almost seems like there are two questions from a safety or ethical standpoint. The first is about performance, like how does the, as you say, how do the top 1% performance change when you move from the large to the compressed model, but then the second is just like, how correlated are the large and small models? If we had to do a lot of, let’s say, not customization, but manually catching edge cases with a large model, to what extent can we be confident that … Like, do we have to redo all our safety assessment on the smaller model? Maybe there are weird quirks that we’d want to conserve that aren’t necessarily reflect explicitly in top 1% performance that then map onto the smaller one, just any model idiosyncrasies that might change in that way? And I don’t know if this is something that is being tackled right now, I know it’s early days, but …

Matthew Stewart (15:22):
Yeah, I think it’s hard to say. I haven’t really seen … So there’s definitely issues. So I’m thinking about the voice applications mostly, because, so let’s say, there’s a famous framework at the moment that people have been developing an open source project called Common Voice, where you can go on the website and you can record a few sentences in your native language or whatever, and then it gets added to this dataset via some kind of voting system, and …

Jeremie (15:48):
Oh, cool.

Matthew Stewart (15:50):
When you go to other things like sound detection, for example, there’s a lot less data available. So like the example I said of maybe a gunshot firing or something. That’s very difficult to get that information, and it’s distorted by the environment, maybe background noise, so it’s hard to get a sort of representative sound of what that is. And that definitely makes it difficult, because I don’t think I’ve seen anyone that’s tried to put this into practice and try to protect a gunshot, because there’s a lot of challenges just in the data collection itself.

Matthew Stewart (16:29):
And then you’ve also go, you run into imbalances, so most of the dataset is not going to be this gunshot, it’s going to be background noises of city traffic or whatever, and as we kind of know from basic statistics, the in group error reduces by a factor of 1/n, so if you have a small amount of positive samples of this gunshot, but you have a very large amount of the other samples, which is just sort of general traffic, you’re going to end up having low accuracy on the gunshots compared to the other parameters, so that presents its own issues itself. And then you have things like false triggers, like a champagne cork sounds kind of like a gunshot. So those are definitely issues that we’re going to see them, and it will be interesting to see how people deal with it, but I would not say I’m an expert enough to tell people how they should deal with these issues.

Jeremie (17:26):
Yeah, well it may be too early for those experts necessarily to exist, I guess, because TinyML’s so young. But it’s interesting how tightly coupled this idea of TinyML is to environmental monitoring, because to some degree, the whole point of creating TinyML systems, doing machine learning on-device is so that this device can then navigate an environment, really be out there in some meaningful sense. What are some of the applications that you see coming around the horizon, let’s say first off with just on-device inference, rather than any kind of training? With just the inference piece, what are some of the applications that you’re seeing right now, and what do you see maybe happening in the future?

Matthew Stewart (18:07):
Sure. So obviously, this is very much industry-driven, so I think we’re going to see mostly things that are sort of lucrative things to be able to predict. Those are the items we’re going to see first, and example of some of those would be predictive maintenance. And Pete Warden, so this is the guy in Google who’s, he wrote the TinyML book which came out in February, and he talks in one of his blog posts, one of the first ones about TinyML about the concept of a Cyber Hans, which is basically, so he talks about Hans being like a guy who can just touch a machine and go, “Oh, this machine needs this thing repairing,” and it’s the idea of having all of this experience and just knowing when a certain vibration feels wrong, like what’s going on inside a machine? But the idea is you could literally have maybe sort of a machine learning model which is built for this specific machine, and it knows when it’s got a certain frequency of vibration, or you’ve got certain temperature readings, that there’s a specific issue wrong with the machine, that can tell the operator beforehand, before it breaks, that this issue is there, and it can save them a lot of money because they can fix it before it comes a problem.

Matthew Stewart (19:22):
So obviously that’s a very useful application, because you could put that in essentially any mechanical device. You could put it in a car, power plants, machinery, all sorts of things.

Jeremie (19:34):
And would one of the advantages here the fact that you don’t have to send the data to some central server somewhere where presumably it could be intercepted along the way, there are privacy implications to that, is that sort of one of the big advantages of this approach, or is it something else?

Matthew Stewart (19:50):
That is definitely one of the benefits of TinyML. So I mean, you’re basically taking the data, you’re taking the compute from the cloud, you’re putting it very much data centric. So in the cloud, we’ve seen very much compute centric, you bring all of your data into like a data warehouse and you run your inference on it and your training. In this sense, we’re taking the model and we’re putting it right where the data is, and then it can just do its thing. And it doesn’t necessarily have to send anything unless there’s an issue. So in this sense of the predictive maintenance, it would basically just do a red flag when something’s wrong, like maybe the sensor’s broken or maybe the machine itself is broken, which means that you have a lot less crowding of the airwaves, if you like, doesn’t need to be constantly sending to and from this information.

Matthew Stewart (20:43):
But yeah, if that information is sensitive, then that could be very useful for certain applications, and you still have issues of people trying to hack edge devices, but it’s also very difficult to hack a device when it has such little compute capabilities and stuff, like there’s not too much you can do with a lot of these end devices. But if you make that even less by reducing the ability to communicate and the frequency at which they communicate, it definitely makes it more secure.

Jeremie (21:18):
I guess the privacy thing is also sort of a double-edged sword. I mean, so on the one hand you’re not sending data out, as you said, going to the compute, you’re bringing the compute to the data. But at the same time, it implies having a lot more devices around us collecting data and going about their work potentially collecting, as you say, picking up sounds from the environment that include people talking about things or doing any number of the horribly embarrassing things that we all do with our day to day lives. So is that something that’s been a focus yet? I mean, I can understand, I guess right now you’re probably just focused on, how can we get these systems to actually work, and we’ll worry about the details later type thing. But is there any thought so far in that direction?

Matthew Stewart (22:04):
Yeah, I would say, so the idea of the quantified self springs to mind, so sensors that are on or within your body, which are relaying information, so in my field obviously we care about air quality, so people for a long time have been trying to make these tiny devices that people can wear so it monitors how much CO2 and CO and ox and stuff they’re breathing in, because it can be related to specific diseases, for example, or exposure in the workplace. And yeah, I suppose if someone like Facebook got access to that, they would have a field day being like, “Oh, this person’s,” I don’t know, “We can sell this to an insurance company,” or something. Yeah, so there’s definitely going to be issues with that, but I would say yeah, it’s too early to say exactly what those could be. So yeah, before the podcast started, we were talking about Neuralink being sort of a TinyML application, because it’s sort of like a tiny chip implanted in your brain, and we saw it in the demo a few weeks ago in pigs. And if you were able to access that information, you could probably hack it in such a way that you could control someone’s limbs, which would be very unfortunate.

Matthew Stewart (23:22):
And yeah, we’re getting into Black Mirror territory, and it’s very hard to say whether that would be possible in the future, and also I’m sure the people developing this technology would be thinking about this, and they would want to dispel that thought in people’s minds, because otherwise no one would buy this technology.

Jeremie (23:41):
Yeah, and it’s an interesting thing that just keeps coming up over and over again. I guess the cliché way of expressing it is with great power comes great responsibility, but in the AI alignment community, for example, people talk a lot about this idea that there’s trade-off between how much you want your machine learning model to be able to do, and how confident you can be in its safety. Because almost by definition, part of the value of machine learning is the part where we abdicate our ability to oversee its actions. Like, machine learning wouldn’t be useful if humans had to double check every calculation, and so we kind of have to step away at a certain point and say, “All right, now you’re on your own,” type thing, let’s just hope that we’ve programmed the system well.

Jeremie (24:26):
I guess that’s one of those things with Neuralink and whatnot. At a certain point, if you do want to be able to fix blindness, Alzheimer’s, and all these things, inevitably you’re going to be tampering with the kind of circuitry, neural circuitry that is, that precisely controls other super important functions that then implicitly could be hacked.

Matthew Stewart (24:46):
Yeah, it’s going to be interesting in the future, because we are playing with things we’re starting to not really understand, and again, that’s obviously why the field of explainability has become a lot more prevalent in recent years. Because if you go to a business, like a boardroom, and you tell people you have this fancy new algorithm which is going to save millions of lives because it can predict every type of cancer, they’re going to be pretty skeptical unless you can visually explain to them. And these are businesspeople, so you can’t just throw mean squared errors at them and things, you have to have good explanations.

Matthew Stewart (25:24):
I think that sort of goes without saying in any AI field, and equally so for TinyML. Or perhaps even more so, because you don’t even have these debugging capabilities and you have these other constraints, and the sensor could be far away in another country and it’s relaying strange information, and do you believe it? Yeah.

Jeremie (25:47):
So is there an implication there, I mean, when you compress a model down, you go from something with let’s say billions, even, of parameters, down to something with maybe tens of thousands if you’re looking at a couple kilobytes. As you do that, I imagine that … Well, actually I guess I’m a little unclear about this. Like, I could imagine it being, to some degree, helpful for interpretability, just because you’re reducing the number of parameters in the system, and thereby reducing, presumably, the amount of noise to some degree. Maybe that makes it more interpretable, and yet at the same time doing interpretable ML on-device seems like its own challenge with a whole other … I mean, TensorFlow Lite probably wouldn’t cut it at that point. But do you have any thoughts on, is that something that you see over the horizon? Would people be able to do explainable ML on-device?

Matthew Stewart (26:37):
Yeah, this is …

Jeremie (26:39):
This is like niche niche, so …

Matthew Stewart (26:43):
Yeah, this is very niche. So I haven’t seen that yet, but it would not surprise me, because there’s just so many applications, and like I said, it’s industry-driven. And industry, when you get into the AI/machine learning realm, they’re very skeptical unless you can visually explain things like maybe what, like if you have a random forest model, what is it doing? You have to have those, what’s the word, the importance values for each parameter, saying what it’s actually using to make these inferences. Because they can be wrong sometimes, and you don’t want to be wrong if it’s someone’s life on the line because you’re deciding whether to give them surgery or no. So yeah, I think, like I said before, because these are such compute constrained devices, I don’t know how critical they’re going to be in environments like that, but also I would say because you’ve got, we’ve mentioned before, you’re going to have a lot of these sensors at once, you’re not just going to have one, really.

Matthew Stewart (27:43):
So let’s imagine that you have, I don’t know, a farm, and you’ve got all these sensors in the ground which are sort of looking at soil contents and sunlight, like solar radiance, things like that, and it’s trying to predict when you should harvest, and then you also have other sensors which have gas sensors on, perhaps it’s like an apple orchard and they’re measuring ethene, which is an emission you get from apples when they’re ripe, they’re ready for picking. If you combine all those together, maybe you would be much more sure. So you could argue that you might not actually need explainable ML if you have enough of those functionalities added together.

Jeremie (28:19):
It’s sort of an ensembling approach to triangulating the ground truth of an environment.

Matthew Stewart (28:26):
Yeah. I mean, I’m just speculating, but that’s sort of … I don’t know, it makes sense in my mind.

Jeremie (28:32):
Well, it’s interesting the kinds of choices, that just the very fact of moving something on-device, doing TinyML kind of forces people to make. Because in a way, what it makes me think of is like, back in the early days of machine learning, when we really didn’t have much compute horsepower available, much data available, every tiny bit of compute we had, we had to direct towards functionality, we had to try to say like, “All right, well, we’ll sacrifice everything to make good predictions,” and it’s only to some degree been more recently as people have realized, like, okay, now there’s almost a glut of compute power, we can use this for a lot of other things, people have gone, “Okay, let’s now try to redirect dome of it to interpretability, safety, and even in some cases, alignment.”

Jeremie (29:14):
It’s interesting that this is starting to happen with TinyML in a context where we’re really directly porting some of the super abilities we already have into that environment, so in a way it’s almost like starting this process over again, except with all the power that we’ve accumulated from decades of advanced research, and none of the ability to do the kind of, the safety stuff that might otherwise accompany this stuff. So it’s kind of an interesting, I guess, wild west phase for the space.

Matthew Stewart (29:44):
Yeah, so I guess it is interesting, because it makes us go back to think what are the really important factors that we need to do this inference or this learning on-device, because when you only have 28 kilobytes of memory, you really have to think about what’s important to you, because you can’t even debug things, so you really have to know what’s going on. So yeah, it presents, I wouldn’t say … They’re certainly different problems to what we had before, but they’re analogous in a lot of ways. It’s almost like the opposite, right, because before we had sort of the capabilities, but we just didn’t have the systems in place, whereas now we have the systems in place, but the capabilities are just not on the systems. Yeah, so it will be interesting to see how it develops, especially because we have a very wide array of devices that can do TinyML. I’m reluctant to call applications on mobile phones TinyML, but that’s certainly, one of the first sort of applications of it was you take a neural network and you want to compress it into an application, and let’s say you’re trying to put VGGNet on an app, and that’s like 100 megabytes, and every time you have a 100 megabyte increase on the App Store, Apple gets very, very, like, “What are you doing with this update?” They look at it a lot and it takes a lot of time.

Matthew Stewart (31:10):
And also, you have to be connected to wifi to download an app that big, so it’s obviously a lot better if you can reduce that by a factor of 10 or 100, so these are just common reasons for wanting to be able to compress things. But yeah, mobile phones are a sort of under-realized example of TinyML, but they also, even in the mobile phone themselves, is probably the most prominent version of TinyML that you’ve actually been exposed to, because I can think of two applications off the bat, which one of them is when you pick up your phone, if you have one of the most recent iPhones and suddenly the screen turns on. I mean, that’s TinyML working itself, because it’s using the inertial measurement unit inside, with a very tiny, I don’t exactly know what it is, it might be a neural network, but there’s this tiny chip which is constantly waiting for that to change, and then once it changes, it relays to the screen to turn on, and that in itself is a TinyML application.

Matthew Stewart (32:12):
And the same when you have a phone waiting to hear you say, “Okay Google,” or “Hey Siri.” Yeah, so this is called wake word detection, so the “Hey Siri” is a wake word, and you try to make them esoteric so that people don’t just randomly say them. But yeah, these run on DSPs, like digital signal processors that they have very small amounts of memory, so I think the Google one has like a 14 kilobyte neural network and it’s operating on a digital signal processor, so it’s incredibly tiny.

Matthew Stewart (32:48):
But you even get these applications on devices that you already use, you just don’t realize that that is an application of it, it’s [crosstalk 00:32:55]-

Jeremie (32:56):
[crosstalk 00:32:56]-

Matthew Stewart (32:56):
… devices we already have a these capabilities.

Jeremie (32:59):
Yeah, right. Well I guess it’s, a lot of these use cases, I don’t want to call them trivial, but they’re certainly not, as you said, they’re not mission critical as far as safety goes, and it’s interesting, like the picking up of the phone, these are marginally useful things, and we’re already kind of at that limit, and you can almost see that tension between safety criticality, capabilities, and then interrogatability? Explainability, basically. Like you’re trading off the computer power on the one hand, and then you’re also, I guess the other lever you have to pull is how safety critical is this application? I guess those three things have to dance a very delicate dance to some degree.

Jeremie (33:44):
And I guess that dance gets harder and harder as you move smaller, too, right? Because talking about phones, they have at least some amount of compute power. What are some of the smaller devices that you’ve looked at in your research?

Matthew Stewart (33:55):
So yeah, I guess like phones, phones are pretty powerful these days. They’re more powerful than the first laptops I ever got. I think the most recent iPhones have 4 gigabytes of static RAM, I could be wrong on that, but that’s a lot. And even things like the most recent Raspberry Pis, so this is like a microcomputer, for people that don’t know what that is, but the most recent one I think also has several gigabytes of RAM, and it’s the size of my hand, it’s a very small device.

Matthew Stewart (34:28):
But then you can go to actual microcontrollers, or things like Arduinos and Arduino Nanos and Micros and they have all sorts of names, and then yeah, you’re getting to flash memory use where you can’t just connect it to a computer, you have to go in IDE and you have to compile the code and then flash it onto the devices, and that’s when you get into the issues of, I can’t debug things, because you can’t communicate backwards and forwards, really, except with maybe a sensor.

Matthew Stewart (35:00):
And actually, that was a big issue for when they first started out trying to make TinyML, because the issue was how you get it to say “Hello world,” because that’s like the first thing you do on a device, and what is the TinyML equivalent to hello world? And yeah, I think Pete Warden and some other really smart guys, they came up with the idea of, we don’t need it to say, “Hello world,” we just need it to be able to blink an LED.

Matthew Stewart (35:26):
So what they did was they took a sine wave and they trained an algorithm, a neural network, to be able to reproduce a sine wave. So possibly the most trivial machine learning algorithm you could ever imagine, it’s very contrived, but yeah, so then it obviously goes through this sort of … Yeah, I’m not going to say continuous, but it’s like going between a bit value of 256 and zero, and that correlates to a light intensity on the LED and you can see it sort of flashing on and off.

Matthew Stewart (36:03):
But yeah, I mean, the ability to not be able to communicate with these devices makes it very, sort of, difficult to do a lot of the typical, traditional things that we’re used to doing with these machine learning models.

Jeremie (36:20):
Yeah, it’s always the paradox of new inventions, because you have this early rush to do stuff with it, which in a sense is necessary in order to promote interest so that enough people notice it to go, “Oh, hey, maybe there’s a problem of interpretability here or explainability here,” and then invest resources in that direction. That’s really cool. So what do you think are the developments that we can … Maybe this is not a fair question. What are the developments that we can expect from this field in the next, let’s say 12 months. Are people, is the average person going to be surprised by what we’re able to do with on-device machine learning?

Matthew Stewart (36:59):
Yeah, so that’s a tough question, because I really don’t know what people are planning to do with it. I’m sure we’ll see more of things like on phones, and myself, I work with drones, so maybe a lot more drone-based machine learning capabilities on nano drones specifically, but I think agriculture is going to be a huge-

Jeremie (37:22):
Sorry, how big is a nano drone? Just out of curiosity, what’s the scale?

Matthew Stewart (37:25):
Sort of like 10 centimeters square.

Jeremie (37:29):
Okay, so like pretty big, but pretty small.

Matthew Stewart (37:32):
Yeah. So it’s actually a mildly controversial thing in the field, because there’s lots of different classification for drones, systems that have been developed, and they overlap, some of them, and some of them use weight, and some of them use wing span, some of them use propulsion system. But yeah, so the reason a nano drone is nice is because it’s small enough that if it hits somebody, it won’t take them out. It’ll just give you a bit of a shock, and then it’ll probably break the drone, but it certainly won’t hurt the human. So it’s like you have the kinetic energy of the rotors, it’s not particularly high if you have a very small payload, so they’d have to spin very fast. Yeah, so that’s one of the reasons nano drones are good, so if you want to look at air quality in a warehouse, like an Amazon warehouse or whatever, you can easily do that using nano drones, which is a good way for looking at things like emissions compliance, which is becoming a big deal these days with all these new regulations coming in. So that’s something that I’m expecting to see.

Matthew Stewart (38:47):
But again, this is a pretty niche application. I think agriculture is a big one, because that’s very much an edge intensive area. You want to know what’s going on in the field, you want to know if you have to water something, if there’s bugs going on, maybe you can detect stressors in plants and know that they need some love.

Jeremie (39:12):
Yeah. Well, I mean, personally, I’m fascinated by the idea of regulatory enforcement using these mechanisms, especially in industries where historically, enforcement of regulations has basically been impossible. You mention checking air quality in an Amazon warehouse or something like that, or going out into a field, our entire system of laws and jurisprudence has evolved in a context where implicitly, an awful lot of these laws simply can’t be enforced because we don’t have the data to do that enforcement. I’m really curious whether having the data is going to result in a shift in laws where we realize, oh my God, if we’re actually going to enforce the letter of the law here, now that we can, this is a totally unrealistic law to impose. You might find companies just collapsing under the weight of the law that’s already out there just with better enforcement.

Matthew Stewart (40:07):
Yeah, I mean, you kind of already see that, actually. So yeah, this is maybe a bit out of scope, but this is a very important aspect in the environmental science field, so I forget whether it’s New Mexico or California, but they have an ozone limit that’s been set by some international regulation of 60 parts per billion, which is reasonably low, and it’s much lower than most other countries, because the US is very developed compared to a lot of countries. But you get into difficult territory, because you actually get a lot of emissions from China that come over the Pacific Ocean. They partition into this sort of chemical which is able to withstand the journey without degrading across the Pacific, then it reforms on the western coastline, and it reforms ozone. So then actually you have, California cannot meet its own emission regulations because of emissions from China, and then you get into some weird, like, is this law okay, should we blame China for this? You get into some international battles there.

Jeremie (41:13):
I imagine there’s, yeah, I could, if I were a betting man, I could place a bet on the next decade bringing about a field of research into something like quantitative law, where people have to basically apply data science professionally to legal practice and start to say, “Well, what’s realistic here?” If we had this ability to monitor systems, monitor people even, which is I guess a whole other direction that this might take. It’s like, yeah, I mean, how do you crunch these numbers, how do you determine whether somebody is net net in violation of something, because presumably, I mean, if you look at a warehouse, I’m sure you’ll find a distribution of different emissions in some pockets. Maybe it’ll be above threshold, maybe in the vast majority it’ll be below, and then you have to get a little bit more nuanced. Like, is it the average, is it the fact that you have one pocket that’s dangerous that matters, and so on. Anyway, I’m really fascinated to see where this all takes us in the next few years.

Matthew Stewart (42:09):
Yeah, I could honestly see that happening. I mean, it might sound like a sort of batshit crazy idea initially, but yeah, I guess-

Jeremie (42:18):
[crosstalk 00:42:18]-

Matthew Stewart (42:18):
… the data science application to the legal field.

Jeremie (42:21):
Yeah. Well, you’re definitely playing a big part in building this future, hopefully for better, because so much of this is environmental applications. But are there any final thoughts you wanted to share on the sort of TinyML or environmental data science kind of things?

Matthew Stewart (42:36):
I guess, only that it’s sort of very, very early stages, and there’s not many resources on it. But certainly, if anyone wants to sort of play around with TinyML, all you need is something like an Arduino Nano, and perhaps an Arducam, and then you can go on to TensorFlow Lite and have a bunch of examples, you can sort of play around with it, and yeah, you could actually develop devices that could be used in your house as security cameras, magic wands I think was one of the examples. So yeah, I encourage people if they’re interested to play around with that and see what they can create, because everyone’s on a level playing field at this point, there’s no real experts on this area.

Jeremie (43:22):
A comforting thought to wrap up the conversation on. Thanks so much Matt, I really appreciate it. Do you want to share your Twitter handle? And we’ll share it on the blog post that’ll come with the podcast as well, just [crosstalk 00:43:32]-

Matthew Stewart (43:32):
Oh yeah, sure. It’s just MatthewPStewart. Very simple, yeah.

Jeremie (43:38):
Nice, awesome. Well, really appreciate it and thanks for coming on.

Matthew Stewart (43:40):
Thanks for having me. Take care.


Source link

Write a comment