Lessons from the first broad testing #1

Last week we introduced the Calibur pocket box. It is once again a major leap in product design and features. Testing it is the last step before actual production.

The plan is something like this:
1.Build a very reliable, stable and accurate device
2. Use that basis to build on all the smart features in the application.

We ran a broad épée test back in November which served as a way to pinpoint the weaknesses, analyze the data and incorporate it into a new device. It’s worth summarizing the major takeaways. Today we’ll cover technical issues and how we addressed them:

Connectivity

Let’s jump start to connectivity. Connectivity is the bread and butter of a wireless system. You can only make wireless as good as the quality of connection is. It is judged by 2 factors: stability and latency. To put it more simply how often the connection drops and how much time it takes for a hit to display. The boxes we shipped in November had inconsistencies in both fields, so we put a lot of effort in fixing it.

Testing connectivity in different setups and distances

Stability:

How we measured it

A simple way to measure it is how far you can get without disconnecting? Mark a 15 meters strip every 5 meters (having the scoring at the middle of the strip means not any of the fencers would get farther away than 7m and in most cases, even much less), put the devices at each mark and note the results, then repeat with the device in the backpocket, then again while covering it, then with walls in between, then underwater in a tin suit… on each and every phone.

In general Apple devices disconnected much sooner than the ones with Android, but we were able to improve on both systems. The development is pretty obvious on this part, we were able to drive down drops to practically 0. Tested on a dozen different, low-end to high-end phones and tablets connection always remained active. The challenge is now to find an otherwise well-functioning phone that fails this test and I’m happy to tell that we haven't find one.

Latency:

The latency is a bit more complex issue. How long is good enough actually? We found that the threshold for unnoticeable delay to be around 100ms.

How we measure it

Latency consists of 2 parts: how long does it take to send the data and how long does it take for the phone to process it. Let’s check the first part, so back to the marked strip. Put 2 synchronous clocks to the pocket box and the phone and repeat the stability test but this time at each mark sending the same data packet.

The results are again very promising. The delay dropped significantly and stayed low up until around 15 meters. So now it is up to the phone to process it.

Getting the phone's processing time down proved to be harder to optimise than anticipated. But it’s clear that the delay is mainly caused by the application now which will perform much better after restructuring. More on that later.

Accuracy

The idea that accuracy is based on the utilization of the data pool is a sort of chicken-egg problem. To have people using the devices they need to be enjoyable. To be enjoyable people need to use it. So the deal with the broad épée test was that people would be using them even when it’s not so enjoyable and we will quickly follow-on with updates and by the end of the test they will become quasi-replacement for any system on épée.

After a very strong start, restrictions started to kick in in December basically everywhere and the incoming data started to slow down. But we still managed to get thousands of bouts and tens of thousands of touch data. Huge thanks to our testers for keeping up despite all the hurdles! Two things became clear: 1. That the data will be not sufficiently big with the current model (it's not just a matter of the total volume but the constant flow, to make comparison after every version) and 2. that we certainly underestimated how much work it will be to maintain a system and to develop a new one parallel. So we moved forward and put the focus on incorporating the existing data into the next version.

touching plastron, mask bellguard then interchanging the weapons.

The goal was to make accuracy high enough so that fencers will enjoy using the devices in training sessions and the rest of the data will come much easier. The 2 major questions in accuracy are: does it go off when it shouldn’t and do we need calibration to avoid that? We were able to improve tremendously on both aspects.

How we measured it

How-it-works grey box:This is actually the most complicated part. The touches are validated by a model dependent on the electric properties on the weapon. To put it simply we measure these properties, gather them in the cloud where machine learning algorithms process them. Then the validation model is updated based on the new data and feeded back to devices through the application. Instead of measuring one property in the previous iteration now we do 3 and the machine learning fetches those together.

TLDR: The general direction is very clear latency is down, accuracy is up. We are getting through the 3rd wave (so far) of the pandemic and with local clubs shut down, it's more tricky to find the ways for sufficient testing. In our testing the system works well in 90-95% of the cases. So the job for the AI is to close the remaining gap particularly on rare cases. Global testing will start next month with the goal of finding the outlier cases. As soon as the data feed start to improve again so will the system.

In the next posts we will cover the general feedback and future features but for ending watch this video of a short bouting in the office.

short bout in the office

Product evolution #2

We nicknamed the project “wire eater” among ourselves so I will refer to the devices asWE versions. The plan was set in motion: ship WE-1, incorporate feedback and develop WE-2 within 3 months. The goals for WE-2: make the app cross-platform (Android and iOS), get bellguard-grounding-accuracy to 90-95% on épée and deliver over 100 devices for the clubs to test. In other words a broad beta test for WE-2. We planned the testing to take place in November.

The sprint started by mid-August and the first clubs buying WE-1 planned to restart fencing in September. We agreed to deliver for the reopening. For the first few weeks development and production went simultaneously, but delivering a product for actual customers was very exciting. We finished just in time:

Some pictures from a training session @ PSE. The kids had fun, and we gathered valuable insights.

Néhány kép a PSE edzéséről. A gyerekek jól érezték magukat mi pedig sűrűn jegyzeteltünk.
Posted by Calibur Fencing on Thursday, 1 October 2020

The kids intuitively started to use our products really enjoyed themselves

In the meantime we made a detailed roadmap for WE-2 with the 3 goals in mind as above. Let’s get through them one by one.

Making the app cross platform

It shouldn’t take particularly precise planning. We take what we have for Android and replicate it for iPhones, right? Well, Apple strictly controls everything and why wouldn’t that be true for wireless devices. If we want to connect something to iPhones we need to use a wireless chip approved by Apple. If you had to guess whether we used one like that or not, where would you put your money, and why on not? WE-1 only supports épée and does not have any grounding capacity, but it has a very stable and fast connection with the phones. That part was fine tuned already. Changing the chip means to throw all that away, and restart.

Getting accuracy over 95%

Bellguard grounding should work 9+ times out of 10 in test environment. Our model is based on that a larger data pool is needed to operate. How would we get that? Obviously it’s not an option to gather that at the clubs every time we change something. So a lot of nights went like this:

999 green bottles on the wall, if one green bottle should accidentally fall, 998 green bottles on the wall…

Both of the goals turned out to be an uphill battle especially in the given timeframe. All the new wireless chips turned out to be either too slow or dropped connection too easily with the phones. Every time we tested out something in the lab and took it to a real training session it completely failed. Special thanks to OSC and their fencers for letting us do our it-worked-yesterday-I-swear show twice a week. The connection drop proved to be a very stubborn issue, basically disrupting every other test we tried to run.

[supsystic-gallery id=6]

OSC helped a lot, sharing their space to test

Producing 100 pieces

We were approaching the end of October. The clubs who signed up for testing were all set, waiting for their devices, we just finished the new enclosure design and for the 1st time the application was approved on both platforms.

[supsystic-gallery id=8]

It's much more convenient to connect the yellow penguin than a QR code

But the connection issues still stayed with us. We had only a week left to finish and we put our bets on a last resort solution in changing the antenna design. We urgently needed a plan B and getting the devices out of the pockets made the connection slightly better.

[supsystic-gallery id=7]

Desperate times call desperate measures

I shouldn’t get into details about the things we tried on the last weekend in a rush to find a way to attach them to the body. The new antenna design, though far from perfect proved to be above the threshold and we decided to give it a go.

We’ve made everything ready and after an intensely laborious week with 3 hours of sleep on average we tested, packed and labelled all the packages ready for shipping. And the épée beta started.

Final touches before sending out beta test packages. (Nov.11.)

Utolsó simítások mielőtt kiküldtük a beta teszt csomagokat.
Posted by Calibur Fencing on Sunday, 22 November 2020

Getting to test our devices

As the next stage of testing is approaching we are looking for avid testers on foil and épée. You can take part in 2 main ways: by signing up or by preordering. (For more info, see at the bottom).

How is the testing planned?

Due to COVID there are still many uncertainties around development: First, most of the fencing venues are forced to be closed so it’s hard to test in real-life conditions. Second packages are very often delayed, for some parts we needed to wait nearly 2 months to arrive. Moreover shortages occur much more often and sometimes even basic parts can be missing. Third, in case someone in the family or friends got symptoms associated with the virus we also need to take measures. And it is not really an option to do hardware development remotely.

These factors make planning significantly more complicated. However we do as much as possible to keep the development on track. So we a more detailed addition (below) to the public roadmap leading to product release. For understanding the roadmap let's get some background knowledge on testing stages. We can split testing into 2 major parts: alpha and beta-testing.

Alpha testing

We test mostly in-house. It usually goes like this: we build something, get into fencing gear, try it out, fail miserably then start over. The goal is to get a proof of concept. We get through a wide range of ideas and possible features. (We have a small desk dedicated to our trials what we call the Calibur museum. In a future post we will get through some of them.) When we do this part enough times and start to feel the need to share the experience we go out to local fencing venues. This phase is the

Beta-testing

We can also split it into 2 steps: narrow and broad beta-testing. In narrow beta-testing we get to local venues and test some specific part of the device, like connection or certain types of situations during bouts. We have 2 goals while doing this: get the concept accurate and stable enough. As soon as we cross the threshold we set up we can start broad beta-testing. In this phase we send out devices to all our partners to use them as they would during normal fencing. The goal is to have broad usage feedback and to collect data.

As you can imagine the broad testing is the most resource hungry part of the testing and since there are 3 disciplines in fencing we would need to do it 3 times.

Good thing that we set up a model that takes much of that burden off of this phase. Since the product will constantly adapt to the data it gets, in some way if you will buy the device you will be a broad beta tester and help all the other fencers to have a better product. Thanks for your great work!

Testing roadmap for the latest version of the device

On the bigger scale we conducted beta testing on 3 versions already. Last time in November a broad beta on 100+ devices for épée. That went well and since then we've rebuilt the system from the ground up to be better suited for 3 weapons. We plan product release in April so the devices will be very close to the actual products in this run. If things go as planned we will need to adjust only the software after that. The red lines just before product release represent the final broad beta-test.

How can you participate?

Signing up

First, sign up below. Second, hop on a video call with us. We try to have the most diverse set of testers so we will send out video call invitations so we can get through the details. Third, receive your package and fence. The base requirements are to use each device at least 5+ hours/week and to be able to take weekly/bi-weekly feedback calls for around 4-8 weeks.

Preordering

Another way for participating can be if you preorder a set and request a testing device. In that case you will receive a test set before and later your final set as well. So anyone in the first batch of preorder can request test devices. Preorder will start this month.

Give aways

The third way is to receive one in a give away. We will give away for the most active members of the community and to university clubs etc. The give aways will be announced by e-mail, to receive it just sign up below.

So, please sign up if you didn't already and help us getting out the word by sharing this article.