Announcing OpenTestData.org, plus Q&A with Jason Arbon on Why This Matters

Jul 22

Test.ai is excited to introduce OpenTestData.org, an open-source repository of test data for teams to speed up testing.

If you've ever tested an app, you have probably wondered: wouldn't it be a lot easier if I could just go to one place and find the test artifacts that I need, labeled and ready to go? That's the idea behind OpenTestData.org. Especially as apps become more powerful and AI-centric, app teams need more test data to ensure apps work the way they're supposed to. Rather than always creating testing plans from scratch, testers can leverage the existing artifacts in OpenTestData.org to get a jumpstart on established testing patterns and avoid re-inventing the wheel.

OpenTestData.org is built from the contributions of professionals in the testing community. We are looking for contributors to help build a foundation of test data for use by the community. Those interested can get started by visiting http://opentestdata.org and clicking Join.

This idea has been marinating in the minds of both Jason Arbon, CEO of test.ai, and Jonathan Lipps, founder of Cloud Grey, for quite some time. I sat down with Jason to chat about why this matters.

Why did you and Jonathan Lipps of CloudGrey start OpenTestData.org (OTD)?

Jonathan and I have both seen test teams struggle to come up with great test data input. Sure, they can write a lot of manual, or automated tests, but coming up with test input values is a ‘creative’ task today when it should be far more of an ‘engineering’ task. We want to elevate this aspect of testing to something more akin to an engineering discipline with standards, consistency, and benchmarking, etc.

For example, have you ever caught yourself dreaming up different values to test an address field in your app? You try your home address, 1600 Pennsylvania Ave, your work address, then you start to run out of good ideas. What is the longest valid address in the U.S.? The shortest? An invalid address? What about test input for that email field?

All these great test input values will be stored on OpenTestData.org so you can download all sorts of great test addresses, emails, etc. More importantly, it's not just quick test data, but data curated by the community. The best address field testers in the world (i.e. the folks at FedEx, UPS, Amazon) know a lot more about address test inputs than the average tester, so it makes sense that they should share their knowledge with the rest of the testing community. They are the experts and everyone can benefit from their experience.

As some folks may have noticed, I’ve been noodling on this idea for a while now. I bought the domain name, perhaps 3 years ago. There were two real triggers for us to get moving on this. Jonathan and I were working together on the new AI-based selectors for Appium, and we both really shared a lot of the same thinking around how to design test automation and discussed this idea. Secondly, a selfish interest is that I now have a lot of new AI bots to feed with data. :) These bots are testing thousands of login screens--they could be testing even more if they had great test data to pull from. So, opportunity, validation, and value all coincided.

What is the mission of OTD?

The mission of OTD is to facilitate the sharing of test artifacts among testers. Most test automation starts from scratch every time, meaning that your test case design is limited by the people you happened to hire and how creative they are feeling at the time.

It is a two-sided market. We need testers willing to contribute their test data and testers willing to trust other testers’ data enough to download and use it. We spent a lot of time thinking about how to motivate both sides of the market. Testers who upload will gain ‘points’ and ‘badges’ when other testers upvote and download their test data. Testers who download great test data will benefit from easy-to-use and best-of-breed test inputs to parameterize their test cases. The badges and points are designed to motivate testers to also share OTD with the community.

At its root, the mission of OTD is to build a community among testers, driven by real-world test artifacts themselves, not just meetups, or conferences, or social media.

In an ideal scenario, what’s the best outcome you could hope for with OTD?

The best outcome for OTD would be one where two things happen. One, the best test data designers in the world are recognized for their contribution, and two, 80% of test data input for manual or automated testing is pulled from OTD and those test results are used to benchmark the basic quality of an app. If folks are using the same test input, you can see how your app compares in its pass/fails versus other notable applications. Ultimately that means the beginning of quantifying quality, which will help turn software testing into more of an engineering (vs artisan) discipline, improving quality at scale.

How do you anticipate OTD to be used?

There are two primary usage scenarios. Manual testers will download lists of these test data inputs and use them to create variations in their regression or exploratory testing cycles. Automation test developers will connect in realtime to an OTD API, asking for test data with a particular ‘texture’. For example, the test automation code for the address field of an application can query for an [‘address’, ‘U.S.’, ‘negative, ‘edge’], retrieve 10 address values that shouldn’t ‘work’, and try all 10 of them in a simple loop, delivering more coverage.

The test case data is tagged with some built-in primitives such as ‘negative’, or ‘positive’, indicating whether the test should be expected to pass or fail. The community can also add arbitrary tags to the data to further refine test data queries.

What are the pros and cons of test data being openly available for use via OTD?

Testers always think about the cons. :) There are cons, or at least I can anticipate some angst based on early discussions with folks.

IP: What if testers upload the intellectual property of the company they work for? In practice, companies just don’t see test artifacts as IP they want or need to protect. Often they are eager to share their test cases if someone else is trying to help. For the testers who are paranoid, they should simply ‘clean room’ the test input values on their weekend time. Or they could not participate and go on about their day.

PII: Personally Identifiable Information could be uploaded. Perhaps a tester accidentally uploads patient medical records or other ‘dangerous’ information as test input data. There will be a small group of reviewers who will review the data before it is made widely available. Also, there will be a simple mechanism to request a takedown of data.

Offensive: Similar to PII issues, if folks upload offensive material it will also be reviewed before public availability and have quick ‘take down’ buttons.

Dumbing down Testers: Perhaps the most anticipated, and ridiculous argument/concern, will be that the availability of OTD data could ‘dumb down' testing--removing that creative aspect of humans when they test, as they can simply download hundreds of tests without thinking. The response to this is that they must not have much faith in their fellow testers. Testers will actually learn the patterns of highly-rated test input data from experts by looking at the data, and from there, should use the OTD data and use the time they saved to come up with application-specific test cases. Not having to worry about the mundane should make their testing better, not worse.

The pros? Better testing, better software quality, and awesomeness.

How can people and/or companies contribute to the OpenTestData.org project?

Anyone can signup with just an email address. Folks can even be pseudonymous or have multiple accounts. Uploading test data can be done via a simple CSV file or using the API. We are working to make the process of contributing as easy as possible.

The code behind the project, not just the data, is also open-source, so folks can request to contribute to the GitHub project and add even easier ways to upload data or make the site or API better.

How will the test data submitted to OTD be available to people/ companies that want to use it in their own projects?

The data in OTD is open source and available to all, both humans and robots. Humans can use the web site to search for the test data they want (by tag), and then simply click the download button for a CSV file. For all those robots out there, there is a REST API they can call to search for data and the server will respond with that data in realtime.

Note: Please be nice/ smart about caching, test.ai is footing the bill for hosting.

You are the CEO of test.ai, how does OTD relate to your company?

Materially, we are funding the development of the project and covering hosting materials and will be doing a bit of marketing work to get the word out.

Just like human testers, the AI bots will soon use this data as well to parameterize their testing.

Most importantly, I think in the future, AI-based testing artifacts like object classifiers, or AI-based test automation and test results could also be shared on OTD. For now, though, we start humbly with sharing test data.

Ryan Chan