Where We Get Our Data

– □ x

OUR MISSION OF
RADICAL TRANSPARENCY

Where We Get Our Data

Data. It’s everywhere. But which data can you trust and which will lead you down the dark path toward despair and demise? We use scraping to extract data from “trusted” sites on the web and compare them to one and other to find inconsistencies that help us deduce things like trustworthiness, confidence, etc.

What makes us different? Our data collection process uses a rigorous, evidence-based, data-driven scientific framework. The data we present is both reliable and insightful, helping you navigate the vast information landscape with confidence.

EXTRACTING DATA FROM THE WEB IS DECEPTIVELY SIMPLE.

Since the internet is filled with over 6 billion indexable pages and 1.2 million terabytes of data, identifying reliable information is paramount.

While the internet uses many complex technologies to deliver webpages to your browser, my focus here is to give you a clear understanding of how we gather data and transform it into meaningful data that’s accurate and relevant.

We employ a team of humans (yes, real people) that manually collect the data. Scraping with machines can be difficult, time-consuming, and most importantly, error-prone. Humans make less mistakes. As individuals, we make preventable errors every day.

Here’s an overview of our data scraping methodology:

Expert Test Results

Our team manually scrapes trusted expert test results across the web, collecting quantitative data for key Performance Criteria.

We then average the test results on our guides and reviews, carefully review them, and compare them to other products. This process provides you with a comprehensive perspective on each product’s performance.

Before we do the above, we must first develop a Testing Methodology for each category. This groundwork determines what Performance Criteria are truly important to test or evaluate.

Expert & Customer Product Ratings

The team scrapes Expert Product Ratings and Customer Ratings.

We manually collect Expert Product Ratings from only trusted experts who earned a Trust Rating of at least 60%. The Trust Rating evaluation is a completely manual process.

The Customer Ratings are collected with an AI tool which analyzes customer sentiment, relevance, and authenticity.

Together, the ratings are synthesized using our Bayesian model to calculate the True Score for each product.

How do we know what to test? First, we’re testing experts with years of experience, and we know exactly what needs to be tested and what test data to collect.

Second, the Testing Methodologies are the foundation to our entire process, from the Expert Test Results to the Trust Ratings. They synthesize trusted expert testing processes and real customer pain points to determine which Performance Criteria matter when testing a product’s performance.

You’re welcome to explore our testing methodologies in each category, such as TVs, soundbars and speakers.

By being transparent about our data collection methods, I aim to provide you with insights you can trust, helping you make informed decisions in a complex digital world.

WHAT IS SCRAPING?

In a nutshell, scraping is the harvesting of data from a web page or similar resource. It is sometimes referred to as ‘web scraping’, ‘web harvesting’ or ‘web data extraction’.

HOW DO WE USE IT?

PRICE MONITORING

We look for price changes, sales, new, refurb and discontinued products and more.

NEWS AGGREGATION

We use sentiment analysis, as an alternative data source for products, services, etc. featured on the site.

SOCIAL MEDIA

We look for social signals & Influencer activity which includes looking at follower growth and other mechanics.

REVIEW AGGREGATION

We extract reviews from a range of websites related to products, services and more.

SEARCH ENGINE RESULTS

We monitor search engine result page (SERP) activity. Including videos, images and marketplaces.

HOW DO WE DO IT?

Due to our system, we created a multi fascinating process we call TAVNO (Target, Acquisition, Validation, Normalisation & Output). In order to build and maintain our model, all elements play a key part.

In principle, the process runs sequentially. However, each segment of the process has its own set of parameters and arguments which must complete in order to finishcomplete its cycle. I won’t go into too much detail on the individual segments but here is a breakdown of the model:

TARGET: Websites are selected based on a list of predefined criteria; we then analyze the structure of those sites and identify the valuable information and metrics contained within them. Each site is typically unique. Therefore the process can be wide-ranging.

ACQUISITION: The second stage involves using the information we have identified in the previous stage to define data points. We then scrape the resource for the most insightful and important information.

VALIDATION: Accuracy of information is of the utmost importance. We adhere strictly to the computer science principle of GIGO (Garbage In Garbage Out). So we validate all information we have collected.

NORMALIZATION: Once we have acquired the information by scraping and validating it, we use a variety of techniques to normalize the data and create meaningful relationships between various metrics.

OUTPUT: To use the data as meaningful information. We use a set of predefined associative data structures to map the information onto our site. This enables us to provide actionable insights for you to utilise for a wide range of decisions.

PUBLIC DATA _^VS PRIVATE DATA

We’ve been throwing the word “data” around a lot. Everyone has it, and it hasn’t always been positive. So let’s get ahead of some questions and make things clear.

HERE’S WHAT WE’RE NOT DOING

If the content is behind a paywall and not publicly available, we don’t collect it. It’s private and we treat it as such. However, if the data is accessible by way of a public means, such schema data that is often collected by Google, then we’ll collect that data.

HERE’S WHAT WE ARE DOING

Generally speaking most review websites are public. Some are pay walled, some require an email sign up. To that end, we don’t collect all the content and then republish it. We summarize the finding of publications and use their scores and test data to expedite and simplify the customer journey saving you, the reader (or consumer) hours of research before you buy your next product.

Public data is the name of the game. If it’s public, it’s legal. That’s what was ruled by the US Ninth Circuit Court in HiQ Labs, Inc v. LinkedIn Corporation.

If it’s not public, it’s not legal to scrape. That’s it. We have no use for personally identifying information. We’re helping you make decisions on products by giving you facts about products, testers, testing criteria and markets. We’re not trying to appeal to whether or not you like the color red.

Why We Exist

WHERE WE GET OUR DATA

How We Test the Testers

How We Score

HOW WE MAKE MONEY

How we Write (SOME) CONTENT