If "Online RE Listings Fall Short", Imagine The Reports

Posted by urbandigs

Sun Jun 12th, 2011 12:30 PM

A: Long time readers of UD know how long we spent scrubbing the RLS data to build the real time tracking platform you see here today. I now progress was slow, but when it comes to engineering out data integrity issues embedded in 30,000,000+ broker status updates over 6 years, you can imagine how daunting a task it was. Make no mistake about it, the reason we spent 14 months data mining (mid 2009 to late 2010, site launched in October) was because we knew that the reports the brokers and consumers use to track the marketplace were 100% dependent on the quality of the listings data. The UD 'data cleansing process' is the secret behind all the tools available to subscribers and chart accuracy is our ongoing primary mission. Afterall, the reports are only as good as the data behind the scenes.

The Wall Street Journal discusses how "Online Real-Estate Listings Can Fall Short" today and states:

As home buyers cautiously re-enter the market, they're arming themselves with information found online far more than what existed pre-housing crash. A record nine out of 10 house hunters searched online last year, according to the National Association of Realtors.

But with this great migration online has come a new set of obstacles, including errors, out-of-date information and properties that are listed on the Web but aren't actually for sale. The most common problems are simply errors...

For example, some real-estate agents keep listings on their personal websites long after they've sold; when home buyers contact the agent inquiring about the property, they're instead pitched new properties that might not meet their criteria, says Leonard Baron, principal of real-estate consulting firm LPB Services.

Such lagging information is more common with smaller firms' websites and could be a function of real-estate agents simply forgetting to update those listings, says a spokesman for the National Association of Realtors.

Either way, for buyers, it's a waste of time.
Granted this article was written about MLS's across the country, it still applies to the way Manhattan listings are shared.

We should always be reminded that when it comes to processing 1000s of listing updates a day from 100s of different agencies, errors occur! The goal should be to consistently stay on top of evolving new issues that pop up with time. Trust me, the firms and the brokers are aware of it and I always speak out to the industry to UPDATE YOUR LISTINGS!! Its for the better of everyone that data is up to date and accurate! Deep down, I believe that all the major firms care way more than many think about data quality.

Without divulging too much, I can tell you a few integrity issues that have affected the Manhattan listings data:

1) Errors in the sharing process
2) Internal system triggers / Redundancy
3) Stale/Obsolete Listings
4) Violation of natural flow of listing process / ACRIS as verify point

Let me explain.

1. Errors in Sharing Process

To join the Rebny Listing Service (RLS) you must be a Broker A and choose one of the vendors that handles data sharing. Your choices are Realplus, OLR, BrokersNYC, and RealtyMX. Most firms use either Realplus or OLR. Therefore, all data and status updates from these vendor clients (the brokerage firms) are sent to a processing mechanism that takes in the update and then delivers it back out to all REBNY member firms. This is how an agent's listing at say Halstead is eventually shared by a broker at another firm.

Sometimes there are errors in this vendor sharing process. Here is an example of one individual listing exposed to this integrity issue at 505 West 47th, Unit 4EN:


Imagine how a reporting platform would deal with such issues? Due to this error alone, this listing can and likely is counted dozens of times as a new status change is processed. Even the price and the maintenance are constantly switching. There are varieties of this kind of problem and all had to be dealt with individually. Its not as simple as removing duplicates! If you want accurate charts, you need to adapt to the poisons that are doing more damage than good to the measured data.

2. Internal System Triggers

Our sharing system has internal triggers when a listing expires that can affect the status of that listing. This will result in some listings switching to an Off-Market state, when in reality the listing is still either ACTIVE or IN CONTRACT. Unless this is properly engineered, a listing can be placed in the wrong category and measured incorrectly in the data reports.

3. Stale/Obsolete Listings

A major issue. As I write this post I see that OLR shows Total Manhattan Inventory at 10,800. Our systems show Manhattan Active Inventory at 7,824. That means that OLR is 25% higher than UrbanDigs' measure of Manhattan supply.

I would guess the main reason lies in how the settings to measure what is counted as an 'Active Unit of Inventory' are tweaked. In the UrbanDigs platform, every listing state has rules that define when a listing should NOT be counted due to a lack of status updates by the listing agent.

I don't want to give away our proprietary settings, but I assure you the proper research was done in order to figure out the best settings for each listing state (i.e., active inventory, pending sales, and off-market inventory). Just as the WSJ.com article states, "around 21% of the data agents individually submit for posting on real-estate websites isn't updated when changes are made to the price or when the property is sold". That applies in our market as well and the UrbanDigs real time platform was engineered to focus on the freshly updated listing information being processed in the front end, while ignoring the ongoing obsolete information going out the tail end. Every day our charts/data tables take in fresh updates and spit out obsolete ones! That is why our platform is so sensitive to real time changes that are occurring as you read this. It is also why our charts tend to 'fit' together and compliment each other; i.e., pending sales properly leads the pace of ACRIS sales.

4. Violation of natural flow of listing process / ACRIS as verify point

Our daily ACRIS sales feed has multiple functions in the UD tracking platform. One of these functions is to act as the "verify point" for real time inventory data.

No longer can stale listings "infect" our charts. We engineered our systems so that as sales data becomes publicly available, it will override any listing state in the Manhattan RLS feed. So, if a broker fails to update a listing to SOLD, we will do it as the sale is filed with the city register and all charts will be updated accordingly. In addition, no one individual listing can ever be counted in more than one listing state (active, pending, off mkt, closed) at any given point in time. In the end, its all about data integrity.

Which brings me to the final point. If the data is not up to UD quality standards, then we will NOT build reports/charts around it. This is what brokers and consumers need to get used to. Here are two biggies:

1. Size per square foot - Fact is, 70% or so of the Manhattan housing stock is co-op. Most of the firms/agents in the RLS leave the "SFT" datafield blank. Those that are entered are estimated and usually artificially inflated. Condos are much better at having this information provided. Either way, building any chart that shows avg price per sft or median price per sft is going to be exposed to

a) big time incomplete information and,
b) the inflation rate for co-op data that was estimated and published.

There is a reason every website has a variation of the following disclaimer: "All measurements and square footages are approximate and all information should be confirmed by customer."

QUESTION: Do you really want to spend your time analyzing a chart that uses such incomplete or inflated data?

The answer should be no. The numbers may look pretty, but the foundation underneath that supplies the information for the chart is very weak. Try to only use price per sft trends for condo buildings where the square footage is confirmed in the offering plan.

2. Using # of Bathrooms, not bedrooms, for Inventory Breakdowns - Looking at all the source data, we see it all. I can't tell you how many 2BR apartments, are listed with only one bathroom. There are also plenty of 3BR apartments with less than two bathrooms. The reason is because sellers want agents to market their property in the best light possible, even if that means adding a 'BR' to the total count when there is a possible conversion space available. This flaw leads me to believe that 'Bedroom Count' is over-inflated.

QUESTION: When generating a chart for 3BR apartments in the UES, do you really want units of inventory with less than two bathrooms to be included? What about when generating a chart for 2BR apartments in Midtown? Do you really want listings with only 1 or 1 1/2 bathrooms to be included?

The answer should be no. You want to generate a chart that shows you a trend that is representative of the actual submarket you are buying or selling into. That is what its all about, and that is what we focused all our attention on delivering. Real-time, dynamic yet very accurate charts that the user can customize. If you have a 2br/2bth apt in UES, you only care about other 2br/2bth apartments and how have the supply/demand trends been in that granular submarket! You don't want units of inventory that don't belong infecting the trend you are analyzing.

So, we looked at every unit we can breakdown apartments by and settled on 'bathrooms'. Think about it, when do agents over-inflate the number of bathrooms in an apartment? Hardly ever! Its as high quality a unit to build a chart with as possible from the source database.

This site is still a work in progress but the hard part, the data mining and scrubbing process, is done. Now at least you have real time tools that you can trust, specifically engineered for the Manhattan housing marketplace. That is where our mission started and will continue in the years ahead. Expect tons of more reporting tools in the next 6-12 months.