From Search to AI Architecture

By Kevin Watters September 5, 2018

Teaching AI from Search

Then:

Enterprises spend a lot of time to make their content well-curated so that it can be exposed in a search engine

Now:

Enterprises spend a lot of time to make their content well-curated so that it can be exposed to use in machine learning to train models and power AI.

Search Architecture is AI Architecture.

I feel these two statements speak for themselves, in that there is a huge overlap in both use cases. It’s no secret that Enterprises are focusing a lot of development attention towards projects that include machine learning and artificial intelligence. Some projects are more successful than others. I believe that enterprises that have a firm grasp on their enterprise search infrastructure are positioned to be exceedingly successful in their AI and ML development efforts.

As it turns out one of the biggest challenges for creating artificial intelligence is access to data. Data scientists need to be empowered to assemble the training data set as they see fit for the task. Unfortunately, on many projects this data access has been a blocker that prevents teams from reaching development milestones.

It’s often the case that a data scientist might be handed one static set of training data from which to train and build a machine learned model or classifier. Data scientists often have little option other than working with data provided.

When on a project, data scientists may often hear that some data set exists within the enterprise. But sometimes the exact location of that data is unknown, or access to that data is prohibitively difficult. The models produced are only as good as the data provided. Additionally, data in the real world changes over time. The feedback loop to the data scientist is often left unimplemented.

Businesses that have taken control of their enterprise search solution have a clear advantage over other businesses. Those enterprises long ago discovered that being able to connect to all of the different data repositories and pulling them into a centralized search is the only way to provide uniform and consistent access across all the digital assets of the firm.

The reality is having a robust enterprise connector framework that is agile that is easy to on board new data sources is exactly the same framework that the data scientists need in order to properly generate their training data sets. Having a framework that allows data scientists to map disparate data from different data sources to get it into a consistent format that is suitable for use with machine learning is often the most difficult task a data scientist faces in their day to day work.

This begs the question: Why would you want your data scientist worrying about the details of which systems the data they need is stored in? The most value from a data scientist will come when the data scientist has easy access to all of the data through a consistent API. The more time the data scientist spends debugging database connectivity issues, the less time the data scientist can be adding value with their specific skill set.

Perhaps at this point some light bulbs are going off. Why not use the search engine to create the training sets? Well, I believe this use case is addressed by search platforms that support “deep pagination”. Deep pagination is the act of running a search against a search engine and then proceeding to iterate through a very large number of results, sometimes even all the way to the end of the result set.

Historically, search engines have always excelled in extremely high performance retrieval of data. Modern search engines have added specialized APIs to facilitate the export of this data. Solr introduced the export handler, Elasticsearch has the Scroll API. Both of which have the ability to export at an extremely fast rate all of the documents that matched a query. These result sets also can include very important metadata about the data set being exported such as key statistical items like min, max and standard deviation of the data being exported. Many machine learning algorithms need this sort of information so it can properly normalize it for training.

Are you using search to power your AI and ML projects? Do the data scientists complain about access to data or having to normalize data? The solution to those problems might be easier to find than you think.

Teaching AI from Search

Search Architecture is AI Architecture.

Privacy Policy

INTERPRETATION AND DEFINITIONS

INTERPRETATION

DEFINITIONS

COLLECTING AND USING YOUR PERSONAL DATA

TYPES OF DATA COLLECTED

PERSONAL DATA

USAGE DATA

TRACKING TECHNOLOGIES AND COOKIES

EMBEDDED CONTENT & PLUGINS

GOOGLE WEB FONTS

YOUTUBE

AKISTMET

USE OF YOUR PERSONAL DATA

RETENTION OF YOUR PERSONAL DATA

TRANSFER OF YOUR PERSONAL DATA

DELETE YOUR PERSONAL DATA

DISCLOSURE OF YOUR PERSONAL DATA

BUSINESS TRANSACTIONS

LAW ENFORCEMENT

OTHER LEGAL REQUIREMENTS

SECURITY OF YOUR PERSONAL DATA

DETAILED INFORMATION ON THE PROCESSING OF YOUR PERSONAL DATA

ANALYTICS

GDPR PRIVACY

LEGAL BASIS FOR PROCESSING PERSONAL DATA UNDER GDPR

YOUR RIGHTS UNDER THE GDPR

EXERCISING OF YOUR GDPR DATA PROTECTION RIGHTS

CCPA PRIVACY

CATEGORIES OF PERSONAL INFORMATION COLLECTED

SOURCES OF PERSONAL INFORMATION

USE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

DISCLOSURE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

SALE OF PERSONAL INFORMATION

SHARE OF PERSONAL INFORMATION

SALE OF PERSONAL INFORMATION OF MINORS UNDER 16 YEARS OF AGE

YOUR RIGHTS UNDER THE CCPA

EXERCISING YOUR CCPA DATA PROTECTION RIGHTS

DO NOT SELL MY PERSONAL INFORMATION

WEBSITE

MOBILE DEVICES

“DO NOT TRACK” POLICY AS REQUIRED BY CALIFORNIA ONLINE PRIVACY PROTECTION ACT (CALOPPA)

CHILDREN’S PRIVACY

INFORMATION COLLECTED FROM CHILDREN UNDER THE AGE OF 13

PARENTAL ACCESS

YOUR CALIFORNIA PRIVACY RIGHTS (CALIFORNIA’S SHINE THE LIGHT LAW)

CALIFORNIA PRIVACY RIGHTS FOR MINOR USERS (CALIFORNIA BUSINESS AND PROFESSIONS CODE SECTION 22581)

LINKS TO OTHER WEBSITES

CHANGES TO THIS PRIVACY POLICY

CONTACT US