Artificial intelligence (AI) is now everywhere in Data Management, BI, and Data Science software, according to Mike Ferguson, Managing Director of Intelligent Business Strategies. The AI field is still young and will continue to get better as increased adoption of AI enables data and analytics software to predict, automate, and optimize, thus shortening time to value. Ferguson spoke at DATAVERSITY® Enterprise Data World Conference about the use of AI in Data Management and analytics.
AI Impacts in Data Management and Analytics
Ferguson explained that AI is currently being used to provide three capabilities: prediction, automation, and optimization.
GET UNLIMITED ACCESS TO 160+ ONLINE COURSES
Choose from a wide range of on-demand Data Management courses and comprehensive training programs with our premium subscription.
- Prediction: AI can predict the data needed to get a more accurate machine learning model, or find fraudulent queries coming from an unauthorized source for SQL injection
- Automation: AI canincrease speed, prevent the need for manual work, and with laborious tasks it can save time
- Optimization: AI provides new ways to improve and apply best practices
How AI Is Being Used
- Reduction of costs due to better use of Data Management and analytics infrastructure
- Improved database query optimization
- Automated data and domain discovery across a distributed data landscape
- Assisted data stewardship for smarter Data Governance
- Self-healing data
- Recommendations for better machine learning model development
- Assisted self-service data prep
- Query recommendations
AI in the Database
“Database management system vendors are now deploying artificial intelligence, particularly machine learning, into the database itself,” he said. Diagnosis, monitoring, alerting, and protecting the database can now be done automatically by the software.
Self-configuring databases have the ability to manage modeling, scheduling, patching, upgrading, and predicting. Self-optimization can understand queries, find the best way to execute or rewrite a query, the best execution engine to use, and the best infrastructure setup to execute. A self-healing database can automatically fix itself, stay alive, and stay highly available without any need for human intervention.
AI can monitor query logs and identify anomalies in queries that could indicate SQL injection or fraudulent queries coming in, and can potentially block those. AI can make sure that any personally identifiable data is automatically masked, so that there’s no inadvertent disclosure of people’s identity.
AI can be used to predict resource usage ahead of time, Ferguson said. Using training models, AI can make automatic decisions around resource use on behalf of administrators, without necessarily consulting them.
AI in the Data Catalogs
“This is a pretty crowded market right now,” he said. Pre-trained models are being shipped with a data catalog to automate the discovery and classifications of data and relationships detection of PII, lineage detection, non-compliance, and the inference of logical entities across a distributed data landscape. Training models can be linked to terms in a business glossary to automate mapping.
Ferguson sees the possibility to go further than just automatically detecting and discovering data — to discover what it means if there’s no metadata. It’s also possible to link models that come shipped in software and automatically match them to common definitions in the glossary, even across a distributed landscape, he said.
AI in Data Governance
Ferguson said that 12 years ago most of the discussion around Data Governance was purely about Data Quality. Now the conversation has expanded into other disciplines such as data access security, data privacy, data lifecycle management, and data lineage.
As a result, the use of machine learning has expanded into those areas as well. AI can help with data access protection using both unsupervised and supervised machine learning, recognizing abnormal data use that could indicate a potential threat, as well as automated outlier detection to prevent cyber threats. Security teams can now have eyes all across the enterprise, rather than expecting people to see everything, he said.
AI is also assisting with Master Data Management, helping data stewards detect poor quality data, which can be a “horrendously mundane task,” Ferguson said. Automatic data correction, the ability to make recommendations to stewards, or even completely automating tasks pave the way for self-learning. “This takes us into not just predictive or prescriptive, but a new generation of what’s called ‘reinforcement learning’ coming through now,” he said. Although it’s very early days for that field, it’s certainly one to watch.
ETL and Self-Service Data Preparation
AI is being used for anomaly detection, clustering values based on similarity, auto-correction, self-learning transformation, and for calling attention to recommended standardizations. It can also help users with self-service data using prompts to streamline and speed up the exploration process.
AI and Knowledge Graphs
There are an increasing number of data catalogs that store metadata in a graph database. AI can identify closely related data across a distributed landscape of sources and make relevant data recommendations. AI can do graph analytics on data that, if joined, could reveal identity, preventing potential privacy compliance issues. It can use graph analytics on a data set to show other data related to that set, enrich it, and make recommendations to make it a stronger dataset.
AI in BI Tools and Data Science
AI can improve productivity by assisting with self-service data preparation, report and dashboard development, visual analysis, helping with data interpretation during exploration, and with planning and forecasting. It can also be used to broaden the user base by simplifying the user interface using voice or Chatbot.
Users don’t need to be trained to use a BI tool, and can just ask questions and get into the back, Ferguson said. A user can say “explain” and it will generate text that explains the business implications of what they are seeing. “This is all done using natural language processing, or in this case, natural language generation,” he said.
Although AI is now being used in Data Management, BI, and Data Science software, reliance upon AI is still a new idea. Increased adoption of AI will enable data and analytics software to predict, automate, and optimize, all shortening time to value.
Ferguson believes that industry is just at the beginning of maximizing the use of AI. He’d like to see more automation in BI and Data Science using machine learning, automatic testing and selection of algorithms, and other areas. “It’s all pretty new,” he said, “but there are pretty exciting times ahead.”
Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.
Here is the video of the Enterprise Data World Presentation:
Image used under license from Shutterstock.com