At a recent presentation for a local post-secondary institution, I fielded a number of questions related to the use of language, primarily English language texts, as training data for generative AI. There were questions around cultural impacts and related ethical concerns. These queries were more nuanced than the usual ones I get around copyright or […]
How Reducing Bias in AI Models Boosts Success
Artificial intelligence (AI) has the potential to revolutionize industries and improve decision-making processes, but it is not without challenges. One challenge is how to address the issue of bias in AI models to ensure fairness, equity, and satisfying outcomes. AI bias can arise from various sources, including training data, algorithm design, and human influence during […]
What Is Naïve Bayes Classification and How Is It Used for Enterprise Analysis?
Click to learn more about author Kartik Patel. What Is Naïve Bayes Classification? Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is a supervised classification technique used to classify future objects by assigning class labels to instances/records using conditional probability. In supervised classification, training data is already labeled with […]
Is Your AI Model Leaking Intellectual Property?
Click to learn more about author Sameer Vadera. Businesses often employ AI in applications to unlock intelligent functionality like predicting relevant product recommendations for customers. Recently, businesses have started building AI-powered applications that provide predictive functionality using sensitive information — a significant benefit to users. For instance, today, there are AI applications trained using medical […]
Assessing the Risks and Challenges on the Road to Owning Training Data
Click to learn more about author Sameer Vadera. Artificial intelligence (AI) applications have an insatiable appetite for consuming data. Today’s AI models for business applications are built to ingest massive amounts of complex data sets. The cost of collecting and curating data for training AI models, however, can be staggering. In the context of the […]
Mindtech Introduce Chameleon Synthetic Data Generator and AI Tools
According to a recent press release, “Mindtech Global Ltd, a UK based start-up, has announced the availability of Mindtech Chameleon Simulator, creating synthetic vision datasets for training neural networks, and Mindtech Chameleon AI Tools, providing end to end data management for deep learning systems. Effective training of neural networks for visual processing requires very large […]
Innodata Launches Data Annotation and Labeling Solution
A recent press release reports, “Innodata Inc. today announced the launch of its expertly managed data annotation and labeling services to accelerate the creation of training data for customers in key industries such as financial services, legal, healthcare, and pharma. Data preparation and labeling is essential for training AI and machine learning models; it’s what […]
Fujitsu Develops Automatic Labeling Technology to Accelerate AI Use of Time-Series Data
According to a recent press release, “Fujitsu Laboratories Ltd. and Kumamoto University today announced the development of technology to easily create the training data necessary to apply AI to time-series data, such as those from accelerometers and gyroscopic sensors.Time-series data obtained from sensors does not include anything other than every-changing numerical data. Therefore, in order […]
Appen to Acquire Figure Eight to Create Solution for High-Quality Machine Learning Training Data
According to a recent press release, “Appen Limited, a global leader in the provision of high-quality, human-annotated datasets for machine learning and AI, today announced it has signed a definitive agreement to acquire Figure Eight, a best in class machine learning software platform which uses highly automated tools to transform unlabeled text, image, audio, and […]
Strategies for Acquiring High-Quality Training Data
by Angela Guess Moritz Mueller-Freitag recently wrote in Dataconomy, “Access to high-quality training data is critical for startups that use machine learning as the core technology of their business. While many algorithms and software tools are open sourced and shared across the research community, good datasets are usually proprietary and hard to build. Owning a […]