Data Science Central

Data Science Central

Co-founded by Vincent Granville and part of the DSC community, our focus is on data science, ML, AI,

30 Python Libraries that I Often Use - DataScienceCentral.com 18/02/2024

30 Python Libraries that I Often Use https://mltblog.com/3ONhMWi

This list covers well-known as well as specialized libraries that I use rather frequently. Applications include GenAI, data animations, LLM, synthetic data generation and evaluation, ML optimization, scientific computing, statistics, web crawling, APIs, SQL, and more. I also mention my owns, and issues that I faced with standard libraries. In several instances, for instance sound generation, I did not use any library. In addition, included some functions that I regularly call. Many times, I explain why I had to create my home-made versions.

30 Python Libraries that I Often Use - DataScienceCentral.com 30 Python libraries to solve most AI problems, including GenAI, data videos, synthetization, model evaluation, computer vision and more.

15/02/2024

Gemini Ultra Unleashed: Google's Best LLM Now Available https://mltblog.com/3SBZzMz

A lot has changed for the better since the first announcement not long ago.

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Live demo and code-sharing session to see Gemini Ultra in action. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

Probabilistic ANN: The Swiss Army Knife of GenAI - Machine Learning Techniques 11/02/2024

Probabilistic ANN: The Swiss Army Knife of GenAI https://mltblog.com/48hQWfY

ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, […]

Probabilistic ANN: The Swiss Army Knife of GenAI - Machine Learning Techniques ANN -- Approximate Nearest Neighbors -- is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time se...

10/02/2024

Actions in GPTs: Developer Tips, Tricks & Techniques https://mltblog.com/3utzlDZ

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

How to Automate Data Cleaning, in a Nutshell - DataScienceCentral.com 07/02/2024

How to Automate Data Cleaning, in a Nutshell

How to Automate Data Cleaning, in a Nutshell - DataScienceCentral.com Issues and solutions to automate data cleaning. Free your data scientists from the most boring tasks, making them happier and reducing costs.

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning - Machine Learning Techniques 06/02/2024

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

Massively Speed-Up your Learning Algorithm, with Stochastic Thinning - Machine Learning Techniques Dramatically Speed-Up your Learning Algorithm, with Stochastic Thinning. Includes use case, Python code, regression and neural network illustrations.

More Fun Math Problems for Machine Learning Practitioners - DataScienceCentral.com 06/02/2024

More Fun Math Problems for Machine Learning Practitioners

More Fun Math Problems for Machine Learning Practitioners - DataScienceCentral.com This is part of a series featuring the following aspects of machine learning: Mathematics, simulations, benchmarking algorithms based on synthetic data (in short, experimental data science) Opinions, for instance about the value of a PhD in our field, or the use of some techniques Methods, principle...

NoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco - Machine Learning Techniques 05/02/2024

Better, Faster, Less Expensive Synthetic Data Without Deep Learning

NoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco - Machine Learning Techniques My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistent...

05/02/2024

AI-based Object/Image Detection for Inventory Management https://mltblog.com/3SMRJRC

Hands-on workshop for developers and AI professionals, on state-of-the-art technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

This is one of the AI applications where many compagnies recognize the value and are ready to invest, with guaranteed return thanks to low costs, proven technology, and automation.

Many of the requests we get from potential enterprise clients - even brick and mortar companies - are actually focused on this topic: automated classification and management of inventory or digital content, with an interest in automated image labeling and classification, as well as creating document taxonomies and better search tools (sometimes with automated data analysis) to help internal customers quickly find what they need.

GenAI Breakthrough Fast, High Quality Tabular Data Synthetization 05/02/2024

NoGAN: Ultrafast Data Synthesizer and New Evaluation Metric - My Presentation at ODSC San Francisco

GenAI Breakthrough Fast, High Quality Tabular Data Synthetization Our presentation/workshop about NoGAN at ODSC San Francisco, October 2023. Runs 1000x faster than GAN, consistently delivering better results according to th...

The Riemann Hypothesis in One Picture - DataScienceCentral.com 05/02/2024

The Riemann Hypothesis in One Picture

The Riemann Hypothesis in One Picture - DataScienceCentral.com With visual, simple, intuitive method for supervised classification

Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations - DataScienceCentral.com 04/02/2024

Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations

Simple Introduction to Public-Key Cryptography and Cryptanalysis: Illustration with Random Permutations - DataScienceCentral.com In this article, I illustrate the concept of asymmetric key with a simple example. Rather than discussing algorithms such as RSA, (still widely used, for instance to set up a secure website) I focus on a system easier to understand, based on random permutations. I discuss how to generate these rando...

03/02/2024

GenAI: Fast Vector Search at Scale (Demo on AWS)

Register at https://mltblog.com/3UGF0l5.

ANN stands for Approximate Nearest Neighbors, a faster yet high-quality alternative to exact but slow KNN, for vector search in GenAI contexts (LLM, GPT, multimodal, and so on). My team is actually developing proprietary technology on this topic, with paper coming soon. In the meanwhile, if you want to see real enterprise case studies, and an existing fully scaled algorithm in action, this hands-on workshop is for you.

Intended to developers and AI professionals, featuring state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

Synthetizing the Insurance Dataset Using Copulas: Towards Better Synthetization - Machine Learning Techniques 02/02/2024

Synthetizing the Insurance Dataset Using Copulas - Towards Better Synthetization

Synthetizing the Insurance Dataset Using Copulas: Towards Better Synthetization - Machine Learning Techniques This article is an extract from my book “Synthetic Data and Generative AI”, available here. In the context of synthetic data generation, I've been asked a few times to provide a case study focusing on real-life tabular data used in the finance or health industry. Here we go: this article fills t...

A Simple Regression Problem - DataScienceCentral.com 02/02/2024

A Simple Regression Problem

A Simple Regression Problem - DataScienceCentral.com This article is part of a new series featuring problems with solution, to help you hone your machine learning and pattern recognition skills. Try to solve this problem by yourself first, before looking at the solution. Today’s problem also has an intriguing mathematical appeal and solution: this a...

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices - Machine Learning Techniques 01/02/2024

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices

Generative AI: Synthetic Data Vendor Comparison and Benchmarking Best Practices - Machine Learning Techniques The goal of data synthetization is to produce artificial data that mimics the patterns and features present in existing, real data. Many generation methods and evaluation techniques are available, depending on purposes, the type of data, and the application field. Everyone is familiar with synthetic...

New Book: Intuitive Machine Learning - DataScienceCentral.com 01/02/2024

Book: Intuitive Machine Learning and Explainable AI

New Book: Intuitive Machine Learning - DataScienceCentral.com Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications.

Machine Learning Cloud Regression: The Swiss Army Knife of Optimization - Machine Learning Techniques 31/01/2024

Machine Learning Cloud Regression: The Swiss Army Knife of Optimization

Machine Learning Cloud Regression: The Swiss Army Knife of Optimization - Machine Learning Techniques Entitled “Machine Learning Cloud Regression: The Swiss Army Knife of Optimization”, the full version in PDF format is accessible in the “Free Books and Articles” section, here. Also discussed in details with Python code in chapter 1 in my book “Intuitive Machine Learning and Explainable AI...

Better LLMs with Shorter Embeddings: Part 3 - DataScienceCentral.com 31/01/2024

Better LLMs with Shorter Embeddings: Part 3 https://mltblog.com/3HGj6Xi

Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs

Better LLMs with Shorter Embeddings: Part 3 - DataScienceCentral.com Variable Length Embeddings and fast ANN-like search (approximated nearest neighbors) for better, lighter and less expensive LLMs

18 Differences Between Good and Great Data Scientists - DataScienceCentral.com 31/01/2024

18 Differences Between Good and Great Data Scientists

18 Differences Between Good and Great Data Scientists - DataScienceCentral.com machine learning, data science career, business analytics, data science lifecycle, data visualizations

How to Choose the Best Machine Learning Technique: Comparison Table - DataScienceCentral.com 30/01/2024

How to Choose the Best Machine Learning Technique: Comparison Table

How to Choose the Best Machine Learning Technique: Comparison Table - DataScienceCentral.com

30/01/2024

Creating Embeddings on Large, Real-Time Data with OpenAI https://mltblog.com/3SiMGXF

Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

I recently showed how to optimize embeddings and RAG architecture in LLMs and GPT-like applications, with home-made systems. This webinar discusses a real business case, with much larger input data in real time, using efficient tools. Embeddings is the central piece.

New Python Library to Evaluate AI-generated Data and Compare Models - Machine Learning Techniques 30/01/2024

New Python Library to Evaluate AI-generated Data and Compare Models

New Python Library to Evaluate AI-generated Data and Compare Models - Machine Learning Techniques Called GenAI-Evalution, you use it for instance to assess the quality of tabular synthetic data. In this case, it measures how faithfully the synthetization mimics the real data it is derived from, by comparing the full joint empirical distributions (ECDF) attached to the two datasets. It works both...

A Synthetic Stock Exchange Played with Real Money - Machine Learning Techniques 29/01/2024

A Synthetic Stock Exchange Played with Real Money. Includes Python code dealing with gigantic numbers using exact arithmetic.

A Synthetic Stock Exchange Played with Real Money - Machine Learning Techniques Not only that, but you can predict -- more precisely compute with absolute certainty -- what the value of any stock will be tomorrow. Transaction fees are well below 0.05% and the market, at least in the version presented here, is fair: in other words, a zero-sum game if you play by luck. If instead

GitHub - VincentGranville/Point-Processes: This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. 29/01/2024

Python Code and Material from the Book "Stochastic Processes and Simulations" - GitHub Repository

GitHub - VincentGranville/Point-Processes: This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. This repository contains the material (datasets, code, videos, spreadsheets) related to my book Stochastic Processes and Simulations - A Machine Learning Perspective. - GitHub - VincentGranville/Po...

An Intriguing Job Interview Question for AI/ML Professionals - DataScienceCentral.com 29/01/2024

An Intriguing Job Interview Question for AI/ML Professionals

An Intriguing Job Interview Question for AI/ML Professionals - DataScienceCentral.com Intriguing technical job interview questions for candidates applying to machine learning and AI jobs, with 4 difficulty levels.

New Book: Intuitive Machine Learning and Explainable AI - Machine Learning Techniques 28/01/2024

Book: Interpretable Machine Learning

New Book: Intuitive Machine Learning and Explainable AI - Machine Learning Techniques Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications. By Vincent Granville Ph.D, published in September 2022. PDF format, 156 pages. Version 1.0 with Python code. The book is available here. For my upcoming course based on thi...

27/01/2024

Build Document/Image Analytics with GPT-4 Vision https://mltblog.com/48Odh69

Showcasing a conceptual application demo that can analyze insurance claims data, interpret PDF documents and photos of car accidents to infer damage types and estimate payouts.

Hands-on workshop for developers and AI professionals, on state-of-the-art GenAI technology. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness - Machine Learning Techniques 27/01/2024

New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness

New GenAI Evaluation Metric, Ultrafast Search, and Perfect Randomness - Machine Learning Techniques This article covers three different GenAI topics. First, I introduce one of the best random number generators (PRNG) with infinite period. Then I show how to evaluate the synthesized numbers using the full multivariate empirical distribution (same as KS that I used for NoGAN evaluation), but this ti...

Stochastic Processes, 2nd Edition, now with Python Code - Machine Learning Techniques 26/01/2024

My Book on Poisson-binomial Stochastic Processes and Simulations

Stochastic Processes, 2nd Edition, now with Python Code - Machine Learning Techniques The book covers supervised classification, including fractal classification, as well as unsupervised clustering, using an innovative approach. Datasets are first mapped onto an image, then processed using image filtering techniques. I discuss the analogy with neural networks, comparing very deep but...