Data Science Digest 2

In case you missed them the first time, be sure to bookmark these articles!

Title: The 5 Clustering Algorithms Data Scientists Need to Know

Author: George Seif
Source: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
How: Step-by-step instructions for each: K-Means/ K-Medians, Mean-Shift, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM), Agglomerative Hierarchical Clustering (as well as Hierarchical Agglomerative Clustering or HAC
When to use this: For Machine Learning projects to gain initial insights based on similar/ dissimilar features or properties
Why it’s helpful: This article summarizes the pros/ cons of each approach, provides step-by-step instructions and an animated example of each in action.
Suggested application: When you need quick processing, need to get insight into how to approach further analysis.
Business impact or insights to be gained: Understanding the advantages and disadvantages of each approach can have significant impacts on your final analysis. This cheat sheet helps you quickly make a decision about which to use in which circumstance, based on your data and resource options.

Title: Data Extraction from PDF Tools: Tabula vs ByteScout PDF Multitool

Author: Tirthajyoti Sarkar, ON Semiconductor
Source: https://bytescout.com/blog/2015/10/data-extraction-from-pdf-tools-tabula.html
How: Download app from either supplier
When to use this: When you need to extract data from a PDF file
Why it’s helpful: Copying and pasting data from a PDF or rekeying information is frequently impractical, time consuming and prone to errors.
Suggested application: With Tabula, it only works with originally electronic data. ByteScout leverages OCT to also work with scanned data, and has additional export options, including XML, Text and CSV. Tabula is free, ByteScout has a free trial.
Business impact or insights to be gained: Data sources aren’t always formatted in a readily usable format, especially when looking at publicly accessible competitor data. These tools can help accelerate speed to analysis, which can support gaining a competitive edge in your industry.

Title: Designing conversational experiences with sentiment analysis in Amazon Lex

Author: Anubhav Mishra and Kevin Choh
Source: https://aws.amazon.com/blogs/machine-learning/designing-conversational-experiences-with-sentiment-analysis-in-amazon-lex/
How: Use the Amazon Lex console and follow the step-by-step process to build your bot; be sure to designate “yes” under Settings/ General/ Sentiment Analysis
When to use this: For creating bots which incorporate sentiment analysis natively versus using a custom integration and/or API
Why it’s helpful: Create a more positive customer or client interaction by acknowledging emotion behind comments and performance failures.
Suggested application: eCommerce or retail feedback, acknowledging delays in shipping, poor service, etc., building into workflows when an interaction needs to be handed off to a human.
Business impact or insights to be gained: With a pay for use fee, no development or infrastructure costs, any business can leverage the Amazon Alexa Automatic speech recognition (ASR), Natural Language Understanding (NLU) and deep learning technologies.

Bonus article: Trying to explain Data Governance to leaders in your organization? Wondering where your company stacks up against others? This reference can help you with some stats, some definitions and some best practices to help make the argument in your place of work. https://bi-survey.com/data-governance

Subscribe to get more tips and references. Have an article you’d like featured? Send us a note at Contact Us.

See what others are saying