Natural Language Processing in Python – Tomasz Kuczmarski, STX Summit, Poland

Home / Data Conferences / Natural Language Processing in Python – Tomasz Kuczmarski, STX Summit, Poland

STX Next are predominantly a Python shop based out of Poland that offer superhero like problem solving capability with the premise that clients get full live access to the scrum board and expertise in assembling a fully Agile team in 21 days.

In March they invited more than 200 guests to attend the first STX Summit for dinner and talks on some of the cool things that they have been doing.

Python Developer Tomasz Kuczmarski kicked off the talks with natural language processing in Python and a bit of machine learning outlining the tools which are needed with an introduction to the packages and libraries.

Kuczmarski recommends downloading Anaconda which is a free distribution of Python and includes about 200 of the most popular Python packages for science, math, engineering and data analysis. The packages which are needed for natural language processing are NumPy, SciPy, Matplotlib, scikit-learn and NLTK which are all contained in Anaconda.

NLTK has the cool tools for doing all the tasks in natural language processing, so classification, stemming, tagging, you name it..  and it’s also got tools for doing discourse analysis so basically if you use NLTK you can cover all the tasks in constructing a fully functioning dialogue system.

For machine learning, Kuczmarski gives a revision lesson on supervised and unsupervised learning from data and explains how to scikit-learn can be used for text classification using a Naive Bayes classifier through first data loading, then feature extraction and classifier training with a gives a slide show on how it is done.

The talk also looks at using NLTK for discourse analysis where language threads are able to be interpreted in different ways depending on context and tone and in his example Kuczmarski shows how NLTK is able to use it’s toolset to reason which outputs of interpretation of sentences are likely to be true.


Related Posts