To follow up on my article on the management of character strings, here is a first part which will allow us to have a progressive approach to the processing of this type of data. Far from any semantic approach (which will be the subject of a later post) we will discuss here the technique of bags of words
If you want to have an analytical approach to your data, you have of course been faced with the difficulty of using character strings. So much so that very often you have certainly had to put some aside. Lack of tools, complexity of managing complex semantics … In this article (first in a series) we will tackle these problems and especially see how to solve them.