Tuesday, 16 January 2018

Python - HTML to Text for sending SMS - SMS Safe characters - remove \xa0

Following code are some of the alternatives for removing special characters from string:
from bs4 import BeautifulSoup

raw_html = '

Dear Parent, 

This is a test message, kindly ignore it. 

Thanks

' clean_text = BeautifulSoup(raw_html, "lxml").text print clean_text #u'Dear Parent,\xa0This is a test message,\xa0kindly ignore it.\xa0Thanks'
The above code produces these characters \xa0 in the string. To remove them properly, we can use two ways. The first one is BeautifulSoup's get_text method with strip argument as True
clean_text = BeautifulSoup(raw_html, "lxml").get_text(strip=True)

print clean_text
# Dear Parent,This is a test message,kindly ignore it.Thanks

The other option is to use python's library unicodedata
import unicodedata

clean_text = BeautifulSoup(raw_html, "lxml").text
print clean_text
#u'Dear Parent,\xa0This is a test message,\xa0kindly ignore it.\xa0Thanks'

new_str = unicodedata.normalize("NFKD",clean_text)
print new_str
# u'Dear Parent,This is a test message,kindly ignore it.Thanks'

1 comment:

  1. I did everything like you wrote, and really, everything works just amazing! So now I can easily send any message to my employees without any extra efforts and nerves. Also I use group texting for business, check over here if you want to learn how do to it too, its very helpful thing for business.

    ReplyDelete