We use cookies to give you the best personal experience on our website. If you continue to use our site without changing your cookie settings, you agree we may place these cookies on your device. You can change your cookie settings at any time but if you do , you may lose some functionality on our website . More information can be found in our privacy policy.
Please provide more information.
Stylus no longer supports Internet Explorer 7, 8 or 9. Please upgrade to IE 11, Chrome, Safari, Firefox or Edge. This will ensure you have the best possible experience on the site.
Brief Published: 18 Apr 2018

Alibaba Adds Visual Recognition To Smart Speaker


Visual recognition is the big add-on feature for AliGenie 2.0, launched by China’s Alibaba Group in March.

AliGenie is the artificial intelligence (AI) platform powering Tmall Genie, a smart speaker where consumers can voice order items from Alibaba’s Tmall online retail platform (500 million active users per month).

AliGenie 2.0 is arguably the most sophisticated smart speaker currently available in the world. Children can scan covers of more than 100 books and have stories read to them. The elderly, or people with visual impairments, can scan 40,000 medicines for accurate identification. The device also recognises certain flash cards, which helps language learners read Chinese characters.

Users begin by downloading the Genie FireEye app. Then, they attach their smartphones to a phone holder called the XHolder, which is connected to Tmall Genie. Two million units have been sold since the holder launched in mid-2017.

An emotional connection to users is promoted through an activation screen enabling the device to engage with users through touch. There’s a suite of more than 20 simulated, animated facial expressions. For example, tickling the head of Tmall Cat when it appears on the screen will cause it to giggle and purr.

AliGenie 2.0 was developed by Alibaba’s AI Labs, which focuses on theoretical research and product commercialisation in areas such as speech recognition, natural language processing, deep learning and voice identification.

For more on voice-controlled retail, see Reflexive Retail, part of our Liquid Retail Industry Trend.