Featured Articles

TSMC: Volume production of 16nm FinFET in 2H 2015

TSMC: Volume production of 16nm FinFET in 2H 2015

TSMC has announced that it will begin volume production of 16nm FinFET products in the second half of 2015, in late…

More...
AMD misses earnings targets, announces layoffs

AMD misses earnings targets, announces layoffs

AMD has missed earnings targets and is planning a substantial job cuts. The company reported quarterly earnings yesterday and the street is…

More...
Did Google botch the Nexus 6 and Nexus 9?

Did Google botch the Nexus 6 and Nexus 9?

As expected, Google has finally released the eagerly awaited Nexus 6 phablet and its first 64-bit device, the Nexus 9 tablet.

More...
Gainward GTX 970 Phantom previewed

Gainward GTX 970 Phantom previewed

Nvidia has released two new graphics cards based on its latest Maxwell GPU architecture. The Geforce GTX 970 and Geforce GTX…

More...
EVGA GTX 970 SC ACX 2.0 reviewed

EVGA GTX 970 SC ACX 2.0 reviewed

Nvidia has released two new graphics cards based on its latest Maxwell GPU architecture. The Geforce GTX 970 and Geforce GTX…

More...
Frontpage Slideshow | Copyright © 2006-2010 orks, a business unit of Nuevvo Webware Ltd.
Wednesday, 23 November 2011 12:46

How to battle the Chinese Water army

Written by Nick Farell

y spam

Battle of the spammers

A Chinese bloke from Canada has been infiltrating the Chinese water army to find out what makes it tick.

Cheng Chen at the University of Victoria in Canada and a few of his mates had been investigating how the water army, which floods websites with spam works. He wanted to use what he learnt to spot paid posters automatically.

Soldiers in the Water Army are given a task to register on a website and then to start generating content in the form of posts, articles, links to websites and videos. The content is pre-prepared or the posters receive detailed instructions on the type of things they can say. The soldiers are monitored by a QA team who check that posts t a certain 'quality' threshold. A post would not be validated if it is deleted by the host.

Cheng studied the pattern of posts that appeared on a couple of big Chinese websites: Sina.com and Sohu.com and then he went through identifying those they believed were from paid posters and then set about looking for patterns in their behaviour that can differentiate them from legitimate users. Paid posters tend to post more new comments than replies to other comments. They also post more often with half posting every 2.5 minutes on average. They also move on from a discussion more quickly than legitimate users, dumping their IDs and never using them again. Content is different because they are paid by the volume and so often take shortcuts, cutting and pasting the same content many times.

Cheng built some software to look for repetitions and similarities in messages as well as the other behaviours they'd identified. They then tested it on the dataset they'd downloaded from Sina and Sohu and found it to be remarkably good, with an accuracy of 88 per cent in spotting paid posters." They think they have the basis of a software package that will weed out a significant fraction of paid posters.

More here.

Nick Farell

E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
blog comments powered by Disqus

 

Facebook activity

Latest Commented Articles

Recent Comments