I’d like to explain why I feel that the Semantic Web is a great solution for helping professional contractors and freelancers find work,and for helping companies source talent as well. I’m making assumptions in line with my experiences in the IT sector,but any solution this different technologically will benefit from being re-usable across sectors and I plan to touch on this too.
At the moment we freelancers spam agencies with Word documents that get indexed for keywords and they spam us whenever a job comes in with the same keywords on it. I get no visibility or acknowledgement that data is updated and agencies continue to spam out roles apparently based on out-of-date home address and rate expectations. There seems to be no systematic way for me to express that I am or am not looking for roles or that I dislike being interrupted while on site.
It just doesn’t seem to work!
For example,my CV mentions the challenging T-SQL and PL/SQL work I did on a data warehousing project at Virgin Mobile using data taken out of SingleView and Strategix,and I get regularly spammed by people looking for “SingleView developers” which as far as I’m aware is actually a UNIX scripting role,if not a role involving a proprietary technology stack.
What if a HR guy at telecommunications company wants an SQL developer with experience in is sector? The relevant keywords are “Virgin Mobile”which should be strong match (a telecommunications company),“SingleView”a strong match (a telecommunications billing platform). Other words like “Data Warehousing”,“T-SQL”and “PL/SQL”are unequivocal,but notice I do not claim to be a data warehousing expert,I merely worked on a project of that variety. On its own,that should be a weak match based on the type of relationship between the the abstract concept and my career.
Likewise there are problems of “SQL”appearing in the job spec and “T-SQL”and “PL/SQL”on the CV which should probably still be strong matches despite being less similar in terms of matching tokens. Different ways of breaking up words automatically will lead to different answers,which does not inspire confidence that a keyword driven solution can scale while retaining full accuracy.
An Information Retrieval solution that does not understand the difference between a skill,and a project goal and which is not combined with an accurate taxonomy of companies,technologies,and products is not going to rank matches as well as one that does. People’s lives are too interesting and varied to be captured by one consistent schema. For instance,consider medicine –an entirely different taxonomy of specialist skills,not project based like freelancing and therefore involving different relationships between concepts. This makes the problem essentially semantic,and on a large scale as well since there are many jobs and candidates in the economy.
I’ve just finished reading an article describing exactly this kind of IR solution. It is not fundamentally a Semantic Web problem merely a semantic one,but a IR approach that is both schema agnostic and graph-based is undoubtedly suited to searching career data across sectors.
There is no evidence in the spam I get that these problems are solved. Let alone the problems of owning and controlling my own data and controlling which agencies I want to trade with.
Then of course,Virgin Mobile no longer exists,which is a clue that it is less relevant and an opportunity for an incomplete taxonomy to do financial damage.
These last requirements make the problem an obviously web problem. A problem screaming for Linked Data.
By retaining control of my CV at its own URI would not need to keep spamming agencies,instead,they come to me via HTTP to where my data is to get the latest version. If I deny an agency access to my career progress –as I would if I no longer wished to trade with them –then even if they copied the data without authorisation I’m going to be decreasingly likely to get spammed simply because my data is old. As each agency tries to update and gets the bad news cost incentives will help to ensure they comply and delete cached data.
I’m assuming an ecosystem of tool vendors and community sites like LinkedIn or the sites of professional associations such as the Professional Contractors Group or the ACM would have a role in providing tools for me to put my CV on-line semantically and would offer search interfaces to agencies according to my preferences. I would have no problem with the ACM gaining profit from doing so and it is natural I tell them about my skills and interests. SPARQL endpoints are a natural publishing medium for data at this scale,and with what I imagine would be a high level of heterogeneity you would want to use something like Pellet to drive federated search while supporting reasoning and custom rules to add value.
Likewise where product taxonomies are owned by trusted companies such as Oracle and Microsoft they can be kept up to date by the firm,as well as being expanded by other companies and organisations such as sourceforge,freshmeat,codehaus,or dbpedia in a web like way. In particular,while project hosting sites can add coverage for the projects they host,sites like dbpedia that are more general can expand the schema with arbitrary information. If the taxonomies are expressed in a common language then unplugging technology taxonomies and replacing medical taxonomies would be straightforward.
There is a similar opportunity for technology vendors to step in and deny taxonomy data to agencies that have angered their developer community,and likewise for tool vendors and intermediaries (the PCG for example,or unions) to deny access to candidate data that they host for members.
The combined social influence of the data owners and communities,the increase in IR accuracy and distributed data maintenance would improve the work seeking and worker sourcing experience. Those seeking workers would have fewer weak or false results to contact and seekers benefit from receiving less spam.
Agencies –especially bad agencies –are the obvious losers as they would be automated away by tools,and pressured to avoid the spam they rely on today. Good agencies would be able to compete by developing better ways to deal with heterogeneity,superior match making,by providing better billing services,favourable contracts,better transparency,and helping to sell opportunities to workers –i.e. to do more of what they should be doing now.
So,I hope I have explained the current problems,why solutions are necessarily semantic in a graph-oriented model,and why economically efficient and pleasant solutions are necessarily web-like,as well as the social and economic pressures that would apply to improve service quality.
Recent Comments