Archive for the 'research' Category

Going multi-lingual

Its a hard problem and not many people are tacking it in the web development world. Heres some research into presenting content in different languages via HTML.

To frame the problem heres a good break down of the three technical issues:

There are three considerations for presenting HTML in non-English languages. First, that the document is delivered in the desired natural language (such as English, French, etc.) and dialect (US, British, etc.). Second, that the document is presented in the correct character set. This is a requirement for most Eastern languages (Russian, Japanese, etc.). Third, that the document is presented in the correct directionality. This is a consideration for languages such as Hebrew, Arabic, Japanese that are customarily written right-to-left or top-to-bottom.

Continue reading ‘Going multi-lingual’

A definition of Consciousness

Suppose ‘consciousness’ could be defined as the ability of a ‘being that can learn’ to understand how its self learns. Thus it then has to ‘decide’, a rudimentary idea in our perception of ‘consciousness’.

I think this explains to a degree then our ‘personality’ which we use to direct our experiences in the world and thus our learning. We, in a manner, feed our learning what it likes best: pleasurable experiences.

If this is true, what does it say about the ‘Turing test’ and there for the way forward for AI research.

Flash as a background and the object tag drama

The Mission

Use Flash as a background image.

The Problems

There are a multitude of problems with embedding Flash into valid mark-up. Basically:

  • You can’t use the tag now days and has been dropped in favor of the tag. You actually can use the embed tag but its not future compatible and your page won’t validate.
  • This presents problems because IE and Netscape/Firefox based browsers handle the object tag differently. If you manage to get a single object tag to load a flash movie in both browsers then IE seems to not stream the movie anymore.

Continue reading ‘Flash as a background and the object tag drama’

Notes for an MP3 manager

Ihave a large music collection in MP3 format. I’m having a hard time keeping it organised. iTunes wigs out after about 100 gig. Anyway I hate the way it handles compilations, or maybe I just hate the way I have no control over its filing method. Its also a pain syncing with others who also have a large collection.

Here are some notes on some features I’d like to see in MP3 collection manager software. I’m almost ready to write it myself so these might be the beginnings of a new venture.

  • Syncing collections! Choose a host and a target and list all the tracks not in the host collection. List which track are of a higher quality.
  • List missing tracks from incomplete albums.
  • Flag tracks as bad copy or corrupt so when syncing, will look for them and get them.
  • Integrate with MusicBrainz.org, freedb.org, discogs.com
  • Browse albums by cover
  • Cover art fetcher
  • album cover is stored once in the folder not in the tracks (optional?). Perhaps ether or transparently
  • When there is ambiguity as to which album a tracks belongs to, use the other tracks in the directory to guess i.e. try to minimise the number of albums for a give set of tracks.
  • Add extra metadata tagging such as ‘remix by’, more than one artist per track, tagging for genre and/or add a style like in discogs.com. Search, filter browse by these extra fields.
  • From this meta data change the tagging scheme used though out the library easily at will. make policies like: The Cure -> Cure, The, Dj -> DJ etc. Adapt from one library to another when syncing
  • Store ratings for tracks. Possible to download others ratings and comments from others in an XML format (FOAF?). Perhaps upload to last.fm or make my own site if they arn’t up for it?
  • Find people who have albums/tracks you want for trading (naughty)
  • Update iTunes when tags are edited i=in the library

Blog to log

Just sniffing around I can across:

  1. Reblog, filter and republish other peoples feeds on your own site. “Useful to individuals who want to maintain a weblog but prefer curating content to writing original posts”. This is a poor way of describing a Planet or Portal, but with more editorial control.
  2. TagCloud extracts keywords from a set of given RSS feeds and builds a ‘tag cloud’ (or a ‘keyword cloud’ technically).

Rise of the editor

Beyond the self publishing revolution to the rise of the editor: as more content comes online good editorial will be needed to sift though it. This role will become more and more important and sites like Slashdot might be knocked off the top of the ‘most read geek portal’ list.

I envisaged sites like Vibewire being a collection of blogs, ether hosted by them or where ever, that feed into the individual channels and the editors simply selecting from what is already published on these blogs. These channels in turn would feed into the main page.

Reblog gives anyone the power to be their own Slashdot or Vibewire.

Update: Looks like someone already has.

RSS needs Tags

TagCloud illustrates one of the main problems with RSS, and that is that it provides raw information without any means to contextualise it. With the recent rise of tags or Folksonomy (I that that word) feeds need to be extended to allow for keywords/tags to be provided with the feed. Then tools to map theses remote taxonomy’s to local ones would be needed but would provide a means to manage the large volumes of information effectively.

Update:
Seem someone is trying to put a name on this problem: Feed Overload Syndrome, and its solution ‘Meta-Feeds’ (I guess everyone is trying to be the first to name the next big fad). But lets break Mr Burnham down a bit:

Burnham reckons tagging is no good because it makes tag soap as every one has their own tags, which is OK and that’s what so called ‘folksonomy’ are about. Individually they are worth little but on mass these little bits add up to more than the expert made taxonomies which are worth a lot (if you know how to use them). The goal however is to map from whatever incoming taxonomy a local personal one which will ultimately have more meaning to the viewer (the advantage of folksonomys).

Burnhams solutions breaks down to what Reblog is doing:

…the posts are categorized and placed into a taxonomy using advanced statistical processes such as Bayesian analysis and natural language processing

So basically machine keyword scraping and the mythical ‘natural language processing’ (oh for a computer that can understand!). This is never going to be as valuable to the end user as human tagging and won’t map perfectly to an expert taxonomy. What we really want is some sort of collaborative filtering process that maps from one folksonomy/taxonomy to another based on trust networks that the end user subscribes to. As a result of this Peer to Peer Social Networking looks promising and more like a realistic solution, or even some sort of Google ranking based on community and author.

Burnham did get one thing right however, this mapping process will have to be external to the actual feeds, ether with a smart feedreader client that talks to a service/community to do the mapping.

Probems with Drupal and the way forward.

Drupal treats taxonomies like any other entity and so you can have as many as you like. You then associate them with module types and when someone creates an instance of a new module (Node) they are given the option to select which term(s) (i.e. category) they want to put the data in. Modules can have more than one taxonomy associated with it. In reality all data is treated the same and the taxonomies make the bumps in the landscape. we had the problem that you couldn’t associate one bit of content with another but someone wrote a handy module to do this. It basically allowed parent child relationships between the data of different module types i.e. so a an ‘Article’ about a course could have ‘Events’ listed with it to book those courses. The interface became unintuitive because the admin had to make then both separately and then separately make the associating between the course description and the booking listings.

I was thinking that you can globally use a flat Tag like taxonomy for organiising data and have small fixed hierarchies for building composate data types e.g. if you want to do a booking system you could have a ‘Gig’ which can be made up of three smaller modules ‘Address’, ‘Event’, ‘Price Tag’ (for buying tickets). This would be a small hierarchy but grouped as one entity and perhaps appear on the site under ‘Festival listings’ and also with the band when you look at them. Parts of the Gig could be used separately e.g. the Event which has a Date could be used in a Calendar of what’s on in the Festival, and the Price Tag might appear in the shop with other band merchandise.

I’m thinking this mini hierarchy thing will present one face for entering the data for a ‘Gig’ to the three ’sub modules’ of Gig, ether as a series of forms or some how as one form (avoiding conflicts with field names etc) and processed as one.

The way that the mini hierarchy can be put together using Tags I’m thinking is that they will all be given Node ID’s (all module instances will have one) and have the same Tags as the ‘Gig’ is given but the sub modules will automatically be give the ‘Gig’ node ID as a tag, so when you view the ‘Gig’ anything with the Gigs Node ID as a Tag will appear with it. The only problem this leaves is ordering of content on the screen which is no trivial matter and a problem that Drupal had too but I’m working on an answer.

There is no shelf

Shirky: Ontology is Overrated — Categories, Links, and Tags

This article looks at ontology’s and compares traditional predefined fixed expert ontology’s with the current web trend of individually defined organically growing tagging of web content post-publishing that is becoming every more popular on the web.

It reviews the traditional methods and looks at how they are based on library’s who’s systems where designed to find a book on a shelf so a thus it had to have only one place in the catalog system.

The essence of a book isn’t the ideas it contains. The essence of a book is “book.” Thinking that library catelogs exist to organize concepts confuses the container for the thing contained.

The article basically compares hierarchical systems with a flat unstructured one. The realisation that:

One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: mind reading, and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future.

It ends with an analysis of del.icio.us tagging and provides some statistical views of it.

My ideas from this

  • Perhaps you can get the users on a community driven site to classify the content on that site. If uses are offered a system to mark articles (items) as ‘favourites/bookmarks’ they could be also offered a option to tag add tags so they can find it again. These tags could be globally pooled and a dynamic thesaurus could be generated between like terms/tags.
  • Music managers, such as iTunes or Winamp, should allow a tag type labelling for genre. Disocogs handles this well but having ‘Genre’ with a top level type label and then ‘Style’ which can be a list of styles an album falls into. Better would be to have this per song as often tracks on the same album will be in different styles.
  • the semantics here are in the users, not in the system. This is not a way to get computers to understand things…The tag overlap is in the system, but the tag semantics are in the users. This is not a way to inject linguistic meaning into the machine.

    I disagree here. I believe that ‘meaning’ of words is individual to everyone and is based on the examples of our own experience. del.icio.us is providing individual opinions of word meanings/groupings and giving an example (the URL). I think together this is closer to how our brains work than anything else. It might be argued that del.icio.us doesn’t understand this information, but what is understanding it? Because del.icio.us doesn’t have a mean to express its ‘understanding’ how can we say it doesn’t?

    Within its domain and only mode of expression (i.e. recommending tags based on a given one) then it does understand because it can equate tags that are similar. If this is not understanding then what is? I mean isn’t this what we do when we are asked what a ‘dog’ is? Don’t we recall our experiences of ‘dog’ to create an ‘understanding’ in our minds which is converted to words to relate this. Words that come together to form related meaning, as a dictionary uses words to describe words.

    I think in this way Google is in fact intelligent as it is like a concept dictionary of all the information online. ‘Intelligent’ within its domain and mode of communication.

Assessment of Foreign Language Instructional Software

This is a summary of the article titled ‘Criteria for the Assessment of Foreign Language Instructional Software and Web Sites‘.

This article develops standards ‘for assessing language-learning software and Web sites’. It gives examples of assessments using these standards of all the note worth language packages/websites for the learning of Russian.

A note on general language acquisition:

research in second language acquisition shows us that learners need to have good, authentic input—listening to and reading comprehensible texts—and many opportunities to practice speaking by using the language to negotiate meaning in situations that resemble culturally authentic communicative contexts.

Three criteria for effective learning software are stated to be:

  1. More of the students will reach higher proficiency levels in one or another modality in the same amount of time. (This is a cognitive goal.)
  2. More of the students will be sufficiently engaged and energized in the learning process to want to continue for a longer period of time. Students will thus attain higher proficiency levels in one or more modalities than they would have if they had stopped the learning process earlier. (This is an affective goal leading indirectly to a cognitive goal.)
  3. More of the students will be able to organize their studies and thus achieve better learning outcomes. (This is a metacognitive goal leading indirectly to a cognitive goal. Software and Web sites that meet this criterion usually put at the learners’ disposal resources that might otherwise not be available or as accessible, such as online dictionaries and strategy tutorials.)

Also stipulated are:

five characteristics of pedagogical design for multimedia applications or classroom lessons:

  1. Learners must know what they are expected to do and what goals they are expected to achieve by completing the task.
  2. Learners must be adequately prepared to begin the task.
  3. Learners must be adequately trained to complete the task.
  4. Learners must be adequately tested to assess their completion of the task.
  5. Learners must be given opportunities to expand their learning beyond the task.

Finally he offers ‘two caveats’ for software designers in addition to the above which are:

  1. the integration of image and sound
  2. the power, given to the learner, to move back and forth through the lesson as he or she sees fit.

Sound advice.

Pedagogy and Customized Curriculum

The hope here is to produce some software that can be used as a learning aid that can not only track the students progress but adapt to it and augment it. Ideally such a method could be applied to subjects other than language acquisition providing students with a completely personalised teacher that uses continual testing to provide feedback and direction.
Continue reading ‘Pedagogy and Customized Curriculum’