and you can look at the talk here
and you can look at the talk here
Well, of course I got delayed, but here are some samples of the output from the demo I presented on. The demo focused on highlighting the gradient intensity to determine the image focus and potentially identify blurred images. This method works well with an initial test image:
The second test image is also a success:
But the third image shows how much noise is picked up by this method, mostly because it’s a rough equivalent to the canny edge detector, but I forwent the initial Gaussian blur:
Overall, it serves better as a visual for the methods employed more than a decent heuristic to detect blur.
Well, this last Saturday was a blast, and even though my talk got moved, it was a pretty fun experience.
You can look at the slides from the talk here.
And you can download the demo here.
I’ll be doing a postmortem later this week, and review where the demo works, and where it doesn’t
Well, I’ve had an issue for a while regarding the speed of my downloads (I was getting around 200 kbps on all wireless downloads). Since I’ve had issues in the past with download speeds with dd-wrt as well, I decided to install the Tomato RAF firmware for my router. Now, the funny thing is that even though I changed the firmware for my router, the download speeds still hovered around 200 kbps. After googling for a bit I started trying to optimize the settings, and I found that the Tomato firmware went from 200 kbps to 15 Mbps after I did three things: I changed the Transmit Power to 70 mW, I changed the transmission rate from *auto to 54 Mbps, and after that I disabled QOS. It turns out (on my router at least) that the QOS is the biggest issue in regards to decreasing throughput on the router. Anyways, I had a hard time finding a guide to optimize the Tomato firmware, so I figured I’d put this up.
Alright, this is long overdue, but I’ve finally created a sample project for a bit of code I’ve found very useful over the course of the last two plus years. This code is a generic method of searching a sizable amount of JSON data on the client side. I’ve made many optimizations to this codebase over the years, but I figured that I should start at the beginning and share the ways that I’ve optimized this control. I think this should illustrate a common process for many of us, in that this project fulfilled a simple need originally, and then as the scope increased and changed, I had to expand upon the flexibility and extensibility of the control.
Originally, I was tasked with a simple side project of making a one off web app to search contact information of the company I was contracting for internally. The one strict requirement was that the search be full text search, such that if the text entered on the client side matched any portion in any position of the row, it would be displayed. I started with a normal database search, but when I saw the amount of data being searched, I wondered if it wouldn’t be prudent to constrain the app to the client, and just search and load the results as the user was typing.
I took a small amount of JSON data and made a prototype app to search it. When I saw the speed with witch it took the search code to find a match, I realized I had hit upon something. I expanded the amount of data, and saw that even with a full search with multiple tokens on the JSON dataset, the searching took milliseconds on the client side, with even modest requirements.
With the version that I am posting right now, the capability of the existing code to search client records takes a very negligible amount of time on around 5k records instantaneously. In the most recent incarnations, the size of the search-able JSON dataset is approaching 40k plus rows with 7+ columns. That seems like a very useful tool when you consider JSON data cached from external links. With URL rewriting and some other optimizations, I foresee the code being able to search tens of thousands and potentially hundreds of thousands of records near instantaneously on the client side.
Needless to say, this kind of technology could make sites like Google infinitely more accessible. An intuitive way of searching data instantly would be a boon for many users, especially once they grasp the capabilities of such a system. I have started writing a series of posts to come after this one on the various ways I have upgraded and extended the code. I hope to post them soon. Oh, also, within the next week I’ll have a demo area up too, that way you can see what the code is doing.
You can download the project here.
Request.ServerVariables["HTTP_HOST"] + Request.ServerVariables["URL"] Request.UserHostAddress Request.UserAgent Request.Browser.Browser Request.Browser.Crawler Request.UrlReferrer
which yields results resembling this:
Page Url: skynetsoftware.net/default.aspx
Host Address: (Some ip address)
User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; WOW64; Trident/4.0; Comcast Install 1.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET CLR 4.0.20506)
After this I just used my existing methods for serializing and reporting data and it was a snap. I now have a generic page for reporting this data, but i will write some LINQ stuff in the near future to give myself more detailed reports on pageview data. I’ll probably use canvas for visualization and do some other cool stuff. I’ve got a few ideas I’m rolling around, but for now I’m just satisfied that I have accurate data to analyze. In retrospect, I could contact Google and try to figure out why the data is inaccurate, but I feel slightly guilty requesting support for a feature I’m not paying for! 😉
This entry is related to my earlier post on modeling the human brain. I was reading a post on biomimicry, and it seemed to eerily reflect my statement that we do not know much about natural mechanisms. Granted, these are great examples of optimizations we have made from observing nature, but we still have a lot to learn and build on. Progress in this area excites me, but it also aggravates me to think of what we still don’t know…
According to Professor Henry Markram, a digital model of brain is merely 10 years away. I’ve read a few books ( The Age of Spiritual Machines and The Singularity is Near ) by Ray Kurzweil, and it looks like he was pretty close when he predicted the date of the first model of the human brain. If I recall correctly, Kurzweil estimated that according to Moore’s Law, the computing power required to run a simulation of a human brain would be achieved around the year 2020.
Of course I’m being optimistic in my reception of the news, but I can’t help but be excited. I’m receptive to the idea that once we can realistically scan the human brain and create a realistic simulation based on that data, we can start to analyze and replicate human consciousness from a bottom up approach. I realize that there is a lot more to creating a true AI than merely simulating neurons, but it’s a decent start on the path to a possible singularity. I realize that from a skeptic’s point of view, Moore’s Law is broken, but with advances being made in parallel programming, I believe that we can still achieve some phenomenal feats in regards to computing power. As Professor Markram could point out, as long as there are enough processors, a realistic number of neurons can be simulated.
The fact of the matter is that our knowledge of natural mechanisms is still very limited at this time. I’m sure that once we can model and analyze these mental processes, we can develop simpler algorithms to achieve the same behaviors. At that time, I expect us to make great inroads towards using subsets of these processes to accomplish tasks which seem impossible for computers to achieve now. Machine vision, fine motor skills, and many other domains would more than likely be drastically advanced using this technology. Again, the skeptic in me wants to debate the possibility of this actually occurring, but the optimist in me wants to realize the benefits this technology possibly offers. As with all matters this complex, there are many opinions, and only time will prove the correct parties right. Here’s hoping to an AI that passes the Turing test within my lifetime!
So before my blog even gets a decent following, I’m noticing spam. This is just a little information that caught my attention, so I figured I’d share it.
At 4:45PM on 7/6/2009, I got my second comment on my blog for a post (Behind The Blog: An Inside Look At What It Takes To Develop A Blog Engine From Scratch). I was exited until I looked at the text of the message:
“How soon will you update your blog? I’m interested in reading some more information on this issue.”
from a certain KonstantinMiller with a .cn email address and a homepage of http://www.google.com… My curiosity being piqued now, I fired up google and searched for the email address entered for the comment. Lo and behold, the top result was a post from a blogger who noticed the same thing as me and provided some pretty detailed info on the party behind the spamming (Including the idea that this person is probably located in Moldova). If that wasn’t enough, the rest of the first page of results had the word spam in pretty much every description.
It doesn’t stop there though. Now that I verified that this seemingly innocuous comment was a seed for future spam, I was interested in figuring out the details behind this tactic. I put FEEDJIT on my site from the very beginning, since I wanted to see where people were coming from and what they were searching for to get to my blog. Knowing that I could get the info for recent visitors to the site, I pulled up the tracking page and looked through the log. I saw two very odd entries, one visitor that got to my site from search.live.com from the phrase “about” and one visitor from search.live.com on the phrase “contact”. Both of these visitations were within 24 hours of the posting of curious comment, and apparently they originated from Moldova.
So now I’ve formed a pattern in my head. From what I’ve figured out, this spammer initially spiders a search engine or multiple search engines for common phrases in web sites (and I’m guessing blogs in particular) for common key words. Almost every blog is going to have an about me/us page and a contact page (where an email address can likely be obtained) and therefore a vague search term like “about” can dredge up tons of blogs in a targeted fashion. Then the spider adds a vague and innocuous looking comment with an email address and user name that is unique and can be searched at a later date. I’m guessing that if their initial comment makes it through long enough to get indexed by Google, it’s probably a worthwhile blog to spam, as the owner of the blog is likely either absent, oblivious, or not too sharp. Then they commence with the full scale assault.
The bothersome thing about this tactic is that if the party involved used a .com or another common TLD that didn’t draw attention and used a contact name and email address that was randomly generated from a preset list or stored after a test post, it would be neigh impossible to proactively block them. This type of initial post would slip through any Bayesian filter you could set up, and unless you flagged generic posts as spam, there’s really no way to stop this, shy of manually approving every comment on your blog. I have the luxury of being able to manually approve comments, but other blogs that have a large following will be bothered by this immensely.
Update: I’m still seeing generic search terms resulting in visitations, but now it’s coming from an IP in the US… Either this person is changing tactics or someone else is using a similar plan of attack.