Sunday, October 18, 2009

Category-key-words

Keywords are superior to directory hierarchies because things often fit into more than one category and there's more than one way of categorizing things.

But categories are kind of easier to browse without having to think too much.

Solution: each entry is entered using keywords, but the category system is generated automatically

-a top-level category for each key word
-the second level of each category has a sub-category for each other keyword existing in all its elements
-and so on

for example

object a, keywords b, c, e
object b, keywords d, e
object c, keywords q

hierarchy:

b (object a)
c (object a)
e (object a)
e (object a)
c (object a)
c (object a)
b (object a)
e (object a)
e (object a)
b (object a)
d (object b)
e (object b)
e (object a, object b)
d (object b)
c (object a)
b (object a)
b (object a)
c (object a)
q (object c)

it might look like a really complicated hierarchy, but my theory is that it's not so complicated when you're collapsing and expanding branches at will from the top down.

Thursday, October 15, 2009

Efficient searching for substrings

I don't know if this idea's ever been done before, so I'll post it.

It's a way to search for substrings within a large dataset. Google's code, for example, I'm sure is very efficient, but doesn't do substrings. It just indexes words. The same method wouldn't work for substrings, because the sentence "I heard it through the grape vine" has 561, give or take, possible substrings.

So here's my way. It uses SQL, but obviously using hash tables or binary trees directly would also work. It's just easier to implement it in SQL, and possibly even more efficient on account of the DBE's query optimizer and so forth.

So I'm indexing the string "I heard it through the grape vine", say it's in Record 1.

my fields are (autonumber id, int nextid, char letter, int record)
we'll call the table t, just because it's convenient.

here's what i index:

(0, 1, 'I', 1) //or normalize it to lower-case so we can do non-case-sensitive searches
(1, 2, ' ', 1)
(2, 3, 'h', 1)
(3, 4, 'e', 1)
(4, 5, 'a', 1)
(5, 6, 'r', 1)
(6, 7, 'd', 1)
(7, 8, ' ', 1)
(8, 9, 'i', 1)
(9, 10, 't', 1)
(10, 11, ' ', 1)
(11, 12, 't', 1)
(12, 13, 'h', 1)
(13, 14, 'r', 1)
(14, 15, 'o', 1)
(15, 16, 'u', 1)
(16, 17, 'g', 1)
(17, 18, 'h', 1)
(18, 19, ' ', 1)
(19, 20, 't', 1)
(20, 21, 'h', 1)
(21, 22, 'e', 1)
(22, 23, ' ', 1)
(23, 24, 'g', 1)
(24, 25, 'r', 1)
(25, 26, 'a', 1)
(26, 27, 'p', 1)
(27, 28, 'e', 1)
(28, 29, ' ', 1)
(29, 30, 'v', 1)
(30, 31, 'i', 1)
(31, 32, 'n', 1)
(32, null, 'e', 1)

now i'll search for "rape".

Now here's my search query.

select a.record from t a, t b, t c, t d on a.nextid = b.id inner join b.nextid = c.id inner join c.nextid = d.id inner join a.letter = "r" inner join b.letter = "a" inner join c.letter = "p" inner join d.letter = "e"

Or something like that. I'm actually not that experienced with SQL. But you know what I'm getting at here.

The result of that search (after it's turned into sql that will actually work) will be 1, because "rape" occurs in record 1.

---

Another thing we can do is this

(0, 1, 'I', 1, 0)
(1, 2, ' ', 1, 1)
(2, 3, 'h', 1, 2)
(3, 4, 'e', 1, 3)
(4, 5, 'a', 1, 4)
(5, 6, 'r', 1, 5)
(6, 7, 'd', 1, 6)
(7, 8, ' ', 1, 7)
(8, 9, 'i', 1, 8)

etc.

where the last field is the index within the record at which that letter occurs.
it just happens to match id and nextid-1 in this example, but it wouldn't necessarily, because the same table would include many records.

that way if I searched for "rape" i would instantly know that it occurs in Record 1, at index 24, without having to do a string search through record 1's text (which is stored as a normal sting in some other table).

Also, we might want to use ints instead of chars for the letters to speed it up.

Also, for unicode you might have to do it a little differently. The same principle still applies.

It might also be safe to forgo the "nextid" field and do "...a.id = b.id-1 inner join a.id = c.id-2 inner join a.id = d.id-3...", but i don't know which way is more efficient. i suspect this way would be much less optimized.

oh, one other thing: obviously index the "nextid" and "letter" fields.

Saturday, October 10, 2009

*Possibly* Crack-Pot Idea

Have millions of users take random photos of the night sky with their digital cameras. Let them send them in to a central server. The server uses software to automatically align/rotate them with each other and group by time of year. It averages them out in order to get a very precise image of the entire night sky. It would probably want to use this precision to be able to remove the effect of light pollution from the result.
Consider this an anti-idea..

The Virtual Species: Post-Modernism Gets Physical
or Humanity's Launch into Hyper-Reality.. Next Stop: Oblivion

It's an interesting phenomenon whereby some people actually infiltrate home computers to install programs like SETI@home or Folding@home on them without their owners' permissions. It would invoke mixed feelings in me if it happened to me: it has all the nuisance of a trojan, but the intention behind it is different. It's something like, "I COMMAND that your computer work to serve humanity (and without your knowledge or consent)." Add to that the fact that I don't even agree with the Folding@home-and-ilk initiative, and I almost want to roll my eyes at such a person's choice to boldly get on that kind of a high horse.

I don't run Folding@home, and not because I want to save electricity, but because I don't believe in it. It's just a bunch of fucking manipulation. Einstein said, "the significant problems we have cannot be solved at the same level of thinking with which we created them," and I think that maxim really applies here. Forget the Power of Creation. Forget the Heart. Forget even all the modern causes of disease. Just solve the problem with manipulation. And not even physical manpilation, just some sort of abstract conceptual manipulation. But don't use human thought to do this, use computers. But don't use *one* computer, use hundreds of thousands of these silicone machines, not even individually created by humans but rather mass-produced in factories, solving our health problems by searching for virtual solutions in some ridiculously huge abstract mathematical possibility space. If the Greys really exist, and if they actually are infertile and hence creating human-Grey hybrids in order to continue their species, then this is why: they went down the same road we're going down now a long time ago.

Fuck Folding@home, Fuck Rosetta@home, Fuck Predictor@home, Fuck Foldit, and Fuck Robetta.

Fuck It Fuck It Fuck It Fuck It Fuck It!

I can feel the healing already..

Friday, October 02, 2009

More efficient color-adjusted bulbs

Give incandescent lights a better color temperature by using phosphors on the glass, instead of the typical way of using neodymium.
If Bank Robbers Were Smart..
Bank robbers should wear all black with horizontal orange or red stripes going all the way around the body. Orange and red are 'danger' colors in nature, and this would trigger the same unconscious response in the employees and customers. The robbers would be less likely to be challenged.