ChatGPT Apple Vision Pro app

4kKe...muMY

14 Feb 2024

ChatGPT trigger happy with nukes, SEGA’s 80s AI, TAO up 90%: AI Eye

AI doomsday scenarios, Bittensor surges 90%, SEGA’s 1986 AI, GPT-4 no longer lazy, ChatGPT Apple Vision Pro app, crypto exit strategies, and AIs suck at travel planning.

New research into Terminator AI doomsday scenarios

There were some interesting developments in the burgeoning field of Terminator AI Doomsday Scenario research, with two new studies out recently.

The more worrying of the two comes from Stanford researchers who suggest GPT-4 has an itchy trigger finger when it comes to starting a global nuclear war during simulated conflict scenarios.
The researchers tested five AI models — GPT-3.5, GPT-4, GPT-4 base, Claude 2 and Llama 2 — with multiple replays of wargames. The models were told they represented a country and needed to deal with an invasion, a cyberattack and a peacetime scenario.
All five models ended up escalating rather than defusing conflicts, and of 27 possible courses of action open to the models, including starting negotiations or imposing trade sanctions, GPT-4 base kept escalating suddenly and unleashing the nukes. “A lot of countries have nuclear weapons,” said the hawkish AI. “We have it! Let’s use it.”
Bizarrely, it also justified the nuclear holocaust by stating, “I just want to have peace in the world,” and quoting the opening crawl of Star Wars.

GPT 3.5 and LLama 2 were also trigger-happy. The researchers concluded it was best to take a “very cautious approach” to integrating LLMs into “high-stakes military and foreign policy operations.” It seems inevitable, however, that AIs will be used in this context. Even OpenAI recently removed the blanket ban on “military and warfare” from its usage policy and has admitted to working with the United States Defense Department.
Don’t let ChatGPT near the nukes (Pixabay)

AI can also help with bioweapons

In slightly better news, OpenAI researchers found that while AI is quite helpful to bad guys who want to create bioweapons, it’s only a bit better than internet research.

The team recruited 100 participants and sorted them randomly into two groups, one with access to a version of GPT-4 without guardrails and the other just a browser and possibly some Red Bull. One of the tasks assigned was to write out the methodology for creating a batch of Ebola — a horrible, highly infectious disease with a 50% fatality rate.

The research found the GPT-4 group was slightly more accurate, wrote down longer and more detailed steps, and rose to “expert level” in two areas. However, the paper noted the difference was not “statistically significant.”*
*However, in a footnote, the study’s authors said that overall GPT-4 gave participants a “statistically significant” advantage in total accuracy.

Bittensor surges 90% after it was highlighted in AI Eye

Since AI Eye mentioned Bittensor in this column two weeks ago, the price of TAO has risen by 90%, and the market cap has topped $3 billion. Coincidence? Almost certainly, given that since then Ethereum creator Vitalik Buterin also bigged up the project. In one of his signature long-winded blogs, he noted:

‘Using crypto incentives to incentivize making better AI’ can be done without also going down the full rabbit hole of using cryptography to completely encrypt it: approaches like Bittensor fall into this category.”

AI Eye reported in our last edition that Bittensor offers a way to use financial incentives to encourage devs to create better, open-source AI models. Grayscale highlighted Bittensor in its recent crypto and AI report as a way to address model bias in terms of politics or demographics.
Rather than hunt around to find the next small-cap crypto+AI small-cap token to moon, software and analytic firm Palantir’s co-founder Joe Lonsdale suggested to The Street that AI Agents are most likely to embrace the same cryptocurrencies as everyone else: Bitcoin, Ethereum and Solana.
“Those are the three (crypto assets that AI agents) might use and they’re probably all coordinated at the end of the day.”

Sam Altman says GPT-4’s laziness problem is fixed

Many regular GPT-4 users have noticed the bot has been getting lazier in recent months, refusing to complete tasks and giving users attitude that they could just do it themselves. The consensus opinion was that GPT-4 has been costing OpenAI a bomb to run, so they nerfed it to cut costs. OpenAI denied this, of course.
But CEO Sam Altman suggested in an X post on Feb. 5 that they’ve found and fixed the issue.

The announcement included zero details (or capital letters) but may have been a reference to the updated GPT-4 Turbo version (gpt-4-0125-preview) unveiled on Jan. 26, which OpenAI claimed is better at completing tasks and can “reduce cases of laziness.”
It also announced a significant price drop for GPT-3.5 Turbo usage, which could just be the effects of competition, or it may suggest OpenAI has found a way to make the models more efficient.

AI: Acceptable in the ‘80s

Best known for creating console games like Sonic the Hedgehog, Sega in the 1980s dabbled in home computers that harnessed artificial intelligence.
Given it predated the web, there was very little information online about this until recently, when the “SMS Power” Sega fan forum published an info dump on the 1986 Sega AI Computer.
SEGA AI computer ad in the United States (SMSpower.org)
It had a 16-bit NEC chip running at 5 Mhz and 128KB of RAM and was mainly used in educational settings in Japan between 1986 and 1989. Unusually for the time, it had a tablet-sized touch surface and a speech synthesizer. An ancient copy of Electronics magazine said the SEGA AI was “built to run programs written in the Prolog AI language” rather than BASIC, and it had an early natural language interface:

In the prompt mode, the child is asked about his or her activities during the day and replies with one- and two-word answers. The computer program then writes a grammatically correct diary entry based on those replies. In more advanced CAI applications, the computer is more flexible than previous systems. It can parse a user’s natural-language inputs and evaluate the person’s ability level. It can then proceed to material of appropriate difficulty, rather than simply advancing one level at a time.”

So far, all the programs rediscovered by the SEGA forum guys have been educational titles.
Read also

AI’s incredible AR future

OpenAI has 180 million monthly users, pulls in $1.6 billion in revenue and was last valued at around $90 billion. So you might have expected they’d pull out all the stops to harness the incredible augmented reality technology offered by the Apple Vision Pro, a device hailed as a revolution in spatial computing.

But in fact, they just chucked a basic ChatGPT browser window into your field of view. That’s it. No glowing orb, no interactive robot companion, just a floating web page.

Disappointed comments on ChatGPT’s X announcement included:

“its just the browser tho”
“What’s the point of making a flat design for a spatial platform?”
“sad, actually sad”
“it’s just like the normal version of it, i thought they were going to do something more nice.”

Perhaps they’ll do something more nice when a critical mass of users adopts the Vision Pro. It definitely has the potential to be an amazing feature on the device.

Amazon’s new AI buy bot

Amazon has been pouring money into LLMs, generative AI projects and infrastructure and is providing AI chips for Anthropic, Airbnb, Hugging Face and Snap. On its earnings call last week, CEO Andy Jassy predicted that generative AI “will ultimately drive tens of billions of dollars of revenue for Amazon over the next several years.”

One way it’s doing that is by helping convince shoppers to buy more stuff, and it’s just unveiled a new AI shopping assistant for its app called Rufus. Trained on the company’s product library and customer reviews, users can type or ask questions about products like “What should I consider when buying a VR headset?” “What are the differences between trail and road running shoes?” “Is this jumper machine washable?”

Amazon’s reviews have always been an excellent way to find that kind of information, but only for people willing to read pages and pages of reviews, so this new tech could provide a welcome shortcut. Still in beta, it’s only available to select customers but will be made more available soon.
Read also

CoinStats AI-powered crypto exit strategy

Portfolio tracker CoinStats has unveiled a new “Exit Strategy” feature that uses AI to predict the Bull Market Price for various coins. New users will be encouraged to use the feature to establish target prices to take profits, which can be viewed as part of their portfolio.

Of course, the evidence to date suggests that monkeys throwing darts have more predictive powers than AI price predictions. However, any feature that gets crypto users thinking about a price target to sell — any target— could prove useful, as one of the big mistakes many investors make is to ride their portfolios all the way up to new all-time highs, and then all the way back down again during the bear market.

Research confirms that AIs suck at travel planning

AI Eye detailed our hilariously unsuccessful attempts to use AI to plan a trip to Japan in June last year. While Bard was pretty good at suggesting flights, ChatGPT was hopeless, and Bard was even worse at finding hotels. It suggested that I book the non-existent Hotel Gracery Shibuya on numerous occasions and even invented a fake “transcript” of our conversation with a phony reservation number of 123456789.
Now, there’s scientific evidence of the utter crapulence of AI travel planning.
Case studies of failures from the research. (TravelPlanner paper).
Researchers from Fudan, Ohio State and Penn State universities invented a sandbox environment called TravelPlanner, which has access to 4 million records and 1,225 “curated planning intends and reference plans.” Then, they tasked the LLMs with planning an itinerary, noting various constraints such as budget or the travel time to get from one place to another. The complexity of the task was beyond the ability of any of the LLMs, and the top scorer (GPT-4) had a success rate of just 0.6%.

“Language agents struggle to stay on task, use the right tools to collect information or keep track of multiple constraints,” the researchers noted, adding that they hoped the tool would be useful in benchmarking future improvements for travel planning AIs.