AIDeveloper https://www.webpronews.com/developer/aideveloper/ Breaking News in Tech, Search, Social, & Business Tue, 18 Feb 2025 11:09:58 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://i0.wp.com/www.webpronews.com/wp-content/uploads/2020/03/cropped-wpn_siteidentity-7.png?fit=32%2C32&ssl=1 AIDeveloper https://www.webpronews.com/developer/aideveloper/ 32 32 138578674 Grok 3.0 Unveiled: A Technical Leap Forward in the AI Arms Race https://www.webpronews.com/grok-3-0-unveiled-a-technical-leap-forward-in-the-ai-arms-race/ Tue, 18 Feb 2025 11:09:11 +0000 https://www.webpronews.com/?p=611628 February 18, 2025 – The artificial intelligence landscape has been set ablaze with the official launch of Grok 3.0, the latest flagship model from Elon Musk’s xAI. Announced on Monday, February 17, at 8:00 PM Pacific Time via a live demo streamed on X, Grok 3.0 is being heralded as a game-changer in the fiercely competitive world of generative AI. With Musk dubbing it the “smartest AI on Earth” and tech leaders buzzing about its potential, this release marks a pivotal moment in AI development. From its unprecedented computational scale to its innovative training methodologies, here’s a deep dive into what makes Grok 3.0 a technical marvel—and how it stacks up against its rivals.

A Monumental Technical Achievement

At the heart of Grok 3.0’s prowess is its training infrastructure: xAI’s Colossus supercomputer, a behemoth powered by 200,000 Nvidia H100 GPUs. During the launch event, Musk revealed that Grok 3.0 was trained with ten times the computational power of its predecessor, Grok 2, and that the cluster size had doubled in just 92 days after an initial deployment of 100,000 GPUs in 122 days. This makes it the largest fully connected H100 cluster ever built, a feat xAI engineers described as “monumental” given the tight timeline.

“We didn’t have much time because we wanted to launch Grok 3 as quickly as possible,” an xAI executive explained during the demo. “We’ve used all this computing power to continuously improve the product along the way.” This scale is a significant escalation in the AI arms race, testing the limits of scaling laws—principles suggesting that larger compute and data lead to proportionally better performance. Gavin Baker, a prominent tech investor, noted on X in December 2024, “This will be the first real test of scaling laws for training, arguably since GPT-4. If scaling laws hold, Grok 3 should be a major leap forward in AI’s state of the art.”

Unlike many competitors relying on real-world data scraped from the web, Grok 3.0 leverages synthetic datasets designed to simulate diverse scenarios. Musk emphasized this shift during the World Governments Summit in Dubai on February 13, stating, “It’s trained on a lot of synthetic data and can reflect on its mistakes to achieve logical consistency.” This approach, combined with reinforcement learning and human feedback loops, aims to minimize “hallucinations”—AI-generated inaccuracies—by enabling the model to self-correct in real time. Early benchmarks showcased at the launch suggest this strategy is paying off, with Grok 3.0 outperforming rivals in science, math, and coding tasks.

What Tech and AI Leaders Are Saying

The announcement has sparked a flurry of reactions from industry luminaries. Elon Musk, ever the provocateur, claimed at the Dubai summit, “This might be the last time that an AI is better than Grok,” a bold assertion reflecting his confidence in xAI’s trajectory. During the launch, he praised the team’s efforts, saying, “Grok 3 is an order of magnitude more capable than Grok 2 in a very short period of time. It’s scary smart.”

Ethan Mollick, an AI researcher, commented on X post-launch: “Based on the announcement… X has caught up with the frontier of released models VERY quickly. If they continue to scale this fast, they are a major player.” Mollick also noted parallels with OpenAI’s playbook, suggesting xAI is adopting proven strategies while pushing boundaries with compute scale.

Not all feedback was universally glowing. Benjamin De Kraker, a former xAI engineer, had previously ranked Grok 3.0 below OpenAI’s o1 models in coding ability based on internal tests, a post that led to his resignation after xAI reportedly demanded its deletion. While this critique predates the final release, it underscores the high stakes and scrutiny surrounding Grok 3.0’s claims.

AI expert Dr. Alan D. Thompson praised Grok’s real-time data access via X integration, stating, “This feature sets it apart from competitors, offering fresh insights and potentially enhancing user experience with continuously updated information.” Meanwhile, posts on X from users like

@iruletheworldmo, claiming insider knowledge, hyped a reasoning model that “blows past full o3 scores,” amplifying anticipation.

Comparing Grok 3.0 to Rivals

Grok 3.0 enters a crowded field dominated by OpenAI’s ChatGPT (GPT-4o), Google’s Gemini, Anthropic’s Claude, and China’s DeepSeek R1. xAI showcased comparison benchmarks at the launch, asserting Grok 3.0 Reasoning surpasses Gemini 2 Pro, DeepSeek V3, and ChatGPT-4o in standardized tests like AIME 2025 (math), alongside coding and science tasks. A standout claim came from Chatbot Arena, where an early Grok 3.0 iteration (codename “chocolate”) scored 1402, the first model to break 1400, edging out OpenAI’s ChatGPT-4o-latest at 1377.

Technical Differentiators

  • Compute Scale: Grok 3.0’s 200,000-GPU training dwarfs ChatGPT-4o’s estimated 10,000–20,000 GPU cluster and DeepSeek’s leaner, cost-efficient approach. This brute-force scaling aligns with Musk’s vision of accelerating AI breakthroughs.
  • Synthetic Data & Self-Correction: Unlike GPT-4o and Gemini, which rely heavily on web-scraped data, Grok 3.0’s synthetic training reduces legal risks and biases, while its self-correcting mechanism aims for higher logical accuracy. OpenAI’s o1 and DeepSeek’s R1 also feature reasoning capabilities, but xAI claims Grok 3.0’s “Big Brain” mode offers superior adaptability.
  • Real-Time X Integration: A native advantage over rivals, Grok 3.0 pulls live data from X, making it uniquely responsive to current events—a capability ChatGPT and Gemini lack without external plugins.
  • Reasoning Models: Grok 3.0 Reasoning and its smaller sibling, Grok 3 mini Reasoning, mimic OpenAI’s o1 series by “thinking through” problems step-by-step. xAI asserts Grok 3.0 Reasoning beats o1-mini-high on AIME 2025, though independent verification is pending.

Features and Accessibility

Grok 3.0 introduces “DeepSearch,” a next-generation search engine rivaling OpenAI’s Deep Research, scanning X and the web for comprehensive answers. Multimodal capabilities—analyzing images alongside text—mirror ChatGPT-4o and Gemini, but xAI’s Flux-based image generation (enhanced by the new Auroria model) promises photorealistic precision. Voice mode, teased for release within a week, could challenge ChatGPT’s conversational edge.

Initially rolled out to X Premium+ subscribers ($50/month), Grok 3.0 also offers a standalone “SuperGrok” subscription ($30/month or $300/year) for unlimited queries and early feature access. This tiered model contrasts with ChatGPT’s broader free tier and DeepSeek’s open-source approach, potentially limiting Grok’s immediate reach.

Rival Responses

OpenAI, facing Musk’s $97.4 billion buyout bid (rejected in February), has doubled down with free reasoning models like o1. DeepSeek’s R1, built on a fraction of Western budgets, has disrupted the market, prompting xAI to accelerate Grok 3.0’s timeline. Google’s Gemini 2.0 series remains a formidable contender with its vast parameter count, though it lacks Grok’s real-time data edge.

The Bigger Picture

Grok 3.0’s launch isn’t just a technical milestone—it’s a statement. Musk’s xAI, founded in 2023, has catapulted from underdog to frontrunner in under two years, leveraging massive compute, synthetic data innovation, and X’s ecosystem. The model’s beta status—expect “imperfections at first,” Musk cautioned—belies its ambition: daily improvements aim to outpace rivals’ static updates.

Yet challenges loom. Grok’s X-centric data raises misinformation risks, a concern amplified by its less restrictive content policies. Independent benchmarks will determine if its performance claims hold against OpenAI’s polish, Google’s scale, and DeepSeek’s efficiency. Mollick’s X post hints at an API play, but its adoption remains uncertain amidst established ecosystems.

For now, Grok 3.0 stands as a testament to scaling laws’ enduring power and xAI’s relentless pace. As Musk mused during the demo, referencing Robert Heinlein’s “Stranger in a Strange Land,” “To grok is to deeply understand—and empathy is part of that.” Whether Grok 3.0 truly “groks” the world better than its rivals, it’s undeniably redefined the AI frontier. The race is far from over, but xAI has just fired a shot heard across the tech universe.

]]>
611628
Thomson Reuters Win AI Copyright Case, Spelling Trouble for AI Firms https://www.webpronews.com/thomson-reuters-win-ai-copyright-case-spelling-trouble-for-ai-firms/ Wed, 12 Feb 2025 20:35:27 +0000 https://www.webpronews.com/?p=611536 Thomson Reuters has won its case against Ross Intelligence, setting a legal precedent for how AI firms collect and use the vast quantities of data their models rely on.

The vast majority of AI companies have engaged in legally questionable behavior, hoovering up vast quantities of copyrighted data to use for training purposes. The firms have argued that fair use covers their activity, but that hasn’t stopped multiple companies and media outlets from suing various AI firms.

Thomson Reuters sued Ross Intelligence, a startup that has since shut down because of the cost of the legal battle, alleging copyright infringement. Specifically, Ross Intelligence was accused of using Thomson Reuters’ legal database as the basis for some of its AI-generated materials.

Notably, in his ruling, U.S. Circuit Judge Stephanos Bibas reversed his original decision, in which he initially ruled that a jury would need to decide the fair use aspect of the case.

A smart man knows when he is right; a wise man knows when he is wrong. Wisdom does not always find me, so I try to embrace it when it does––even if it comes late, as it did here.

I thus revise my 2023 summary judgment opinion and order in this case. See Fed. R. Civ. P. 54(b); D.I. 547, 548; Thomson Reuters Enter. Ctr. GmbH v. Ross Intel. Inc., 694 F. Supp. 3d 467 (D. Del. 2023). Now I (1) grant most of Thomson Reuters’s motion for partial summary judgment on direct copyright infringement and related defenses, D.I. 674; (2) grant Thomson Reuters’s motion for partial summary judgment on fair use, D.I. 672; (3) deny Ross’s motion for summary judgment on fair use, D.I. 676; and (4) deny Ross’s motion for summary judgment on Thomson Reuters’s copyright claims, D.I. 683.

Case Background

Judge Biba then goes on to summarize the case, acknowledging that Thomson Reuters’ Westlaw database is one of the largest legal databases in the U.S., with the company licensing its contents to users. In an effort to build a competing database, Ross asked to license Westlaw content. Because Ross’ stated goal was to build a competitor to Westlaw, Thomson Reuters understandably declined to license its content to the firm.

In what has been a common refrain among AI firms when they can’t legally access data they want/need for their AI models, Ross moved ahead anyway.

So to train its AI, Ross made a deal with LegalEase to get training data in the form of “Bulk Memos.” Id. at 5. Bulk Memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the lawyers should not just copy and paste headnotes directly into the questions. D.I. 678-36 at 5–9. LegalEase sold Ross roughly 25,000 Bulk Memos, which Ross used to train its AI search tool. See D.I. 752-1 at 5; D.I. 769 at 30 (10:48:35). In other words, Ross built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes. When Thomson Reuters found out, it sued Ross for copyright infringement.

The Headnotes and Key Number System Questions

At the heart of the case was whether Ross infringed copyright by copying Westlaw headnotes based on their originality.

The headnotes are original. A headnote is a short, key point of law chiseled out of a lengthy judicial opinion. The text of judicial opinions is not copyrightable. Banks v. Manchester, 128 U.S. 244, 253–54 (1888). And even if it were, Thomson Reuters would not get that copyright because it did not write the opinions. But a headnote can introduce creativity by distilling, synthesizing, or explaining part of an opinion, and thus be copyrightable. That is why I have changed my mind.

First, the headnotes are a compilation. “Factual compilations” are original if the compiler makes “choices as to selection and arrangement” using “a minimal degree of creativity.” Feist, 499 U.S. at 348. Thomson Reuters’s selection and arrangement of its headnotes easily clears that low bar.

More than that, each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. § 102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is. That editorial expression has enough “creative spark” to be original. Feist, 499 U.S. at 345. So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works. That belated insight explains my change of heart. In my 2023 opinion, I wrongly viewed the degree of overlap between the headnote text and the case opinion text as dispositive of originality. 694 F. Supp. 3d at 478. I no longer think that is so. But I am still not granting summary judgment on any headnotes that are verbatim copies of the case opinion (for reasons that I explain below).

Similarly, Ross Intelligence copied Westlaw’s Key Number System, although it did not present the Key Number System to customers.

“The Key Number System is original too. There is no genuine issue of material fact about the Key Number System’s originality. Recall that Westlaw uses this taxonomy to organize its materials. Even if “most of the organization decisions are made by a rote computer program and the high-level topics largely track common doctrinal topics taught as law school courses,” it still has the minimum “spark” of originality. Id. at 477 (internal quotation marks omitted); Feist, 499 U.S. at 345. The question is whether the system is original, not how hard Thomas Reuters worked to create it. Feist, 499 U.S. at 359–60. So whether a rote computer program did the work is not dispositive. And it does not matter if the Key Number System categorizes opinions into legal buckets that any first-year law student would recognize. To be original, a compilation need not be “novel,” just “independently created by” Thomson Reuters. Id. at 345–46. There are many possible, logical ways to organize legal topics by level of granularity. It is enough that Thomson Reuters chose a particular one.

The Fair Use Issue

The biggest issue of all, however, was whether Ross’ actions fell under Fair Use, a legal doctrine that allows copyrighted material to be used under specific circumstances. In his ruling, Judge Biba reiterated that he was reversing his initial ruling, including on the fair use question, before outlining the four specific factors that must be considered.

I must consider at least four fair-use factors: (1) the use’s purpose and character, including whether it is commercial or nonprofit; (2) the copyrighted work’s nature; (3) how much of the work was used and how substantial a part it was relative to the copyrighted work’s whole; and (4) how Ross’s use affected the copyrighted work’s value or potential market. 17 U.S.C. § 107(1)–(4). The first and fourth factors weigh most heavily in the analysis. Authors Guild v. Google, Inc., 804 F.3d 202, 220 (2d Cir. 2015) (Leval, J.).

Factor One – The Purpose and Character of Ross’ Use

Judge Biba said the first factor went in favor of Thomson Rueters, ruling that Ross’ commercial intentions and lack of any type of transformative nature of Ross’ use of Westlaw data argued against fair use.

Ross’s use is not transformative. Transformativeness is about the purpose of the use. “If an original work and a secondary use share the same or highly similar purposes, and the second use is of a commercial nature, the first factor is likely to weigh against fair use, absent some other justification for copying.” Warhol, 598 U.S. at 532–33. It weighs against fair use here. Ross’s use is not transformative because it does not have a “further purpose or different character” from Thomson Reuters’s. Id. at 529.

But because Ross’s use was commercial and not transformative, I need not consider this possible element. Even if I found no bad faith, that finding would not outweigh the other two considerations.

Factor Two – The Nature of the Original Work

The second factor went in favor of Ross. This factor went back to the creativity involved in Westlaw’s headnotes, and whether they met the threshold to warrant fair use protection.

Westlaw’s material has more than the minimal spark of originality required for copyright validity. But the material is not that creative. Though the headnotes required editorial creativity and judgment, that creativity is less than that of a novelist or artist drafting a work from scratch. And the Key Number System is a factual compilation, so its creativity is limited.

So factor two goes for Ross. Note, though, that this factor “has rarely played a significant role in the determination of a fair use dispute.

Factor Three – How the Work Was Used and Was Relative to the Whole

The third factor also went in favor of Ross.

My prior opinion did not decide factor three but suggested that it leaned towards Ross. The opinion focused on Ross’s claim that its output to an end user is a judicial opinion, not a West headnote, so it “communicates little sense of the original.” 649 F. Supp. 3d at 485 (quoting Authors Guild, 804 F.3d at 223).

I stand by that reasoning, but now go a step further and decide factor three for Ross. There is no factual dispute: Ross’s output to an end user does not include a West headnote. What matters is not “the amount and substantiality of the portion used in making a copy, but rather the amount and substantiality of what is thereby made accessible to a public for which it may serve as a competing substitute.” Authors Guild, 804 F.3d at 222 (internal quotation marks omitted). Because Ross did not make West headnotes available to the public, Ross benefits from factor three.

Factor Four – The Effect of Ross Copying Westlake

Judge Biba cites Harper & Row in saying this fourth factor “is undoubtedly the single most important element of fair use.”

My prior opinion left this factor for the jury. I thought that “Ross’s use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw.” 694 F. Supp. 3d at 486. If that were true, then Ross would not be a market substitute for Westlaw. Plus, I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data. Id. And I thought a jury ought to sort out “whether the public’s interest is better served by protecting a creator or a copier.” Id.

In hindsight, those concerns are unpersuasive. Even taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute. D.I. 752-1 at 4. And it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. Ross bears the burden of proof. It has not put forward enough facts to show that these markets do not exist and would not be affected.

The Decision

Ultimately, when taking the above factors into consideration, Judge Biba rejected Ross’ fair-use defense.

Factors one and four favor Thomson Reuters. Factors two and three favor Ross. Factor two matters less than the others, and factor four matters more. Weighing them all together, I grant summary judgment for Thomson Reuters on fair use.

I grant partial summary judgment to Thomson Reuters on direct copyright infringement for the headnotes in Appendix A. For those headnotes, the only remaining factual issue on liability is that some of those copyrights may have expired or been untimely created. This factual question underlying copyright validity is for the jury. I also grant summary judgment to Thomson Reuters against Ross’s defenses of innocent infringement, copyright misuse, merger, scenes à faire, and fair use. I deny Ross’s motions for summary judgment on direct copyright infringement and fair use. I revise all parts of my prior opinions that conflict with this one. I leave undisturbed the parts of my prior opinion not addressed in this one, such as my rulings on contributory liability, vicarious liability, and tortious interference with contract.

“We are pleased that the court granted summary judgment in our favor and concluded that Westlaw’s editorial content, created and maintained by our attorney editors, is protected by copyright and cannot be used without our consent. The copying of our content was not ‘fair use,'” the company said in a statement.

The Implications of the Decision

The implications of Judge Biba’s decision will reach far and wide within the AI industry, and should serve as a warning to AI companies throughout the industry who have engaged in similar practices.

Meta, for example, is involved in a court case in which its own internal emails detail the questions and concerns staff had about pirating more than 80 TB of tens of millions of books. Those same emails implicate OpenAI for allegedly engaging in the same behavior, including pirating books from the same sources.

AI firms have maintained that fair use covers their activities, making it legal to hoover up any and all data, regardless of copyright status. Judge Biba’s decision, on the other hand, raises major questions about that argument.

If Judge Biba’s ruling is used as a legal precedent by the many other AI copyright cases being litigated, it could spell disaster for the AI industry, leaving firms and their executives liable for untold sums in damages and even facing potential criminal charges.

]]>
611536
California AG Puts AI Firms On Notice https://www.webpronews.com/california-ag-puts-ai-firms-on-notice/ Mon, 10 Feb 2025 18:23:51 +0000 https://www.webpronews.com/?p=611502 California Attorney General Rob Bonta has issued a legal advisory, putting AI firms on notice about activities that may not be legal.

California is at the epicenter of much of the AI development within the U.S., with Silicon Valley serving as home of many of the leading AI firms. As a result, the firms fall within the jurisdiction of California, which has some of the strictest privacy laws in the country.

The legal advisory acknowledges the good that AI can be used to accomplish.

AI systems are at the forefront of the technology industry, and hold great potential to achieve scientific breakthroughs, boost economic growth, and benefit consumers. As home to the world’s leading technology companies and many of the most compelling recent developments in AI, California has a vested interest in the development and growth of AI tools. The AGO encourages the responsible use of AI in ways that are safe, ethical, and consistent with human dignity to help solve urgent challenges, increase efficiencies, and unlock access to information—consistent with state and federal law.

The advisory then goes on the describe the challenges A systems pose, and the potential threats they may bring.

AI systems are proliferating at an exponential rate and already affect nearly all aspects of everyday life. Businesses are using AI systems to evaluate consumers’ credit risk and guide loan decisions, screen tenants for rentals, and target consumers with ads and offers. AI systems are also used in the workplace to guide employment decisions, in educational settings to provide new learning systems, and in healthcare settings to inform medical diagnoses. But many consumers are not aware of when and how AI systems are used in their lives or by institutions that they rely on. Moreover, AI systems are novel and complex, and their inner workings are often not understood by developers and entities that use AI, let alone consumers. The rapid deployment of such tools has resulted in situations where AI tools have generated false information or biased and discriminatory results, often while being represented as neutral and free from human bias.

The AG’s office outlines a number of laws that govern AI use, including the state’s Unfair Competition Law, False Advertising Law, several competition laws, a number of civil rights laws, and the state’s election misinformation prevention laws.

The advisory also delves into California’s data protection laws and they role they play in AI development and use cases.

AI developers and users that collect and use Californians’ personal information must comply with CCPA’s protections for consumers, including by ensuring that their collection, use, retention, and sharing of consumer personal information is reasonably necessary and proportionate to achieve the purposes for which the personal information was collected and processed. (Id. § 1798.100.) Businesses are prohibited from processing personal information for non-disclosed purposes, and even the collection, use, retention, and sharing of personal information for disclosed purposes must be compatible with the context in which the personal information was collected. (Ibid.) AI developers and users should also be aware that using personal information for research is also subject to several requirements and limitations. (Id. § 1798.140(ab).) A new bill signed into law in September 2024 confirms that the protections for personal information in the CCPA apply to personal information in AI systems that are capable of outputting personal information. (Civ. Code, § 1798.140, added by AB 1008, Stats. 2024, ch. 804.) A second bill expands the definition of sensitive personal information to include “neural data.” (Civ. Code, § 1798.140, added by SB 1223, Stats. 2024, ch. 887.)

The California Invasion of Privacy Act (CIPA) may also impact AI training data, inputs, or outputs. CIPA restricts recording or listening to private electronic communication, including wiretapping, eavesdropping on or recording communications without the consent of all parties, and recording or intercepting cellular communications without the consent of all parties. (Pen. Code, § 630 et seq.) CIPA also prohibits use of systems that examine or record voice prints to determine the truth or falsity of statements without consent. (Id. § 637.3.) Developers and users should ensure that their AI systems, or any data used by the system, do not violate CIPA.

California law contains heightened protection for particular types of consumer data, including education and healthcare data that may be processed or used by AI systems. The Student Online Personal Information Protection Act (SOPIPA) broadly prohibits education technology service providers from selling student data, engaging in targeted advertising using student data, and amassing profiles about students, except for specified school purposes. (Bus. & Prof. Code, § 22584 et seq.) SOPIPA applies to services and apps used primarily for “K-12 school purposes.” This includes services and apps for home or remote instruction, as well as those intended for use at a public or private school. Developers and users should ensure any educational AI systems comply with SOPIPA, even if they are marketed directly to consumers.

The advisory also cites the state’s Confidentiality of Medical Information Act (CMIA) which governs how patient data is used, as well as the required disclosures before that data can be shared with outside companies.

The AG’s notice concludes by emphasizing the need or AI companies to remain vigilant about the various laws and regulations that may impact their work.

Beyond the laws and regulations discussed in this advisory, other California laws—including tort, public nuisance, environmental and business regulation, and criminal law—apply equally to AI systems and to conduct and business activities that involve the use of AI. Conduct that is illegal if engaged in without the involvement of AI is equally unlawful if AI is involved, and the fact that AI is involved is not a defense to liability under any law.

This overview is not intended to be exhaustive. Entities that develop or use AI have a duty to ensure that they understand and are in compliance with all state, federal, and local laws that may apply to them or their activities. That is particularly so when AI is used or developed for applications that could carry a potential risk of harm to people, organizations, physical or virtual infrastructure, or the environment.

Conclusion

The AG’s notice serves as a warning shot to AI firms, emphasizing that they are not above existing law, just because they are creating industry-defining technology.

Many legal issues surrounding AI are currently being decided in the court system, although some experts fear AI companies are moving so fast that any legal decisions clarifying the legality of their actions may come too late to have any appreciable effect.

California, at least, appears to be taking a tougher stance, putting firms on notice that they must adhere to existing law, or face the consequences.

]]>
611502
Red Hat Working to Integrate AI Into Fedora and GNOME https://www.webpronews.com/red-hat-working-to-integrate-ai-into-fedora-and-gnome/ Wed, 05 Feb 2025 12:00:00 +0000 https://www.webpronews.com/?p=611425 Christian F.K. Schaller, Director of Software Engineering at Red Hat, says the company is working to integrate IBM’s AI models into Fedora Workstation and GNOME.

IBM, which owns Red Hat, has been developing its Granite line of AI models, designed specifically for business applications. IBM has released Granite 3.0, its latest version, under the Apache 2.0 license, a permissive license that makes it ideal for open source projects.

Schaller says Red Hat is working to incorporate Granite into Fedora and GNOME, giving Linux users access to a variety of AI-powered tools.

One big item on our list for the year is looking at ways Fedora Workstation can make use of artificial intelligence. Thanks to IBMs Granite effort we know have an AI engine that is available under proper open source licensing terms and which can be extended for many different usecases. Also the IBM Granite team has an aggressive plan for releasing updated versions of Granite, incorporating new features of special interest to developers, like making Granite a great engine to power IDEs and similar tools. We been brainstorming various ideas in the team for how we can make use of AI to provide improved or new features to users of GNOME and Fedora Workstation. This includes making sure Fedora Workstation users have access to great tools like RamaLama, that we make sure setting up accelerated AI inside Toolbx is simple, that we offer a good Code Assistant based on Granite and that we come up with other cool integration points.

Wayland Improvements

Schaller goes on to detail several other improvements, starting with Wayland, the successor to the X11 window manager. Last year saw a bit of drama with Wayland development, with GNOME developers often accused of holding up progress, or blocking protocols they don’t see a need for within GNOME itself.

Schaller addresses those issues, highlighting the value of the “ext” namespace for extensions to Wayland that may not appeal to every desktop environment, but still serve a valuable purpose for some.

The Wayland community had some challenges last year with frustrations boiling over a few times due to new protocol development taking a long time. Some of it was simply the challenge of finding enough people across multiple projects having the time to follow up and help review while other parts are genuine disagreements of what kind of things should be Wayland protocols or not. That said I think that problem has been somewhat resolved with a general understanding now that we have the ‘ext’ namespace for a reason, to allow people to have a space to review and make protocols without an expectation that they will be universally implemented. This allows for protocols of interest only to a subset of the community going into ‘ext’ and thus allowing protocols that might not be of interest to GNOME and KDE for instance to still have a place to live.

Flatpak Improvements

Similarly, Flatpak saw major improvements in 2024. Flatpak is a containerized application format that includes all necessary dependencies, rather than rely on the underlying system. As a result, Flatpak is ideal for installing the latest and greatest version of a package—even on stable releases like Debian—without worrying about conflicts or risking destabilizing the system.

Because of its containerized nature, however, Flatpaks have traditionally had some limitations, such as connecting to USB devices. Schaller highlights the progress that was made, thanks to the USB portal implementation.

Some major improvements to the Flatpak stack has happened recently with the USB portal merged upstream. The USB portal came out of the Sovereign fund funding for GNOME and it gives us a more secure way to give sandboxed applications access to you USB devcices. In a somewhat related note we are still working on making system daemons installable through Flatpak, with the usecase being applications that has a system daemon to communicate with a specific piece of hardware for example (usually through USB). Christian Hergert got this on his todo list, but we are at the moment waiting for Lennart Poettering to merge some pre-requisite work into systemd that we want to base this on.

Other Improvements

Schaller touts the additional improvements being made, including to High Dynamic Range (HDR), PipeWire audio server, MIPI camera support, accessibility, Firefox, and the GNOME Software software app.

Fedora’s developers have made it clear that they want the distro, which serves as an upstream for Red Hat Enterprise Linux, to be “the best community platform for AI.” Integrating IBM’s Granite is a major step in that direction.

]]>
611425
SoftBank May Invest $25 Billion In OpenAI https://www.webpronews.com/softbank-may-invest-25-billion-in-openai/ Fri, 31 Jan 2025 17:57:58 +0000 https://www.webpronews.com/?p=611363 SoftBank is the latest company interested in making a massive investment in OpenAI, with the company reportedly looking to invest as much as $25 billion.

OpenAI has been wooing investors as it continues to spend money at an extraordinary rate in its quest for true artificial intelligence. Microsoft has been one of the company’s largest investors, but the relationship between the two companies appears to be cooling.

According to Financial Times, via TechCrunch, SoftBank could invest between $15 and $25 billion dollars in the AI firm. The investment would be in addition to the $15 billion it plans to invest in the US Stargate AI project.

As the outlets point out, the investment would be SoftBank’s largest since its failed WeWork bet. What’s more, the investment would also serve to give OpenAI more independence from Microsoft.

]]>
611363
AI Supercharges Developer Productivity: Transforming Code Creation to System Maintenance https://www.webpronews.com/ai-supercharges-developer-productivity-transforming-code-creation-to-system-maintenance/ Thu, 30 Jan 2025 11:38:42 +0000 https://www.webpronews.com/?p=611293 Artificial Intelligence (AI) has become the catalyst for a productivity renaissance in the high-velocity world of software development, where demand outstrips supply. For professional developers, AI isn’t just another tool; it’s a transformative force that reshapes the entire software lifecycle. Here’s how AI revolutionizes development for those at the forefront of code creation, testing, maintenance, and beyond.

Code Creation: Beyond Autocomplete

AI has transcended simple code suggestions to become an integral part of the coding process. Tools like GitHub Copilot or DeepMind’s AlphaCode now offer intelligent code completion beyond syntax, proposing entire functions or algorithms based on context, project history, and global codebases.

What was once a solitary task has evolved into pair programming with AI, where the machine suggests alternative implementations, highlights potential improvements, or alerts to security vulnerabilities in real time. This shift allows developers to bypass boilerplate code, focusing instead on high-level logic and innovative architecture.

Testing: Comprehensive and Predictive

In the realm of testing, AI has introduced a predictive element. It generates test cases, including those that human testers might not conceive, by learning from vast datasets of code, bugs, and fixes. This results in enhanced test coverage with less manual effort. AI also optimizes CI/CD pipelines by predicting which tests are most likely to fail, prioritizing them, or suggesting which tests can be safely removed, accelerating deployment cycles and improving release reliability.

Maintenance and Monitoring: From Reactive to Predictive

The maintenance phase has significantly shifted from reactive to predictive thanks to AI. Systems now monitor applications in production, detecting anomalies in performance, security, or user behavior. AI can predict potential issues before they escalate, alerting developers in time to take preventative actions. Moreover, when vulnerabilities or bugs surface, AI can suggest patches based on historical data, dramatically speeding up the resolution process. The pinnacle of this trend is self-healing systems where AI autonomously implements fixes, reducing downtime and the urgency for human intervention.

Documentation and Knowledge Management

AI also plays a crucial role in documentation, automatically updating or generating documentation as code changes, ensuring that technical documentation remains both current and comprehensive. Beyond documentation, AI enhances knowledge management by analyzing code, commit messages, and issues to build a dynamic knowledge base, which can answer developer queries about project history or architectural decisions.

Challenges and Considerations

While AI’s integration into development is largely beneficial, it presents some challenges. Developers must adapt to this new paradigm, learning to critically interpret AI’s suggestions while maintaining their creativity and problem-solving skills. There’s a delicate balance to strike to avoid over-reliance on AI, which could potentially stifle innovation or introduce biases if not managed with ethical considerations in mind.

AI is Not Replacing Developers

AI is not replacing developers but augmenting their capabilities, making them more efficient, creative, and focused on delivering value through complex problem-solving. The future of development is a symbiotic relationship between AI and human developers, where each enhances the other’s strengths. For the professional developer, mastering this integration is not just about keeping up; it’s about leading in an industry that’s increasingly intertwined with artificial intelligence.

]]>
611293
Italy Investigates DeepSeek Over Privacy Concerns https://www.webpronews.com/italy-investigates-deepseek-over-privacy-concerns/ Wed, 29 Jan 2025 14:27:54 +0000 https://www.webpronews.com/?p=611280 The Italian government is joining the growing list of entities concerned about Chinese AI startup DeepSeek, launching an investigation over privacy issues.

DeepSeek has quickly gained recognition for its impressive AI model, one that rivals the best OpenAI has to offer. Even more impressive is the fact that DeepSeek built its model for a mere $3-$5 million, a fraction of the $100 million it cost OpenAI, while doing it with second-rate Nvidia hardware.

The Italian data and privacy watchdog, the Garante Per La Protezione Dei Dati Personali (GPDP), announced it was launching an investigation of DeepSeek over “possible risk for data from millions of people in Italy.”

The GPDP made the announcement on its official website (machine translated):

The Guarantor for the protection of personal data has sent a request for information to Hangzhou DeepSeek Artificial Intelligence and to Beijing DeepSeek Artificial Intelligence, the companies that provide the DeepSeek chatbot service, both on the web platform and on the App.

Given the possible high risk for the data of millions of people in Italy, the Authority asked the two companies and their affiliates to confirm what personal data are collected, from which sources, for what purposes, what the basis is legal treatment, and whether they are stored on servers located in China.

The Guarantor also asked the companies what type of information is used to train the artificial intelligence system and, in the event that personal data is collected through web scraping activities, to clarify how users registered and those not registered in the service have been or are informed about the processing of their data.

Given that DeepSeek is a Chinese AI firm, it’s a safe bet this is not the last investigation it will face.

]]>
611280
OpenAI Unveils ‘Operator’: Your New Digital Assistant for Web Tasks https://www.webpronews.com/openai-unveils-operator-your-new-digital-assistant-for-web-tasks/ Fri, 24 Jan 2025 15:45:19 +0000 https://www.webpronews.com/?p=611176 In the hyper-competitive world of artificial intelligence, where the race for the most advanced AI agent is akin to the gold rush of yesteryears, OpenAI has just struck a new vein with the release of “Operator.” This isn’t just another AI tool; it’s your new digital sidekick, capable of navigating the internet and performing tasks for you, from booking travel to managing your online shopping list.

Launched on January 23, 2025, Operator starts its journey as a “research preview” available only to those who subscribe to OpenAI’s ChatGPT Pro tier, a $200 monthly investment into the future of AI interaction. But what does this mean for the average tech-savvy individual or enterprise? It means having an AI that isn’t just about answering questions but acting on them.

The Mechanics of Operator

Operator leverages a novel model called the Computer-Using Agent (CUA), which utilizes the vision capabilities of OpenAI’s GPT-4o model alongside advanced reasoning skills honed by reinforcement learning. This combination allows Operator to “see” websites through screenshots and interact with them via clicks, scrolls, and keystrokes, essentially emulating human navigation of the web.

The CUA model is designed to understand and manipulate graphical user interfaces (GUIs) by interpreting visual cues from browser windows. Here’s a deeper dive for the developers:

  • Vision and Interaction: Operator uses a convolutional neural network (CNN) layer to process visual inputs from screenshots, identifying actionable elements like buttons or text fields. The model then applies a decision-making algorithm, which could be likened to a mix of deep Q-learning for action selection and a transformer-based approach for understanding context.
  • API Integration: While Operator doesn’t rely on traditional APIs for interaction, developers can expect an API release that allows for integration of CUA capabilities into other applications. This API will likely include endpoints for initiating tasks, monitoring progress, and managing session data.
  • Performance Metrics: In benchmarks like OSWorld, where AI models are tested on their ability to mimic human computer use, Operator scored a 38.1%, surpassing competitors like Anthropic’s model but not yet reaching human levels (72.4%). In web navigation tasks, it boasts an 87% success rate on WebVoyager, suggesting robust performance in real-world scenarios.
  • Limitations and Adaptability: Operator’s current limitations include struggles with complex interfaces or tasks requiring nuanced human judgment. However, its design includes mechanisms for learning from user feedback, potentially improving over time through online learning techniques.

Safety in an Autonomous World

With great power comes great responsibility, and OpenAI is acutely aware of this. Operator isn’t given free rein; it operates under stringent safety protocols. For instance, it won’t send emails or alter calendar events without user intervention, aiming to prevent potential misuse or privacy breaches. OpenAI’s safety net includes both automated and human-reviewed monitoring to pause any suspicious activity, reflecting broader concerns about AI autonomy.

  • User Control: Before executing tasks with significant consequences, like making purchases, Operator requests confirmation from the user, ensuring a layer of human oversight.
  • Privacy: Operator’s design includes options to clear browsing data, manage cookies, and opt out of data collection for model improvement, all accessible through a dedicated settings panel.

The Competitive Scene

The tech world isn’t short of AI agents; Anthropic has its “Computer Use” feature, and Google is rumored to be working on similar tech. But Operator’s immediate integration into the ChatGPT ecosystem gives it a head start. The buzz on X has been palpable, with users and tech analysts alike weighing in on its potential. One notable post from

@MatthewBerman highlights, “OpenAI’s first AGENTS are here! ‘Operator’ can control a browser and accomplish real-world tasks on your behalf,” showcasing the community’s excitement and the platform’s capabilities.

Looking Ahead

OpenAI’s move with Operator isn’t just about adding another tool to its belt; it’s about redefining how we interact with technology. The company has teased further integration of Operator’s capabilities across its product lineup, hinting at a future where AI agents handle the mundane, allowing humans to focus on the creative and strategic.

  • Developer Opportunities: With plans to make CUA available through an API, developers can look forward to building applications that leverage Operator’s capabilities for automation in sectors like customer service, e-commerce, and personal productivity.
  • Scalability and Customization: The model’s architecture allows for scaling down to smaller, more specific tasks or scaling up for broader, more complex workflows, offering flexibility for different use cases.

However, the path forward for Operator is dotted with challenges. Adapting to the ever-evolving web, ensuring privacy, and managing the ethical implications of autonomous agents will be critical. Developers and tech enthusiasts are watching closely, eager to see how Operator will evolve, adapt, and perhaps, revolutionize our daily digital interactions.

As we stand on this new frontier, one thing is clear: with Operator, OpenAI isn’t just aiming to assist but to transform our digital lives, one task at a time.

]]>
611176
Microsoft: AI Will Never Be 100% Secure https://www.webpronews.com/microsoft-ai-will-never-be-100-secure/ Fri, 17 Jan 2025 20:30:48 +0000 https://www.webpronews.com/?p=611124 Anyone holding out hope that AI can be made inherently secure are in for a disappointment, with Microsoft’s research team saying it is an impossible task.

AI has been both a blessing and a curse for cybersecurity. While it can be a useful tool for analyzing code and finding vulnerabilities, bad actors are already working overtime to use AI in ever-increasingly sophisticated attacks. Beyond direct cyberattacks, AI also poses everyday risks to data security and trade secrets, thanks to how AI consumes and indexes data.

To better understand the burgeoning field, a group of Microsoft researchers tackled the question of AI security, publishing their findings in a new paper. The research was done using Microsoft’s own AI models.

In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems. Due to the nascency of the field, there are many open questions about how red teaming operations should be conducted.

The team focused on eight specific areas.

  1. Understand what the system can do and where it is applied
  2. You don’t have to compute gradients to break an AI system
  3. AI red teaming is not safety benchmarking
  4. Automation can help cover more of the risk landscape
  5. The human element of AI red teaming is crucial
  6. Responsible AI harms are pervasive but difficult to measure
  7. LLMs amplify existing security risks and introduce new ones
  8. The work of securing AI systems will never be complete

AI Will Never Be 100% Secure

When discussing the eighth point, the researchers highlighted the issues involved in securing AI systems.

Engineering and scientific breakthroughs are much needed and will certainly help mitigate the risks of powerful AI systems. However, the idea that it is possible to guarantee or “solve” AI safety through technical advances alone is unrealistic and overlooks the roles that can be played by economics, break-fix cycles, and regulation.

Ultimately, the researchers conclude that the key to AI security is raising the cost of attacking, using the three specific methods cited.

Economics of cybersecurity. A well-known epigram in cybersecurity is that “no system is completely foolproof” [2]. Even if a system is engineered to be as secure as possible, it will always be subject to the fallibility of humans and vulnerable to sufficiently well-resourced adversaries. Therefore, the goal of operational cybersecurity is to increase the cost required to successfully attack a system (ideally, well beyond the value that would be gained by the attacker) [2, 26]. Fundamental limitations of AI models give rise to similar cost-benefit tradeoffs in the context of AI alignment. For example, it has been demonstrated theoretically [50] and experimentally [9] that for any output which has a non-zero probability of being generated by an LLM, there exists a sufficiently long prompt that will elicit this response. Techniques like reinforcement learning from human feedback (RLHF) therefore make it more difficult, but by no means impossible, to jailbreak models. Currently, the cost of jailbreaking most models is low, which explains why real-world adversaries usually do not use expensive attacks to achieve their objectives.

Break-fix cycles. In the absence of safety and security guarantees, we need methods to develop AI systems that are as difficult to break as possible. One way to do this is using break-fix cycles, which perform multiple rounds of red teaming and mitigation until the system is robust to a wide range of attacks. We applied this approach to safety-align Microsoft’s Phi-3 language models and covered a wide variety of harms and scenarios [11]. Given that mitigations may also inadvertently introduce new risks, purple teaming methods that continually apply both offensive and defensive strategies [3] may be more effective at raising the cost of attacks than a single round of red teaming.

Policy and regulation. Finally, regulation can also raise the cost of an attack in multiple ways. For example, it can require organizations to adhere to stringent security practices, creating better defenses across the industry. Laws can also deter attackers by establishing clear consequences for engaging in illegal activities. Regulating the development and usage of AI is complicated, and governments around the world are deliberating on how to control these powerful technologies without stifling innovation. Even if it were possible to guarantee the adherence of an AI system to some agreed upon set of rules, those rules will inevitably change over time in response to shifting priorities.

The work of building safe and secure AI systems will never be complete. But by raising the cost of attacks, we believe that the prompt injections of today will eventually become the buffer overflows of the early 2000s – though not eliminated entirely, now largely mitigated through defense-in-depth measures and secure-first design.

Additional Findings

The study cites a number of issues involved in securing AI systems, not the least of which is the common scenario of integrating AI with legacy systems. Unfortunately, trying to marry the two often results in serious security issues.

The integration of generative AI models into a variety of applications has introduced novel attack vectors and shifted the security risk landscape. However, many discussions around GenAI security overlook existing vulnerabilities. As elaborated in Lesson 2, attacks that target end-to-end systems, rather than just underlying models, often work best in practice. We therefore encourage AI red teams to consider both existing (typically system-level) and novel (typically model-level) risks.

Existing security risks. Application security risks often stem from improper security engineering practices including outdated dependencies, improper error handling, lack of input/output sanitization, credentials in source, insecure packet encryption, etc. These vulnerabilities can have major consequences. For example, Weiss et al., [49] discovered a token-length side channel in GPT-4 and Microsoft Copilot that enabled an adversary to accurately reconstruct encrypted LLM responses and infer private user interactions. Notably, this attack did not exploit any weakness in the underlying AI model and could only be mitigated by more secure methods of data transmission. In case study #5, we provide an example of a well-known security vulnerability (SSRF) identified by one of our operations.

Model-level weaknesses. Of course, AI models also introduce new security vulnerabilities and have expanded the attack surface. For example, AI systems that use retrieval augmented generation (RAG) architectures are often susceptible to cross-prompt injection attacks (XPIA), which hide malicious instructions in documents, exploiting the fact that LLMs are trained to follow user instructions and struggle to distinguish among multiple inputs [13]. We have leveraged this attack in a variety of operations to alter model behavior and exfiltrate private data. Better defenses will likely rely on both system-level mitigations (e.g., input sanitization) and model-level improvements (e.g., instruction hierarchies [43]).

While techniques like these are helpful, it is important to remember that they can only mitigate, and not eliminate, security risk. Due to fundamental limitations of language models [50], one must assume that if an LLM is supplied with untrusted input, it will produce arbitrary output. When that input includes private information, one must also assume that the model will output private information. In the next lesson, we discuss how these limitations inform our thinking around how to develop AI systems that are as safe and secure as possible

Conclusion

The study’s conclusion is a fascinating look into the challenges involved in securing AI systems, and sets realistic expectations that organizations must account for.

Ultimately, AI is proving to be much like any other computer system, where it will become a never-ending battle of one-upmanship between security professionals and bad actors.

]]>
611124
Microsoft Creates CoreAI, a New AI Division Led by Former Facebook Exec https://www.webpronews.com/microsoft-creates-coreai-a-new-ai-division-led-by-former-facebook-exec/ Tue, 14 Jan 2025 13:00:00 +0000 https://www.webpronews.com/?p=610886 Microsoft announced it has created a new AI division, CoreAI, led by former Facebook global head of engineering Jay Parikh.

Parikh joined Microsoft in October 2024, with CEO Satya Nadella making clear at the time that Parikh was being tapped to assist with the company’s AI initiatives.

When I look to the next phase of Microsoft, both in terms of our scale and our massive opportunity ahead, it’s clear that we need to continue adding exceptional talent at every level of the organization to increase our depth and capability across our business priorities – spanning security, quality, and AI innovation.

In its latest blog post, Microsoft says the new division will focus on AI agents.

We will build agentic applications with memory, entitlements, and action space that will inherit powerful model capabilities. And we will adapt these capabilities for enhanced performance and safety across roles, business processes, and industry domains. Further, how we build, deploy, and maintain code for these AI applications is also fundamentally changing and becoming agentic.

This is leading to a new AI-first app stack — one with new UI/UX patterns, runtimes to build with agents, orchestrate multiple agents, and a reimagined management and observability layer. In this world, Azure must become the infrastructure for AI, while we build our AI platform and developer tools — spanning Azure AI Foundry, GitHub, and VS Code — on top of it. In other words, our AI platform and tools will come together to create agents, and these agents will come together to change every SaaS application category, and building custom applications will be driven by software (i.e. “service as software”).

Parikh will lead the new team as Executive Vice President.

This new division will bring together Dev Div, AI Platform, and some key teams from the Office of the CTO (AI Supercomputer, AI Agentic Runtimes, and Engineering Thrive), with the mission to build the end-to-end Copilot & AI stack for both our first-party and third-party customers to build and run AI apps and agents. This group will also build out GitHub Copilot, thus having a tight feedback loop between the leading AI-first product and the AI platform to motivate the stack and its roadmap.

Jay Parikh will lead this group as EVP of CoreAI – Platform and Tools, with Eric Boyd, Jason Taylor, Julia Liuson, Tim Bozarth, and their respective teams reporting to Jay.

Jay will work closely with Scott, Rajesh, Charlie, Mustafa, and Kevin to optimize our entire tech stack for both performance and efficiency. Additionally, Jay and team will lead our progress and work around developer productivity and Engineering Thrive across the company.

Microsoft is already one of the leaders in the burgeoning AI industry, and the company’s new division underscores its efforts to continuing innovating in the field.

]]>
610886
AI Crushes Paperwork: How Machine Learning is Revolutionizing Document Processing https://www.webpronews.com/machine-learning-document-processing/ Sun, 08 Dec 2024 11:15:00 +0000 https://www.webpronews.com/?p=610520 Paperwork sucks. It’s a time-consuming, error-prone, soul-crushing necessity in every industry. But AI is changing the game. Large Language Models and computer vision are enabling businesses to automate document processing like never before.

Take mortgage processing for example. Startups like ClosingWTF are using AI to analyze loan estimates, closing disclosures, and other mortgage documents. Their platform flags costly hidden fees, unfavorable terms, and predatory fine print so homebuyers can secure the best deal and avoid getting screwed by lenders.

Under the hood, closing.wtf leverages large language models (LLMs) to process mortgage documents. The platform uses Anthropic’s API to convert unstructured PDFs into structured JSON objects based on a predefined schema for mortgage data.

First, the documents are passed through an LLM pipeline built on state-of-the-art transformer architectures like GPT-3. 

Here’s how it works under the hood

Closing.wtf leverages large language models (LLMs) to process mortgage documents.  The platform uses Anthropic’s API to convert PDFs into structured JSON objects based on a predefined schema for mortgage data. The LLM extracts key data points such as interest rates, APR, closing costs, and PMI. Other LLM calls then check for common red flags and benchmark against industry standards to provide user-friendly insights.

It’s a killer app for the $2T+ mortgage industry, where margins are razor-thin and every basis point counts. Lenders and brokers also use it to optimize their offerings and streamline operations. The result? Lower costs and better outcomes for everyone involved.

But that’s just the tip of the iceberg. AI-powered document processing is driving innovation across sectors:

  • Healthcare: NLP models extract insights from unstructured medical records, doctors’ notes, and research papers to enable precision medicine and accelerate drug discovery.
  • Law: AI contract review platforms like Kira, eBrevia, and LawGeex reduce document review time by 20-90%. NLP identifies key clauses, analyzes risk, and ensures compliance.
  • Finance: Intelligent OCR and NLP automate trade confirmations, KYC/AML checks, loan applications, and invoice processing. ML detects fraud and enables real-time risk assessment.
  • Logistics: Computer vision and NLP process shipping labels, waybills, and customs forms to optimize supply chains. Deep learning forecasts demand and routes shipments efficiently.

The common thread?

AI is crushing paperwork across the board, and it’s a damn good thing. McKinsey estimates that 50% of current work activities are automatable with existing tech. Document processing is low-hanging fruit that’s ripe for disruption.

But automating paperwork is just the first step. The real magic happens when you mine all that data for insights. With AI, businesses can audit 100% of their documents, not just a sample. They can run deep analytics to negotiate better deals, streamline compliance, and make smarter data-driven decisions.

We’re in the early innings of the AI revolution, and document processing is one of the most impactful applications. If you’re an entrepreneur looking for your next idea, this space is on fire.

Some practical tips if you’re building a document processing AI:

  1. Leverage OpenAI, Anthropic, Coheres, or similar LLM APIs which can handle document processing.
  2. Prompt engineering is key. Craft your prompts to extract entities, map relationships, and surface key insights. Vercel’s AI SDK is particularly useful for this.
  3. Invest in human-in-the-loop processes for edge cases and continuous model improvement. Aim for 80/20 automation out of the gate.
  4. Be transparent about AI-generated insights and provide audit trails for compliance-sensitive use cases.
  5. Invest in a kick-ass UI/UX. Vercel’s V0, Bolt.New, and Loveable are great for quick react prototypes. Make it dead simple for users to upload docs, view insights, and take action. The best AI is invisible AI.

The Bottom Line

AI document processing is already driving massive efficiency gains and enabling new products and business models. And we’re just scratching the surface. It’s an exciting time to be in tech, and an even better time to let the robots handle the paperwork.

]]>
610520
Francois Chollet, Creator of Keras, Leaves Google https://www.webpronews.com/francois-chollet-creator-of-keras-leaves-google/ Sat, 16 Nov 2024 21:57:49 +0000 https://www.webpronews.com/?p=610075 Francois Chollet—an AI pioneer and the creator of Keras—is leaving Google, the latest in a string of AI pioneers to leave the company.

Keras is a Python deep learning API that bills itself as “a superpower for developers.”

The purpose of Keras is to give an unfair advantage to any developer looking to ship Machine Learning-powered apps. Keras focuses on debugging speed, code elegance & conciseness, maintainability, and deployability. When you choose Keras, your codebase is smaller, more readable, easier to iterate on. Your models run faster thanks to XLA compilation with JAX and TensorFlow, and are easier to deploy across every surface (server, mobile, browser, embedded) thanks to the serving components from the TensorFlow and PyTorch ecosystems, such as TF Serving, TorchServe, TF Lite, TF.js, and more.

According to a blog post by Bill Jia, VP of Engineering for Core ML, and Xavi Amatriain, VP of ACE (AI and Compute Enablement) said Chollet is leaving the company.

Today, we’re announcing that Francois Chollet, the creator of Keras and a leading figure in the AI world, is embarking on a new chapter in his career outside of Google. While we are sad to see him go, we are incredibly proud of his immense contributions and excited to see what he accomplishes next.

With over two million users, Keras has become a cornerstone of AI development, streamlining complex workflows and democratizing access to cutting-edge technology. It powers numerous applications at Google and across the world, from the Waymo autonomous cars, to your daily YouTube, Netflix, and Spotify recommendations.

The two executives say Chollet remains committed to contributing to Keras, while Google will continue to invest in it.

Importantly, Francois remains deeply committed to the future of Keras and its continued support for JAX, TensorFlow, and PyTorch. He will continue contributing to the project and overseeing its roadmap. The Keras team at Google will continue to collaborate with Francois in the open-source community, and wish him all the best in his future endeavors.

Google’s continued investment in Keras 3 demonstrates our commitment to support major ML frameworks and offer ML developers framework optionality. Our recent launch of Keras Hub is also a significant step towards democratizing access to powerful AI tools and accelerating the development of innovative multimodal applications.

Google has lost a number of high-profile AI researchers and pioneers in the last couple of years, a trend it needs to address if it is to remain competitive.

]]>
610075
Caitlin Kalinowski, Former Meta AR Hardwar Exec, Joins OpenAI https://www.webpronews.com/caitlin-kalinowski-former-meta-ar-hardwar-exec-joins-openai/ Thu, 07 Nov 2024 16:55:30 +0000 https://www.webpronews.com/?p=609929 Caitlin Kalinowski, former Head of AR Glasses Hardware at Meta, announced that she has joined OpenAI to lead the company’s hardware efforts.

Kalinowski made the announcement in a LinkedIn post.

I’m delighted to share that I’m joining OpenAI to lead hardware!

OpenAI and ChatGPT have already changed the world, improving how people get and interact with information and delivering meaningful benefits around the globe. AI is the most exciting engineering frontier in tech right now, and I could not be more excited to be part of this team.

In my new role, I will initially focus on OpenAI’s robotics work and partnerships to help bring AI into the physical world and unlock its benefits for humanity.

Thank you to the OpenAI team, Sam, Kevin Weil, PW, and to my friends and colleagues in engineering and beyond!

OpenAI is known to be working on AI-powered hardware, even partnering with former Apple hardware designer Sir Jony Ive. The partnership was first reported in late 2023, although there was little known about what the collaboration might produce.

Although details about the collaboration remain nearly nonexistent, OpenAI CEO Sam Altman did say he believes the partnership could redefine how people interact with technology.

“We are at a point where generative AI can not only complement but enhance user experiences in ways that were once unimaginable,” Altman stated. “Our discussions with Jony made it clear that we could do more than just create another gadget—we could redefine how people interact with technology.”

OpenAI hiring Kalinowski is a good sign that things are moving forward.

]]>
609929
Sam Altman Blames Compute Scaling for Lack of GPT-5 https://www.webpronews.com/sam-altman-blames-compute-scaling-for-lack-of-gpt-5/ Sun, 03 Nov 2024 23:03:00 +0000 https://www.webpronews.com/?p=609824 In a Reddit AMA with OpenAI’s Sam Altman, Kevin Weil, Srinivas Narayanan, and Mark Chen, Altman blamed compute scaling for the lack of newer AI models.

OpenAI, Anthropic, and Google have been in an AI arms race, each one working to unlock the next major AI breakthrough. While OpenAI has continued to iterate on GPT-4, it no longer has a dominant lead, with Anthropic’s Claude going toe-to-toe with ChatGPT and besting it at times.

In the AMA, Altman was asked why the company has not yet released GPT-5.

we are prioritizing shipping o1 and its successors.

all of these models have gotten quite complex and we can’t ship as many things in parallel as we’d like to. (we also face a lot of limitations and hard decisions about we allocated our compute towards many great ideas.)

don’t have a date for AVM vision yet.

Compute and energy demands are increasingly becoming an issue for AI firms, threatening to derail many companies’ climate change commitments. Microsoft and Amazon have been investing in nuclear energy in an effort to meet those needs.

On OpenAI’s Leadership Exodus

Altman also addressed a question regarding OpenAI’s ongoing loss of some of its co-founders and top researchers.

While we are sad to not have some of the people we had worked with closely, we have an incredibly talented team and many new amazing people who have joined us recently as well. And we keep shipping which is really important 🙂

On AI’s Ability to Change a Founder’s Role

Another user asked AI could be used to augment a founder’s role and entrepreneurship.

extremely excited about this!

if a founder can be 10x as productive, we will have a lot more (and better startups). this works better than having a founding team of 10 people in many ways (less coordination overhead, for example).

although a 10x productivity gain is still far in the future, i believe it will happen. the resulting economic acceleration in general, and for startups in particular, will be great.

The entire AMA provides quite a bit of insight into OpenAI and Sam Altman’s take on the AI field.

]]>
609824
Google Using AI to Write More Than a Quarter of Its Code https://www.webpronews.com/google-using-ai-to-write-more-than-a-quarter-of-its-code/ Wed, 30 Oct 2024 21:54:44 +0000 https://www.webpronews.com/?p=609658 Alphabet CEO Sunda Pichai made a surprising revelation, saying Google is now using AI to write more than a quarter of all new code.

Pichai made the statement in the company’s Q3 2024 earnings call.

We’re also using AI internally to improve our coding processes, which is boosting productivity and efficiency. Today, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers. This helps our engineers do more and move faster.

Pichai’s statement echoes a recent interview of Google co-founder Sergey Brin, in which he touted the benefits of using AI to generate code, even saying he didn’t feel the company’s engineers were using it enough.

“I think that AI touches so many different elements of day-to-day life, and sure, search is one of them,” [Brin said](“I think that AI touches so many different elements of day-to-day life, and sure, search is one of them,” Brin said. “But it kind of covers everything. For example, programming itself, the way that I think about it is very different now.). “But it kind of covers everything. For example, programming itself, the way that I think about it is very different now.

“Writing code from scratch feels really hard, compared to just asking the AI to do it,” Brin continued, to laughter from the audience. “I’ve written a little bit of code myself, just for kicks, just for fun. And then sometimes I’ve had the AI write the code for me, which was fun.”

“They were kind of impressed because they don’t honestly use the AI tools for their own coding as much as I think they ought to,” he added.

Gemini Insights

Pichai said Gemini use was growing dramatically, both in users and across the company’s platforms.

Our research teams also drive our industry-leading Gemini model capabilities, including long context understanding, multimodality, and agentive capabilities. By any measure — token volume, API calls, consumer usage, business adoption — usage of the Gemini models is in a period of dramatic growth. And our teams are actively working on performance improvements and new capabilities for our range of models. Stay tuned!

And they’re building out experiences where AI can see and reason about the world around you. Project Astra is a glimpse of that future. We’re working to ship experiences like this as early as 2025.

We then work to bring those advances to consumers and businesses: Today, all seven of our products and platforms with more than 2 billion monthly users use Gemini models. That includes the latest product to surpass the 2 billion user milestone, Google Maps. Beyond Google’s own platforms, following strong demand, we’re making Gemini even more broadly available to developers. Today we shared that Gemini is now available on GitHub Copilot, with more to come.

Google has made a change to the Gemini team’s organization, in an effort to speed up further development.

We recently moved the Gemini app team to Google DeepMind to speed up deployment of new models, and streamline post-training work. This follows other structural changes that have unified teams in research, machine learning infrastructure and our developer teams, as well as our security efforts and our Platforms and Devices team. This is all helping us move faster. For instance, it was a small, dedicated team that built Notebook LM, an incredibly popular product that has so much promise.

Conclusion

Overall, Pichai’s comments are a fascinating look into how AI is revolutionizing Big Tech, giving companies the ability to develop and innovate at a faster pace.

]]>
609658
How AI-Driven Amazon Q Developer Streamlines Code, Testing, and Security https://www.webpronews.com/how-ai-driven-amazon-q-developer-streamlines-code-testing-and-security/ Sun, 20 Oct 2024 13:49:32 +0000 https://www.webpronews.com/?p=609165 As development teams face increasing pressure to deliver high-quality code rapidly, tools that help streamline processes are becoming essential. Amazon Q Developer, an AI-powered assistant from AWS, is one such tool that promises to transform the development landscape by automating tasks such as code comprehension, testing, and debugging, while enhancing overall productivity.

In a recent demonstration, Betty Zheng, Senior Developer Advocate at AWS, showcased the potential of Amazon Q Developer to optimize various development tasks, offering a glimpse of what AI-driven development can achieve for developers working on cloud-native applications.

Catch our conversation on AI-Driven Amazon Q Developer!

 

Understanding Complex Code with Amazon Q Developer

One of the standout features of Amazon Q Developer is its ability to comprehend and summarize code in ways that allow developers to quickly grasp the architecture of new projects. Developers often face the challenge of onboarding into large, unfamiliar codebases, but Amazon Q mitigates this by parsing complex files like pom.xml and generating clear, actionable summaries. As Zheng points out, “Amazon Q helps us quickly understand the project metadata, dependencies, and build configurations in a matter of seconds.”

In her demonstration, Zheng explains how Amazon Q integrates seamlessly with popular IDEs such as VS Code and JetBrains, providing real-time explanations of the code at hand. For example, when inspecting a Spring Framework-based application, developers can simply highlight a section of code and ask Amazon Q to explain it. “This helps reduce the cognitive load on developers and allows them to focus on building and improving the application,” says Zheng.

The ability to break down complex code into simple, understandable steps is particularly useful when collaborating across teams. Amazon Q’s conversational AI can generate documentation on the fly, creating comments or JavaDoc strings for public methods. As Zheng illustrates, this feature significantly reduces the time needed for documentation, enhancing collaboration between team members.

Automated Debugging and Unit Testing

Debugging and testing are integral but time-consuming parts of software development. Amazon Q accelerates these tasks by identifying bugs, suggesting fixes, and even generating unit tests to ensure code quality. Zheng demonstrates how Amazon Q spotted an issue in a word-guessing game application, where the word selection was not functioning as expected. “By simply sending the problem code to Amazon Q, the tool provided a corrected version of the function, which we could immediately test and deploy,” Zheng explains.

The automated generation of unit tests is another powerful capability. Amazon Q creates comprehensive test cases to verify the correctness of functions, which not only improves code reliability but also boosts developer productivity by eliminating the need for manual test creation. “Unit testing is essential, but it can be a tedious task. With Amazon Q, we can generate these tests much more efficiently, ensuring higher code quality without slowing down the development process,” adds Zheng.

Additionally, Amazon Q enables continuous feedback during the development process by performing security scans. As Zheng notes, “The AI detects potential vulnerabilities and suggests fixes, ensuring that developers are writing secure code from the start.” This early detection of security risks helps teams maintain secure code without waiting until later stages of development when the cost of fixing issues is higher.

Streamlined Feature Development with Natural Language

Perhaps one of the most transformative features of Amazon Q Developer is its ability to take natural language input and translate it into functional code. In her demo, Zheng illustrates how developers can simply describe a new feature in plain English—such as adding a difficulty selection to the word-guessing game—and Amazon Q will automatically break down the request into logical steps. “The tool follows existing code patterns, reuses code where appropriate, and generates the necessary code to implement the new feature,” Zheng explains.

This capability allows teams to iterate quickly on new ideas without getting bogged down in the details of implementation. By interacting with Amazon Q using natural language, developers can go from concept to deployment in a fraction of the time it would take using traditional methods. As Zheng puts it, “You can build and test new features without leaving your IDE, making the entire development process more fluid and efficient.”

Improving Code Quality and Security

In addition to streamlining development tasks, Amazon Q helps improve overall code quality and security. Its real-time code scanning capabilities allow it to identify inefficiencies and potential vulnerabilities as developers write code. Zheng demonstrated how the tool scans for common security issues, offers best practices for remediation, and provides detailed explanations of the detected problems.

The value of this continuous scanning cannot be overstated. Longer feedback loops, especially when it comes to security issues, can lead to costly context-switching for developers. Amazon Q eliminates these delays by providing immediate feedback within the IDE, ensuring that developers can address issues as they arise rather than waiting until a formal code review or testing phase.

Moreover, Amazon Q ensures that developers are always working with the latest, most secure versions of their dependencies by automating package upgrades. This feature is especially critical for teams managing large projects with numerous dependencies, as it helps mitigate risks associated with outdated or vulnerable packages.

AI-Driven Development is Just Getting Started

Amazon Q Developer exemplifies the direction in which modern development workflows are headed. By leveraging AI, Amazon Q enhances every stage of the development lifecycle—from code comprehension and debugging to feature creation and security optimization. As Zheng highlights, “It turns tasks that would have taken days into actions that can be completed in just a few minutes.”

The implications for development teams are profound. With AI handling much of the heavy lifting, developers can focus on innovation and strategy rather than getting bogged down in routine tasks. This acceleration in the development process not only reduces time to market but also improves code quality, security, and maintainability.

In a fast-paced, competitive landscape, tools like Amazon Q Developer will be essential for teams looking to stay ahead. Whether you’re working on cloud-native applications or complex enterprise solutions, the integration of AI into your workflow can provide a critical advantage. Amazon Q Developer is leading this charge, demonstrating that AI-driven development is not a distant future—it’s happening now.

]]>
609165
Nobel Prize Winner Geoffrey Hinton Proud Ilya Sutskever Fired Sam Altman https://www.webpronews.com/nobel-prize-winner-geoffrey-hinton-proud-ilya-sutskever-fired-sam-altman/ Thu, 10 Oct 2024 18:00:59 +0000 https://www.webpronews.com/?p=609353 Dr. Geoffrey Hinton, widely considered the “Godfather of AI,” says he is particularly proud of former student Ilya Sutskever for firing OpenAI CEO Sam Altman in 2023.

Sutskever was one of several OpenAI board members who led a coup against Altman in 2023, ousting him from the company. Pressure, from both inside and outside the company, ultimately led to Altman’s return, with Sutskever eventually leaving himself.

At the time of Altman’s ouster, reports indicated that Sutskever and the other board members were concerned that Altman was straying too far from OpenAI’s primary goal of safe AI development. The board felt Altman was pursuing profit at the expense of safety, a narrative that has been repeated by other executives who have left the company in recent months.

Hinton is the latest to lend weight those concerns. In a video post following his Nobel Prize win, Hinton touted the students he had over the years, particularly calling out Sutskever.

“I’d also like to acknowledge my students,” Hinton says in the video. “I was particularly fortunate to have many very clever students, much clever than me, who actually made things work. They’ve gone on to do great things.

“I’m particularly proud of the fact that one of my students fired Sam Altman, and I think I better leave it there and leave it for questions.”

Hinton then goes on to describe why Sutskever was involved in firing Altman.

“So OpenAI was set up with a big emphasis on safety,” he continues. “Its primary objective was to develop artificial general intelligence and ensure that it was safe.

“One of my former students Ilya Sutskever, was the chief scientist. And over time, it turned out that Sam Altman was much less concerned with safety than with profits. And I think that’s unfortunate.”

Hinton has long been a vocal advocate for need to develop AI with safety concerns front and center. He previously worked on AI at Google, before leaving the company and sounding the alarm over its rushed efforts to catch up with OpenAI and Microsoft.

Since leaving Google, Hinton has warned of the danger AI poses, saying efforts need to be taken to ensure it doesn’t gain the upper hand.

“The idea that this stuff could actually get smarter than people — a few people believed that,” Dr. Hinton said. “But most people thought it was way off. And I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that.

“I don’t think they should scale this up more until they have understood whether they can control it,” he added.

]]>
609353
The Unstoppable Rise of OpenAI’s o1 Models—And Why Experts Are Worried https://www.webpronews.com/the-unstoppable-rise-of-openais-o1-models-and-why-experts-are-worried/ Sat, 21 Sep 2024 11:05:04 +0000 https://www.webpronews.com/?p=608660 OpenAI’s newest release of the o1 models is nothing short of a game-changer in the artificial intelligence (AI) landscape. With capabilities far beyond anything seen before, these models are poised to revolutionize industries like healthcare, finance, and education. But along with these extraordinary abilities come serious questions about potential risks, including concerns over AI safety and the implications of wielding such power without sufficient oversight.

Tech executives across sectors are watching these developments closely, as the o1 models represent a significant leap in AI’s ability to handle complex reasoning tasks. However, the models also challenge established notions about the future of AI governance and raise questions about the ethical implications of deploying such powerful technology.

Listen to our conversation on the rise of OpenAI’s o1 models. Should you be worried?

 

The Unprecedented Capabilities of the o1 Models

The o1 series, which includes the o1-preview and o1-mini models, is a significant breakthrough in generative AI. As Timothy B. Lee, an AI journalist with a master’s in computer science, noted in a recent article, “o1 is by far the biggest jump in reasoning capabilities since GPT-4. It’s in a class of its own.” These models have demonstrated the ability to solve complex reasoning problems that were previously beyond the reach of earlier iterations of AI.

One of the most impressive aspects of the o1 models is their ability to handle multi-step reasoning tasks. For example, the models excel at breaking down complex programming problems into manageable steps, as OpenAI demonstrated during the launch event. By thinking step-by-step, the o1-preview model can solve intricate problems in fields like computer programming and mathematics, offering solutions far faster and with more accuracy than previous models.

This improvement is largely due to OpenAI’s use of reinforcement learning, which teaches the model to “think” through problems and find solutions in a more focused, precise manner. The shift from imitation learning, which involved mimicking human behavior, to reinforcement learning has allowed o1 to excel where other models struggle, such as in logic-heavy tasks like writing bash scripts or solving math problems.

A Double-Edged Sword: Are the o1 Models a Threat?

Despite these extraordinary capabilities, concerns about the potential dangers of the o1 models have been raised within the AI community. While OpenAI has been relatively reserved in discussing the risks, an internal letter from OpenAI researchers last year sparked considerable debate. The letter, which was leaked to Reuters, warned that the Q* project—which evolved into the o1 models—could “threaten humanity” if not properly managed. Although this might sound like a plot from a science fiction novel, the fears stem from the growing autonomy and reasoning power of these systems.

Much of the concern revolves around the speed and scale at which the o1 models can operate. By solving problems that require advanced reasoning—tasks once thought to be the exclusive domain of human intellect—the o1 models may introduce new risks if deployed irresponsibly. As Lee wrote in his analysis, “The o1 models aren’t perfect, but they’re a lot better at this [complex reasoning] than other frontier models.”

This has led to a broader conversation about AI safety and governance. While OpenAI has implemented safety protocols to mitigate risks, many industry leaders and researchers are pushing for more robust regulations to prevent the misuse of such powerful technologies. The question remains: Are we ready for AI systems that can think more critically and deeply than any model before?

Why Reinforcement Learning Makes o1 Different

The technical foundation of the o1 models is a significant departure from earlier AI systems. As Lee explains, the key to o1’s success lies in the use of reinforcement learning. Unlike imitation learning, which trains models to replicate human behavior based on predefined examples, reinforcement learning enables the model to learn from its mistakes and adapt in real-time. This capability is crucial for handling multi-step reasoning tasks, where a single mistake could derail the entire process.

To illustrate the difference, consider a basic math problem: “2+2=4.” In imitation learning, the model would simply memorize this equation and reproduce it when prompted. However, if the model were asked to solve a more complex equation, like “2+5+4+5-12+7-5=,” it might struggle because it has not learned how to break down complex problems into simpler parts.

Reinforcement learning addresses this issue by teaching the model to solve problems step by step. In the case of the o1 models, this has resulted in the ability to solve advanced math problems and write complex code, as seen in OpenAI’s demonstrations. This approach has allowed the o1 models to outperform even human experts in specific tasks, making them an invaluable tool for businesses that require deep, multi-step reasoning capabilities.

The Limitations: Where o1 Still Falls Short

Despite its many strengths, the o1 models are not without limitations. One of the most notable areas where the models struggle is spatial reasoning. In tests involving tasks that required a visual or spatial understanding—such as navigation puzzles or chess problems—both the o1-preview and o1-mini models produced incorrect or nonsensical answers.

For example, when asked to solve a chess problem, the o1-preview model recommended a move that was not only incorrect but also illegal in the game of chess. This highlights a broader issue with current AI systems: while they can excel at text-based reasoning tasks, they struggle with problems that require an understanding of physical or spatial relationships.

This limitation is a reminder that, despite the advancements in AI, we are still far from achieving a truly general artificial intelligence that can reason about the world in the same way humans do. As Lee pointed out, “The real world is far messier than math problems.” While o1’s ability to solve complex reasoning problems is impressive, it remains limited in its ability to navigate the complexities of real-world scenarios that involve spatial reasoning or long-term memory.

The Implications for Tech Executives: A Call for AI Governance

For tech executives, the release of the o1 models presents both an opportunity and a challenge. On one hand, the models’ extraordinary capabilities could revolutionize industries ranging from finance to healthcare by automating complex, multi-step reasoning tasks. On the other hand, the potential risks associated with such powerful systems cannot be ignored.

Executives must carefully consider how to integrate these models into their operations while ensuring that robust safety protocols are in place. This is especially important in industries where AI is used to make high-stakes decisions, such as healthcare or finance. The power of the o1 models to handle complex data and offer rapid solutions is unmatched, but without proper oversight, the risks could outweigh the benefits.

OpenAI’s efforts to collaborate with AI safety institutes in the U.S. and U.K. are a step in the right direction, but more needs to be done to ensure that AI systems are developed and deployed responsibly. As the capabilities of AI continue to grow, tech executives will play a crucial role in shaping the future of AI governance and ensuring that these technologies are used for the greater good.

The o1 Models Represent a New Era for AI

The o1 models represent a new era in artificial intelligence—one where AI systems are capable of deep, multi-step reasoning that was once thought to be the exclusive domain of human cognition. For businesses, these models offer unprecedented opportunities to automate complex tasks and unlock new insights from their data. But with this power comes a responsibility to ensure that AI is used ethically and safely.

As OpenAI continues to push the boundaries of what AI can do, the question for tech executives is not just how to leverage these models for growth, but also how to navigate the ethical and regulatory challenges that come with such extraordinary technology. The future of AI is here, and it’s both exciting and uncertain.

]]>
608660
OpenAI Establishes New Safety Board—Without Sam Altman https://www.webpronews.com/openai-establishes-new-safety-board-without-sam-altman/ Tue, 17 Sep 2024 14:35:18 +0000 https://www.webpronews.com/?p=608335 OpenAI has taken a major step toward improving its safety governance, establishing a new Safety and Security Committee that does not include Sam Altman. Altman has been CEO of OpenAI since 2019, outside of a short time in November 2023. 

OpenAI has faced ongoing criticism regarding its safety processes, with notable scientists and executives leaving the company over concerns it is not doing enough to address potential threats AI may pose. The company fueled concerns even more when it disbanded the “superalignment team” responsible for evaluating potential existential threats from AI.

Listen to a podcast conversation on OpenAI’s new safety board—Without Sam Altman!

 

In a move that is sure to allay fears, the company has unveiled the new Safety and Security Committee, and provided insight into how much power it has.

As one of its initial mandates, the Safety and Security Committee conducted a 90-day review of safety and security-related processes and safeguards and made recommendations to the full Board.

Following the full Board’s review, we are now sharing the Safety and Security Committee’s recommendations across five key areas, which we are adopting. These include enhancements we have made to build on our governance, safety, and security practices.

  • Establishing independent governance for safety & security
  • Enhancing security measures
  • Being transparent about our work
  • Collaborating with external organizations
  • Unifying our safety frameworks for model development and monitoring

The first recommendation is of particular note, as it gives the Safety and Security Committee far more power than previous safety oversight measures.

The Safety and Security Committee will become an independent Board oversight committee focused on safety and security, to be chaired by Zico Kolter, Director of the Machine Learning Department with the School of Computer Science at Carnegie Mellon University, and including Adam D’Angelo, Quora co-founder and CEO, retired US Army General Paul Nakasone, and Nicole Seligman, former EVP and General Counsel of Sony Corporation. It will oversee, among other things, the safety and security processes guiding OpenAI’s model development and deployment.

The Safety and Security Committee will be briefed by company leadership on safety evaluations for major model releases, and will, along with the full board, exercise oversight over model launches, including having the authority to delay a release until safety concerns are addressed. As part of its work, the Safety and Security Committee and the Board reviewed the safety assessment of the o1 release and will continue to receive regular reports on technical assessments for current and future models, as well as reports of ongoing post-release monitoring. The Safety and Security Committee will also benefit from regular engagement with representatives from OpenAI’s safety and security teams. Periodic briefings on safety and security matters will also be provided to the full Board.

The announcement is a welcome one and represents a major shift in OpenAI’s operations. The absence of Sam Altman from the committee is another welcome move. Altman has repeatedly come under fire for decisions, such as OpenAI releasing a voice the “Sky” voice that sounded eerily like Scarlett Johansson, despite the actor declining to lend her voice to the project. Altman even sent a tweet that seemed to indicate the intention to mimic Johansson’s voice. Similarly, Altman was ousted from OpenAI in 2023 amid growing concerns that he was prioritizing the commercialization of OpenAI’s work over safe development.

In view of Altman’s past, it will be a relief to investors and employees alike that he—and the rest of OpenAI leadership—finally have proper and independent oversight.

]]>
608335
OpenAI o1 Released: A New Paradigm in AI with Advanced Reasoning Capabilities https://www.webpronews.com/openai-o1-released-a-new-paradigm-in-ai-with-advanced-reasoning-capabilities/ Thu, 12 Sep 2024 19:30:35 +0000 https://www.webpronews.com/?p=607969 In a significant leap for artificial intelligence, OpenAI has introduced its latest model, o1, which represents a major advancement in how AI approaches complex reasoning tasks. Released on September 12, 2024, OpenAI o1 is designed to “think before responding,” employing a structured process known as chain-of-thought reasoning. Unlike previous models, o1 is trained using reinforcement learning to develop problem-solving strategies that mirror human cognitive processes. This enables the model to outperform its predecessors, including GPT-4o, on a variety of tasks in mathematics, science, and coding. OpenAI’s o1 is a preview of what could be a new era of AI, where models do not simply generate answers but reason their way to solutions.

The Foundations of OpenAI o1: Reinforcement Learning and Chain-of-Thought Processing

The critical distinction between o1 and earlier models like GPT-4o lies in its use of reinforcement learning (RL), which allows the model to iteratively improve its reasoning abilities. Traditional large language models (LLMs), including GPT-4o, are trained on massive datasets to predict the next word or token in a sequence, relying heavily on statistical patterns in the data. In contrast, OpenAI o1 uses RL to solve problems more dynamically, rewarding the model for correct solutions and penalizing incorrect ones. This method enables o1 to refine its internal decision-making process.

According to Mark Chen, OpenAI’s Vice President of Research, “The model sharpens its thinking and fine-tunes the strategies it uses to get to the answer.” This approach allows o1 to break down complex problems into smaller, manageable steps, similar to how a human might approach a challenging puzzle. In other words, the model doesn’t simply produce an answer—it “reasons” through the problem by analyzing multiple paths and revising its strategy as needed.

This chain-of-thought (CoT) method provides several advantages. First, it allows the model to be more transparent in its decision-making. Users can observe the step-by-step reasoning process as it unfolds, which increases the interpretability of the model’s outputs. Second, it enhances the model’s ability to handle multi-step problems. For example, when solving a mathematical problem or writing complex code, o1 iterates through each step, checking for logical consistency and correctness before moving on.

Chen explains: “The model is learning to think for itself, rather than trying to imitate the way humans would think. It’s the first time we’ve seen this level of self-reasoning in an LLM.”

Performance Benchmarks: Outperforming Humans in Science, Math, and Coding

The chain-of-thought and reinforcement learning techniques used by o1 have led to impressive results in competitive benchmarks. The model was tested against both human and machine intelligence on several reasoning-heavy tasks, and the outcomes were striking.

On the American Invitational Mathematics Examination (AIME), a test designed to challenge the brightest high school math students in the U.S., o1 achieved a 74% success rate when given a single attempt per problem, increasing to 83% with consensus voting across multiple samples. For context, GPT-4o averaged only 12% on the same exam. Notably, when allowed to process 1,000 samples with a learned scoring function, o1 achieved a 93% success rate, placing it among the top 500 students in the country.

In scientific domains, o1 demonstrated similar superiority. On GPQA Diamond, a benchmark for PhD-level expertise in biology, chemistry, and physics, o1 outperformed human PhDs for the first time. Bob McGrew, OpenAI’s Chief Research Officer, noted, “o1 was able to surpass human experts in several key tasks, which is a significant milestone for AI in academic research and problem-solving.”

In the realm of coding, o1 ranked in the 89th percentile on Codeforces, a competitive programming platform. This places the model among the top participants in real-time coding competitions, where solutions to algorithmic problems must be developed under tight constraints. The ability to apply reasoning across domains—whether in coding, math, or scientific inquiry—sets o1 apart from previous models, which often struggled with reasoning-heavy tasks.

Overcoming Traditional AI Limitations

One of the long-standing issues with AI models has been their tendency to “hallucinate”—generating plausible but incorrect information. OpenAI o1’s reinforcement learning and chain-of-thought processes help mitigate this issue by encouraging the model to fact-check its outputs during reasoning. According to Jerry Tworek, OpenAI’s Research Lead, “We have noticed that this model hallucinates less. While hallucinations still occur, o1 spends more time thinking through its responses, which reduces the likelihood of errors.”

In this sense, o1 introduces a more methodical approach to problem-solving. By considering multiple strategies and self-correcting as needed, the model minimizes the errors that plagued previous iterations of GPT models. Ethan Mollick, a professor at the University of Pennsylvania’s Wharton School, who tested o1, remarked, “In using the model for a month, I saw it tackle more substantive, multi-faceted problems and generate fewer hallucinations, even in tasks that traditionally trip up AI.”

Technical Challenges and Future Development

Despite its advancements, o1 is not without its challenges. The model requires significantly more compute resources than its predecessors, making it both slower and more expensive to operate. OpenAI has priced o1-preview at $15 per 1 million input tokens and $60 per 1 million output tokens, approximately 3-4 times the cost of GPT-4o. These costs may limit the immediate accessibility of o1, particularly for smaller developers and enterprises.

Additionally, while o1 excels at reasoning-heavy tasks, it is less effective in other areas compared to GPT-4o. For instance, o1 lacks web-browsing capabilities and cannot process multimodal inputs, such as images or audio. This positions o1 as a specialized model for reasoning rather than a general-purpose AI. OpenAI has indicated that future iterations will address these limitations, with plans to integrate reasoning and scaling paradigms in upcoming models like GPT-5.

Looking ahead, OpenAI envisions further improvements to o1’s reasoning capabilities. Sam Altman, OpenAI’s CEO, hinted at the company’s ambitions, stating, “We are experimenting with models that can reason for hours, days, or even weeks to solve the most difficult problems. This could represent a new frontier in AI development, where machine intelligence approaches the complexity of human thought.”

Implications for AI Development

The release of OpenAI o1 signals a paradigm shift in how AI models are built and deployed. By focusing on reasoning, rather than simply scaling model size, OpenAI is paving the way for more intelligent, reliable AI systems. The ability to think through problems and self-correct has the potential to transform how AI is used in high-stakes domains like medicine, engineering, and legal analysis.

As Noah Goodman, a professor at Stanford, put it, “This is a significant step toward generalizing AI reasoning capabilities. The implications for fields that require careful deliberation—like diagnostics or legal research—are profound. But we still need to be confident in how these models arrive at their decisions, especially as they become more autonomous.”

OpenAI o1 represents a breakthrough in AI’s ability to reason, marking a new era in model development. As OpenAI continues to refine this technology, the potential applications are vast, from academic research to real-world decision-making systems. While challenges remain, the advancements made by o1 show that AI is on the cusp of achieving human-like levels of reasoning, with profound implications for the future of technology and the world.

]]>
607969