I like some of the blogs that post about fashion from different countries, but I wish they would stop tagging these posts as “ethnic models”. If you are featuring a fashion walkway from India, with Indian models, how is it an “ethnic” group?! This is all very shady, and there is no need for such things.
I hate shalwaar-kameez fashion snobs tbh. Like, people who act shitty if you are not wearing the right “in” type of one — long kameez, short kameez, churidaar not shalwaar, transparent dupatta, blah blah blah.
I just like my run-of-the-mill normal kinds, thank you.
Big Data, big business and bioscience
There’s a funny sense of déjà vu when a passing thought reappears in a respected publication only a few hours after having popped into your head.
This happened today after seeing this morning’s news from the Science and Tech. Facilities Council (STFC) ‒ Big Data Is Big Business. The update came across quite heavy on buzzwords: big data, open data, and asking “Is there an app for that?” despite the piece having nothing to do with apps…
Big data is big business, with the British government estimating that it will have created 58,000 new jobs and added £216 billion to the UK economy by 2017. The UK has vast data sets that are open for public use, generated through world-class research activity and data-intensive public sector organisations. Research has shown that allowing unfettered access is likely to stimulate novel uses of the data, resulting in the emergence of many new companies selling new services.
Allowing unfettered access to public data has implications beyond what “research” can show, and there’s rightly contention regarding whether governmental bodies should really be stewards to avid entrepeneurs with what may be confidential data.
The news piece focussed on the Sentinel 1a satellite, launched a fortnight ago, and its uses for flood monitoring.
This isn’t really what I understand by the term Big Data, which is used by smaller companies such as Zillabyte to mean watching trends, predicting markets, and generally being very aware of the populace, and by monoliths like IBM and Palantir to indicate doing so on a grandiose scale, but with more emphasis on really strong software engineering.
This sort of tacit meaning left me with a vague unease. It takes a very minor change of perspective to see “predictive analytics” as Orwellian, and I had a moment of wondering whether groups pushing such initiatives on this public sector data might do so by intermingling them with (or disguising them behind) tech trends for “openness”.
I should probably clarify at this point that this unease is separate from any related to science working with industry, and more to do with the proven track record for some of these tech companies to act reprehensibly behind closed doors when left to decide what they should do with analytics, and just a gut feeling that this echoes previous scandals involving unregulated industries.
That the current UK government is in favour of these plans isn’t a great surprise after the recently exposed sale of NHS patient data, which we can only hope isn’t part of a greater push for privatisation. The event STFC are hosting (the reason behind said press release) will hold talks from heads of all 5 UK Research Councils, in the Daresbury lab at Harwell, Oxford.
The piece that made me think back to their news item came in Nature later today, bearing caution to Beware of backroom deals in the name of ‘science’ from Colin Macilwain, editor of Research Europe. It’s hard to think of another article I’ve read in Nature in recent times which has been so outright political.
He describes the work of a neo-conservative lobby in US congress, using the word ‘sound’ to give an unearnt respectability to a policy (nothing new in Congress) known as the ‘sound science’ farm bill. AAAS helped block its passage to the President, stating that:
The Section would also require that agencies favor data that are “experimental, empirical, quantifiable, and reproducible,” although not all scientific research could meet each of these criteria. For example, some experiments are theoretical or statistical rather than experimental, and others are so large-scale that they may not be reproducible. The new regulation could also prevent policymakers from using science based on new technologies
this provision could “further hamstring agencies already under significant budgetary pressure.”
In short, the Section, if passed, may slow or even paralyze agencies’ rule-making abilities by complicating an already thorough review process, making it exceedingly difficult to implement new regulations pertaining to agricultural, environmental, or public health practices, among other things, the statement said.
Leshner echoed concerns of Sen. Edward Markey (D-MA), who released a statement of his own in December. The Consortium for Ocean Leadership and UCAR also released a joint letter opposing Section 12307 and calling for removal of the provision from the Farm Bill.
Experimental, empirical, quantifiable and reproducible are lofty goals, but as Macilwain points out, “the approach would discount, for example, the use of weather modelling, or of data collected from one-off events, such as natural disasters.” It’s a good example of scientific language being used to shut down critical thinking in the scientifically-minded, but the appropriate groups took note and had it excised from the bill.
Dealing with such provisions is a bit like whack-a-mole. There is another mole already in sight on Capitol Hill: the Secret Science Reform Act, now under consideration by the House science committee, to stop the Environmental Protection Agency from using data that are not publicly available in its assessments.
And who could argue with that? Well, one issue with making all such data public is that it gives industry grounds for refusing to hand confidential data over, as it would then become public.
In the end, regulatory arguments are more philosophical than scientific in their nature. Environmentalists advocate caution in the face of uncertainty; industry wants cost-benefit analysis.
The natural sciences have little to say on which approach is wiser. Industry, however, has become adroit at using the concept of sound science to advocate the latter path. Too many researchers, as well as the wider public, are taken in by the claim that when someone says they are seeking the scientific answer to a regulatory question, they mean what they say. They very rarely do.
While this piece made no mention of ‘Big Data’, ‘Big Business’ cropped up several times, as did concerns around data sharing in public and private sectors. I’m not here to espouse any particular political view, and personally I’m more interested to know about what it is that each of the Research Councils are terming their ‘Big Data’, most of all the biotech & biosciences council (bbsrc).
The BBSRC Strategic Plan (2010-2015) covers their early intentions to exploit ‘big data’, but none of this has ever been called Big Data when I’ve heard it discussed.
The slide (just one, but with a lot crammed onto one page) highlighted a need for:
- computationally proficient biologists
- software engineers that understand the heterogeneity of biological data
- biological engineers that can deploy computational models to design and manipulate biological systems
In 2012, work began on a new BBSRC bioinformatics ‘Technical Hub’ in Cambridgeshire, which holds a training centre, EMBL-EBI office space and ‘an industry-led clinical translation suite for bioinformatics’. This all seems quite vague, and again it’d be interesting to hear what ‘Big Data’ BBSRC are going to discuss as ‘now available’ in the presentation to UK businesses.
Again, a presentation related to the 2010-15 plan makes no clear reference to industry other than citing them as partners. What’s more, the statement that “bioscience is big data science” makes me a little bemused as to what this term is being used to mean.
Bioscience ‘big data’ is quite different from the behavioural / socioeconomic ‘Big Data’ that raises concerns of social engineering
Recently here in the UK, there has been a row over patient privacy in a system known as Care.data, which was to allow mining of GP records — amounting to entire medical histories. The NHS was described a few years ago as a huge source of health informatics being missed out on, and the potential benefits to research are huge if handled well. Trust in this system is vital as it’s possible to opt-out, so poor handling could doom it.
Fears over privacy were downplayed as the records would be “anonymised” — though Ben Goldacre quickly highlighted how it could in fact be used to identify individuals.
Seemingly contrary to this statement, health secretary Jeremy Hunt outlined plans to link care.data to genomic sequencing, in what seems to be preparation for personalised medicine in the NHS (‘Genomics England’). This is as far as I can see the current most obvious instantiation of ‘Big Data’ in UK science, and largely it’s only through the biosciences’ links to the medical profession (i.e. patients and those in clinical trials) that they would earn any of the accompanying privacy or social concerns vs. lab research data.
Taking a look at the jobs listed on NatureJobs, to get an idea of exactly what sort of work might come under the title in the workplace as it stands, the reality seems to lack that much reason for alarm. No sign of bioscience-NSA partnerships, just use of a buzzword: bioinformatics by another name, with an element of bragging rights (similarly: why are we still calling it Next-Generation Sequencing other than to make it sound cool?)
It’s also interesting to see the diversity in desired backgrounds: more mathematical/statistical/computer science than bioscientists.
PhD student Computational Discovery of Genetic Variation in Genomic ‘Big Data’ : Amsterdam, Netherlandsdevelopment of computational and statistical algorithmic frameworks for “finding needles in genomic big data haystacks”. This poses intriguing and challenging computational and/or statistical questions. The project will be linked to the “Genome of the Netherlands” (GoNL) project, which is concerned with the genomes of 769 Dutch individuals, grouped into families. The purpose of this project and the arrangement of its data aims at spotting and characterizing genetic variation in the light of evolution most favorably. It is based on 60 terabytes of genome data and provides a most relevant link to current genomics research, as being largest family-oriented such sequencing project worldwide. Beyond GoNL, applications to diseases such as cancer and virus genomes are also of great interest.develop software and methods to explore, analyze, and visualize clinical and biological data sets including genomic, neuroimaging, and electronic health record dataSTFC’s Hartree Centre, based within the Scientific Computing Department, specializes in exploring advanced computing techniques, including Big Data and HPC. The Hartree Centre was funded through the UK Government’s e-Leadership council to work closely with industrial applications, and some of its work is commercial in confidence and subject to IPR rules regarding public disclosure. We are looking for someone with an aptitude for solving ‘Big Data’ problems. A background with exposure to data analytics systems and infrastructure such as Hadoop, HDFS, IBM Infosphere, IBM Streams, and MarkLogic is required, with an underlying scientific/engineering discipline.The research will focus on the development of novel statistical methodology for the analysis of large-scale single cell genomics data and offers an opportunity to be at the analytical forefront of institutional expansion in this area. Ideally, you will have experience of statistical methods development gained through a recently obtained (or soon to be) PhD (or equivalent) in a quantitative subject (e.g. mathematics, statistics, physics, engineering or computer science). Experience of Bayesian Statistics and machine learning techniques is highly desirable as is evidence of prior experience of developing bioinformatics software and/or analysing genomic data sets.Over the next few years, more than 35 new endowed professorships and chairs will be established, which will provide incredible opportunities for world-renowned researchers. The two main research areas are Big Data and Bioengineering.
I won’t be able to make it to Hartree, but Genomics England is holding ‘town hall engagement meetings’ across the UK starting later this month, for the public/patients and afterparties (i.e. technical talks) for clinicians and those interested in healthcare data and the realities of sequencing 100,000 Brits’ genomes over the next 5 years which I’m looking forward to.
There was a nice Leading Edge Analysis report in Cell this week touching on this exact issue, interviewing Gene Myers, mathematician-computer scientist turned cell biologist (indeed director of Molecular Cell Biology and Genetics at the Max Planck Institute)
There was a time when [biological] data was really hard-fought and you wanted to preserve it,’’ Myers says, ‘‘but now that’s not true anymore.’’
The piece underscores that data sharing is not just trendy, but rather it’s necessary for any hope of statistically significant findings for rare diseases and the intricacies of biology.
which aims to “assemble a genomic zoo” by collecting sequence data representing the genomes of 10,000 vertebrate species, which corresponds to approximately one genome per vertebrate genus. As of December 2013, the group has data for 94 species complete or in progress.
“I’m extremely passionate about the fact that we have an opportunity to understand life on this planet, how it evolved, how living systems are built by molecular evolution,” says Haussler. “This is a watershed for science.”
For research like this to flourish, though, certain cultural changes will be needed, including broader data sharing, improved computational training for biologists, and providing more support for data scientists in the traditional academic structure.
which currently counts among its members more than 100 healthcare, research, and disease advocacy organizations from across the world. The initiative aims to support an infrastructure for sharing patient genetic data, keeping in mind concerns about patient privacy and security, to help push medical research forward.
“Individual researchers need to be committed to the idea of participating in collaborations to collect enough data together to do a truly deep analysis, to have the numbers to be able to investigate their individual cases in the context of other cases,” he says.
But convincing researchers to share their data has not been easy. “Scientists always want to have their cake and eat it too,” Haussler says. “They would like to have total control over all the genomes from their patients, yet they realize virtually everything we’re talking about becomes rare when you get down to the precise molecular characterization.”
The reported fear of researchers being “scooped” from sharing data (after publication, i.e. reinterpretation of data from the literature) has been criticised as careerism in one way or another, and not something the scientific community should be making allowances for given the benefits to science on the whole.
Scooping here takes on a different meaning to that where a researcher beats you to a publication off their own back ‒ the issue is more to do with keeping what you generated, since you paid for it.
Systems biologist Uri Alon even wrote a song about it… an inside-joke if ever there was one.
The alternative view is that really if your research was paid for with public money, it’s not really yours to hoard in this way.
Others have suggested that the confidence to avoid being scooped is a luxury not available to early career researchers, and richer nations may have less to lose from funding (as is seen with trends in governmental data’s ‘openness’).
More discussion can be found here, and it seems that genomics is one of the fields less prone to this fear, and perhaps ecology more so ‒ for social and cultural reasons that need be treated more carefully than with simple mandates to deposit data.
A third interview was with Philip Bourne, leader of the National Institutes of Health’s Big Data to Knowledge (BD2K) initiative. In March he became the first Associate Director of Data Science, described as the “so-called data czar for the NIH.”
“The notion of being a data scientist is crucially important, yet these people are typically not well looked after” in a university setting, Bourne says. “They don’t last in the system.” He cites as an example his own University of California, San Diego research group, which earned the nickname of “the Google bus” because so many of its alumni ended up working at the nearby Google office. “Every morning half the people on that bus were people from my lab. They weren’t even looking for jobs outside academia, but they were just attracted away. We need to raise awareness of the importance of these people in the system.”
It goes the other way as well, with current researchers requiring training in data science if they hope to be successful in the new research environment. “We’ve got to figure out how to train the next generation and our current generation,” says Eric Green [director of the National Human Genome Research Institute]. “Mid-career scientists are going to be practicing their trade for another 20 to 30 years, yet they’re woefully untrained when it comes to data science. The train is just going to pass them by if we’re not careful. Then we need to think about the next generation: for the run-of-the-mill biologist, what is the minimum competency they need to function in the new world of data science? We want to raise everyone’s floor.”
“On the one hand, it’s everybody’s problem, but at the same time it becomes nobody’s problem because it slips through the cracks.”
On the technical side, Bourne and others are also considering the best way to build systems that can support the huge data sets currently being generated. Bourne says it is likely that public-private partnerships between research institutions and corporations like Amazon, Microsoft, and Google are likely to solve these problems. Scientists have been interested in such partnerships for a long time, but only recently have the data reached the scale where computer scientists are really interested in getting involved. “During the Human Genome Project, we would invite in computer scientists and entice them to get involved,” Green recalls, “but they really weren’t very interested.” Now that researchers are generating data three or four orders of magnitude larger than the human genome though, “all of a sudden we have the technology to generate the scale of data to get them going.”
Infrastructure development is certainly important, but it’s also useful to step back and remember the real reason big data is scientifically important in the first place. “Applications drive everything,” says Schatz. “For the rest of our days storage and transfer will be a problem, but I’m really excited for the days where we’ll have those systems in place and can ask some really exciting questions.”
This final point was echoed in another post today, from Razib Khan ‒ in his closing notes on a cattle genetics paper, he got tangibly excited at the shifting ground Bernstein noted above.
Over the next decade it seems inevitable that the clusters at the heart of “genomics cores” across the world will be gorging on whole sequences of thousands of individuals for many organisms. It will be a “flood the zone” era for attempting to understand the tree of life. An army of bioinformaticists will be thrown at the data in human waves, absorbing shock after shock, slowly transforming the ad hoc kludge pipelines of the pre-Model T era of genomics into simpler turnkey solutions. And then the biology will come back to the fore, and the deep wellspring of knowledge by those who focus on specific organisms and is going to be the essence of the enterprise once more.
❏ Bernstein R (2014) Shifting Ground for Big Data Researchers. Cell, 157, 283-284
If a book told you something when you were fifteen, it will tell you it again when you’re fifty, though you may understand it so differently that it seems you’re reading a whole new book.
Ursula K. Le Guin (via handatthelevelofyoureye)
I love her so much. I’ve been reading only her books for the past two months.
I want to re-read Wizard of Earthsea and the other books in the series again. I read them so long ago… I forgot most of the plot. I think I will read over the holidays *so excited*
theastrolibrarian replied to your post “I stopped watching GoT midway through season 2. Now I hear all this…”
Rape tw, racism tw, torture tw, misogyny tw, racism tw, sexualised violence tw, soap opera dialogue tw - but if you got as far as season 2 you probably knew that anyway. I watched all seasons so far and I don’t know why I put myself through it.
Yeh, exactly, although back then I was pretty unaware of the racist elements. I only watched like… two episodes of the second season before giving up.
dromedarypenguin replied to your post “I stopped watching GoT midway through season 2. Now I hear all this…”
It’s pretty great! Why did you stop?
There is way too much sex and sexual violence in it. Like a lot of HBO shows, tbh. And they just add it for gratuitous viewing, since it doesn’t advance the plot in any way.
I wonder how vocal the media will be in calling out the ~new government~ once it comes into force. There’s already instances of how editors, journalists are worshipping the rising sun and white washing Modi’s image by constantly harping about the so called “clean chit”, absolving him of any wrong doing. It says something when foreign media (Guardian, NYTimes, Telegraph, The Economist) has been more hard on Modi for his role in 2002 than the Indian media. Also, the corporate houses running news channels (Network 18 anyone?) have been constantly urging their editors to tone down their criticism and give more footage to Modi compared to his rivals.
The recent farcical India TV “interview” was a prime example of this. The stage managed crowd, the soft-ball questions. Even yesterday’s interview of his to ANI involved no cross questioning or countering of his points by the interviewer.
I wonder if this will continue after the elections too. As this piece in Business Standard states, the institutions like Courts and Media are more than happy to be compliant when a autocratic leader is at the helm. There are still some journalists who are sticking to their guns. Past instances prove that Modi doesn’t take too kindly of people questioning his authority. It is highly ostensible that he goes after the activists and journalists who’ve chosen to speak out against him once he’s at the top position. The firing of Hartosh Bal Singh and Manu Joseph was the beginning of the things to come.
They will all lick his shoes, including the international media. Even the last time BJP formed the main part of the government, everyone was beaming on about how Vajpayee and Co. have completely abandoned their hardliner Hindutva logic, and are more ~open~ and ~democratic~ even to the Muslims!!1!! All past sins forgiven and forgotten. The same will happen to Modi — people will bang on about how the new BJP-led government is not as bad; meanwhile the true horrific use of state violence on those dispossessed from their lands will be hidden. Of course, Gujarat 2002 will be completely buried, as will the 100,000 Muslims who were forced into Muslim ghettos, and still dwell there to this day.
"Indira Gandhi suspended democratic procedures and declared Emergency! Congress is corrupt"
I agree, but rather than looking at how the rights of Indian citizens have been systematically eroded in India, the logical conclusion you reach is that we should all welcome BJP with open arms? Erm…
I stopped watching GoT midway through season 2. Now I hear all this hullabaloo about it on tumblr, and I’m tempted to start watching it again…
Cubes of heaven: Chilled tofu with a (mainly) roasted salad
Tofu with avocado, blanched spinach, sauteed onion and roasted pumpkin, roasted tomato and roasted bell peppers, with a balsamic vinegar dressing
My family has taken to roasting things. Big time. Every time we buy a batch of cherry tomatoes from the market, before I can even pop them in my mouth they’ll be roasting in the oven, becoming succulent and sweet and juicy and heavenly. Yesterday we had a roasting day, pumpkin, tomato and bell peppers in all 3 colors, so I used them today for my salad and it was awesome. Paired with the coolness of the tofu (for protein), it couldn’t have been a better lunch. Just one note: don’t forget to season the salad well before eating!
I can’t believe I can’t find rice paper anywhere in this entire country?! What is the point of living here, someone tell me.
I’m always astounded at the sheer size of Maharashtra. It goes right from the west coast into the very heart of India.